Structural heterogeneity assessment among the isoforms of fungal 1-aminocyclopropane-1-carboxylic acid (ACC) deaminase: a comparative in silico perspective

Background The primary amino acid sequence of a protein is a translated version from its gene sequence which carries important messages and information concealed therein. The present study unveils the structure-function and evolutionary aspects of 1-aminocyclopropane-1-carboxylic acid deaminase (ACCD) proteins of fungal origin. ACCD, an important plant growth-promoting microbial enzyme, is less frequent in fungi compared to bacteria. Hence, an inclusive understanding of fungal ACC deaminases (fACCD) has brought forth here. Results In silico investigation of 40 fACCD proteins recovered from NCBI database reveals that fACCD are prevalent in Colletotrichum (25%), Fusarium (15%), and Trichoderma (10%). The fACCD were found 16.18–82.47 kDa proteins having 149–750 amino acid residues. The enzyme activity would be optimum in a wide range of pH having isoelectric points 4.76–10.06. Higher aliphatic indices (81.49–100.13) and instability indices > 40 indicated the thermostability nature. The secondary structural analysis further validates the stability owing to higher α-helices. Built tertiary protein models designated as ACCNK1–ACCNK40 have been deposited in the PMDB with accessions PM0083418–39 and PM0083476–93. All proteins were found as homo-dimer except ACCNK13, a homo-tetramer. Conclusions Hence, these anticipated features would facilitate to explore and identify novel variants of fungal ACCD in vitro aiming to industrial-scale applications. Supplementary Information The online version contains supplementary material available at 10.1186/s43141-021-00294-0.


Page 2 of 14
Pramanik and Mandal Journal of Genetic Engineering and Biotechnology (2022) 20:18 Background Ethylene, a volatile phytohormone synthesized from methionine through two intermediates viz. S-adenosyl-Lmethionine (SAM) and 1-aminocyclopropane-1-carboxylic acid (ACC) [1]. It is known to be involved in regular plant growth and development including seed germination, fruit ripening, flowering, and senescence [2]. The phytohormone, on the other hand, is known to overproduce (known as "stress ethylene") as a response to biotic and abiotic stresses that lead to altered plant growth and development, often leading to death [3]. This happens as a rapid surge of ACC (the immediate precursor of ethylene) levels in plant cells either during the interaction with phytopathogens [3] or exposure to abiotic stresses like heavy metal, drought, salinity, etc. [4,5]. The influence of pathogen-induced ethylene in virulence and disease development has been studied earlier [6,7]. A group of microorganisms possesses 1-aminocyclopropane-1-carboxylic acid deaminase (ACCD) [EC 3.5.99.7] activity that plays a major role in ethylene signaling in plants. The microbial ACCD cleaves ACC of plant cells into α-ketobutyrate and ammonia during the synergistic plant-microbe interaction under stress and drops the "stress ethylene" levels in plants thereby assisting in normal plant functioning. The ACCD is known to be prevalent in bacteria, and in some beneficial fungi as well as in stramenopiles [8]. However, the study of ACCD of fungal origin is less compared to others [9]. Interestingly, the ACCD is also evident in some plant pathogenic species of Alternaria, Aspergillus, Colletotrichum, and Fusarium that suggests the likely role of ACCD in the ecological fitness of the fungi [3]. Ethylene perception was necessary during the spore germination and appressorium formation in Colletotrichum gloeosporioides while in Botrytis cinerea, the hormone exaggerated the transcriptional reprogramming of the genes associated with plant interaction [10]. Like plant growthpromoting rhizobacteria (PGPR), ACC deaminase is also evident in plant growth-promoting fungi (PGPF) such as in several Trichoderma species [9,11]. Still, the distribution of ACCD in fungal species is limited which consequently limits our understanding of the structurefunction aspects of the fungal ACCD (fACCD). Hence, the overall structural as well as functional features of fACCD are to be explored to facilitate the process of discovery of more novel variants ACCD from different fungal classes.
The lack of fundamental structural features including three-dimensional structure of a protein of interest discernibly limits the knowledge of biological function.
While an x-ray crystallography or at least a nuclear magnetic resonance (NMR) produces an accurate structural feature of a particular protein, it is often not accessible and feasible as well especially for screening large set of proteins. Besides, few proteins also lose to sustain their native state due to chemical properties and technical limitations that suggests predictive approaches to adopt straightaway as a complement of wet-lab set up [12]. The present study was undertaken to unravel the structural, functional, and phylogenetic perspectives of known fungal ACC deaminase that is often encoded by the gene acdS. To date, to the best of our knowledge, there is no indepth investigation on fungal ACC deaminase that needs to be examined. Here, some open-source bioinformatic tools, web-servers, and offline tools were utilized to analyze the linear chain of amino acids that is the principal source of information hidden therein. Starting from the phylogenetic analysis, a thorough physicochemical characterization, secondary structural conformations were derived followed by representations of tertiary structural arrangements. This is accompanied by structural validation to assess the quality of structures and functional analysis was also targeted to find the conserved residues in the proteins of interest. In the end, we have submitted the built 3-D models of fACCD proteins in public repositories for further research.

Amino acid sequence recovery
The amino acid sequences of different fungal ACCD proteins (fACCD) were extracted from The National Center for Biotechnology Information (https:// www. ncbi. nlm. nih. gov/). The proteins mentioned therein as "hypothetical proteins, " "probable ACC deaminase, " and "unnamed protein product" were screened out to keep away from any ambiguity in selecting appropriate protein sequences. These sequences were saved in FASTA format for examination through bioinformatic analyses.

Phylogeny of fungal ACCD
Evolutionary relationship based on the fACCD proteins among the selected taxa was inferred in MEGA X [13] using the Neighbor-Joining method [14] with 1000 bootstraps. The evolutionary distances were computed using the Poisson correction method [15] and are in the units of the number of amino acid substitutions per site. All ambiguous positions were removed for each sequence pair using the pairwise deletion option.

Physicochemical characterization
The primary sequence analyses for all selected fACCD proteins were executed by determining the computation of various physical and chemical parameters from ExPASy ProtParam tool [16]. This tool (https:// web. expasy. org/ protp aram/) analyses length of sequence, amino acid composition, molecular weight (MW), isoelectric point (pI), extinction coefficient (EC), instability index (II), aliphatic index (AI), grand average of hydropathicity (GRAVY), and the total number of negatively as well as positively charged residues (TNR and TPR respectively).

Secondary structure prediction
Prediction of protein folding was performed in the improved self-optimized prediction (SOPMA) method [17] to determine the percentage of α-helices, extended strands, β-turns, and random coils for the fACCD proteins.

Template selection and homology-based modeling
All selected fACCD proteins were used to determine the 3-D model for each protein structure. SWISS-MODEL, a homology-based protein modeling server [18] was used to predict the protein structures in the following order: template search>template selection>model building. The SWISS-MODEL template library (SMTL version 2020-11-04, PDB release 2020-10-30) was searched with BLAST [19] and HHBlits [20] for evolutionary related structures matching the target sequences. Suitable templates for each target protein were chosen tactically from the 50 templates obtained per search based on sequence similarity, query coverage, global model quality estimation (GMQE), and quaternary structure quality estimation (QSQE). Hence, one particular template per protein was selected based on target-template alignment to build final protein models using ProMod3 3.

Structure assessment
Evaluation of 3-D structures was performed following SWISS Model structure assessment project (https:// swiss model. expasy. org/ assess) followed by the structure analysis and verification server (SAVES v6.0) which determines the stereochemical quality of a protein structure by evaluating residue-by-residue geometry as well as overall structural geometry [21].

Model deposition
The built 3-D protein models were deposited to the protein model database (PMDB) which is a public resource for storing protein models to give access as well as validating experimental data [22]. PMDB database (http:// srv00. recas. ba. infn. it/ PMDB/) assigns a unique identifier for each submitted model to directly access the relevant data.

Functional analyses
To find out conserved domains among the fACCD proteins, a multiple sequence alignment program, Clustal Omega (https:// www. ebi. ac. uk/ Tools/ msa/ clust alo/), was used which generate alignments for more than three sequences [23]. Additionally, a motif finder tool (https:// www. genome. jp/ tools/ motif/) was used to find common motifs among the selected proteins.

Phylogeny of fungal ACCD
To decode the evolutionary consequences among the selected genera, a phylogenetic tree was constructed based on the fACCD sequences ( Fig. 1). The amino acid sequence homology-based phylogeny depicts the clustering pattern among different fungal genera among which Colletotrichum spp. occupied the major clades of homologs (Fig. 1). The said genus was found with the closest clustering tendency with the second most abundant genus Fusarium spp. in two different clades. The third abundant genus, Trichoderma, was however created a separate clade far distant from Colletotrichum spp. and Fusarium spp. (Fig. 1). Besides, Metarhizium spp. and Lachnellula spp. showed closer affinity to Colletotrichum spp. and Fusarium spp.   20:18 found in between them with two different clades (Fig. 1).
Other fungal species, having less frequent in number, was however distributed in the phylogenetic tree with no definitive and inferable pattern (Fig. 1).

Physicochemical characterization
Selected 40 fACCD were characterized to depict the theoretical information on physical and chemical features (  Fig. 2). Heatmap analysis reveals a considerable variation in amino acid composition is noticeable among the fACCD (Fig. 2). The analysis indicated that the linear chain of fACCD proteins has a sequence length ranging from 149 to 750 amino acid residues having molecular weights (MW) between 16.18 and 82.47 kDa ( Table 1). The isoelectric points (pI) suggested that enzyme activities would be optimum in a wide range of pH having pI  (Table 1). Furthermore, assuming all pairs of Cys residues form cystines, extinction coefficients (EC) were measured (in 280 nm) which were found to be 13,075-90,800 M − 1 cm − 1 (Table 1). Moreover, instability indices for most of the fACCD were found below 40 while aliphatic indices (AI) were found higher indicating the thermostable nature of the proteins (Table 1). GRAVY values were found lower in every case whereas computed TNR and TPR are presented in Table 1.

Secondary structure prediction
The secondary structural conformation is the local folded structures that form within a polypeptide chain as a result of interaction among the atoms of the backbone (between the amino hydrogen and carboxyl oxygen atoms). The α-helices and β-sheets are the two most common conformations that indicate the stability of a protein of interest. Here, we analyzed the primary amino acid chains of all fACCD to predict the same. The results suggested that the proteins are abundant in α-helices and random coils while the least contents are shown in the case of extended strands and β-turns (Fig. 3).

Template selection and homology-based modeling
Structural information is crucial to determine the protein function than merely the primary sequence information. The overall 3-D arrangement of a polypeptide chain is the consequence of the interactions between the polar and charged amino acids. Since homology modeling or comparative protein modeling is a useful tool for the prediction of protein structure, the target-template alignment is important to initiate the task. Collectively, three templates were chosen viz. 1f2d.1.A, 1tzm.1.A, and 1j0a.1.A to perform the homology modeling of 40 fACCD "target" proteins (Fig. 4). The 40 built models with suitable templates were designated consecutively as ACCNK1 to ACCNK40 ( Table 2, Fig. 4). It was found that most of the proteins were homo-dimer except ACCNK13 which was a homo-tetramer (Fig. 4).

Structure assessment
The next and essential step of homology-based modeling is the structural validation for the quality of the built protein models. Several quality parameters viz. QMEAN score, MolProbity score, SAVES ERRAT overall quality factor, and distribution of amino acid residues in the Ramachandran plot were taken into consideration to assess the quality of built structures ( Table 2). QMEAN and MolProbity score was found numerically lower while an overall quality factor, in most of the cases, were found greater than 90% (Table 2). Also, the distribution of amino acid residues in the Ramachandran plot showed more than 90% of residues occupied in the favored region (Table 2).

Model deposition
All fACCD, i.e., ACCNK1 to ACCNK40 were finally deposited in the protein model database (PMDB) which stores annotated protein models for further studies. The accession numbers PM0083418-39 and PM0083476-93 were assigned automatically by the server for ACCNK1-ACCNK22 and ACCNK23-ACCNK40 respectively. The models can be accessed anytime from the server using the PMDB identifiers.

Functional analyses
The multiple sequence alignment (MSA) performed through Clustal Omega among all the fACCD recognized several conserved residues within the linear chain of amino acids either fully or partially (Fig. 5). An asterisk (*) in the MSA specified fully conserved residue while a colon (:) indicated conservation between groups of strongly similar properties and a period (.) is the sign of conservation between groups of weakly similar properties (Fig. 5). Besides, from the functional analysis, it was revealed that the proteins contained 1-4 functional motifs (Fig. S3).

Discussion
Microbial ACC deaminase is an inducible enzyme that can be induced by the presence of its substrate, ACC. ACC has been reported to utilize as a sole source of nitrogen by Fusarium graminearum [24] and by a biocontrolling PGPF, Trichoderma asperellum T203 [9]. The gene acdS encodes the enzyme AcdS which is regulated differentially under different environmental stresses [3]. Unlike bacterial ACC deaminase, the distribution of this enzyme in fungal species is not so frequent. As    Fig. 1, Fig. S1). Interestingly, all the fungal taxa are restricted within the division Ascomycota (74%) and Basidiomycota (26%) (Fig. S2). Nevertheless, further examination is needed to explore the structural and functional characteristics of the said proteins. For this, crystal structure analysis is required which can be performed through biophysical tools either through nuclear magnetic resonance (NMR), X-ray crystallography, or by X-ray free-electron lasers (FELs) [25][26][27]. The experimental processes are however timeconsuming, luxurious, and often difficult to perform in case of a large number of isolated proteins. For the selection and screening from large protein datasets, several bioinformatics tools could be useful to predict proteinfolding patterns and 3-D structures as well as generating hypotheses about a protein's function directing future works on a protein of interest [28].
In this study, we have selected 40 fungal ACC deaminase proteins obtained from 19 different fungal genera from the NCBI database eliminating the ambiguous sequences. Phylogenetic analysis suggested that Colletotrichum spp. occupied the major clades of homologs among different fungal genera (Fig. 2). After Colletotrichum spp., Fusarium spp. and Trichoderma spp. are the most dominant genera possessing AcdS (Fig. 2). An earlier phylogenetic study on different microbial taxa supported that the acdS genes are predominantly vertically inherited in various fungal classes [8]. Added further, the fACCD sequences were further characterized to decipher the physical and chemical properties that revealed that fACCD are 16.18-82.47 kDa proteins having isoelectric points between 4.76 and 10.06 (Table 1). The isoelectric points below and above the neutral pH are the indication of the acidic and basic nature of the proteins which could be due to the amphoteric nature of amino acid residues [29]. A Tas-acdS (ACCD derived from Trichoderma asperellum) having 348 amino acids with an expected molecular weight of 37 kDa [9]. Besides, most of fACCD in this study portrayed instability indices < 40 with higher aliphatic indices (relative volume of a protein occupied by aliphatic side chains) which supports their thermostability nature [30,31]. GRAVY, however, was lower in every case suggesting better interaction with water molecules [31].
Furthermore, selected fACCD were used for secondary structural analysis to uncover the folding pattern of the proteins. This step is crucial as an intermediate state between amino acid sequences and tertiary structures [32]. The results suggested the dominance of α-helical conformation (Fig. 3) indicating protein stability. The α-helices are reported abundant in thermophiles [33]; however, an alike trend was found in the case of phytase proteins of Aspergillus niger determined computationally [34]. Likewise, the homology-based protein modeling revealed that fACCD are multimeric proteins, and most of them are homodimeric except ACCNK13, a homotetramer (Fig. 4). The built models for 40 fACCD were sequentially designated as ACCNK1 to ACCNK40 ( Table 2). It is accepted that among the approachable prediction methods, homology modeling is the most successful one, for the protein tertiary structure prediction if at least one suitable template (experimentally derived) of the protein family is available in the protein data bank (PDB) [35]. However, the secondary structural elements were found in agreement with the 3D models obtained through homology modeling.
To conform with the reliability of the computed models, several structure-assessment tools were adopted that generated numerical quality scores such as QMEAN score, MolProbity score, SAVES ERRAT overall quality factor, and distribution of amino acid residues in the Ramachandran plot validating the accuracy and stereochemical quality of the structures ( Table 2). The QMEAN score should be within 0-1 to obtain high-resolution structures [36] whereas a MolProbity score is a single number that signifies the central MolProbity protein quality statistics, lower the MolProbity score higher the resolution [37]. On the other hand, SAVES ERRAT overall quality factor > 95% determines a high-resolution structure [38]. Also, the distribution of amino acid residues more than 90% in the favored region of Ramachandran plot suggested the characteristics of a good model [39].
Nonetheless, the built protein models in PDB format were deposited to the protein model database (PMDB) with accession numbers PM0083418-39 and PM0083476-93 for further use. Finally, the functional annotation from Clustal Omega evidenced the conserved residues for the selected 40 fACCD (Fig. 5). Conserved residues in proteins have an important role in protein folding and unfolding kinetics and protein stability as well [40]. Furthermore, the motif search result indicated that the common motif shared by all proteins were "PALP" which suggested the proteins belong to pyridoxal-phosphate dependent class of enzymes (Fig. S3).

Conclusions
There was a dire need to assemble fungal ACC deaminase protein sequences (fACCD) derived experimentally in order to decipher the physicochemical, stereochemical, and functional features for a comprehensive overview. Keeping in mind the constraints in utilizing the modern biophysical tools to obtain crystal structures, computational annotation was proven useful to predict the structure-function aspects of proteins of interest. This study unveils the characterization of fACCD indicating that these are multimeric proteins having molecular weight 16-82 kDa, with both acidic and alkaline property and thermostable in nature. To date, the fACCD are predominant in different genera of Basidiomycota followed by Ascomycota. It is important to note that among the Asco-and Basidiomycota members, both beneficial and phytopathogens possess this plant growth promoting enzyme. This might be due to an evolutionary consequence which may serve as a meaningful cue in symbiotic as well as host-pathogen interaction studies. Thus, as an integral part of in vitro studies, anticipated features of fungal ACC deaminase would direct to design, identify, and engineer novel variants of this plant-stress related fungal protein for their effective application in industrial-scale.