Skip to main content

Plant catalase in silico characterization and phylogenetic analysis with structural modeling



Catalase (EC is a heme-containing tetrameric enzyme that plays a critical role in signaling and hydrogen peroxide metabolism. It was the first enzyme to be crystallized and isolated. Catalase is a well-known industrial enzyme used in diagnostic and analytical methods in the form of biomarkers and biosensors, as well as in the textile, paper, food, and pharmaceutical industries. In silico analysis of CAT genes and proteins has gained increased interest, emphasizing the development of biomarkers and drug designs. The present work aims to understand the catalase evolutionary relationship of plant species and analyze its physicochemical characteristics, homology, phylogenetic tree construction, secondary structure prediction, and 3D modeling of protein sequences and its validation using a variety of conventional computational methods to assist researchers in better understanding the structure of proteins.


Around 65 plant catalase sequences were computationally evaluated and subjected to bioinformatics assessment for physicochemical characterization, multiple sequence alignment, phylogenetic construction, motif and domain identification, and secondary and tertiary structure prediction. The phylogenetic tree revealed six unique clusters where diversity of plant catalases was found to be the largest for Oryza sativa. The thermostability and hydrophilic nature of these proteins were primarily observed, as evidenced by a relatively high aliphatic index and negative GRAVY value. The distribution of 5 sequence motifs was uniformly distributed with a width length of 50 with the best possible amino residue sequences that resemble the plant catalase PLN02609 superfamily. Using SOPMA, the predicted secondary structure of the protein sequences revealed the predominance of the random coil. The predicted 3D CAT model from Arabidopsis thaliana was a homotetramer, thermostable protein with 59-KDa weight, and its structural validation was confirmed by PROCHECK, ERRAT, Verify3D, and Ramachandran plot. The functional relationships of our query sequence revealed the glutathione reductase as the closest interacting protein of query protein.


This theoretical plant catalases in silico analysis provide insight into its physiochemical characteristics and functional and structural understanding and its evolutionary behavior and exploring protein structure-function relationships when crystal structures are unavailable.


Catalases (EC are iron porphyrin oxidoreductase enzymes that scavenge hydrogen peroxide into water and oxygen [1, 2]. They are heme-containing tetrameric enzymes found in subcellular organelles (peroxisomes), the primary source of H2O2 production during oxidative stress conditions via photorespiratory oxidation, beta oxidation of fatty acids, and purine catabolism [3]. CAT plays a crucial role due to pathological events connected to their dysfunction, such as increased vulnerability to apoptosis, tumor stimulation, regulated aging, and inflammation. It also aids in defensive mechanisms and protects the cell from oxidative damage. Another significant property of catalase is its strong catalytic activity, using H2O2 as a substrate to oxidize phenols, insecticides, herbicides, polyaromatic hydrocarbons, and synthetic textile dyes [4]. Catalase was the first enzyme to crystallize and isolate. They are found in various plant species such as tobacco, Arabidopsis thaliana, pepper, mustard, saffron, maize, castor bean, sunflower, cotton, wheat, and spinach [5,6,7,8,9,10,11]. The role of catalase in aging, senescence, and plant defense has been of significant importance. In light of the different applications of catalase mentioned above, the current work is being conducted for in silico analysis from plant sources. Computational investigation of the plant catalase amino sequence revealed the conserved secondary structure in sequences that play a crucial role in evolution. Primary research on catalases was conducted to examine their characteristics and key biological functions. Analyses of the phylogeny of the catalase gene has indicated the existence of three primary clades that separated themselves early in the evolution of this gene family by at least two gene duplication events [12]. A phylogenetic approach could help us account for the intrinsic divergence in enzyme dynamics induced by the natural evolution of sequence variation across time [13]. As genomics advances, computational tools are becoming increasingly crucial in helping to find and describe possible gene families for various industrial uses. This helps untangle the sequence-structure-functional relationship between enzyme protein sequences [14]. The analysis of genes and proteins in silico has gained increased interest, emphasizing the development of biomarkers, drug design, and the development of a very effective microbiological agent suitable for a wide range of industries. The present work aims to understand the catalase evolutionary relationship of plant species and analyze its physicochemical characteristics, homology, phylogenetic tree construction, secondary structure prediction, and 3D modeling of protein sequences and its validation using a variety of conventional computational methods to assist researchers in better understanding the structure of proteins.


Protein sequence recovery

In FASTA format for various computational analyses, sixty-five full-length catalase protein sequences from various plant sources were retrieved from the NCBI (National Center for Biotechnology Information) database. The number of protein sequences with accession numbers and source organisms is given in Table 1.

Table 1 Selected protein sequences of catalases from different plant sources

ProtParam tool for primary sequence analysis

The ExPasy ProtParam tool was used to compute the physiochemical parameters of the selected catalases. ProtParam calculates a variety of physicochemical properties that can be derived from the sequence of a protein. The molecular weight, theoretical pI, amino acid composition, atomic composition, extinction coefficient, estimated half-life, instability index, aliphatic index, and grand average of hydropathicity (GRAVY) are all parameters computed by ProtParam [15] (

Multiple Sequence Alignment (MSA)

The multiple sequence alignment of protein profiles was developed using MEGA 6.1 software to verify the accuracy of the alignment. The ClustalW program was used to perform multiple alignments of sequences.

Amino acid composition

MEGA 11 examined the catalase-encoding amino acid composition where all species’ individual amino acid frequencies were retrieved (

Phylogenetic tree construction

To better understand the evolutionary relationships between plant species, catalase phylogenetic trees were constructed with MEGA6 software, and the visualization of phylogenetic tree patterns was performed using the neighbor-joining (NJ) method or UPGMA [16].

Motifs search and domain discovery

The analysis of motifs was done using the MEME tool (, which was also used to search their protein family using the NCBI conserved domain database (CDD) ( The biological activities of conserved protein motif data collected by MEME were analyzed using BLAST, and domains were assessed using InterProScan by offering the most significant possible match of sequences based on their highest similarity score [17].

Prediction of secondary structure

Secondary structures have a direct impact on how proteins fold and deform. This is how various amino acid sequences of plant catalase form helixes, sheets, and turns in the molecule. SOPMA (self-optimized prediction method with alignment) was used to predict the secondary structure of different plant catalases [18]. It is a self-optimized homologous tool based on Levin and his colleagues [19].

Comparative 3D modeling

A query protein sequence from each cluster group generated from a phylogenetic tree of plant catalase was analyzed, and comparative homology modeling was performed using the SWISS-MODEL ( [20], based on automated comparative 3D modeling of protein structures.

Model evaluation

The most crucial step in homology modeling is model evaluation, which demonstrates that the modeled protein is of acceptable quality. Here, the predicted CAT model was evaluated and verified by the ERRAT value [21], Verify3D score [22], and PROCHECK [23] programs available from the SAVES server ( The quality of the predicted model was evaluated by Ramachandran plot assessment.

Protein-protein interaction

STRING v10.0 ( server was used to determine the catalase interaction of Arabidopsis thaliana with other closely related proteins. The query sequence was Arabidopsis thaliana with accession number CAA45564.1, and a functional protein association network was created [24].


Retrieval of sequences

The protein sequences of many enzymes like peroxidases [25,26,27], pectinases, proteases [28], lipases [29], phytases, polyphenol oxidases [15], and cellulases [29] have been assessed and analyzed using bioinformatics tools. The current study used various bioinformatic tools to analyze the protein sequences of industrially important enzyme catalases from various plant sources. Around 150 catalase protein sequences from various plant sources were initially retrieved from NCBI using the BLAST method. From there, sequences with more than 70% similarity were selected where only 65 sequences were computationally evaluated based on full-length protein sequences (see Table 1). The diversity of plant sources for catalases was observed and found the largest for Oryza sativa, with 11 accession numbers forming the main group. Oryza sativa consists of four catalase genes OsCATA, OsCATB, OsCATC, and OsCATD [30], with functional variations under various abiotic stress conditions. Multiple accessions of the same catalase source help us gain insight into the structural and functional diversity of enzymatic proteins.

Physicochemical characterization

ProtParam was used to elucidate several physiochemical properties of the sequences. The amino acid residue variability in the 65 catalase protein sequences studied ranged from 90 to 533. The molecular weights varied between 10,322.46 and 61,366.87 daltons, while the pI values varied between 4.53 and 7.95. Most catalases had pI ranging from 5 to 7, while AAF34718 of Capsicum annuum has the pI value of 7.11, and the Oryza family placed in group F of the phylogenetic tree showed pI ranging from 4 to 5. Other physicochemical characteristics such as instability index, aliphatic index, and hydropathicity (GRAVY) were also variable for these CAT proteins. The aliphatic index measures the relative volume filled by the aliphatic side chain of amino acids such as alanine, valine, leucine, and isoleucine and provides information on the thermostability of globular proteins. It may be seen positively in increasing the thermostability of globular proteins. The following formula is used to determine the aliphatic index [31].

$$\mathrm{Aliphatic}\ \mathrm{index}=\mathrm{X}\ \left(\mathrm{Ala}\right)+\mathrm{a}\times \mathrm{X}\ \left(\mathrm{Val}\right)+\mathrm{b}\times \left(\mathrm{X}\ \left(\mathrm{Ile}\right)+\mathrm{X}\ \left(\mathrm{Leu}\right)\right)$$

The coefficients a and b are the relative volume of valine side chain (a = 2.9) and of Leu/Ile side chains (b = 3.9) to the side chain of alanine.

Plant catalases are assumed to be thermostable based on the data shown in Table 2. The instability index represents the in vivo half-life of a protein, and a number greater than 40 suggests a half-life of less than 5 h, while a value less than 40 indicates a half-life of more than 16 h. It also estimates the stability of the protein molecule [32, 33]. Most plant catalases have an instability index of less than 40, except a few that belong to the Oryza, Capsicum annuum, and Brassica juncea families. The hydrophobicity value of a peptide is represented by the grand average hydropathicity index (GRAVY), which is calculated as the sum of the hydropathy values of all amino acids divided by the sequence length, revealing that the negative value of the obtained plant proteins is hydrophilic.

Table 2 Physiochemical characterization of protein sequences of plant catalases as revealed by ProtParam

Assessment of phylogenetic tree and MSA

The phylogenetic tree revealed six unique clusters labeled A, B, C, D, E, and F, each of which had 4, 22, 12, 5, 7, and 15 protein sequences are shown in Fig. 1. Multiple accessions belonging to the same genus were grouped, suggesting similarity at the sequence level, except for the Oryza sativa protein sequence was distributed in both groups D and F. The phylogenetic analysis provides a depth understanding of how species evolve due to genetic alterations. Scientists can use phylogenetics to examine the path that connects a modern plant CAT organism to its ancestral origin and anticipate future genetic divergence. It can also be helpful in comparative genomics, which analyzes the relationship between genomes of different species by gene prediction or discovery, locating specific genetic regions along a genome [34,35,36]. Before building the phylogenetic tree, the alignment of multiple sequences is shown in Fig. 2, revealing the degree of homology between the sequences from different plant sources. This information could be used to synthesize a specific catalase probe or primer that would serve as a marker to remove putative genes from sequenced plant strains. The advancement in the comparative genomic study of proteins provides a detailed understanding of functional genes within and between plant species, providing clear evidence for evolution research and gene function hypotheses of plant catalase [37].

Fig. 1
figure 1

Construction of phylogenetic tree of protein sequences of plant catalases using NJ method. The unique clusters A, B, C, D, E, and F are highlighted, consisting of 4, 22, 12, 5, 7, and 15 members, respectively

Fig. 2
figure 2

Multiple sequence alignment of distinct clusters A, B, C, D, E, and F of plant catalases

Motifs and domain identification

The structure and functional complexity of enzymes can be predicted and assessed using attributes such as sequence and function order features, domains, and motifs. Sequence motifs identified by protein sequence analysis can be used as signature sequences for targeted enzymes to determine their putative functions [38,39,40]. The distribution of 5 sequence motifs among 65 plant catalases was analyzed, uniformly distributed with a width length of 50 with the best possible amino residue sequences, as shown in Table 3. When these motifs were subjected to BLAST, they resembled the plant catalase superfamily PLN02609.

Table 3 The five motifs with best match possible amino acid sequences with their respective domain

Amino acid composition

MEGA 11 was used to compute the composition of the amino acid sequences individually. The average amino acid composition was highest for proline at 7.38%, followed by aspartate (7.12%) given in Table 4, suggesting significant conformational rigidity of the secondary structure of the protein due to the distinctive cyclic structure of the proline side chain [41].

Table 4 Amino acid composition (%) of CAT protein from different plant sources

Prediction of secondary structure

Predicting the secondary structure of proteins is critical to understanding protein folding in three dimensions. The secondary structure is predicted using the primary protein sequence [42]. Using SOPMA, the predicted secondary structure of protein sequences revealed the predominance of random coils with more than 40% except for a few sequences such as Capsicum annuum, Solanum melongena, Solanum lycopersicum, Oryza meridionalis, Oryza rufipogon, Oryza glaberrima, and Oryza barthii, which had extended arms in the majority. The alpha helix and beta turn found the highest repeats in Populus deltoides and Oryza sativa, as given in Table 5.

Table 5 Secondary structure prediction of plant catalases using SOPMA

Comparative homology modeling and its functional analysis

To predict the 3D structure, a well-known template sequence is required, similar to the query sequence. A single organism from each cluster was selected, as shown in Table 6, and homology modeling of the 3D protein structure was carried out, where Arabidopsis thaliana was found as the query sequence to have the highest sequence identity and the GMQE score. The 3D structure was built by SWISS-MODEL using template 4qol.1.A Bacillus pumilus catalase by extrapolating experimental data from an evolutionarily related protein structure that serves as a template in Fig. 3, and the quality estimation of the predicted model is shown in Fig. 4a. The template’s sequence identity was 53.8% compared to the query sequence, the QMEAN score was −1.44, the GMQE value at 0.81 values, and the predicted model’s oligo state was homotetramer with 1.65 A resolution [43]. As part of the evaluation and validation process, the predicted protein model of the query sequence (in. PDB format) was uploaded to many servers. The Ramachandran plot analysis showed that 89.8% resided in the most favored (red) regions, while 10.1% fell into the additional allowed (brown) regions and 0.4% in the generously allowed regions, validating the quality of the modeled structure given in Fig. 5.

Table 6 Characterization of selected organism modeling from each cluster evaluated by SWISS-MODEL
Fig. 3
figure 3

Predicted protein model of catalase enzyme of Arabidopsis thaliana showing distinct four homo-tetrameric chains

Fig. 4
figure 4

Predicted protein model quality estimation by SWISS-MODEL

Fig. 5
figure 5

Ramachandran plot of predicted CAT model from Arabidopsis thaliana generated from PROCHECK. Residues in most favored regions (A, B, L)—89.8%. Residues in additional allowed regions (a, b, l, p)—10.1%. Residues in generously allowed regions (~a, ~b, ~l, ~p)—0.4%. Residues in disallowed regions—0.4%

The overall G factor of dihedral angles and covalent forces was −0.16, higher than the allowable threshold of −0.5. A high G factor indicates that a stereochemical characteristic correlates with a high probability of conformation [44, 45]. The predicted model was submitted to the SAVES server. ERRAT plots were used to examine the protein model’s atom distribution with one another and to make decisions regarding the model’s reliability when evaluating the amino acid environment. The overall quality factor of ERRAT was 92.5, indicating a slightly negligible value of the individual residues (Fig. 6). The Verify3D suggested that the CAT model has at least 80% of amino acids with a score > = 0.2 in the 3D/1D profile, while the average residue was around 70.2%, suggesting the compatibility of the predicted model with its amino acid residues [46]. The QMEAN Z-score in Fig. 4b and c was −1.4, which was in the expected range of 0.0 to −2.0, representing a well-defined structure [47]. The cellular machinery is built on a foundation of proteins and their functional relationships. It is necessary to consider a network of webs between organisms to understand biological phenomena. The STRING analysis revealed ten predicted interacting partners of query CAT protein from the organism Arabidopsis thaliana (accession number CAA45564.1), which encodes peroxisomal catalase and revealed glutathione reductase as the closest interacting protein with the shortest distance. On the contrary, ACX5 (putative peroxisomal acyl-coenzyme A oxidase) remained distant from the query protein (Figs. 7 and 8) [48].

Fig. 6
figure 6

ERRAT plot of Arabidopsis thaliana catalase model with overall quality factor 92.47

Fig. 7
figure 7

Map of the protein-protein interaction of Arabidopsis thaliana catalase protein

Fig. 8
figure 8

Predicted interacting protein partners of the query sequence from STRING server


Computational approaches have established themselves as a valuable complement to our understanding of the protein universe and its properties. In silico analysis is one of the most helpful tools that contributes significantly to computational biology for exploring the structural and functional properties of the protein. Hence, the study was conducted to explore the structural and functional properties of catalase enzymes from plants using different bioinformatics tools such as ProtParam, MEGA-X, SOPMA, SWISS-MODEL, and SAVES server. The Expasy tool revealed several physiochemical characteristics of the retrieved catalase sequences, each representing its unique behavior. The pH at which a protein does not have a net electrical charge and is considered neutral is known as its isoelectric or isoionic point [49]. In the development of buffer systems for purification and isoelectric focus, the prediction of pI is critical. The study suggested that the theoretical pI value of most plant catalases is acidic ranging from 5 to 7, but Capsicum annuum has an alkaline pI value of 7.11. The instability index of protein catalases ranged from 28.94 to 44.90, except for a few species of catalases having an index of more than 40 with accession number CAD42908, CAD42909 (Prunus persica), AAD17934, AAD17935, AAD17938 (Brassica juncea), KFK30147 (Arabis alpina), CAA85424 (Nicotiana plumbaginifolia), BAF91369, AAF34718 (Capsicum annuum), BAA81682, BAA81681 (Oryza glaberrima), and BAA81680 (Oryza barthii). The aliphatic index refers to the percentage of a protein’s total volume occupied by its hydrophobic aliphatic side chains. The heat stability of a protein depends on its aliphatic index. A higher aliphatic index means that proteins are better able to withstand high temperatures [50]. Catalases with an aliphatic index ranging from 65.66 to 75.55 have substantial amounts of hydrophobic amino acids and are very thermally stable. The hydrophilic nature of the plant catalases was observed with the GRAVY score. The GRAVY negative score indicates that the protein could be globular (hydrophilic) rather than membranous (hydrophobic). This information could aid in the identification of these proteins [51]. The phylogenetic tree analysis was constructed using the maximum likelihood method to show evolutionary relationships among plant catalases. The distribution of Oryza sativa in different clusters C, D, and F revealed its genetic diversity and similarity with Festuca arundinacea and Saccharum spontaneum. Using a Pfam database search and NCBI/CDD-BLAST, the proteins were categorized into specific families based on the presence of a specific domain of their sequences. The NCBI BLAST designated the PLN02609 superfamily for catalase proteins with conserved domains. Overlapping annotations on the same protein sequences are generated by a superfamily, which is a collection of conserved models that have evolutionary domains. Protein secondary structure prediction from sequences is regarded as a link between the prediction of primary and tertiary structures [52]. Based on catalase secondary structure prediction, it was revealed the predominance of random coils followed by alpha helix in most of the catalases [3], which is highly similar to the results of CAT1 genes of PgCAT1, Soldanella alpina, and Gossypium hirsutum [7]. Random coils are irregular secondary arrangements found in the N and C terminal arms and loops of the protein structure occur because of electrostatic repulsion and steric hindrance of bulky adjacent residues such as isoleucine or charged residues such as glutamic acid or aspartic acid. In a random coil state, the average conformation of each amino acid residue is independent of the conformations of all residues other than those immediately proximal in the primary structure [53]. The amino acid composition of plant catalases revealed the highest proline content, which could explain the predominant coiled structural content. Proline has the unique ability to cause coiling by disrupting secondary conformations by causing kinks in polypeptide chains [54]. In silico prediction of a 3D model of a protein is a difficult element of correlating data received from NMR or crystallography-based approaches [48]. The query sequence (CAA45564) was blasted against PDB to find the best template. The highest sequence identity of 53.8% with negative QMEAN value and GMQE score suggested the template selection 4qol.1.A of Bacillus pumilus catalase. The validation of the predicted structure was performed by computational tools where 89.8% favored region of Ramachandran plot implied good quality of the model. The SAVES server tools ERRAT, Verify3D, and QMEAN Z-scores suggested a well-defined protein structure. The functional relationships of our query sequence revealed the glutathione reductase as the closest interacting protein with the shortest distance, which may be associated with the overlapping of its functional roles in the metabolic pathway [55].


In silico analysis of plant catalase protein provides insight into the numerous catalytic sites, allowing for possible manipulation of desirable qualities relevant to various sectors. Phylogenetic analysis revealed the similarity of various plant catalases, elucidating how species evolve genetically. Scientists can use phylogenetics to determine the genetic link between a modern organism and its ancestral origin and anticipate future genetic divergence. Numerous conserved amino acid residues among distinct clusters may allow for developing particular probes or markers that reflect source species from a specific taxon. Secondary structure analysis confirmed the predominance of a random coil followed by an alpha helix, an extended strand, and a beta turn. Plant catalases had the highest proline content in their amino acid composition, which could explain their coiled structural content. Proline has the unique ability to cause coiling in polypeptide chains by disrupting secondary conformations. The predicted 3D CAT model from Arabidopsis thaliana was a homotetramer, thermostable protein with 59-KDa weight, and its structural validation was confirmed by PROCHECK, ERRAT, Verify3D, and Ramachandran plot. In silico protein structure analysis is an extremely valuable technique for exploring protein structure-function relationships when crystal structures are unavailable. It can also help predict ligand-receptor interactions, enzyme-substrate interactions, mutagenesis experiments, SAR data, and loop structure prediction. While these studies build a robust foundation for wet-lab experimentation, they also provide a strong framework for looking at novel sources utilizing metagenomics approaches and directed evolution to incorporate desired functional qualities.

Availability of data and materials

Not applicable


  1. Su Y, Guo J, Ling H, Chen S, Wang S, Xu L, Allan AC, Que Y (2014) Isolation of a novel peroxisomal catalase gene from sugarcane, which is responsive to biotic and abiotic stresses. PLoS One 9(1):1–11.

    Article  Google Scholar 

  2. Takio N, Yadav M, Yadav HS (2021) Catalase-mediated remediation of environmental pollutants and potential application – a review. Biocatal Biotransform 39(6):389–407.

    Article  Google Scholar 

  3. Ashokan KV, Mundaganur DS, Mundaganur YD (2011) Catalase: phylogenetic characterization to explore protein cluster. J Res Bioinform 1:001–008

    Google Scholar 

  4. Garcia R, Kaid N, Vignaud C, Nicolas J (2000) Purification and some properties of catalase from wheat germ (Triticum aestivum L.). J Agric Food Chem 48:1050–1057.

    Article  Google Scholar 

  5. Keyham J, Keyhani E, Kamali J (2002) Thermal stability of catalases active in dormant saffron corms. Mol Rep 29(1-2):125–128.

    Article  Google Scholar 

  6. Lee SH, An CS (2005) Differential expression of three catalase genes in hot pepper. Mol Cell 20(2):247–255

    Google Scholar 

  7. Purev M, Kim YJ, Kim MK, Pulla RK, Yang DC (2010) Isolation of a novel catalase (Cat1) gene from Panax ginseng and analysis of the response of this gene to various stresses. Plant Physiol Biochem 48(6):451–460.

    Article  Google Scholar 

  8. Chen HJ, Wu SD, Huang GJ, Shen CY, Afiyanti M, Li WJ, Lin YH (2012) Expression of a cloned sweet potato catalase SPCAT1 alleviates ethephon-mediated leaf senescence and H2O2 elevation. J Plant Physiol 169(1):86–97.

    Article  Google Scholar 

  9. Du Y, Wang P, Chen J, Song C (2008) Comprehensive functional analysis of the catalase gene family in Arabidopsis Thaliana. J Integr Plant Biol 50(10):1318–1326.

    Article  Google Scholar 

  10. Guan Z, Chai T, Zhang Y, Xu J, Wei W (2009) Enhancement of Cd tolerance in transgenic tobacco plants overexpressing a Cd-induced catalase CDNA. Chemosphere 76(5):623–630.

    Article  Google Scholar 

  11. Sheoran SB, Pandey P, Sharma S, Narwal R et al (2013) Insilico comparative analysis and expression profile of antioxidant proteins in plants. Genet Mol Res 12(1):537–551

    Article  Google Scholar 

  12. Hoseinian GA, Ghaemi N, Rahimi F (2006) Partial purification and properties of catalase from Brassiaoleracea capitata. Asian J Plant Sci 5:827–831

    Article  Google Scholar 

  13. Linka B, Szakonyi G, Petkovits T, Nagy LG, Papp T, Vágvölgyi C, Benyhe S, Ötvös F (2012) Homology modeling and phylogenetic relationships of catalases of an opportunistic pathogen Rhizopus Oryzae. Life Sci 91(3–4):115–126.

    Article  Google Scholar 

  14. Lai J, Jin J, Kubelka J, Liberles DA (2012) A phylogenetic analysis of normal modes evolution in enzymes and its relationship to enzyme function. J Mol Biol 422(3):442–459.

    Article  Google Scholar 

  15. Yadav M, Yadav S, Yadav D, Yadav K (2017) In-silico analysis of manganese peroxidases from different fungal sources. Curr Proteomics 14(3):1–13.

    Article  MATH  Google Scholar 

  16. Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4(4):406–425.

    Article  Google Scholar 

  17. Bailey TL, Boden M, Buske FA, Frith M, Grant CE, Clementi L, Ren J, Li WW, Noble WS (2009) MEME Suite: tools for motif discovery and searching. Nucleic Acids Res 37:202–208.

    Article  Google Scholar 

  18. Geourjon C, Deléage G (1995) SOPMA: significant improvements in protein secondary structure prediction by consensus prediction from multiple alignments. Comput Appl Biosci 11(6):681–684.

    Article  Google Scholar 

  19. Levin JM, Robson B, Garnier J (1986) An algorithm for secondary structure determination in proteins based on sequence similarity. FEBS Lett 205(2):303–308.

    Article  Google Scholar 

  20. Arnold K, Bordoli L, Kopp J, Schwede T (2006) The SWISS-MODEL workspace: a web-based environment for protein structure homology modelling. Bioinformatics 22(2):195–201.

    Article  Google Scholar 

  21. MacArthur MW, Laskowski RA, Thornton JM (1994) Knowledge-based validation of protein structure coordinates derived by X-ray crystallography and NMR spectroscopy. Curr Opin Struct Biol 4(5):731–737.

    Article  Google Scholar 

  22. Eisenberg D, Lüthy R, Bowie JU (1997) VERIFY3D: assessment of protein models with three-dimensional profiles. Methods Enzymol 277:396–404.

    Article  Google Scholar 

  23. Laskowski RA, MacArthur MW, Moss DS, Thornton JM (1993) PROCHECK: a program to check the stereochemical quality of protein structures. J Appl Crystallogr 26(2):283–291.

    Article  Google Scholar 

  24. Szklarczyk D, Franceschini A, Wyder A et al (2015) STRING v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res 43:D447–D452.

    Article  Google Scholar 

  25. Purwar S, Gupta A, Vajpayee G, Sundaram S (2014) Isolation and in-silico characterization of peroxidase isoenzymes from wheat (Triticum aestivum) against Karnal Bunt (Tilletia indica). Bioinformation 10(2):87.

    Article  Google Scholar 

  26. Mathé C, Fawal N, Roux C, Dunand C (2019) In silico definition of new ligninolytic peroxidase sub-classes in fungi and putative relation to fungal life style. Sci Rep 9(1):1–14.

    Article  Google Scholar 

  27. Singh AK, Katari SK, Umamaheswari A, Raj A (2021) In silico exploration of lignin peroxidase for unraveling the degradation mechanism employing lignin model compounds. RSC Adv 11(24):14632–14653.

    Article  Google Scholar 

  28. Morya VK, Yadav VK, Yadav S, Yadav D (2016) Active site characterization of proteases sequences from different species of Aspergillus. Cell Biochem Biophys 74:327–335.

    Article  Google Scholar 

  29. Hoda A, Tafaj M, Sallaku E (2021) In silico structural, functional and phylogenetic analyses of cellulase from Ruminococcus Albus. J Genet Eng Biotechnol 19(1):58.

    Article  Google Scholar 

  30. Alam NB, Ghosh A (2018) Comprehensive analysis and transcript profiling of Arabidopsis thaliana and Oryza sativa catalase gene family suggests their specific roles in development and stress responses. Plant Physiol Biochem 123:54–64.

    Article  Google Scholar 

  31. Ikai A (1980) Thermostability and aliphatic index of globular proteins. J Biochem 88(6):1895–1898.

    Article  Google Scholar 

  32. Kaur A, Pati PK, Pati AM, Nagpal AK (2020) Physico-chemical characterization and topological analysis of pathogenesis-related proteins from Arabidopsis thaliana and Oryza sativa using in-silico approaches. PLoS One 5:1–15.

    Article  Google Scholar 

  33. Gamage DG, Gunaratne A, Periyannan GR, Russell TG (2019) Applicability of instability index for in vitro protein stability prediction. Protein Pept Lett 26(5):339–347

    Article  Google Scholar 

  34. Huson DH, Bryant D (2006) Application of phylogenetic networks in evolutionary studies. Mol Biol Evol 23(2):254–267.

    Article  Google Scholar 

  35. Huelsenbeck JP, Bollback JP (2008) Application of the likelihood function in phylogenetic analysis. In: Handbok of Statistical Genetics, vol 1, 3rd edn, pp 460–488.

    Chapter  Google Scholar 

  36. Alam MT, Merlo ME, Takano E, Breitling R (2010) Genome-based phylogenetic analysis of Streptomyces and its relatives. Mol Phylogenet Evol 54(3):763–772.

    Article  Google Scholar 

  37. Ong Q, Nguyen P, Phuong Thao N, Le L (2016) Bioinformatics approach in plant genomic research. Curr Genomics 17(4):368–378

    Article  Google Scholar 

  38. Smeets HJM, Brunner HG, Ropers HH, Wieringa B (1989) Use of variable simple sequence motifs as genetic markers: application to study of myotonic dystrophy. Hum Genet 83(3):245–251.

    Article  Google Scholar 

  39. Nettling M, Treutler H, Grau J, Keilwagen J, Posch S, Grosse I (2015) DiffLogo: a comparative visualization of sequence motifs. BMC Bioinformatics 16(1):1–9.

    Article  Google Scholar 

  40. Bork P, Koonin EV (1996) Protein sequence motifs. Curr Opin Struct Biol 6(3):366–376.

    Article  Google Scholar 

  41. Morris AL, MacArthur MW, Hutchinson E.G., Thornton J.M. (1992) Stereochemical quality of protein structure coordinates. Proteins: Struct. Funct. Genet. 12(4):345-364.

  42. Mugilan A, Ajitha MC, Devi, Thinagar (2010) Insilico secondary structure prediction method (Kalasalingam University Structure Prediction Method) using comparative analysis. Trends Bioinformatics 3(1):11–19.

    Article  Google Scholar 

  43. Waterhouse A, Bertoni M, Bienert S, Studer G, Tauriello G, Gumienny R, Heer FT, De Beer TAP, Rempfer C, Bordoli L, Lepore R, Schwede T (2018) SWISS-MODEL: homology modelling of protein structures and complexes. Nucleic Acids Res 46(W1):W296–W303.

    Article  Google Scholar 

  44. Kikuchi O (1978) A molecular orbital study of the conformation and g-factors of the HSSH− radical anion. Bull Chem Soc Jpn 51(1):315–316.

    Article  Google Scholar 

  45. Tran NT, Jakovlić I, Wang WM (2015) In silico characterisation, homology modelling and structure-based functional annotation of blunt snout bream (Megalobrama amblycephala) Hsp70 and Hsc70 proteins. J Anim Sci Technol 57(1):1–9.

    Article  Google Scholar 

  46. Aslanzadeh V, Ghaderian M (2012) Homology modeling and functional characterization of PR-1a protein of Hordeum vulgare subsp. Vulgare. Bioinformation 8(17):807.

    Article  Google Scholar 

  47. Messaoudi A, Belguith H, Ben Hamida J (2013) Homology modeling and virtual screening approaches to identify potent inhibitors of VEB-1 β-lactamase. Theor Biol Med Model 10(1):1–0.

    Article  Google Scholar 

  48. Pramanik K, Ghosh PK, Ray S, Sarkar A, Mitra S, Maiti TK (2017) An in silico structural, functional and phylogenetic analysis with three-dimensional protein modeling of alkaline phosphatase enzyme of Pseudomonas aeruginosa. J Genet Eng Biotechnol 15(2):527–537.

    Article  Google Scholar 

  49. Hoda A, Tafaj M, Sallaku E (2021) In silico structural, functional and phylogenetic analysis of cellulase from Ruminococcus albus. J Genet Eng Biotechnol 19:58.

    Article  Google Scholar 

  50. Panda S, Chandra G (2012) Physicochemical characterization and functional analysis of some snake venom toxin proteins and related non-toxin proteins of other chordates. Bioinformation 8(18):891–896

    Article  Google Scholar 

  51. Enany S (2014) Structural and functional analysis of hypothetical and conserved proteins of Clostridium tetani. J Infect Public Health 7:296–307

    Article  Google Scholar 

  52. Zhang B, Li J, Lü Q (2018) Prediction of 8-state protein secondary structures by a novel deep learning architecture. BMC Bioinformatics 19:293

    Article  Google Scholar 

  53. Rodwell VW, Kenelly PJ, Bender D, Botham K, Weil PA (2018) Harper’s Illustrated Biochemistry 31/e. McGraw-Hill Education McGraw-Hill Companies, New York, Blacklick

    Google Scholar 

  54. Krieger F, Moglich A, Kiefhaber T (2005) Effect of proline and glycine residues on dynamics and barriers of loop formation in polypeptide chains. J Am Chem Soc 127:3346–3352.

    Article  Google Scholar 

  55. Damian S, Annika LG, David L, Alexander J, Stefan W, Jaime HC, Milan S, Nadezhda TD, John HM, Peer B, Lars JJ, Christian VM (2019) STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res 47(D1):D607–D613.

    Article  Google Scholar 

Download references


The Department of Chemistry, NERIST, is highly acknowledged for providing necessary facilities.


Not applicable

Author information

Authors and Affiliations



TN carried out the phylogenetic studies and modeling and drafted the manuscript. MY analyzed and interpreted the data. HSY conceived the study and designing. The authors read and approved the final manuscript.

Corresponding authors

Correspondence to Takio Nene or Meera Yadav.

Ethics declarations

Ethics approval and consent to participate

Not applicable

Consent for publication

Not applicable

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Nene, T., Yadav, M. & Yadav, H.S. Plant catalase in silico characterization and phylogenetic analysis with structural modeling. J Genet Eng Biotechnol 20, 125 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Catalase
  • Phylogenetic
  • Homology modeling
  • Thermostable
  • In silico