In silico characterization and structural modeling of bacterial metalloprotease of family M4

Background The M4 family of metalloproteases is comprised of a large number of zinc-containing metalloproteases. A large number of these enzymes are important virulence factors of pathogenic bacteria and therefore potential drug targets. Whereas some enzymes have potential for biotechnological applications, the M4 family of metalloproteases is known almost exclusively from bacteria. The aim of the study was to identify the structure and properties of M4 metalloprotease proteins. Results A total of 31 protein sequences of M4 metalloprotease retrieved from UniProt representing different species of bacteria have been characterized for various physiochemical properties. They were thermostable, hydrophillic protein of a molecular mass ranging from 38 to 66 KDa. Correlation on the basis of both enzymes and respective genes has also been studied by phylogenetic tree. B. cereus M4 metalloprotease (PDB ID: 1NPC) was selected as a representative species for secondary and tertiary structures among the M4 metalloprotease proteins. The secondary structure displaying 11 helices (H1-H11) is involved in 15 helix-helix interactions, while 4 β-sheet motifs composed of 15 β-strands in PDBsum. Possible disulfide bridges were absent in most of the cases. The tertiary structure of B. cereus M4 metalloprotease was validated by QMEAN4 and SAVES server (Ramachandran plot, verify 3D, and ERRAT) which proved the stability, reliability, and consistency of the tertiary structure of the protein. Functional analysis was done in terms of membrane protein topology, disease-causing region prediction, proteolytic cleavage sites prediction, and network generation. Transmembrane helix prediction showed absence of transmembrane helix in protein. Protein-protein interaction networks demonstrated that bacillolysin of B. cereus interacted with ten other proteins in a high confidence score. Five disorder regions were identified. Active sites analysis showed the zinc-binding residues—His-143, His-147, and Glu-167, with Glu-144 acting as the catalytic residues. Conclusion Moreover, this theoretical overview will help researchers to get a details idea about the protein structure and it may also help to design enzymes with desirable characteristics for exploiting them at industrial level or potential drug targets. Supplementary Information The online version contains supplementary material available at 10.1186/s43141-020-00105-y.


Background
Proteases are enzymes that can hydrolyze proteins and are composed of a diverse group of exoproteases and endoproteases depending on their activity. Based on their catalytic mechanism, endoproteases are divided into aspartic proteases, cysteine proteases, metalloproteases, serine proteases, and threonine proteases [1]. Proteases that contain one or two divalent metal ions in their active sites are known as metalloproteases. While most of the metalloproteases contain Zn 2+ , in some cases, Ca 2+ Mg 2+ , Ni 2+ , or Cu 2+ are also found [2]. The role of the catalytic metal ions in metalloproteases is to activate the water molecule, which serves as a nucleophile in catalysis. Metalloproteases are produced by all species of plants, animals, and microorganisms. They are involved in many biological processes such as embryonic development, morphogenesis, processing of peptide hormones, release of cytokines and growth factors, cell-cell fusion, cell adhesion and migration, intestinal absorption of nutrients, viral polyprotein processing, bacterial cell wall biosynthesis, and metabolism of antibiotics [3]. Due to their active relation with many diseases, extracellular metalloproteases have been widely studied [4].
All the well-characterized proteinases to date belong to one or more family in MEROPS database (http:// MEROPS.sanger.ac.uk/). The current MEROPS database (release 12.1) classifies metallopeptidases into 76 families, which are grouped into sixteen clans based on metal ion binding motifs and similarities to their 3-D structure. The proteases in the M4 family belong to clan MA, a big family of metalloproteases that degrade extracellular proteins and peptides for bacterial nutrition. Metalloproteases of the family M4 comprise different types of peptidases, thermolysin, vibriolysin, pseudolysin, coccolysin, aureolysin, vimelysin, lambda toxin, bacillolysin, stearolysin, gelatinase, elastase, etc.
Some of the metalloproteases are widely used in the food, medicine, brewing, leather, film, and baking industries. Thermolysin from Bacillus thermoproteolyticus has diverse industrial usage. Thermolysin is used as a peptide and ester synthetase in the production of Ncarbobenzoxy-L-aspartyl-L-phenylalaninemethyl ester (Z-Asp-Phe-OMe), the precursor to the artificial sweetener aspartame [5][6][7]. Thermolysin is also used in biotechnology industry as a non-specific proteinase to obtain fragments for peptide sequencing. Thermozymes, enzymes from thermophilic microorganisms, have unique characteristics such as extreme temperature persistence, high stability in organic solvents, strict substrate specificity, and pH stability. For these features, thermozymes have been considerably used in many industrial applications [8][9][10][11]. Vimelysin from Vibrio str.T1800 has pertinence in peptide condensation reactions because of its high activity in organic solvents [12]. In addition, vibriolysin from Vibrio proteolyticus are utilized in several industrial as well as biomedical applications [10,11]. It mediates the coupling of N-protected aspartic acid and phenylalanine methyl ester to yield N-protected aspartylphenylalaninemethylester, a precursor of the sweetener aspartame, whereas a new metalloprotease of the M4 family, VP9, was identified in Vibrio pomeroyi strain 12613 from Atlantic seawater that was able to hydrolyze casein and gelatin [13]. Neutrase from B. subtilis was began to use in industrial sector in 1995 for the synthesis of Celite-545 and followed by 1997 for the synthesis of Polyamide-PA6 [14,15]. Another metalloprotease of M4 family, pseudolysin from P. aeruginosa, can be developed for peptide synthesis, which has been demonstrated to be a suitable catalyst for peptide bond formation through reverse proteolysis [16]. M4 metalloprotease obtainable from Actinobacteria is used for wort production [17]. Thus, proteinases of the M4 family have a huge potential for industrial context and have also been found to be a useful catalyst in protein engineering [18,19].
Members of the M4 family that are considered virulence factors of pathogens can also be used as targets in drug and vaccine development [20,21]. Lambda toxin of Clostridium perfringens activates the precursors of clostridial potent toxins and degrades various host proteins that contribute to innate or adaptive immune defense against infections [22]. Hemagglutinin from Vibrio cholerae are the causative agents for gastritis, peptic ulcer, gastric carcinoma [23] and cholera. It has also been shown to affect intracellular tight junctions by degrading occluding [24]. In addition, vibriolysin from V. proteolyticus is used for the removal of necrotic tissue from wounds such as burns or cutaneous ulcers and is reported to stimulate the healing of partial-thickness burn wounds [25]. The peptidase from Legionella may have role in the virulence of Legionnaire's disease and pneumonia [26] as it cleaves α 1 -antitrypsin [27], tumor necrosis factor α, interleukin 2, and CD4 on human T cell surfaces [28]. Pseudolysin, an extracellullar elastase of Pseudomonas aeruginosa, is involved in chronic ulcers by degradation of human wound fluids and human skin proteins [29]. Again, Pseudomonas aeruginosa produces elastase B in the hemolymph after infection of larval silkworm contributes to the growth of P. aeruginosa in the silkworm and pathogenicity of P. aeruginosa to the host [30]. Based on the current knowledge, it is reasonable to believe that particular metalloproteinases associated with human pathogens have been recognized as prominent virulence factors and their therapeutic inhibition has become a novel strategy in the development of secondgeneration antibiotics [31,32].
For a successful integration of M4 metalloprotease in large-scale industrial processes and therapeutic use, a detailed understanding of the enzyme is prerequisite.
The present study was aimed to be utilize in silico tools for the characterization of M4 metalloprotease from different bacterial species for their physicochemical characteristics; primary, secondary, and tertiary structure of proteins; functional analysis; domains and motifs; protein model; and phylogenetic analysis.

Sequence retrieval and alignment
A total of 31 different M4 metalloprotease sequences of bacterial origin have been retrieved from UniProt (https://www.uniprot.org/). Corresponding gene sequences of 31 bacterial M4 metalloprotease proteins were retrieved from NCBI (https://www.ncbi.nlm.nih. gov/). The UniProtKB of protein sequences, accession numbers of the gene sequences along with the source organisms were listed in Supplementary Table 1. Clustal Omega (https://www.ebi.ac.uk/Tools/msa/clustalo/) [33] algorithm was used for the alignment of retrieved protein sequences through multiple sequence alignments and the alignments were inspected using CLC sequence viewer 8.0 (http://www.clcbio.com).

Determination of physical parameters of the proteins
The different physicochemical properties of M4 metalloprotease enzymes were computed using ExPASy'sProt-Param tool (http://web.expasy.org/protparam) [34] and these properties were deduced from a protein sequence. The ProtParam includes the following computed parameters: molecular weight, isoelectric point (pI), extinction coefficient (EC-quantitative study of protein-protein and protein-ligand interactions), instability index (IIstability of proteins), aliphatic index (AI-relative volume of protein occupied by aliphatic side chains), and Grand Average of Hydropathicities (GRAVY-sum of all hydropathicity values of all amino acids divided by number of residues in a sequence).

Phylogenetic tree construction
Two different phylogenetic trees were constructed from amino acid sequences and from gene sequences using the MEGAX software [35] to compare evolutionary relatedness of the taxa. The evolutionary history was inferred using the neighbor-joining method [36]. For amino acid sequence, the evolutionary distances were computed using the Poisson correction method [37] whereas for gene sequence the evolutionary distances were computed using the maximum composite likelihood method [38].

Primary structure analysis
For primary structure analysis, viz., the amino acids present in polypeptide chain, ExPASy-ProtParam tool had been used. For domain search, the Pfam site (http:// www.sanger.ac.uk/Software/Pfam/search.shtml) was used. Motif analysis was done using MEME (http:// meme.sdsc.edu/meme/meme.html) [39]. The conserved protein motifs deduced by MEME were subjected to biological functional analysis using protein BLAST and domains were studied with Interproscan (http://www.ebi. ac.uk/interpro/search/sequence/) providing the best possible match based on highest similarity score.

Tertiary structure analysis and validation
Among the 31 strains, B. cereus was selected as a representative of all the strains to predict the tertiary structure of M4 metalloprotease protein. SWISS-MODEL 3.1.0 (https://swissmodel.expasy.org/) was used to build the 3D models of B. cereus M4 metalloprotease sequence (PDB ID: 1NPC) with energy minimization parameters [41]. PyMOL (Schrödinger Inc.) was used to visualize publishable image of the model [42]. Structure evaluation was the most important component of structure prediction. Predicted protein model of M4 metalloprotease of B. cereus was evaluated and verified from both QMEAN (https://swissmodel.expasy.org/qmean/) and SAVES (https://servicesn.mbi.ucla.edu/SAVES/) server. Ramachandran plot generated from RAMPAGE server (http://mordred.bioc.cam.ac.uk/~rapper/rampage.php) [43]. Verify3D [44] and ERRAT [45] were evaluated from SAVES. The overall quality of the structure was obtained through Ramachandran plot. Verify3D analyzed the compatibility of an atomic model (3D) with its own amino acid sequence (1D) [46]. The verification of the crystallographic structure of proteins was done by Errate.

Functional analysis
Function prediction was done in terms of membrane protein topology, disease-causing region prediction, proteolytic cleavage sites prediction and network generation. TMHMM 2.0 tool (www.cbs.dtu.dk/services/ TMHMM) was used to understand membrane protein topology, more specifically if the protein was membrane spanning or extracellular in nature [47]. GlobPlot 2.3 (http://globplot.embl.de/) was used to identify regions of globularity and disorder within protein sequences. This web service looks for order/globularity or disorder tendency in the query protein based on a running sum of the propensity for an amino acid by searching domain databases and sets of disordered proteins [48]. Proteolytic cleavage sites were identified by using a web-based tool peptide cutter (http://web. expasy.org/peptide_cutter/) [34], which predicted the proteolytic cleavage sites and sites cleaved by chemicals in a given protein sequence. Identification of protein-protein interaction was carried out by STRI NG 11.0 (https://string-db.org/) [49]. STRING is a biological database which is used to construct protein-protein interaction network for different known and predicted protein interactions.

Active site prediction
The Computed Atlas of Surface Topography of proteins 3.0 (CASTp 3.0) server (http://sts.bioe.uic.edu/) [50] was used to predict active binding site pockets of protein. It includes annotated functional information of specific residues on the protein structure.

Sequence retrieval and alignment
The protein sequences of M4 metalloprotease enzymes belonging to different bacterial strains were retrieved from UniProt and FASTA format of these sequences have been selected based on the overall quality parameters in UniProt tool (Supplementary table 1). The homology search and multiple sequence alignment of these 31 M4 metalloprotease sequences revealed a little stretch of conserved region ranging from the amino acid residues 117-146, 304-323, 451-533, 542-579, 595-624, 652-667, and 705-720 as shown in Figure S1. Few highly conserved amino acids were also observed for most of the sequences. Twenty-two 100% conserved positions were found in aligned region comprising nonpolar amino acid, Ala, Leu, Gly, and Pro, Val; polar amino acid, Asn and Ser; aromatic amino acid, Tyr; acidic amino acid, Glu and Asp; and basic amino acid, Arg and His. Alignment of 31 M4 metalloproteases revealed a conserved region (HELTE) occurring between amino acid positions "542-546" in the region blocked with pink areas of Fig. 1.

Phylogenetic tree construction
To compare evolutionary relationship, two phylogenetic trees were constructed with MEGAX, one consisted of amino acid sequences of 31 bacterial M4 metalloprotease enzymes (Fig. 2a), and another one is their corresponding gene sequences (Fig. 2b). The horizontal branches represented evolutionary lineages changing over time. The longer the branch, the larger the amount of change. In Fig. 2a the optimal tree with the sum of branch length = 8.83019656 was shown. Here, amino acid sequences were distributed into two main clades and an outgroup. Dominant clade consisted of amino acid sequences from mostly gram-positive bacteria (except Streptomyces lividans TK24; D6EEH2) along with two gram-negative bacteria, Erwinia_carotovora_subsp._ carotovora; Q99132 and Serratia marcescens; Q06517 and were marked as blue (Fig. 2a). From this figure, it was observed that the two strains Erwinia_carotovora_ subsp._carotovora; Q99132 and Serratia marcescens; Q06517 clustered with Clostridium putrefaciens; A0A381J9B8 which revealed sequence level similarity of these protein sequences. Whereas amino acid sequences from rest gram-negative bacteria with Streptomyces lividans TK24, D6EEH2 exhibited high similarity index and they belonged to another clade and were represented as red ( Figure 2a). In Fig. 2b another phylogenetic tree was constructed to find out the relation among gene sequences of corresponding protein. The optimal tree with the sum of branch length = 20.01342277 was shown. Here, gene sequences of gram-positive bacteria (marked as blue) and gram-negative bacteria (marked as red) formed separate clusters signifying the sequence-based similarity. In both cases, the outgroup contained Renibacterium salmoninarum marked as pink.

Determination of physical parameters of the proteins
In confirmation of the uniqueness of any protein or enzyme molecule, characterization of the biochemical features of these molecules play the preliminary role [34]. The physicochemical features of protease sequences obtaining from ExPASy ProtParam were summarized in Table 1. The total number of amino acid residues ranged from 347 to 611 with variable molecular weights. The pI values of all the proteins showed broad range of 4.88-10. The variability was also observed among these proteins in terms of other physiochemical parameters like negative charge residues (Asp and Glu), positively charged amino acid residues (His, Arg, Lys), hydropathicity (GRAVY), and extinction coefficient (EC) which were listed in Table 1.

Primary sequence analysis
The primary structure analysis of M4 metalloprotease enzymes included amino acid distribution, motif, and domain analysis. The amino acid distribution was represented as a heatmap in Fig. 3 which showed that the most abundant amino acid was Ala and the least common amino acid was cysteine.
A total of 10 motifs were observed in 31 sequences when subjected to MEME and depicted in Fig. 4. The motifs with width and best possible match amino acid sequences were given in Table 2. A set of 41 amino acid residues, i.e., PSGSJDVVAHELTHGVTEQTAGLVYZNZSGAJNEAFSD IFG representing motif 1 was uniformly observed in all sequences revealing its identity with the peptidase_M4 and Peptidase_M4_C domains ( Table 2). The order of amino acid residues "HELTH" in this sequence was associated with the active site of the enzyme. Motif 8 was uniformly observed in most of the proteases sequences except Q99132 (Erwinia carotovora) and Q06517 (Serratia marcescens) represented a signal peptide FTP domain which prevented premature activation of proteases [51]. The region of motif 7 was likely to have a protease inhibitory function since it belonged to the pepSY domain [52]. In case of motifs 3, 5 and 9 belonging to Peptidase_M4 domain were associated with the catalysis of the enzyme. The motifs 2, 4, and 6, belonging to Peptidase_M4_C domain, were important in order to the presence of alpha helix related with the flexibility of protein conformation and protein function. Motif 10 was observed only in five species from genus vibrio (Q00971, V. proteolyticus; P43147, V. anguillarum; A8JNY9, V. aestuarianus; O06694, V. vulnificus; and P24153, V. cholerae) along with I2A62, Aeromonas hydrophila.

Secondary structure analysis
The predicted secondary structure composition of M4 metalloprotease was determined using the NPS@ server which generated a consensus report from twelve secondary structure prediction methods. The secondary structure prediction server revealed that the enzyme is dominated by 41.64% of amino acid in random coils along with 32.12% of amino acids resided in α-helices, while 20.36% of residues were in extended sheet. Finally, less amount of the amino acids was found in extended sheet region of 5.88% (Table 3). A more detailed analysis of the secondary structural elements was performed using the PDBsum tool. Here, the amino acid sequence of M4 metalloprotease from B. cereus was taken as template which is also known as Bacillolysin. The predicted secondary structures generated by the PDBsum tool in Fig. 5a were generally in an idea of the "structural coverage", how much the protein sequence of B. cereus M4 metalloprotease was actually represented by the 3D structure [53]. The secondary structure displaying 11 helices (H1-H11) involved in 15 helix-helix interactions, while 4 β-sheet motifs composed of 15 β-strands. According to the diagram, the catalytic site lid 143 HELT H 147 was located within helix H4. This was also observed in the 3D structure. The topology of B. cereus M4 metalloprotease was illustrated in Fig. 5b which showed the arrangement and connectivity of the helices and strands in protein. Where the protein chain consisted of two domains, a C-terminal domain and an N-terminal domain were both folded into a mixed α/β topology.
Disulfide bonds play an important role in folding and stabilizing the unfolded form of the protein by lowering the entropy. Possible disulfide linkages in the primary structure were determined using CYS_REC was represented in Table 3. In most of the cases disulfide bridges were absent, as the prevalence of cysteine residues were very poor (Fig. 3).

Tertiary structure analysis
The tertiary structure of M4 metalloprotease was generated with the SWISS-MODEL using B. cereus M4 metalloprotease (PDB ID: 1NPC) as a template and PyMOL was used to visualize the model (Fig. 6). The active side lid was shown as a pink loop, four Ca 2+ binding site was illustrated as pink and a Zn 2+ binding site was as blue. This model was further verified by QMEAN4 and SAVE S server. QMEAN PDB result was represented in Fig. 7   and depicted the proper folding of protein into a compact three-dimensional field. Ramachandran plot measured the accuracy of protein model and the results were narrated in Fig. 8a. The profile score above zero in the Verify3D graph correspond to the acceptable environment of the model, in Fig. 8b. ERRAT-verified protein structure and the result depicted in Fig. 8c.

Functional analysis
For functional analysis, the query sequence taken was the amino acid sequence of M4 metalloprotease from B. cereus (PDB ID: 1NPC). Here, transmembrane helix prediction analyzed by TMHMM server 2.0 showed that no transmembrane helix presents in the protein. 5 disorder regions were identified by GlobPlot and the regions were from amino acid number 1-29, 93-135, 186-235, 251-257, and 309-317. In Fig. 9, the bluecolored sections were disordered regions and greencolored regions were globular or ordered domains. Protease digestion is a useful process that is used to know proper metabolism, enzymatic digestion and simplification of high order protein structure. According to results from peptide cutter, there were several cleavage sites for 21 different digestive enzymes for the amino acid sequence of M4 metalloprotease from B. cereus. Table 4 summarized the results obtained by the peptide cutter tool which indicated that total numbers of cleavages were found to be 633.  The protein-protein interacting partners of bacillolysin from B. cereus was generated through STRING 11.0 and presented in Fig. 10. STRING forecasted confidence scores (0.771-0.591) which indicated the functional network among the set of proteins of a given organism. M4 metalloprotease of B. cereus (npr) was predicted to be interacting with 10 proteins, namely ina, ina_2, plc, DJ87_2940, rseP, DJ87_4517, yloB_1, pruA, rssA_1, and fsr in different manner. The STRING database analysis depicted that npr protein-protein interaction (PPI) network comprised of 11 nodes connected with 20 different edges. Whereas, expected number of edges was observed to be 11; while the average node degree score was found to be 3.64 i.e., one node had at least 3.64 interacting nodes. Average local clustering coefficient was predicted to be 0.732 and PPI enrichment p-value was observed as 0.00703.

Prediction of active sites
CASTp 3.0 server was used to identify the possible active site for ligand in the M4 metalloprotease of Bacillus cereus (PDB 1NPC). In the present study, the surpass active site area of the enzyme in addition to the number of amino acids occupied in it were also reported. The preeminent active site was found the largest pocket with 103.076 areas and a volume of 76.471 amino acids (Fig.  11). Many functionally important residues were located in this pocket, including three residues His 143, His 147, and Glu 167 in zinc-binding site and two residues, Glu 144 and His 232, in the catalytic site which were essential for the most common HEXXH zinc-binding motif in metalloprotese. Another pocket with an area of 49.147 and a volume of 32.345 amino acids comprising three residues, Glu 167, Asp 171, and Glu 191, in Calcium 2 binding site; four residues, Glu 178, Asn 184, Asp 186, and Glu 191, in Calcium 3 binding site; and four residues, Tyr 194, Thr 195, Lys 198, and Asp 201 in Calcium 4 binding site, since they have also been reported to be essential for the functioning of other bacterial organisms [2,54,55]. The 3D representation of pockets was shown in Fig. 11 with largest pocket in red color and other pocket in blue color.

Discussion
The aim of the study was to identify the structure and properties of M4 metalloprotease proteins using bioinformatics tools. The present study primarily determined the global similarities among the compared proteins. In amino acid sequence alignment of 31 M4 metalloproteases, a conserved region (HELTE) was observed (Fig. 1). The presence of two His and one Glu is important for activity in all the metallopeptidases that carry the HEXXH zinc-binding motif. In case of metallopeptidases having two catalytic metal ions, Ca 2+ and Zn 2+ , along with two residues, a glutamate and an aspartate are also essential [56]. The existence of conserved amino acid plays significant role in confirmation of protein and helix coil transition. Gly and Pro frequently coincide with the extremities of well-structured beta strands or alpha helices. His and Ser are often involved in catalytic sites, especially in proteases. Charged amino acids like Asp, Glu, and Arg are mostly involved in ligand binding. Highly conserved columns might indicate a salt bridge inside the core of the protein [57].
In this study, phylogenetic trees were constructed using both amino acid sequence versus gene sequences to find if there was any correlation among the taxa in terms of their protein sequences compared with respective cDNA. Results obtained from the evolutionary tree ( Fig. 2) implied that metalloproteases from different bacterial species appeared to be related to each other and clustering in distinct groups based on its source organisms and nature of the mechanism of enzymatic activity. Thus, it can be inferred that the bacterial strains might be diverged from a common evolutionary ancestor.
A physicochemical analysis of the protein sequence was determined by the Expasy server's ProtParam tool. It revealed all the proteins have negative GRAVY scores which attested to their solubility in hydrophilic solvents and substantiated by earlier studies [58][59][60]. Average extinction coefficient 78371.13 referred the quantity of light that may be absorbed by protein in 280 nm. In theory, when the pI value of a protein exceeds 7, it is characterized as alkaline in nature and the value of below 7 indicates the acidic nature. In this study, the pI values of all the proteins showed broad range of 4.88-10 indicating diverse nature of protein. Metalloproteases from all the selected bacteria except P. aeruginosa were found to be stable with instability index less than 40, which justified by the previous studies [58,59]. The aliphatic index of a protein is used to measure the relative volume of protein occupied by amino acids in aliphatic side chain [61] and higher value of aliphatic index is considered a positive factor of increased thermo stability. Here, all strains showed high aliphatic index (Ai) of 64.83-81.98 which indicated the thermostability of the proteins [62]. Based on the amino acid distribution, the most abundant amino acid was Ala which accounted for 9.3% of the enzyme's primary structure (Fig. 3). The least common amino acid was cysteine. Other predominant amino acids were found to be Gly (9.1%), Ser (7.8%), Val (7.5%), Asp (6.6%), Lys (6.6%), and Thr (6.5%). Ala is very rare to be dug inside the protein core due to its hydrophobic nature so that it has less tendency to contact with water. On the other hand, due to not having a side chain, Gly mostly occupied the surface of the protein providing high flexibility to the polypeptide chain. The presence of significant amount of hydrophilic amino acids such as Ser and Thr represented the protein as extracellular in nature. As Asp is charged and polar amino acid, it might be occurred on the surface of proteins and involved in salt bridge. Being positively charged, Lys preferred to be in the side chain of proteins and formed salt bridge [63]. The domain analysis exposed different conserved site present in M4 metalloprotease from bacterial sources. The presence of common and unique domains among different proteases might confer their structural flexibility, which directly influences functional activity of proteases. These conserved regions might be utilized for designing primers for PCR-based amplification and cloning of these proteases genes from different bacterial species.
In this study, B. cereus M4 metalloprotease (1NPC) was selected as a representative species for describing secondary and tertiary structures of the M4 metalloprotease proteins. Family M4 contains a wide range of extracellular thermolysin. Among them, the 3D structures are known for thermolysin from Bacillus cereus (1NPC) [64] and B. thermoproteolyticus (1KEI) [65]. Thermolysin from B. thermoproteolyticus have been well-characterized structurally and enzymatically, i.e., its primary and tertiary structures and substrate-binding site. But very little information is available for thermolysin from Bacillus cereus. This was the reason behind the selection of B. cereus M4 metalloprotease (PDB ID: 1NPC). Results generated by secondary structure prediction tool SOPMA showed the abundance of coiled region (41.64%) indicated higher conservation and stability of the model 1NPC [66,67]. Information from PDBsum aided in determining the overall structural organization of proteins and predicting protein pockets for ligand binding. Thus, the secondary structure arrangement of the protein could help in the prediction of tertiary structures. On the other hand, secondary structural elements prediction may overcome the limitations of X-ray crystallography and NMR for tertiary structure of protein. Crystallization of few proteins is very difficult task by X-ray crystallography and NMR is restricted to relatively small protein molecules. Moreover, Roy et al. [68] reported that prediction of secondary structural elements was vital for detection conformational changes within the protein of interest. The protein 3D model gained from SWISS-MODEL workspace was evaluated by both QMEAN4 and SAVES server. QMEAN output estimated geometrical aspects of the protein structure that characterized the global arrangement of variable residues of protease. According to Benkert et al., the QMEAN z-score determines of the absolute quality of a model by relating it to the reference structures solved by X-ray crystallography [69]. In Fig.  7c, the z-scores of the QMEAN terms of the protein model were − 0.82, − 1.01, − 0.74, and 0.02 for Cβ interaction energy, all atom energy, salvation energy, and torsion angle energy, respectively. These scores implied that the predicted protein model could be considered a quality model. Furthermore, for the estimation of perfect quality of the model, the QMEAN server relates the query model with a representative set of high-resolution X-ray structures of similar size and the resulting QMEA N z-score is an extent of degree of nativeness of the particular structure [70]. For high-resolution models, the average z-score is '0'. Here, QMEAN z-score for the query model was − 0.43, which was lower than the standard deviation '1' from the mean value '0' of good models, so this result indicated that the estimated model was comparable to the high-resolution models. Again, from the estimation of absolute quality of modeled protein in Fig. 7d, the dark zone indicated that the model Table 4 Cleavage of amino acid residues by different enzymes generated by ExPASy peptide cutter

Name of enzyme
No of cleavages
Protein-protein interaction (PPI) networks are used to identify the complex molecular mechanisms and pathways to gain basic knowledge of diseases. PPI network demonstrated that bacillolysin interacted with ten other proteins in a high confidence score, among them the closest annotated interacting protein having the shortest node with score of 0.77 was found ina. Then, ina_2, immune inhibitor A, was found having the score of 0.768 that functioned in degrading host tissue proteins with broad substrate specificity [77]. Again, plc (score 0.662), which stood for Phospholipase C, was involved in hemolysis and cell rupture [24]. DJ87_2940 (score 0.647) was an enterotoxin; a nonhemolytic protein, yolB_1 (score 0.595), was a calciumtranslocating P-type ATPase which functioned to transport a variety of different compounds, including ions and phospholipids across a membrane. PruA (score 0.591) was a putative delta-1-pyrroline-5-carboxylate dehydrogenase which was involved in antibiotic biosynthesis pathway. From PPI network analysis, it can be estimated that bacillolysin from B. cereus may be a part of its immune system.
Analysis of protein structures for active site often considers as the starting point in the protein-ligand docking studies. The active site of an enzyme comprises a substrate-binding site and a catalytic site. The enzyme binds with a specific substrate in order to catalyze a chemical reaction, whereas the catalytic site occurs next to the binding site, carrying out the catalysis. Some enzymes require the help of cofactors for their activities. Mostly cofactors are connected to the active site of an enzyme. The calculated result from CASTp showed that the amino acid position 112-235 was predicted to be the active site. Metalloproteases from M4 family need Ca 2+ and Zn 2+ as cofactors which bind with specific amino acid residues in enzyme active site for catalysis. In this study, the zincbinding residues of His-143, His-147, and Glu-167, with Glu-144 having acted as the catalytic residue. Glu144 was responsible for the polarization of the catalytic water molecule leading to an enhancement of nucleophilicity, whereas Asp226 orientated the imidazolium ring of His231 and His231 acted as proton donor and general base [78]. His143 formed a hydrogen bond with Asp171 and had been shown to be essential for activity [79] Tyr158 played a primary role in transition state stabilization and substrate binding [80]. Due to the importance of this active site, it has been widely studied as a target site for antibacterial agents. The 3-D structure of the enzyme is analyzed to identify active site residues as a target site to design drugs which can fit into enzyme.

Conclusion
Metalloproteases of M4 family are widely dispersed across the nature. The importance of these proteases has been perceived since their roles in bacterial pathogenicity along with in industrial sectors. The present in silico study reveals the bacterial M4 metalloprotease are thermostable, hydrophillic, and extracellular in nature with diverse molecular mass ranging from 38 to 66 KDa. Cross-validation for the 3D model quality assessment was performed by different servers. Hence, this overview may help to get a theoretical idea in developing crossprotective next-generation anti-bacillolysin vaccines as well as to design enzymes with desirable characteristics for biotechnological applications. However, further in vivo studies might be suggested.