Study on cocoonase, sericin, and degumming of silk cocoon: computational and experimental

Background Cocoonase is a proteolytic enzyme that helps in dissolving the silk cocoon shell and exit of silk moth. Chemicals like anhydrous Na2CO3, Marseille soap, soda, ethylene diamine and tartaric acid-based degumming of silk cocoon shell have been in practice. During this process, solubility of sericin protein increased resulting in the release of sericin from the fibroin protein of the silk. However, this process diminishes natural color and softness of the silk. Cocoonase enzyme digests the sericin protein of silk at the anterior portion of the cocoon without disturbing the silk fibroin. However, no thorough characterization of cocoonase and sericin protein as well as imaging analysis of chemical- and enzyme-treated silk sheets has been carried out so far. Therefore, present study aimed for detailed characterization of cocoonase and sericin proteins, phylogenetic analysis, secondary and tertiary structure prediction, and computational validation as well as their interaction with other proteins. Further, identification of tasar silkworm (Antheraea mylitta) pupa stage for cocoonase collection, its purification and effect on silk sheet degumming, scanning electron microscope (SEM)-based comparison of chemical- and enzyme-treated cocoon sheets, and its optical coherence tomography (OCT)-based imaging analysis have been investigated. Various computational tools like Molecular Evolutionary Genetics Analysis (MEGA) X and Figtree, Iterative Threading Assembly Refinement (I-TASSER), self-optimized predicted method with alignment (SOPMA), PROCHECK, University of California, San Francisco (UCSF) Chimera, and Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) were used for characterization of cocoonase and sericin proteins. Sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE), protein purification using Sephadex G 25-column, degumming of cocoon sheet using cocoonase enzyme and chemical Na2CO3, and SEM and OCT analysis of degummed cocoon sheet were performed. Results Predicted normalized B-factors of cocoonase and sericin with respect to α and β regions showed that these regions are structurally more stable in cocoonase while less stable in sericin. Conserved domain analysis revealed that B. mori cocoonase contains a trypsin-like serine protease with active site range 45 to 180 query sequences while substrate binding site from 175 to 200 query sequences. SDS-PAGE analysis of cocoonase indicated its molecular weight of 25–26 kDa. Na2CO3 treatment showed more degumming effect (i.e., cocoon sheet weight loss) as compared to degumming with cocoonase. However, cocoonase-treated silk cocoon sheet holds the natural color of tasar silk, smoothness, and luster compared with the cocoon sheet treated with Na2CO3. SEM-based analysis showed the noticeable variation on the surface of silk fiber treated with cocoonase and Na2CO3. OCT analysis also exemplified the variations in the cross-sectional view of the cocoonase and Na2CO3-treated silk sheets. Conclusions Present study enlightens on the detailed characteristics of cocoonase and sericin proteins, comparative degumming activity, and image analysis of cocoonase enzyme and Na2CO3 chemical-treated silk sheets. Obtained findings illustrated about use of cocoonase enzyme in the degumming of silk cocoon at larger scale that will be a boon to the silk industry. Supplementary Information The online version contains supplementary material available at 10.1186/s43141-021-00125-2.


Background
Sericulture is an industry having a history of almost 5000 years where raising of silkworm larvae and turning of cocoons into a string is being performed. Domestic silk moth (Bombyx mori) is a holometabolous insect having four discrete stages (such as egg, larva, pupa, and adult) in their life cycle simply separated from one another [22]. The prevalence of silk as a material fiber has been perceived from the prehistoric time. Silkworm is generally nurtured in different geological locales of India. Arjuna or arjun tree (Terminalia arjuna) and asan or Indian laurel or silver-grey wood (Terminalia tomentosa) are primary host plants for the A. mylitta larvae. Jamun (Syzygium cumini L.) is another potential host of tropical Tasar silkworm, A. mylitta Drury. Tasar cocoons are tougher than coverings of other types of sericigenous insects. The silk fiber delivered by the silkworm is a complex material shaped by fibroin protein and bounded by sericin protein [45]. To get the silk thread from cocoons, removal of sericin is an essential step [52]. As an essential to reeling practice, cooking of cocoon needs to be performed. Here, cocoon is made softer by decomposing or partially solubilizing the sericin component that ties the protein fibroin strands from which the silk string is reeled.
A. mylitta insect species is native to India. This is an economical sericigenous insect that produces tasar silk having high demand worldwide. They are broadly distributed in the tropical area of India lies from West Bengal (East) to Karnataka (South). It is also found in the forest of Madhya Pradesh, Bihar, Maharashtra, Andhra Pradesh, Telangana, and Orissa [59]. Tasar silk obtained from A. mylitta species of wild silkworms is a different color from domesticated silkworm silk. It is coarser and stronger that makes it more favorable in some applications like higher tensile strength, elongation, and stress-relaxation values than the silk secreted by the domesticated silkworm B. mori [12,28,56]. The natural silks are extensively categorized as mulberry (which is obtained from cocoons of B. mori L.) and non-mulberry (which is obtained from tropical and eri, muga, temperate tasar, and anaphe). About 95% of the worldwide generation of non-mulberry silk belongs to tasar. Other varieties (like fagara, coan, mussel, and creepy crawly silks) are not utilized for profitable production [45]. Tasar silk fiber has its own unique shading, elongation, coarse to feel, higher elasticity and stress-relaxation values as compared to mulberry silk fiber. These properties have made tasar silk as capable and attractive as mulberry silk. B. mori has four discrete stages in their life cycle where only larva stage (i.e., from 1st to 5th larval instars) is a feeding period. Also, morphologically remarkable changes from larva to adult occur in the pupa by a wonderfully regulated metabolism that consist of the degradation, remodeling, and neogenesis of the tissues [22]. Reeling is an important method where silkworm is used for drawing silk thread from cocoon spun [45]. There is regularly expanding interest for tasar silk because of its luster, strength, and copper brown color.
In India, the production of tasar silk continued next to mulberry silk for eras, constituting about 4% of the total production of silk. The cocoon cooking involves boiling of the cocoon in water that helps in the release of sericin protein and a continuous silk filament that is reeled to get a thick thread of silk [9]. After degumming of fibers, the mulberry silk is soft, white, and holds luster while non-mulberry silk is irregular, coarser, and brownish in color. In industries, chemical methods are used for degumming of cocoons using chemicals like soda, soap, H 2 O 2 , alkaline solution, and alkali. Sericin and fibroin both are affected by chemical treatment, thus affect the properties of tasar silklike natural color and softness [51]. Therefore, it is anticipated that enzyme-based cocoon degumming will be beneficial in sustaining the natural color and softness of tasar silk. Also, enzymatic methods have many advantages over chemical method as it is cheap, eco-friendly, and enriches the silk quality [17].
A couple of decades ago, it was realized that enzyme-based degumming of silk cocoon needs to be established because enzyme-based cocoon degumming results in a silk yarn having good texture and upgraded gloss. For enzyme-based degumming process, mainly papain, trypsin, and bacterial enzymes were used [31]. A proteolytic enzyme trypsin that is secreted by the pancreas catalyzes the hydrolysis of the peptide bond among the carboxyl group of lysine or the carboxyl group of arginine and amino groups of adjacent amino acids. Trypsin is mostly active at the temperature of 37°C and in the pH range of 7-8. Sericin is a less crystalline protein with a comparatively high lysine and arginine content, polar in nature as well as effectively hydrolyzed by trypsin, while fibroin is not affected by trypsin because of a lower quantity of the arginine and lysine present in its structure [25]. An enzymatic method of degumming includes the use of proteolytic enzymes (like papain, bromelain, trypsin, alcalase, protease) that hydrolyzes the peptide bond of protein and degrades sericin without disturbing fibroin [18].
Cocoonases [enzyme commission (EC) number: 3.4.21.4] are sericin proteinases secreted by few sericigenous insects that soften the end part of the silk cocoon and allow to escape the adult moth [34]. Cocoonase is a proteolytic enzyme produced by silk moth during the pupal-adult transformation. Its main function is to digest the sericin protein at the anterior portion of the cocoon. Cocoonase enzyme is synthesized and collected in the maxillary galeae of silk insect as prococoonase [8,19,36,38]. The SDS-PAGE based study of freshly collected cocoonase exhibited its molecular weight of 25-26 kDa [52].
Computational biology based phylogenetic analysis of cocoonase using the Molecular Evolutionary Genetics Analysis (MEGA) 5.1 Beta4 software showed the existence of conserved domain in cocoonase [43]. Current and future perspective of cocoonase enzyme indicating its detailed possible role in tasar industry have been elucidated [51]. Development of modern biotechnological and molecular biology tools has eased to know the detailed information about the gene and genome of any organism. However, much information of genes as well as whole genome sequence of the tasar silkworm A. mylitta is not yet available. A computational approach-based study by utilizing the all available expressed sequence tags (ESTs) towards predicting the microRNA (miRNA) and single nucleotide polymorphisms (SNPs) in A. mylitta has been reported [16].
Detailed information on cocoonase and sericin proteins about their sequence characteristics and evolutionary relationship, structures, validation, and their interactions in A. mylitta are not available. Therefore, a holistic study on cocoonase and sericin using computational and experimental approaches is of great interest. In the present study, computational analysis has been executed using online B. mori sequences. Also, details about cocoonase collection, purification, degumming of silk cocoon shell and its effect on cocoon sheet showing microscopic differences in enzyme-and chemical-treated cocoon shell, and its characteristics using a scanning electron microscope (SEM) and optical coherence tomography (OCT) have been elucidated.

BLAST
BLAST (basic local alignment search tool) is a sequence similarity search program. The retrieved cocoonase (GenBank ID: BAJ46146.1) and sericin (GenBank ID: BAD00699.1) proteins from NCBI were further searched for similarity checking in NCBI using BLASTP. BLASTP was performed against the query sequence to establish search for the template sequence with highest similarity. Algorithm was set at 250, and only cocoonase and sericin protein sequences of different species were selected.

Phylogenetic tree
The phylogenetic tree was constructed to establish evolutionary relationship between the template and the sequence retrieved after BLAST analysis. Molecular Evolutionary Genetics Analysis (MEGA) X, an integrated tool for conducting automatic and manual sequence alignment, inferring phylogenetic trees and mining webbased databases (https://www.megasoftware.net/) was used to construct the tree [39]. Figtree that is designed as a graphical viewer of phylogenetic trees and as a program for producing publication-ready figures (http:// tree.bio.ed.ac.uk/software/figtree/) was used to color the different species in phylogenetic tree.
SOPMA SOPMA (self-optimized predicted method with alignment) is mostly being used to analyze secondary structure of protein. Sequence length and secondary structure of cocoonase (BAJ46146.1) and sericin (BAD00699.1) proteins were predicted by SOPMA available at http:// npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/ npsa_sopma.html [23]. The improvement happens in the way that SOPMA considers data from an arrangement of successions having a place with a similar family.

Prediction of ligand-binding sites
Initially, I-TASSER model was submitted to the COACH algorithm available at https://zhanglab.ccmb.med.umich. edu/COACH/, which produces ligand-binding site predictions by matching the target models with the proteins in the BioLiP database [71,72].

Prediction of Enzyme Commission (EC) numbers and active sites
Predictions of Enzyme Commission number and active site were generated by COFACTOR and local and global structural evaluations of the I-TASSER models with known proteins in the BioLiP structure function database available at https://zhanglab.ccmb.med.umich.edu/ COFACTOR/help.html [75].

Prediction of normalized B-factor
B-factor (also known as temperature factor) is regularly used to know the extent of atomic motion in the X-ray crystallography. Here, the normalized B-factor was predicted by ResQ [73] using a combination of templatebased assignment and machine-learning-based prediction which employs sequence profile and predicted structural features.

Ramachandran plot by PROCHECK
Ramachandran plot is used for the validation of tertiary structure. Ramachandran plot was prepared for cocoonase and (https://www.ebi.ac.uk/thornton-srv/software/ PROCHECK/) was used for validation of tertiary structure and "stereochemical quality" of a given protein structure.

UCSF Chimera
Chimera is segmented into a core that offers elementary services and visualization, and extensions that provide most higher-level functionality [54]. Chimera is freely accessible to academic and nonprofit researchers and available at https://www.cgl.ucsf.edu/chimera/. University of California, San Francisco (UCSF) Chimera showed the good molecular visualization of the 3D models and was used to generate good quality images of the protein models. This software was used for the better visualization of the 3D structures of cocoonase and sericin proteins [49].

Conserved Domain
Conserved Domain records the location of functional motifs on protein domain models, so that these motifs can be mapped on protein sequences and facilitate the interpretation of sequence conservation and variation in active sites, chemical binding, and protein-protein interaction sites. Cocoonase (BAJ46146.1) and sericin (BAD00699.1) proteins were used to predict the Fig. 1 The phylogenetic tree of cocoonase (a) and sericin (b) that was constructed using MEGA X [39]. The evolutionary history was inferred by using the Maximum Likelihood method and JTT matrix-based model [32]. presence of conserve domain available at https://www. ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml [47,48].

Networking of proteins
A functional interacting network of cocoonase (BAJ46146.1) and sericin (NP_001037329.1, SGF1 -Silk gland factor 1; regulates the transcription of the sericin-1 gene via interaction with the SA site) proteins was performed for the protein sequences using the Search Tool for the Retrieval of Interacting Genes/Proteins (STRI NG) 10 software [21,66].

Recognition of specific stage of pupae
Specific stage of pupae that may be utilized for maximum cocoonase collection was recognized. These pupae were kept for adult emergence and monitored by changes in integument color from natural red-brownish to black [53].

Collection of proteolytic enzyme cocoonase
Freshly pierced cocoons were taken, and before emergence, pupae were transferred to cocoonase collection set-up for cocoonase secretion and collection [53]. Briefly, pupae were kept in a funnel that facilitates collection of secreted cocoonase in a small test tube embedded in ice to keep the collected cocoonase at lower temperature.

Degumming activity assessment of cocoonase
Degumming activity assessment of cocoonase was performed as per method described previously by Wang and Guo [68] with modifications. For this analysis, silk cocoon sheets (each 30 mg in weight) were dried using hot air oven to remove the moisture completely.
Degumming activity was studied in three different test tubes, namely, (a) control-silk cocoon sheet dissolved in 10 ml of Tris-HCl (pH 8.0) buffer; (b) cocoonase enzyme degumming-silk cocoon sheet dissolved in 0.2 ml of cocoonase enzyme + 10 ml of Tris-HCl (pH 8.0) buffer; and (c) Na 2 CO 3 -based alkaline degumming-silk cocoon sheet dissolved in 0.05% of Na 2 CO 3 . All three test tubes were incubated at 42°C for 1 h with agitation. Silk cocoon sheets were rinsed with warm water followed by distilled water thrice. All three treated cocoon sheets were dried for 1 day at ambient temperature followed by at 70°C for 1 h. Subsequently, dry weight was measured using analytical balance. Experiment was performed using three independent replications, and obtained values were averaged.

Cocoonase quantification and purification
The estimation of protein was performed according to Bradford's protein assay method [10]. The protein sample concentration was determined from a standard curve drawn using bovine serum albumin as a standard. Crude cocoonase was purified using Sephadex G 25-column.

SDS-PAGE analysis
The sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) analysis was carried out according to the procedure of Laemmli [40] with slight modification. Six percent stacking gel with pH 6.8 while 10%, 12%, and 15% of resolving gel with pH 8.8 was used. Tris-glycine with 0.1% SDS having pH 8.6 was used as running buffer. A total of 10 μl of cocoonase was boiled for 10 min with equal volume of 1X protein loading buffer. After boiling, the protein samples were immediately chilled on ice for 10 min. These samples were loaded in gel, and the resolved protein was visualized by Coomassie blue staining as per standardized protocol.

Scanning electron microscopy observation of silk filaments
Silk cocoon sheet degummed only in Tris-HCl buffer (control) and sheet obtained after degumming with cocoonase enzyme as well as sheet treated with Na 2 CO 3 were used for morphological observation by using a scanning electron microscope (SEM). Comparative morphological analysis of silk sheets subjected to varying treatment was performed.

Optical coherence tomography-based analysis
Silk cocoon sheets treated only with buffer (control), cocoonase enzyme, and chemical were subjected to OCT system and imaged [58]. For this analysis, all three test tubes (control, treated with cocoonase enzyme, treated with chemical) having cocoon sheets were incubated at 42°C for 1 h only. After treatment, silk sheets were rinsed with warm water followed by distilled water thrice. All three sheets were dried for 1 day at ambient temperature followed by at 70°C for 1 h. Subsequently, these treated and dried sheets were used for image acquisition. Image acquisition was carried out the next day after the treatment. To minimize the moisture content, treated cocoon sheets were stored in an incubator at 37°C. Finally, these sheets were mounted on microscope slides for recording observations and comparative analysis using the OCT-inbuilt software.

Results
Sequence of cocoonase (BAJ46146.1) and sericin 1A' (BAD00699.1) proteins from Bombyx mori having amino acid query length of 227 and 722, respectively, was retrieved from NCBI. Phylogenetic analysis of all the retrieved protein sequences exhibited evolutionary relationship among different species. Evolutionary relationship was shown by using the Maximum Likelihood method and JTT matrix-based model. Different cocoonases of B. mori showed substantial similarity among each other as compared to cocoonase from other species. Various sericin proteins of B. mori also showed noteworthy similarity among each other as compared to sericin of other species. A total of 32 amino acid sequences of cocoonase and 23 amnio acid sequences of sericin protein were found and considered for analysis ( Fig. 1 a and  I-TASSER produces a full-length model of proteins by removing continuous fragments from threading alignments and afterward reassembling them utilizing replica-exchanged Monte Carlo simulation. The models are colored based on rainbow coloring scheme with N-terminal of protein colored blue and Cterminal colored red. The helical structures in red color represent α-helix, 3 10 -helical structures in blue color, and arrow in yellow color represents β-sheets whereas turns were represented in cyan and coils in purple color (Fig. 3 a and b). After the simulation of structure assembly, I-TASSER utilizes the TM-align structural alignment program to match the major I-TASSER model to all the structures in the Protein Data Bank (PDB) library. This segment reports the main 10 proteins from the PDB showing the nearest basic closeness, i.e., the highest TM-score to the predicted I-TASSER model. Predicted tertiary structure of cocoonase exhibited the C-score 1.19, estimated TM-score 0.88 ± 0.07, and estimated RMSD 3.2 ± 2.3 Å, while sericin displayed the C-score − 0.53, estimated TM-score 0.65 ± 0.13, and estimated RMSD 3.2 ± 2.3 Å (Fig. 3 a and b). ResQ-based local accuracy estimation for the first model predicted by I- TASSER for cocoonase and sericin proteins was also analyzed. Result showed that majority of residues in the models were modeled accurately with estimated distance to native below 2 A°for cocoonase and below 6 A°for sericin (Fig. 4 a and b).
Biological annotations of the target proteins based on the I-TASSER structure prediction has been studied where COFACTOR deduces protein functions (like ligand-binding sites and EC number) using structure comparison and protein-protein networks.
The functional templates of cocoonase (PDB ID: 5jbcS) as well as sericin (PDB ID: 3w3lA) with a high confidence score, C-score (0.95 and 0.09, respectively), have been predicted along with varying cluster size (i.e., total number of templates in a cluster), ligand name as well as ligand-binding site residues (Fig. 6 a  and b). COFACTOR-based protein function prediction using structure, sequence, and protein-protein interaction properties has also been studied for finding Enzyme Commission number (EC) and ligandbinding sites [77]. COFACTOR tool-based analysis of cocoonase protein predicted a template of PDB ID: 1z8gA having EC number 3.4.21.106 (a hepsin belonging to peptidase family S1A). Also, the predicted active-site residues were 45, 88, 180, 182, 183, and 197 (shown in colored ball-and-sticks) with a C-score of 0.75 indicating a solid EC number (Fig. 7a). Similarly, for sericin protein, another template PDB ID: 3h09B having EC number 3.4.21.72 (a IgA-specific serine endopeptidase belonging to peptidase family S1A) was predicted. Also, predicted active-site residues were 74 and 130 (shown in colored ball-andsticks) with a C-score 0.153 also indicating a solid EC number (Fig. 7b). The normalized B-factor (also known as B-factor profile, BFP) was predicted by using a combinatorial approach of both template-(a) (b) Fig. 6 The predicted ligand-binding sites in cocoonase (a) and sericin (b). The first functional template (PDB ID: 5jbcS) has a high confidence score (C-score = 0.95) for cocoonase which is showing the structure is stable and functional. The template (PDB ID: 3w3lA) for sericin has C-score of 0.09 which is also stable structure and will bind with a peptide ligand. But for the predicted peptide, the protein can likewise tie to different ligands, which are available in a PDB file at the "Mult" link based assignment and profile-based prediction where residues with BFP values higher than 0 were less stable in experimental structures [73]. In the present study, predicted normalized B-factors of cocoonase for the helix and strand regions were negative or close to zero (Fig. 8a). On the other hand predicted normalized B-factors of sericin for the helix and strand regions were close to zero (Fig. 8b). PROCHECK specified the stereochemical quality of a protein structure by analyzing residue-by-residue geometry and overall structure that analyzes the compatibility of an atomic model (3D) with its own amino acid sequence. A Ramachandran plot output (modified from PROCHECK) of cocoonase and sericin has been predicted ( Fig. 9 a and b). The red region designated the most allowed regions, while yellow, light yellow, and white fields designate the additional allowed, generous allowed, and disallowed regions, respectively. The Ramachandran plot revealed that 68.2% amino acid residues of cocoonase and 53.7% amino acid residues of sericin were predicted within the most favored region (Fig. 9 a and b). To assess the geometric correctness of the theoretical structure, PROCHECK [42] was used to check the stereochemical quality of cocoonase and sericin residue-by-residue geometry. Plot of cocoonase (Fig. 10a) and sericin (Fig. 10b) indicated the graphs of five main-chain properties of their structures. In each graph, the dark band represented the results from the wellrefined structures; the central line was a least-squares fit to the mean trend as a function of resolution, while the width of the band either side of it corresponds to a variation of one standard deviation about the mean [4]. In the present study, Ramachandran plot quality measured by the percentage of the protein's residues that were in its most favored or core regions is indicated as (a), planarity of the peptide bond as measured by the standard deviation of the w torsion angles indicated as (b), number of bad contacts per 100 residues indicated as (c), tetrahedral distortion, measured by the standard deviation of the~zeta torsion angle indicated as (d), and the standard deviation of the hydrogen-bond energies for main-chain hydrogen bonds calculated using the method of Kabsch and Sander [33] has been indicated as (e).
Prediction of the hydrophobic and hydrophilic regions of the protein, single atom, and hydrogen bond in the protein models of cocoonase and sericin, which has been predicted by I-TASSER, were performed by UCSF Chimera. Here, blue color indicated the hydrophilic part of the protein while orange color indicated the hydrophobic part of the protein. Also, higher positive values correspond to more hydrophobic residues, and negative values correspond to hydrophilic residues. On the other hand, no-value color referring to residues lacking Kyte-Doolittle hydrophobicity (i.e., they are not amino acids such as the ligands in this structure) have been shown (Fig. 11 a and b). UCSF Chimera has been used to predict every single atom of the cocoonase and sericin proteins (Fig. 11 c and d) and hydrogen bond in the protein models of cocoonase and sericin (Fig. 11 e and f). Domains are related with protein structure, and therefore, prediction of domain might be useful in inferring protein function. Considering their importance, conserved domains were predicted in cocoonase and sericin proteins. Result indicated that conserved domains of B. mori cocoonase (Fig. 12a) were a trypsin-like serine protease having active site from 45 to 180 query sequence and substrate binding site from 175 to 200 query sequence while conserved domain of sericin (Fig. 12b) indicated no conserved domain availability in NCBI. To know about various other proteins that might be interacting with cocoonase and sericin proteins towards performing the specific functions and predicting their association in other biological events via proteinprotein interacting network, a STRING database-based analysis was performed. A functional interacting network of cocoonase (BAJ46146.1) protein has been obtained (Fig. 13a). However, no functional interacting network of sericin (BAD00699.1) protein with other protein was predicted by STRING v10. Therefore, another sericin protein (NP_001037329.1, SGF1-Silk gland factor 1; regulates the transcription of the sericin-1 gene via interaction with the SA site from Bombyx mori) was used for STRING v10 analysis indicating the interaction with other proteins.
To isolate and collect the maximum cocoonase, it was very much important to find the most suitable stage at which maximum enzyme is being released. Selection of pupae for cocoonase collection was typically based on change in the color of integument that turns dark black at the time of metamorphosis as well as softening of pupae tissues (Fig. 14a). Cocoonase is a proteolytic enzyme that is secreted by several sericigenous insect including A. mylitta during emergence. An emerging adult exudes around 500-850 μl of cocoonase gradually drop-by-drop, and this release process proceeds up to 2-4 h (Fig. 14b). Our result (Table 1) showed the cocoonase activity assessment in terms of comparative analysis of silk cocoon sheet weight treated with buffer (control), subjected to cocoonase treatment (enzyme degumming), and treated with Na 2 CO 3 (chemical). The result showed degumming percentage in terms of decreased cocoon sheet weight treated with buffer, cocoonase enzyme, and Na 2 CO 3 chemical. Result indicated that chemical-based degumming showed maximum degumming effect (cocoon sheet weight loss) as compared to degumming with cocoonase. SDS-PAGE-based protein separation of collected cocoonase contains many proteins with molecular weight of 29 kDa, 25-26 kDa, and 17 kDa proteins. However, Sephadex G25 column-based purification of cocoonase indicated its molecular weight of 25-26 kDa (Fig. 15). It was also very much pertinent to know the changes in structural features of silk sheets that are being treated with buffer, enzyme, and chemical. SEM result analysis presented vibrant variation in silk cocoon sheet fiber surface when cocoon softening was done using buffer, cocoonase enzyme, and Na 2 CO 3 chemical. Silk cocoon sheet obtained after treating/ cooking with buffer only (Fig. 16a) was compared with the silk cocoon sheet treated with cocoonase enzyme and Na 2 CO 3 chemical. Result revealed that silk cocoon sheet treated with cocoonase enzyme holds the natural color of tasar silk (Fig. 16b), smoothness, and luster compared with the cocoon sheet treated with Na 2 CO 3 chemical (Fig. 16c).
It might be further interesting to know the microstructural changes in silk sheets subjected to chemical-and enzyme-based treatment. To compare and analyze the treatment effect with only buffer, cocoonase enzyme, and chemical at different stages, normalized depth profiles were plotted. Treatment effect was mainly studied using optical coherence tomography (OCT) image analysis while comparison of (a) (b) Fig. 9 A Ramachandran plot output (modified from PROCHECK) of cocoonase (a) and sericin (b). The plot calculations were computed by PROCHECK server. The red regions in the graph indicate the most allowed regions; additional allowed, generous allowed, and disallowed regions are indicated as yellow, light yellow, and white fields, receptively. The Ramachandran plot for cocoonase disclosed 68.2% of amino acid residues within the most favored region. Similarly, Ramachandran plot for sericin disclosed 53.7% of amino acid residues within the most favored region morphology was performed using histological images obtained with the treatment of silk cocoon sheet with buffer only, cocoonase enzyme, and Na 2 CO 3 chemical. OCT B-scan image of silk cocoon sheet kept under control (Fig. 17a), cocoonase-treated cocoon sheet (Fig. 17b), and Na 2 CO 3 chemical-treated sheet (Fig.  17c) were obtained where the zoomed red dotted rectangle box showed the region of interest (ROI). Figure 17 d represents A-scan image that gives simplicity of different internal layers in the form of peaks. A-scan image indicated the depth attenuation profile peak of the silk sheet kept under control. The thickness of the silk sheet in Fig. 17d was less in control condition as compared to the treated conditions. The different peaks of the A-scan image represented different layers of the silk cocoon sheet. A-scan profile in Fig. 17e showed the penetration depth of the silk sheet that has been increased, and the thickness has also been increased due to the cocoonase treatment when compared with the control treatment. Treatment of cocoonase also led to the higher number of peaks with higher contrast in A-scan image. Ascan profile in Fig. 17f showed the increased thickness as well as higher contrast.

Discussion
Resemblances and variations among associated biological sequences obtained by sequence alignment are embodied in the form of phylogenetic trees. A phylogenetic tree or phylogeny was an illustration that depicts the lines of evolutionary descent of different species, organisms, or genes from a common ancestor [7]. In the present study phylogenetic analysis of cocoonase and sericin proteins revealed that these sequences were evolutionarily more conserved in B. mori as compared to cocoonase and sericin sequences of other species (Fig. 1 a and b). Secondary structure information of a protein was very much important for folding of a protein into its stable three-dimensional structure or tertiary structure. And predicting protein secondary structures from its sequences has been considered as an intermediate stage bridging the gap between the primary sequences and tertiary structure prediction [81]. Predicted secondary structures (α helix, β turn, extended strand, and random coil) of cocoonase and sericin have been enlisted (Fig. 2 a  and b and Supplementary Figures 1a & b). Accurate 8-state secondary structure prediction can significantly give more precise and high resolution on structurebased property analysis. And a valuable method for accurate prediction of 8-state protein secondary structures by a novel deep learning architecture has also been described [81]. Phylogenetic analysis revealed that B. mandarina and B. mori cocoonase mRNA sequences were closely related, while A. pernyi cocoonase mRNA sequence showed little variation [43]. The I-TASSER server was an online workbench for high-resolution modeling of protein structure and function [63]. I-TASSER-based 3D structure prediction of cocoonase and sericin have shown α-helix (red color), 3 10 helix (blue color), beta sheets (yellow color), turns (cyan color), and coil (purple color) (Fig.  3 a and b). Predicted model having C-score > − 1.5 indicated that these models were of correct global topology. Functional characterization of A. pernyi cocoonase protein by predicting its 3D structure using I-TASSER has been reported [43]. ResQ was a model quality assessment program for the local structure quality estimation and used to assess the accuracy of structure models generated by both I-TASSER and other structure prediction method [74]. ResQ-based first model prediction by I-TASSER for cocoonase and sericin revealed that majority of residues in the model were displayed accurately (Fig. 4 a and b). To find the structurally similar analogs of the query proteins [75], TM-align-based identification of the first I-TASSER model was performed against the PDB library [80]. The top 10 PDB proteins that were structurally close to the cocoonase (1eaxA, 1fiwA, 1fizA, 3w94A, 2f91A, 1ekbB, 1bmaA, 1bruP, 1z8gA, 2anyA) and sericin (5n8pA, 5gr8A, 5hyxB, 5gijB, 2a0zA, 4ecnA, 3cigA, 4mn8A, 6gffI, 4lxrA) proteins have been shown in Fig. 5 a and b. The structural alignments between the query and the 10 closest proteins have been ranked based on the TM-score [79]. COFACTOR has been used to predict the protein function by using 3D structural information of proteins [62,77]. Also, COFACTOR, a protein function prediction webserver, predicts EC numbers and ligand-binding sites by using structural properties of proteins [62]. Varying ligand-binding site residues as well as functional template for cocoonase and sericin with a high confidence scores (C-score = 0.95 and Cscore = 0.09, respectively) have been predicted, where high C-score (0-1) indicated that deduced structures (Fig. 6 a and b) were stable. COFACTOR-based protein function prediction by finding Enzyme Commission number (EC) and ligand-binding sites has also been described [77]. For cocoonase, a template PDB ID:1z8gA with EC number 3.4.21.106 (a hepsin belonging to peptidase family S1A), active-site residues (45, 88, 180, 182, 183, and 197), and C-score of 0.75  have been predicted (Fig. 7a). Similarly, for sericin, another template PDB ID: 3h09B with EC number 3.4.21.72 (a IgA-specific serine endopeptidase belonging to peptidase family S1A), active-site residues (74 and 130), and C-score 0.153 have been predicted (Fig.  7b). Cscore EC was the confidence score for the EC number prediction, and its values range in between 0 and 1, where a higher score indicated a more reliable EC number prediction [62]. DEEPre sequence-based enzyme EC number prediction by deep learning method having ability to capture the functional difference of enzyme isoforms has also been described [44]. B-factor was a value that indicated the extent of the inherent thermal mobility of residues/atoms in proteins [75]. Predicted normalized B-factors of cocoonase for the helix and strand regions were negative or close to zero (Fig. 8a) indicating that these regions were structurally more stable. On the other hand predicted normalized B-factors of sericin for the helix and strand regions were close to zero (Fig. 8b) suggesting that these regions were structurally less stable (Fig. 8b). Aldose reductase and its analogs have a good experimental set of structures to explore the importance of B-factor-based analysis [5]. The Ramachandran plot was an important tool used in the analysis of protein structures [27]. Ramachandran plots have been used to validate protein three-dimensional structures determined using crystallographic methods, NMR spectroscopy, or even computational modeling techniques [11] Also, ϕ and ψ torsion angles in a blocked monopeptide have played a central role in understanding protein structure [60]. A Ramachandran plot out using PROCHECK for cocoonase (Fig.  9a) and sericin (Fig. 9b) confirmed the good quality of model. The six graphs on the main chain parameters of cocoonase (Fig. 10a) and sericin (Fig. 10b) plots indicated the structure (represented by solid square) compared with well-refined structures at a similar resolution. Similarly, predictive study on six graphs on main chain parameters (namely Ramachandran plot quality, peptide bond planarity, inappropriate non-bonded interactions, C alpha tetrahedral distortion, and main-chain hydrogen bond energy for HIV-1 Virion Infectivity Factor (vif)) has been carried out [4]. In stereochemical quality of protein structures in some cases, the trend was dependent on the  resolution while in other cases it remained independent of it [41]. UCSF Chimera has been used for the prediction of the hydrophobic and hydrophilic regions of the protein, single atom, and hydrogen bond in the protein models of cocoonase and sericin shown by I-TASSER ( Fig. 11a-f). An UCSF Chimera tool, RRDistMaps, has been developed to compute the generalized maps in order to analyze pairwise variations in intramolecular contacts. RRDistMaps has an interactive utility to visualize conformational changes, both local (bindingsite residues) and global (hinge motion), between unbound and bound proteins through distance patterns [13]. Another web application and a downloadable tool, ConEVA, has been developed that was useful for a range of contact-related analysis and evaluations including predicted contact comparison, investigation of individual protein folding using predicted contacts, and analysis of contacts in varying structures [1].
Identification of domains in protein sequences was a key step towards structural and functional annotation of protein [50]. Domains were allied with structures, and their identification has been used to infer the protein structure [46,70]. Prediction of domain might also be helpful in various analysis like comparative analysis of domain families [76], evolution of protein and domain structure and function [20,55], prediction of protein-protein interactions [15,24,35] as well as in identifying the evolutionary relationships of multidomain proteins [65]. A trypsin-like serine protease as a conserved domain in B. mori cocoonase (Fig. 12a), while no conserved domain for sericin protein (Fig. 12b) has been predicted, indicates that cocoonase has proteolytic activity. In A. pernyi cocoonase, mRNA sequence common conserved region of trypsin-like serine protease and peptidase S1 domain has been predicted [43]. The co-expression scores in STRING v10 have been computed using a revised and improved pipeline [66], making use of all microarray gene expression experiments deposited in NCBI Gene Expression Omnibus, NCBI GEO [6]. In the present study, an interacting network of cocoonase (Fig. 13a) and sericin (Fig. 13b) proteins has been found indicating that these proteins have significant interaction. The color saturation of the edges denotes the confidence score of a functional association. Also, proteinprotein interaction network of Litopenaeus vannamei haemocytes has also been reported [26]. STRINGbased HSP70 protein interacting network analysis revealed that HSP90AA1, HSF1, HSP90AB1, DNAJB1, DNAJB6, BAG3, LOC783577, DNAJC7, BAG1, and DNAJC2 proteins were found to be interacting with HSP70 [64].
Identification of stage at which maximum collection of cocoonase might be achieved was significantly important. Therefore, proper stage selection followed by collection of cocoonase from pupa using drop-by-drop method has been performed (Fig. 14 a  and b). Cocoon degumming by chemical treatment resulted in deterioration in silk quality and tensile strength of silk and release of relatively more sericin (Table 1). However, cocoonase enzyme-based degumming might have advantages over chemical-based degumming of the cocoon. Degumming effect of various chemicals on the silk yarn of Chinese, Bangalore, and Murshidabad has been studied showing maximum degumming effect with ethylene diamine as compared to Marseille soap, Na 2 CO 3 , tartaric acid, and alcalase enzyme [14]. Silk degumming and sericin extraction have also been investigated by using 2% anhydrous sodium carbonate where degumming loss percentage and the recovery rate of sericin were 26.1% and 75.5%, respectively [68]. Soap-sodabased degumming effect on weight loss, absorbency, bending length, breaking load, elongation at break, and crease recovery using mulberry, muga, tasar, and ericream silk substrates revealed that muga, tasar, and ericraem silks required more time and severe conditions for sericin removal as compared to mulberry silk, indicating that sericin was more strongly embedded in wild silk as compared to mulberry silk [67]. Silk degumming and sericin extraction from silk fibers has also been studied using enzymatic methods, high temperature, and high-pressure methods to compare the fiber whiteness, brightness, weight loss, breaking strength, and elongation. Result revealed that enzymatic process with 8% savinase and 1100°C of high temperature was comparable signifying that this might also be used as an alternative method for sericin degumming [2]. SDS-PAGE separation of purified A. mylitta native cocoonase showed its molecular weight of 25-26 kDa (Fig. 15). Earlier scanning electron microscopy (SEM) and mechanical testing-based analysis revealed that the silk sheet of Antherina suraka cocoon was less compact, with greater thickness and lower tensile strength and stiffness than that of B. mori [57]. SEM-based surface morphology of tasar silk fiber waste protein (sericin) has also been carried out [29]. Recently, a SEM-based study revealed that the silk fibers obtained from peptide-treated silkworm were smooth in texture and at least two times thicker than untreated counterparts [30]. SEM analysis of silk cocoon sheets treated with buffer only (Fig. 16a), cocoonase enzyme (Fig. 16b), and Na 2 CO 3 chemical (Fig. 16c) showed noticeable variations, where cocoonase enzyme-treated silk cocoon sheet clutches the natural color of tasar silk, smoothness, and luster compared with the cocoon sheet treated with Na 2 CO 3 chemical. Effect of cocoonase on degumming of silk cocoon, elemental analysis, and MALDI-TOF-TOF analysis of cocoonase has been carried out showing the similar degumming effect of cocoonase on silk sheet [53].
Optical microscopy-based analysis of cocoon sheet has also been performed to detect micro-structural correlation between OCT B-scan and cocoon sheet cross-section. Present study revealed morphological variations under control, cocoon sheet degummed with cocoonase enzyme in which sericin is released showing treatment variations observed in the OCT B images (Fig. 17a-c). Sericin a glue protein that binds with the fibroin protein and its removal were the target of chemical and enzyme-based treatment/degumming that resulted in fibroin (silk thread) for easy reeling. Observable variations in the peaks of A-scan images (Fig. 17d-f) indicating release of sericin by chemical and enzyme treatment have been recorded. Degumming with Na 2 CO 3 affected the thermal stability and mechanical properties of silk fibroin membranes, and Na 2 CO 3 degumming process caused serious damage to the heavy chain of silk fibroin [69]. OCT has been used in getting the 2D images of subsurface structures in wheat-infected leaf [58], OCTbased structural changes in rice leaves during senescence [3], and morphological characterization of rice leaf bulliform and aerenchyma regions [37]. ; B-scan of silk cocoon sheet treated with Na 2 CO 3 chemical (c). Image size 6 mm (width) × 0.9 mm (height). Red dotted rectangular boxes show the "ROI" region and presented in middle panel. d, e, f Averaged A-scans of "ROIs" for control, treatment with cocoonase enzyme, and with Na 2 CO 3 chemical, respectively

Conclusion
Phylogenetic analysis of cocoonase and sericin revealed their evolutionary relationship between different species. Secondary structure as well as 3D structure prediction of cocoonase and sericin disclosed the atomic structure while I-TASSER predicted the most stable structure. Both proteins were searched in PDB for predicting their structural closeness to the target in the PDB, ligand-binding sites, and active sites. EC predictions revealed that cocoonase (a hepsin and belongs to peptidase family S1A) has EC number 3.4.21.106, while sericin (a IgAspecific serine endopeptidase that also belongs to peptidase family S1A) holds EC number 3.4.21.72. Stability analysis by normalized B-factor and Ramachandran plot showed that cocoonase contains residues in the most favored region. UCSF Chimera showed the hydrophobicity nature of both proteins representing the presence of hydrogen bond between the atoms of protein. Natural cocoonase treatment-induced weight loss presented its degumming activity. Purification and subsequent SDS-PAGE analysis of cocoonase showed its molecular weight of 25-26 kDa. Silk cocoon sheet obtained after the degumming with cocoonase exhibited the natural color, smoothness, and luster, while silk sheet treated with chemicals showed comparable weaker strength. SEM-and OCT-based study also evidenced morphological variations in control, cocoonase, and chemical degummed silk cocoon sheet.