In silico modelling and characterization of eight blast resistance proteins in resistant and susceptible rice cultivars

Background Nucleotide-binding site-leucine-rich repeat (NBS-LRR) resistance genes are the largest class of plant resistance genes which play an important role in the plant defense response. These genes are better conserved than others and function as a recognition-based immune system in plants through their encoded proteins. Results Here, we report the effect of Magnaporthe oryzae, the rice blast pathogen inoculation in resistant BR2655 and susceptible HR12 rice cultivars. Transcriptomic profiling was carried out to analyze differential gene expression in these two cultivars. A total of eight NBS-LRR uncharacterized resistance proteins (RP1, RP2, RP3, RP4, RP5, RP6, RP7, and RP8) were selected in these two cultivars for in silico modeling. Modeller 9.22 and SWISS-MODEL servers were used for the homology modeling of eight RPs. ProFunc server was utilized for the prediction of secondary structure and function. The CDvist Web server and Interpro scan server detected the motif and domains in eight RPs. Ramachandran plot of eight RPs confirmed that the modeled structures occupied favorable positions. Conclusions From the present study, computational analysis of these eight RPs may afford insights into their role, function, and valuable resource for studying the intricate details of the plant defense mechanism. Furthermore, the identification of resistance proteins is useful for the development of molecular markers linked to resistance genes. Supplementary Information The online version contains supplementary material available at 10.1186/s43141-020-00076-0.


Background
Rice is the most important food crop in the world and a primary source of food for more than half of the world's population. The causal agent, Magnaporthe oryzae B.C. Couch, has been used for several decades as a model organism for understanding the mechanism underlying the host and fungal pathogen interaction [1]. The most important group of genes that have been used by breeders for disease control is the plant resistance (R) genes. Hence, building up a host resistance gene repertoire is of prime concern [2]. Resistance genes (R) are members of a very large multigene family, and these R genes are distributed throughout the 12 rice chromosomes except for chromosome 3 [3]. The rice plants are protected against the pathogens, upon infection, by inducing defense mechanisms via the induction of hypersensitive response (HR), which occurs via gene-forgene recognition of a pathogen effector and a rice plantencoded resistance (R) protein [4]. The majority of rice blast resistance genes encode proteins that have a putative central nucleotide-binding site (NBS) and carboxy-terminal leucine-rich repeats (LRR). These NBS-LRR proteins are divided into two major classes: the first class has an N-terminal domain that shares homology with the mammalian Toll-interleukin-1-receptor (TIR) domain while the second class encodes an amino-terminal coiled-coil motif (CC-NBS-LRR) [5].
Rice blast resistance genes are usually constitutively expressed in plants [6]. Major blast R genes like Pi-ta, Pi-d3, Pi-b, Pi-k1, and RGA were cloned and studied. These blast R genes encode Nod-like receptor (NLR) family proteins that may directly or indirectly interact with fungal effectors to trigger immunity [7].
Reports on structural studies of proteins encoded by blast resistance genes are scanty. Pita and pi54 are the examples of the characterized proteins, and other than these, relatively little is known about the downstream interacting partners of plant NBS-LRR proteins [5,8].
Crystal structures of mammalian NBS and LRR domains are taken as templates for homology-modeling approaches as complete structures are not available for plant NBS-LRR proteins [9]. NBS-LRR proteins are known to be involved in defense mechanisms in plants. However, other functions carried out by these proteins and their mechanism of action have not been elucidated well. There are many reported protein sequences with functions yet to be experimentally confirmed. These uncharacterized proteins offer a potential for finding numerous applications as biological markers. This can be achieved by using various computational approaches to predict the three-dimensional structure and function of target proteins. Homology modeling is the most accurate method for the structure prediction of uncharacterized protein [10].
In the present study, we modeled eight blast resistance proteins obtained by in silico approach, which are expressed during host-pathogen interaction, predicted their diverse structures, and identified the different domains and binding sites. Structural analysis of these resistance proteins is important for understanding the interaction between Avr effector proteins and resistance proteins which are the hallmark of plant defense mechanism.

Plant materials and inoculation
The seeds of resistant BR2655 and susceptible HR12 rice cultivars are collected from Zonal Agricultural Research Station, V.C. Farm, Mandya. These seeds were surfaces sterilized and grown in a greenhouse in a culture chamber (14 h light/10 h dark at temperature 28 ± 1°C) for 20 days. The rice seedlings were sprayed with conidia at an inoculum concentration of 1 × 10 5 cells per milliliter containing 0.1% Tween-20. Seedlings which were mock inoculated with 0.1% Tween-20 solution served as control. After inoculation, the leaves were collected separately at 24 h intervals and immediately frozen in liquid nitrogen, and then stored at − 80°C.

RNA isolation
Total RNA from rice plant tissues was extracted using the RNeasy Plant Mini Kit (Qiagen, Germany). The quantity and quality of RNA samples were determined using Nanodrop 2000 (Thermo Fisher Scientific) and Agilent 2100 Bioanalyzer (Agilent Technologies). Libraries were prepared using NEBNext® Ultra™ RNA Library Prep Kit for Illumina according to the sample preparation guide. Paired-end sequencing was performed with the TruSeq SBS Kit (Illumina Inc., USA) on Illumina NextSeq 500 (Illumina., USA) (Supplementary Data 2). The sequencing reads were filtered using default parameters for removing the low quality and contaminated reads using readqc analysis. The HQ reads after quality filtering was used for downstream analysis.
Reference-based assembly and differential gene expression The overall work carried out in this study is presented in the flow chart (Supplementary Figure 1). High-quality clean reads were mapped to the rice reference genome RGAP7 (http://rice.plantbiology.msu.edu/) using a reference assembly tool of CLC Genomics Workbench and mapping parameters are presented in (Supplementary  Table 4).
The rice blast resistance genes expressed in BR2655 and HR12 (upregulated > 3) cultivars were selected based on the keywords "Resistance" and "LRR" (Supplementary Table 2 & 3). Eight genes were shortlisted for the structure determination of their encoded proteins (Supplementary Table 1). Out of these, two genes were chosen from BR2655 and HR12 exclusively. The remaining four genes were selected based on their presence in both the cultivars. The transcripts with log 2 fold change ≥ 3 (upregulated genes) and ≤ 3 (downregulated genes) with P value cutoff of ≤ 0.05 were considered as differentially expressed transcripts at a significant level. Eight proteins expressed by these eight genes were considered for further structural characterization.

Amino acid sequence retrieval and analysis
Amino acid sequences of NBS-LRR of eight resistance proteins (RP1, RP2, RP3, RP4, RP5, RP6, RP7, and RP8) were retrieved from the Rice Genome Annotation Project. The amino acid sequences of eight resistance proteins were stored as FASTA format sequence and used for further analysis (Supplementary Data 1). The physical and chemical parameters were determined by using the ExPASy Prot Param. The similarity search was performed against the non-redundant database in protein data bank (PDB), and PDB structures were used to search similar structures to that of eight RP proteins using PSI-BLAST tool [11].

Conserved motif structures and phylogenetic analysis of RP proteins
The amino acid sequences of eight resistance proteins were subjected to domain and motif search by using The CDvist Web server [15] and Interpro scan server [16]. The eight resistant protein sequences were aligned for Multiple Sequence Alignment (http://www.ebi.ac.uk/ Tools/msa/muscle/) to observe the homology sequence alignment among resistance proteins using ClustalW [17]. The phylogenetic analysis was performed to see the evolutionary relationship among reported resistance genes such as Pita, Pid3, Pik2, Pib, Pi54, Pik1, RGA4, and RGA5 and the eight resistance proteins (RP1 to RP8).

Structure and function analysis of RP
The secondary structures of eight resistance proteins were predicted by using the RaptorX (http://raptorx. uchicago.edu) prediction server [18] and ProFunc server [19] which use methods such as fold matching, residue conservation, surface cleft analysis, and functional 3D templates [19].

Differential gene expression analysis
Experiments on disease screening revealed the different stages of resistance between BR2655 and HR12 rice cultivars. BR2655 and HR12 seedlings inoculated with M. oryzae (M036) conidial suspension, showed disease scoring 2 (resistant), and 8 (susceptible) respectively based on the IRRI SES scale.
We observed the difference in gene expression profiling in BR2655 and HR12 rice cultivars during infection by M. oryzae. In total, we obtained 75.8 and 69.7 million raw reads for BR2655 and HR12 rice cultivars, respectively (Supplementary Data 3 and Supplementary Figure  2). We identified 7577 and 4290 differentially expressed genes (DEG) in the resistant line (R) (BR2655) and susceptible line (S) (HR12), respectively. As per the "LRR" keyword search, 22 transcripts, which are upregulated in BR2655 cultivar were shortlisted and with the "Resistance" keyword search, 36 transcripts were enlisted. Correspondingly, the upregulated transcripts were shortlisted in HR 12 cultivar using the same keyword search, and there were 17 and 38 transcripts, respectively.

Amino acid sequence retrieval and analysis
The amino acid sequences of eight RPs were retrieved from the RGAP. The eight RPs were analyzed for amino acid composition by the ExPASy ProtParam tool (Table 1). Leucine was the most frequent amino acid present in the sequence in all eight RPs and the percentage of leucine residues ranged from 11.4 to 15.0%. RP5 was found to have the least percentage of leucine content, i.e., 11.5%, and RP2 showed the highest percentage of leucine content, i.e., 15.0%.
PSI-BLAST analyses of eight RPs were performed against non-redundant protein to determine the protein family, and top 4 best BLAST scores were obtained for each RP ( Table 2). The sequence identity ranged between 57 and 100% in eight RPs, and the query coverage was in the range of 68-100%.

Physico-chemical properties of eight resistance proteins
Physico-chemical parameters like molecular weight, pI, amino acid composition, estimated half-life, and instability index were performed in ExPASy ProtParam. The predicted molecular mass of eight RPs (RP1 to RP8) had 97.8 kDa, 168.3 kDa, 168.2 kDa, 122.7 kDa, 146.2 kDa, 116.4 kDa, 117.4 kDa, and 102.5 kDa, respectively. The isoelectric points (pI) of eight RPs were found to be in the acidic range except for RP3 and RP8, which were slightly basic. The eight RPs showed high aliphatic index values (88.09-103.58), which indicated that the proteins are stable for a broad range of temperatures [22].

Motifs and phylogenetic analysis
The CDvist Web with HMMER3 against Pfam 30.0 and Interpro scan server was performed to identify the domains of eight resistance proteins. NBS-LRR proteins are the plant disease resistance proteins, which share similar sequences and domains [23]. The results of the CDvist analysis revealed that the NB-ARC domain was found in all the eight resistance proteins and the LRR domain was identified in RP1, RP4, RP5, RP6, and RP7 proteins. RX-CC coiled-coil domain was identified in RP4, RP6, RP7, RP8 (Fig. 1a), and most of the NBS-LRR proteins contained some unknown domains, which were symbolized as X. Similarly, an Interpro scan server was used to predict the different domains: NB-ARC and LRR domain architectures were detected in all the eight resistance proteins. RX-CC domains are recognized in RP4, RP6, RP7, and RP8, and P-loop architecture was identified in RP1, RP2, RP3, RP4, RP6, and RP7 ( Table 3).
The amino acid sequences for eight resistance proteins aligned separately to identify their sequence diversity and phylogenetic relationships (Fig. 1b) revealed two distinct clades. RP5 and Pita branched out from RP4 and another cluster formed by RP3 and RP8 and they branched out from Pik2. In the other clade, RP1 and RP2 of resistant cultivar branched out from RP6 and RP7 and clustered out with reported blast resistance proteins.

Protein modeling
SWISS-MODEL servers adopted to model the eight resistance proteins (Table 4) revealed that eight RPs share only 12.84-18.85% sequence identity and 26-65% query coverage with root-mean-square deviation (RMSD) of 1.74 Å-4.50 Å. Further, the top four templates were selected for each RP from the PSI-BLAST program to build the best target model by using Modeller 9.22 server. Template models were taken from Protein Data Bank. The best target models were selected based on the lowest DOPE (discrete optimized protein energy) for each RP in Modeller which led to our secondary structure predictions. These PDB files of eight resistance proteins were visualized by UCSF Chimera software (Fig. 2).

Secondary structure and function prediction
The secondary structure predictions from the ProFunc server showed predominant alpha-helical coiled structures in all eight RPs (Fig. 3). In the RaptorX property prediction of eight RPs showed that 26-41% of residues are involved in the α-helices structure formation, 9-14% residues are arranged in β-strands, and 48-60% of the residues occur as coils. The solvent accesses of exposed, medium, and buried regions of eight RPs were found to be in the range of 26-31%, 38-48%, and 25-30% in RaptorX prediction function.

SuperPose
The root means square deviation (RMSD) that measures the distance between corresponding residues and accurate models should have < 2.0 Å value [27]. RMSD

Structure validation
Ramachandran plot showed the distribution of φ and ψ angle in the eight resistance protein models within the limits (Fig. 5). Ramachandran plot statistics displayed that 647 amino acid residues (83.1%) are in the favored region, 99 amino acid residues (12.7%) are in the additional allowed region, and 24 amino acid residues (3.1%) are in the generously allowed region, while only nine amino acid residues (1.2%) are in the disallowed region in RP1. RP2 showed that 1323 amino acid residues (98.4%) are in the allowed region and 21 amino acid residues (1.6%) are in the disallowed region. RP3 displayed 1223 amino acid residues (98.3%) are in the allowed region and 21 (1.7%) in the disallowed region. RP4 showed 966 amino acid residues (98.2%) are in the allowed and 17 amino acid residues (1.7%) in the disallowed region. RP5 showed 1101 amino acid residues (94.9%) is allowed and 58 amino acid residues (5.8%) in the disallowed region. RP6 showed 908 amino acid residues (98.0%) in the allowed region and 17 amino acid residues (1.8%) in the disallowed region. RP7 identified 905 amino acid residues (97.4%) is allowed and 24 amino acid residues (2.6%) in the disallowed region. RP showed 804 amino acid residues (98.3%) in the allowed and14 amino acid residue (1.7%) in the disallowed region. Ramachandran plot of eight resistance proteins confirmed that the model structures are following dihedral angles of Ramachandran plot occupied favorable positions [12].

Discussion
In rice plants, many resistance genes are mostly polymorphic [28] which are involved to initiate the cascade signaling to trigger the defense response in rice plants.
Understanding the differential gene expression of resistance genes and their action may give new insights to study the differential transcriptional regulation in rice plant and model the blast resistance protein structures which are expressed during M. oryzae infection. Transcriptomic studies carried out during the present investigation revealed some common transcripts for both BR2655 and HR12. We also observed many of the transcripts are unique to the resistant cultivar, viz., BR2655, and susceptible cultivar HR12 separately. A similar difference in the transcriptomic profiles was observed in Italian rice varieties Gigante Vercelli and Vialone Nano cultivars of rice [29]. Many researchers have carried out previously transcriptomic studies in rice cultivars inoculated with different pathogens [30,31]. Many such studies are carried out in different plants like tomato, banana, and maize [32,33]. We observed the difference in the response of pathogenesis and gene expression profiling in BR2655 and HR12 rice cultivars during infection by M. oryzae. When the transcripts of cultivars are analyzed, it is important to analyze different categories of transcripts expressed, viz., only in resistant, both in resistant and susceptible and only in susceptible cultivars. Accordingly, we identified the R genes which are expressed exclusively in the resistance cultivar BR2655 which may play a decisive role in conferring resistance to the plants. The R genes which are expressed in both may be responsible for initiating the defensive mechanism. The r genes which are expressed exclusively in susceptible cultivars may not be effective in overcoming a specific pathogen like M. oryzae due to change in the cascade of multiple signaling pathways resulting in the virulence of a given pathogen.
In the present study, we thus have shortlisted eight blast resistance transcripts, two each exclusively from BR2655 and HR12 and four common to both the rice cultivars, and their protein sequences were retrieved for further computational analysis and characterization. Resistance proteins are broadly classified into eight groups based on their different conserved domain organization and secondary structures. These resistance proteins are the key players in the plant defense signaling mechanisms [4]. All the shortlisted resistance proteins bearing NBS were observed, and physiochemical properties of all the RP proteins were analyzed in the current study.

Motifs and phylogenetic tree
The eight resistance proteins encode nucleotide-binding site leucine-rich repeats (NBS-LRR) involved in the defense mechanism which share similar sequences and domains [23]. NBS-LRR domains investigated during the current study and most of the domains reported by earlier workers contained some unknown domains, which were symbolized as X [34].. Coiled-coil amino acid sequences of resistance proteins are involved in the signal transduction in many cell processes [35]. NB-ARC and LRR domain architectures were detected in all the eight resistance proteins. These domains play an important role in plant resistance gene inactivation of downstream effectors [36]. The conserved amino acid residues of RP1 was involved in the binding of ADP and reported to be involved in signal transduction [37]. RP2 involves in the binding activity for LMB (Leptomycin B), and RP4 conserved amino acid residues were involved in the binding site for 2S2 -(2S)-2-(1H-INDOL-3-YL) hexanoic acid, having binding sites for ADP [37] and RP8 conserved amino acid residues in the binding site residues for 2S2-(2S)-2-(1H-INDOL-3-YL) hexanoic acid, which acts as ADP binding site [38]. The results revealed that these eight resistance proteins might involve in the transmembrane transport of metal ions.
A phylogenetic tree generated based on the amino acid sequences of these 8 resistance proteins is divided into two groups. A similar analysis was reported for resistance proteins in verticillium wilt-resistant cotton plants [32]. The eight resistant proteins are having a structural and functional relationship with each other indicating that these proteins are well conserved and evolved in the same family of their evolutionary history. Similar studies were also reported in Arabidopsis during pathogen attack Pseudomonas syringae [39].
The physico-chemical properties of eight resistance proteins and their amino acid composition revealed the existence of high-frequency leucine-rich repeats. They are involved in hydrophobic interactions and conformational stability of the RP proteins [40]. The aliphatic index of hypothetical eight RPs revealed that they are stable even at high temperatures. These RPs have hydrophilic amino acids which are capable of interacting with surrounding water molecules intracellularly [41].

Homology modeling
SWISS-Model and Modeller servers were used to build the three-dimensional structures of eight RPs. The templates showed similarity or identity for the eight resistance proteins, and these template models are associated with cell death protein 4 (PDB ID: 3lqr.1.A, PDB ID: 2a5y.1.B), which are localized to the nucleus in proliferating cells [42].
Recently, the cryo-electron microscopy structure of NBS-LRR, such as the wheel-like pentameric ZAR1 resistosome, has been revealed in Arbidopsis thaliana [43]. Similar template structures were used for modeling in the current study, and it helped us to decipher the binding site variations that may occur in different resistance proteins. RP2 was found to be similar to LRR receptor-like serine/threonine-protein kinase FLS2 (PDB ID: 4mn8.1.A) which functions as a pattern recognition receptor [37]. RP4 resembles Toll-like receptor 3 (PDB ID: 5gs0.1.A) which functions as pathogen recognition and activation of innate immunity protein [44], RP5 NLR family CARD Domain-containing Protein 4 (PDB  [45,46]. This paper sheds light on the modeling of eight hypothetical resistance proteins showing homology to the template models which are mainly involved in defense mechanisms. Similar protein modeling carried out and reported on the orthologue of Pi54 designated as Pi54of from Oryza officinalis was studied and modeled [8,47].

Protein structure, function, and validation
The secondary structure predicted for the eight-resistance protein by using RaptorX showed that residues are involved in the formation of α-helix, β-sheet, and coils structures. Profunc predicts the probable functions based on the 3D structure of the target protein [26]. Super-Pose detects the root means square deviation (RMSD) that measures the distance between corresponding residues and accurate models should have < 2.0 Å value [27]. Ramachandran plot of eight resistance proteins confirmed that the modeled structures are following the dihedral angles of the Ramachandran plot and occupied favorable positions (Fig. 5) [2]. Further work is needed to understand the difference between resistance proteins of resistant and susceptible rice cultivars. This can be understood once the structure of effector molecules expressed by the Avr genes of the pathogen is available. Hence, there is a need to elucidate the effector molecules to understand the interaction of resistance proteins and effector proteins. Disease resistance in plants is more often regulated by a gene-for-gene mechanism in which Avr proteins encoded by pathogens are particularly detected by plant disease R proteins directly or indirectly. Avr proteins trigger defense response elements by changing the membrane ion flux, irreversible plasma membrane damage, production of extracellular reactive oxygen intermediates, and alter in Biological process gene expression [48]. The protein-protein interaction of Avr and R proteins becomes evident by the ability of a host to detect pathogen effectors. However, there are relatively few reports on direct interactions between Avr and R proteins [49][50][51][52]. Avr proteins presumably enhance the virulence factors by hindering the innate immune systems of host plants in the absence of recognition by R proteins [53,54]. Few Avr genes are identified and their protein interactions with resistance protein are yet to be structurally characterized [6]. The present study was aimed to identify the transcripts involved in rice blast resistance in resistant BR2655 and susceptible HR12 cultivars and model the threedimensional structures, function predictions, conserved motifs, and validations of NBS-LRR of eight hypothetical resistance proteins (RP1, RP2, RP3, RP4, R5, RP6, RP7, and RP8) using computational tools. No previous studies are found on this NBS-LRR of eight resistance proteins. Hence, we have modeled the eight resistance proteins, which are found to be stable, with well-defined compact reliable three-dimensional structures, by using highly reputed computational tools. The eight resistance proteins were modeled using SWISS-MODEL, I-TASSER, and RaptorX server tools. The secondary structure predicted by RaptorX and ProFunc displayed the presence of α-helix, β-strands, and random coils. ProFunc, Motif, SuperPose, and Ramachandran plot servers were used to predict the structure and function of eight resistance proteins. These eight resistance proteins will function as a valuable resource for studying the intricate details of the plant defense mechanism.

Conclusions
In silico studies provide an opportunity to accomplish the modeling and analysis of resistance proteins by employing  various modeling applications. In the current study, blast resistance transcripts expressed were shortlisted by transcriptomic profiling. Protein sequences of expressed transcripts were selected to determine the physicochemical properties and structures of resistance proteins using in silico techniques. Primary structure analysis revealed that all the resistant and susceptible encoded resistance proteins are rich in leucine residues which seem to correlate with the reported resistance proteins. The secondary structure analysis confirmed that in all the eight sequences, alpha helix dominated, followed by beta turns and then coils. Three-dimensional structure predictions were analyzed by different homology servers, viz., Swiss model and Modeller 9.22. The model structures were validated by a protein structure checking tool called Rampage. The in silico modeled eight resistance proteins are promising candidates for providing insights into domain structures. We hope that further studies with the structure of these resistance proteins and their interactions will provide a better insight into the precise molecular mechanism involved in plant defense.