Computational identification of significant immunogenic epitopes of the putative outer membrane proteins from Mycobacterium tuberculosis

Novel vaccines are required to effectively combat the epidemic spread of tuberculosis. Using in silico approaches, this study focuses on prediction of potential B cell and T cell binding immunogenic epitopes for 30 putative outer membrane proteins of Mtb. Among these, certain immunodominant epitopes of Rv0172, Rv0295c, Rv1006, Rv2264c, and Rv2525c were found, which are capable of binding B-cell and a maximum number of MHC alleles. The selected immunodominant epitopes were screened for their allergenic and antigenic properties, their percentage identity against the human proteome and their structural properties. Further, the binding efficacy of the immunodominant epitopes of Rv0295c and Rv1006 with HLA-DRB1*04:01 was analyzed using molecular docking and molecular dynamics studies. Hence, the in silico-derived immunogenic peptides (epitopes) could potentially be used for the design of subunit vaccines against tuberculosis. Supplementary Information The online version contains supplementary material available at 10.1186/s43141-021-00148-9.


Introduction
Tuberculosis (TB) is caused by pathogenic bacillus Mycobacterium tuberculosis (Mtb) and is a deadly disease that affects millions of people worldwide. In accordance with the WHO Global tuberculosis report 2018, TB is one of the top ten causes for human deaths and estimated around 1.3 million deaths in HIV-negative people. Moreover, 10.0 million people developed TB disease in 2017. The emergence of multi-drug and extensively drug-resistant strains of Mtb increases the burden of the drug treatment regimen for TB. Currently, Bacille-Calmette-Guerin (BCG) is the only available vaccine for treating TB. In infants, it is shown to have a protective effect against tuberculous meningitis and miliary tuberculosis [28]. However, in adults, it is shown to have only limited protection against pulmonary TB. Moreover, it causes more severe complications such as suppurative lymphadenitis, osteomyelitis/osteitis, and disseminated BCG infection. Disseminated BCG infection is a severe adverse reaction that arises in people with impaired immunity. Therefore, the BCG vaccine is not being given for HIV positive patients and for infants born to HIVpositive mothers. Due to the limitations of the BCG vaccine, we need novel and effective vaccines against all forms of TB.
Immunoinformatics involves the use of computational tools to predict the immunogenic epitopes or peptides which could be used to design ideal subunit vaccine candidates. These tools simply use the organism's genetic information, and it reduces the cost and time taken for the development of vaccines [8]. Subunit vaccines usually consist of certain immunoactive biomolecules such as polypeptides and glycolipids and usually, they need the help of an adjuvant for inducing immune protection. These can be easily prepared at low cost and highly specific and efficient with minimal side effects [18]. Many new and promising subunit TB vaccine candidates are in various stages of clinical trials [10,12].
Outer membrane proteins (OMPs) play an important role in the host-pathogen interactions and in maintaining the integrity and permeability of the cell membranes. Due to their localization on the mycobacterial surfaces, they can be easily targeted by the host immune system and hence they are ideal candidates for vaccine design [14]. Recently, Baliga et al.,in [5], have predicted immunogenic epitopes of the OMPs of the pathogen Vibrio anguillarum. Similarly, Rauta et al.,in [22], have predicted immunogenic epitopes of the OMP's of the pathogen Vibrio cholerae. Song et al., in [27], have identified 144 putative OMPs of Mtb which could play some crucial role in mycobacterial pathogenesis. In this study, using computational approaches we intend to identify the potential immunogenic epitopes of 30 putative OMPs of Mtb. We believe that this study will provide suitable leads for the design of peptide-based subunit vaccines using OMP's of Mtb.

Methods
The overall methodology adopted in this study to determine potential vaccine candidates of the putative OMPs of Mtb is depicted in Fig. 1.

Sequence retrieval
FASTA Sequences of 143 putative OMPs of Mtb were retrieved from UniProtKB protein database and subjected for epitope prediction. UniprotKB Ids of the retrieved sequence is given in Table S1. The protein sequence of Rv1784 (one of the putative OMP) was not found in Uni-protKB database.
Sequence-based B cell prediction B cell epitope prediction for the retrieved FASTA sequences of the OMP's of Mtb was performed using IEDB tools. BepiPred Linear Epitope prediction method [16] was employed which uses a propensity scale of amino acids and Hidden Markov models for the prediction of potential immunogenic B cell epitopes. Default parameters were employed for the prediction.

Evaluation of antigenic and allergenic properties of the predicted epitopes
Antigenic and allergenic values of the predicted B cell epitopes were calculated using VaxiJen (antigenic proteins should possess a score above 0.4) [9] and AlgPred servers (non-allergenic protein sequences should possess a score lesser than − 0.4) [24] using default parameters.

Homology of the epitopes with the human proteome
The B cell binding epitopes were further screened for their similarity against humans, in order to avoid crossreactivity. BLASTp program [2] was used to check the similarity of the epitopes against humans. The default non redundant protein sequences (nr) database was employed for similarity searching. All the other parameters were set to default values. The epitopes having Fig. 1 The overall methodology adopted in this study lesser than 80% similar to the human proteome were further analyzed for its structural properties.

T cell epitope prediction
The predicted B cell epitopes were further subjected for T cell binding prediction. The MHC-I binding predictions were made using the IEDB analysis resource Consensus tool [13] which combines predictions from ANN aka NetMHC (4.0) [3,17,20], SMM [21] and Comblib [26]. The reference set of 27 MHC-I alleles was used for the prediction [32]. The peptide length was set to 10. The high affinity binding epitopes were selected based on their percentile rank, which is set to 20.
The MHCII binding predictions were made using the IEDB analysis resource Consensus tool [30,31]. The reference set of 27 MHC-II alleles was used for the prediction [11]. The peptide length was set to 15. The high affinity binding epitopes were selected based on their percentile rank, which is set to 20.

Selection of immunodominant epitopes (IDEs) of the putative OMPs of Mtb
Immunodominant epitopes (IDEs) are regions which can bind B cell as well as maximum number of MHC-I and MHC-II alleles. The identification of IDEs has immense potential as it can lead to strong immune response and it can be effectively used to design peptide based vaccines. This method of finding IDEs was successfully employed by Verma et al. in [29] for the design of DnaK peptide vaccine against S. typhi.

Prediction of transmembrane topology and the solubility of the epitopes
Structural properties of the IDEs such as solvent accessibility, transmembrane topology, and solubility upon overexpression were predicted using ACCpro, ABTMpro, and SOLpro tools respectively, found in the SCRA TCH protein prediction server [7]. Solvent accessibility of the epitopes is an important criterion as the epitopes should be exposed for the interaction of the immune cells. Prediction of transmembrane topology for the epitopes is important because proteins spanning the membrane are difficult to clone and express; therefore, epitopes which are non-transmembrane proteins could be ideal vaccine candidates. The predicted epitopes should also be soluble on over-expression, so the solubility check was also performed.
Molecular docking and molecular dynamics studies of the IDEs with HLA-DRB1*04:01 The selected IDEs of Rv0295c and Rv1006 were modeled using PEPFOLD 3 server [25] and were made to dock with HLA-DRB1*04:01(PDB ID: 5JLZ) using Cluspro server [15]. The higher ranked epitope-HLA complex was further subjected to molecular dynamics studies using GROMACS 2019 [1] software for 20ns. For the MD setup, GROMOS 43a1 force field was used and the epitope-HLA complex was placed in a cubic box filled with spc water molecules. The complex was neutralized by adding corresponding ions and energy was minimized using steepest-descent algorithm. Further, the complex was subjected to NVT and NPT equilibration steps for 100 ps, each. The temperature and the pressure were fixed at 300 K and 1 bar, respectively. Finally, the all atom MD run was performed for 20ns. The coordinates was written for every 10 ps. RMSD and RMSF of the epitope-HLA complex was computed using the GRO-MACS in-built tool namely, rms. Xmgrace was used to plot the graph.

Results
Song et al., in [27], have identified 144 putative OMPs of Mtb and we have used this list of OMPs for the prediction of potentially immunogenic epitopes. B cell epitopes have been predicted for all the 144 OMP's of Mtb and its antigenic and allergenic properties have been calculated. The B cell epitopes which are allergenic, nonantigenic are not considered for further analysis. Moreover, in order to avoid cross-reactivity, the predicted B cell epitopes whose similarity is greater than 80% against the human proteome were further excluded from our analysis. By applying all the above criteria's, we have predicted B cell binding epitopes for 30 putative OMPs of Mtb. Additionally, to predict IDEs for the putative OMP's of Mtb, the B cell binding epitopes were further subjected to T cell binding prediction.
The list of B cell epitopes predicted from the 30 putative OMPs of Mtb, along with their Vaxijen and AlgPred scores and the number of MHC alleles capable of binding these epitopes is given in Table 1. Further, we selected certain IDEs (given in Table 1) which are predicted to bind B-cell and the maximum number of MHC alleles (at least capable of binding > 25 alleles each from Class I as well as Class II). Further, the selected IDEs were checked for solvent accessibility, transmembrane topology, and solubility upon overexpression.
Five selected IDEs are discussed below: 1. 382 ASTASTLPKE IAYSEPRLQPPNGYKDTTV PGIWVPDTPLSHRNTQPGWVVA 432 of Rv0172 is predicted to be a B cell binding epitope and is predicted to bind all the 27 reference alleles of MHC Class I and Class II, respectively. Moreover, it is predicted to be antigenic and non-allergenic, cannot find significant similarity against the human proteome. This IE is a non-transmembrane protein; solvent exposed and is predicted to be soluble when over-expressed. Additionally, Rv0172 belongs to  LG 264 B-cell binding epitope of Rv0295c is nonallergenic and antigenic and binds 26 alleles of MHC Class I and 27 alleles of MHC Class II. It has no sequence similarity with the human proteome, solvent-exposed, non-transmembrane protein and is predicted to be soluble when over-expressed. In fact, Rv0295c is a Trehalose 2-sulfotransferaseand it involves in catalyzing the transfer of a sulfuryl group from 3′-phosphoadenosine-5′-phosphosulfate (PAPS) to trehalose, which leads to the synthesis of trehalose-2-sulfate (T2S) [19]. 3. The next IDE is " 24 LNGCSSSASHRG PLNAMG SPAI PSTAQEIPNPLRGQ 59 " from Rv1006 is predicted to be a B-cell binding epitope which also binds 26 and 27 alleles of MHC Class I and Class II alleles, respectively. It is antigenic, non-allergenic and has least similarity to the human host. Additionally, it is solvent exposed, non-transmembrane and soluble upon overexpression. Rv1006 is believed to be a conserved hypothetical protein. 4. " 383 AWSEADEDSHI GPAPGYTAARPSL SFDHDA HAEPEPKSPPIPW 425 " is predicted to be a B cell binding epitope from Rv2264c, it also binds all of the 27 reference alleles of Class I and Class II, respectively. It is also predicted to be antigenic, non-allergenic and has least similarity to the human proteome. It is solvent exposed non-transmembrane and soluble upon overexpression. Rv2264c is a conserved hypothetical protein. 5. " 102 YGKGSTADWLGGA SAGVQHARRGSELHA AAGGPTSAPIYA SIDDNPSYEQYK 153 " is predicted to be a B cell binding epitope from Rv2525c, it also binds all of the 27 reference alleles of Class I and Class II, respectively. It is also predicted to be antigenic, non-allergenic and has least similarity to the human proteome. It is solvent exposed non-transmembrane and soluble upon overexpression. Rv2525c is a tat secreted protein and it functions as a putative peptidoglycan hydrolase [6].

Molecular docking and molecular dynamics studies
The epitopes of Rv0295c and Rv1006 was modeled using PEPFOLD 3 server and was subjected to molecular docking studies with the 3D structure of HLA-DRB1*04: 01 using Cluspro server. The other epitopes could not be modeled by the PEPFOLD 3 server as the length of the epitope was greater than 50 amino acids. The epitope of Rv2264c was having 52.17% similarity (Table 1) with the human proteome was also excluded for molecular docking and dynamics studies. The top-ranked epitope-HLA-DRB1*04:01 complex was retrieved. The binding energies for the top ranked epitope-HLA-DRB1*04:01 complex is given in Table 2.

Discussion
Vaccination is the best efficient method to treat TB. BCG is the currently available vaccine against TB. It expresses an Mtb immunodominant protein Antigen 85B (Ag85B) [23]. The antigen85 (Ag85) proteins comprise of Ag85A, Ag85B, and Ag85C. They are well-known mycolyltransferases or Diacylglycerolacyltransferases of Mtb which involves the transfer of mycolic acids to the cell wall arabinogalactan and they possess a high binding affinity for fibronectin [33]. BCG has been very effective against severe forms of TB in infants but it has protective efficacy against adults. Due to the adverse effects of BCG, a more effective and protective vaccine against all forms of TB is currently needed. In recent years, many new adjuvantedAg85B protein and vectored subunit vaccine candidates of Ag85A are in different phases of clinical trials. ESAT-6 and certain other proteins of Mtb have also been tested for their immunogenic competence [12].
Generally, bacterial OMPs serve as potential vaccine candidates, as their exposed epitopes on the bacterial cell surface could be easily recognized by the host immune system [5,22]. In [35], Zvi et al. predicted 45 tophits antigens covering the entire genome of Mtb as potential vaccine candidates which can be incorporated in  [4]. Therefore, from our in silico study of the OMP's of Mtb, we have retrieved five IDEs (Table 1) which can bind both B cell and maximum number of T cells, antigenic, and non-allergenic, having lesser or no sequence similarity with the human proteome, non-transmembrane proteins and are predicted to be soluble when over-expressed. These five IDEs of the putative OMP's (Rv0172, Rv0295c, Rv1006, Rv2264c, and Rv2525c) of Mtb could serve as ideal candidates for the design of subunit vaccines against tuberculosis.

Conclusion
In this study, through immunoinformatics approach, potentially immunogenic epitopes for 30 putative OMPs of Mtb have been identified. Immuno dominant epitopes designed for Rv0172, Rv0295c, Rv1006, Rv2264c, and Rv2525c were predicted to be non-allergenic, antigenic and capable of binding B cells and a maximum number of MHC alleles. These epitopes also show lesser or no sequence similarity with the human proteome, solventexposed, non-transmembrane and soluble upon overexpression. Molecular docking and molecular dynamics analysis of Rv0295c and Rv1006 epitopes-HLA-DRB1*04:01 complex further enhance our study. Thus, we suggest that these in silico-derived epitopes could be useful in developing peptide-based subunit vaccines against tuberculosis.
Additional file 1: Table S1. UniprotKB IDs of the putative OMPs of Mtb .  Table S2. List of MHC-I and MHC-II alleles employed in the study. Figure  S3. Ramachandran plot of IDE of Rv0295c. Figure 4. Ramachandran plot of IDE of Rv1006. Figure S5. Ramachandran plot of IDE of Rv2265.