The complete workflow of the methodology used in this study is described in Fig. 1.
Data selection
The M. lepromatosis and M. leprae proteins ML0091, ML0405, ML1636, ML2055, ML2331, ML2346, and ML1556 were previously proved to detect leprosy at some level [26]. ML2028, ML2055, ML2380, and ML2531 were tested as immunizers in mice, and they demonstrated reduced bacterial burden [4]. NP_301196.1, NP_301663.1, NP_301805.1, NP_301958.1, NP_302056.1, NP_302185.1, NP_302232.1, NP_302292.1, NP_302342.1, NP_302490.1, and NP_302503.1 were obtained as immunogenic proteins from our previous results, through reverse vaccinology analysis [37]. The sequences of these proteins were retrieved from National Center for Biotechnology Information (NCBI) in FASTA format [38]. The antigenicity of these selected proteins was evaluated by VaxiJen [39]. In total, 22 proteins shared among both strains were used for the next steps.
Prediction of epitopes that binds to MHC I alleles
The epitopes able to bind to MHC I alleles and activate cytotoxic T lymphocytes (CTL) were predicted by two different platforms to improve the confidence of the prediction. The Immune Epitope Database and Analysis Resource (IEDB) contain thousands of high- and low-affinity epitopes used in training to enhance the accuracy of the predictor [40, 41]. Aiming to develop a diagnostic tool to be used in all endemic areas, we selected all 27 alleles with high frequency in the global population. The lengths of our peptides were 9 amino acid residues [42]. Default parameters were chosen for the prediction since they combine artificial neural network (ANN), scoring matrix method (SMM), and combinatorial library. Epitopes with percentile rank smaller than 1% were selected for our study, due to their enhanced probability to be immunogenic. NetCTL-1.2 server can assess binding affinity, antigenic processing, and transportation, integrated into the epitope prediction, using both ANN and SMM to make the predictions [43, 44]. The same alleles used in IEDB were used in NetCTL-1.2.
Prediction of epitopes that binds to MHC II alleles
For epitopes that activate helper T lymphocyte (HTL) (MHC II-binding epitopes), we also used two different predictors, IEDB tool [40] and NetMHCII-2.3 server [45]. The MHC II cleft size can accommodate epitopes from 13 to 25 amino acids; thus, we chose to use a 15-residue length as a standard, since the NetMHCII-2.3 server allows users to use this length, approving the comparison between both programs. In IEDB, we selected only epitopes with percentile rank lower than 3%. For the IC50, which is used to determine the epitopes’ affinity with the MHC, we chose an IC50 < 1000 nM [45]. ANN is also used by the NetMHCII-2.3 server with various epitope databases to increase data training and predict the epitopes [46].
Prediction of B cell epitopes
To predict linear B cell epitopes, we used ABCpred [47, 48] which uses ANN for predictions and LBtope server [49] which uses the support vector machine (SVM)-based models for the prediction. We chose the epitope’s length as 16 due to its better accuracy properties [48, 50, 51].
Filtering and immunogenicity assessment of MHC I epitopes
All the epitopes predicted were filtered through an in-house python script which compare the results from both programs for each epitope (Fig. 1A). After the recognition of epitopes predicted by the two programs, the same script was used to find overlapping epitopes between B cells and MHC II with at least nine sequential amino acid residues. The last time that the script was used was to search for the overlap between class I epitopes predicted as immunogenic by the immunogenicity tool and the remaining class I epitopes. Class I immunogenicity tool [52] uses amino acid properties and their position within the peptide to predict immunogenic properties. Only peptides with a score greater than 0.1 were chosen.
Sequence construction
The epitopes that passed through all those filters were then merged into different constructs with the sequence AAY for MHC I epitopes and GPGPG for MHC II as peptide linker sequences, which help in protein folding [53].
Evaluation of host homology and physical–chemical properties
To evaluate the similarity between the constructed protein with human proteins, and therefore reduce autoimmunity possibilities, a BLASTp was carried out. The whole multi-epitope protein sequence and its individual epitopes were submitted against the UniProtKB Human database.
Molecular mass, theoretical pI, extinction coefficient, aliphatic index, grand average of hydropathicity (GRAVY), estimated half-life for three model organisms (Escherichia coli, yeast, and mammal cells), and the instability index were analyzed through the final construct sequence using ProtParam [54]. Solubility index was also assessed by Protein-Sol [55], which evaluates several properties based on E. coli expression data.
Secondary structure prediction
The secondary structure of the final epitope construct was predicted by RaptorX template-based protein structure modeling server [56] and PSIPRED. PSIPRED predicts the secondary structure and generates the pictures by applying complex ANN and position-specific scoring matrix (PSSM) [57].
Structural modeling, refinement, and properties assessment
To predict the tertiary structure (3D), three different programs were used, and the best 3D structure was chosen based on its structural quality. For the evaluation, PROCHECK was used through SAVES v6.0 [58, 59] to generate the Ramachandran plot. Phyre2 intensive method comprises the multiple alignments of the sequence of interest with homologous sequences using threading and ab initio techniques followed by the secondary structure’s prediction with the PSIPRED. Then, a hidden Markov model (HMM) is determined with the information from these two steps combined. The models with the best scores are used, from a search in an HMM database of known protein structures, to determine the modeling and error correction [60]. Multiple-template threading (MTT) and scoring methods are used in RaptorX to predict the 3D structures and to indicate the quality of models predicted [56]. Finally, I-TASSER uses an interactive method based on the templates according to fragment assembly simulations with further refinement to construct the models [57].
To enhance he local and global quality of the modeled 3D structure, we used GalaxyWeb Server which applies the methods for the refinement of amino acid side chains using light and aggressive relaxation approaches [61].
Antigenicity, IFN-γ, IL-4, and Il-10 inducing potential
The final construct sequence was analyzed for crucial aspects related to the induction of immune responses, toxicity, and allergenicity. We used VaxiJen to assess the antigenic capacity through the automatic cross-covariance method, thus analyzing the physical–chemical properties and predicting the ability to induce immune responses without the need to do alignments [39].
The search for epitopes able to induce IFN-γ production was performed with the IFNepitope predictor, using MHC II epitopes. This predictor uses a SVM hybrid method based on motifs to perform the prediction [62]. IL-4 and IL-10 inductions were also assessed by different predictors (IL-4Pred and IL-10Pred), by the same method [63, 64]. The ProInflam web server was used as well to predict the pro-inflammatory potential of the peptides included in the protein [65].
Conformational B cell epitopes prediction
The ElliPro web-based tool was used to predict conformational B cell epitopes from the refined predicted structure of our multi-epitope protein [66]. These epitopes are generally conformational, which means they are away in linear distance but close in spatial proximity [67].
In silico cloning
To verify the capacity of cloning and expression of the multi-epitope protein in an appropriate expression vector, we performed in silico cloning. Using JCat, we adapted the codon of our peptide according to the E. coli K12 expression system’s codon usage through reverse translation. With the cDNA-optimized sequence, the codon optimization for E. coli k12 was performed, and it returned the Codon Adaptation Index (CAI), which must have a score higher than 0.8, and the GC content rate should be between 30 and 70%. Furthermore, to clone the final optimized gene sequence, we used the pET28a( +) vector obtained from the Addgene website (https://www.addgene.org/), with Blpi and BamHI restriction sites. Finally, the optimized sequence was inserted into the pET28a( +) vector using the SnapGene tool [68] to ensure protein expression.