The Tropheryma whipplei causes acute gastroenteritis to neuronal damages in Homo sapiens. Genomics and codon adaptation studies would be helpful advancements of disease evolution prediction, prevention, and treatment of disease. The codon usage data and codon usage measurement tools were deployed to detect the rare, very rare codons, and also synonymous codons usage. The higher effective number of codon usage values indicates the low codon usage bias in T. whipplei and also in the 23S and 16S ribosomal RNA genes.
In T. whipplei, it was found to hold low codon biasness in genomic sets. The synonymous codons possess the base content in 3rd position that was calculated as A3S% (24.47 and 22.88), C3S% (20.99 and 22.88), T3S% (21.47 and 19.53), and G3S% (33.08 and 34.71) for 23s and 16s rRNA, respectively.
Amino acids like valine, aspartate, leucine, and phenylalanine hold high codon usage frequency and also found to be present in epitopes KPSYLSALSAHLNDK and FKSFNYNVAIGVRQP that were screened from proteins excinuclease ABC subunit UvrC and 3-oxoacyl-ACP reductase FabG, respectively. This method opens novel ways to determine epitope-based peptide vaccines against different pathogenic organisms.
Tropheryma whipplei is an actinobacteria pathogen causing Whipple’s disease in Homo sapiens. This pathogenic problem was discovered and found to be associated with gastroenteritis, endocarditis, and neuronal damages in Caucasian individuals . Regardless of this, its lethal impact was additionally seen in canines . The credit for its name and disclosure was connected with honorable Nobel laureate G. H Whipple, who performed many explorations for lipodystrophy (malfunctioned lipid biosynthesis and ingestion) brought about by T. whipplei  has a broad-spectrum infection. Caucasian populaces, kids, sewage, and farming specialists were discovered to be generally influenced by this illness. The bacterium causes immunomodulation with an extended IL-16 discharge, IL-10 synthesis, and dysregulation of mucosal T-helper cells. Further immunological irregularities were depicted because of Whipple’s disease’s multifaceted nature . Clinical side effects of this infection were seen as extreme looseness of the bowels, loss of body weight, and weakness among patients . T. whipplei assaults lamina propria of the gastrointestinal tract and targets macrophages for its replication . Sequencing of two strains of T. whipplei (Twist and TW 08/27) was effectively led by the French researchers that already open scope for genomic examination and improvement of better treatment procedures for this lethal sickness; in their investigation, it was discovered that this actinobacterium has low GC content (46%) in correlations with other relatives of a similar order .
Current medicines like doxycycline, hydroxychloroquine, and trimethoprim/sulfamethoxazole must be used for almost 2 years and lifetime follow-up for patients [8, 9]. Later in silico concentrates on epitope-based vaccine design can become conceivable prophylaxis for Whipple’s illness . This actinobacterium has a huge encoding of surface proteins, while some are additionally connected with the enormous substance of noncoding redundant DNA. This genome additionally shows the fluctuation in genomic sets, including phase variations causing the modifications of cell proteins; this shows the importance of immune bypass and association with the host genome [1, 7]. Such uncommon genomic trademark highlights of bacterium open wide scope in discovering codon utilization patterns to uncover characteristic and mutational determination. Codons contained 3 nucleotides in sequence and coded for a particular amino acid or as a STOP codon for translation. The differences in codon usage are differences defined in codon usage bias. Equivalent codon utilization in numerous prokaryotic unicellular life forms is consistently connected with the directional mutational inclination and translational choice . Other elements like replication-translation determination, protein hydropathy, can likewise have a critical impact . In some microbial pathogen species, mutational predisposition was discovered to be strand explicit, and those living beings show differed interchangeable and nonequivalent codon utilization . This examination not just give experiences about characteristic and mutational determination pressures acting at genomic levels of T. whipplei yet besides offer a superior cognizance of transformative improvements in this host-versatile bacterium. This computational examination uncovered the data concerning profoundly translated proteins and enzymes of this bacterium, and the conceivable amino acids that can be considered in epitope-based prophylaxis plan to get the inhibitory effect on bacterial action on its host or to create a better conceivable treatment like in immunoinformatics-based recent studies [14, 15]. Ribosomal RNA (16S and 23S) codon usage patterns were analyzed here to determine the changes associated with evolutionary or phylogenetic patterns of the bacterium. In this study, we also revealed epitope-based peptide vaccine candidates against Tropheryma whipplei. The aim of the study is to determine codon usage patterns in T. whipplei, and on the basis of that we predicted epitope-based vaccine candidate by deploying latest bioinformatics tools.
Codon data retrieval
To measure the codon usage bias, retrieved codon usage tables from codon and codon pair usage tables (CoCoPUTs) database. This database showed the relative frequency that different codons are used in genes in T. whipplei RefSeq data. Similarly, codon-pair usage tables displayed the counts of each codon pair in the CDSs of T. whipplei genomic data (RefSeq) and calculated codon-pair usage bias.
All codons in the original sequence of T. whipplei strains are replaced with the corresponding redundant codon having the highest codon usage frequency. ATGme tool  was used to identify rare codons and accordingly optimize genomic sequences (http://www.atgme.org/). Genomic sequences in FASTA format pasted in the search box, and codon usage table pasted in the respective interface and processed the data for analysis of rare codons and sequence optimization.
Codon usage measurements
From the identified genomic sequences of ribosomal RNA, nucleotide composition was computed. The G + C composition of 1st, 2nd, and 3rd positions and GC1s, GC2s, and GC3s in the codons were discovered for the frequency and mean frequency identification. The frequency of synonymous third position codon and percentage, i.e., A3, T3, G3, and C3 and %A3s, %C3s, %T3s, and %G3s, respectively, was calculated. To measure the bias of synonymous codons, the effective number of codons (ENC) was identified. Additionally, codon usage, codon usage per thousand, and relative synonymous codon usage (RSCU) were also calculated using “CAIcal” tool availed from https://ppuigbo.me/programs/CAIcal/.
Epitope-based vaccine prediction
Proteomic data for Tropheryma whipplei was accessed from NCBI GenBank database, and then allergenicity was estimated by deploying AllergenFP server . NetMHCIIpan-4.0 server  was used to screen epitopes from selected proteins that can interact with human leukocyte antigen (HLA) proteins. VaxiJen 2.0 tool  was used to reveal antigenicity of screened epitopes. Epitopes structure was predicted by using PEP-FOLD 3.5 , and HLA allelic determinant HLA DRB1_0101 (PDB-ID:1AQD) was retrieved from RCSB-PDB database. Biochemical properties for epitopes were calculated by using ProtParam tool of ExPASy web server.
Molecular docking between epitopes and HLA determinants was done by using PatchDock , FireDock, and DINC web tool . These tools not only assist in docking in user-friendly approach but also calculate different parameters like global energy, atomic contact energy, and binding energy for docked complexes.
Identified codons and calculated usage bias
The codon-pair usage table and dinucleotide usage data were identified from the CoCoPUTs database [23, 24]. The T. whipplei taxonomy ID or taxid (2039) was verified by NCBI’s taxonomy tool, and the taxonomy was illustrated in Fig. 1. The log-transformed codon-pair frequency heat map was discovered from the data analysis as illustrated in Fig. 2. The degree of ENC values ranges from 20 to 61 . If the value is 20, then one codon coding for each amino acid and value ranged to 61 means all the synonymous codon was used for each amino acid. The ENC value computed in our analysis was 56.138, which means more than one codon was used for each amino acid. The ENC value should be ≤ 35 for significant codon bias . So, the higher ENC value indicates the low codon usage bias in T. whipplei. The ENC value details are demonstrated in Table 1.
The codon usage details are summarized in the Table 2, and the codon usage frequency per 1000 codons is illustrated in Fig. 3. The RefSeq (n = 859) of T. whipplei had 88597 CDSs and 28006357 codons. Table 2 illustrated the CDS and its codon pair. The codons GTT (37.06), GAT (37.03), CTT (32.53), and TTT (30.88) were identified as the highest usage frequency (frequency value shown in bracket). Dinucleotide frequencies per 1000 dinucleotide are demonstrated in Fig. 4.
Tropheryma whipplei str. Twist codon usage table
Tropheryma whipplei strain Twist complete sequence of 23S and 16S ribosomal RNA genes were composed of 3102 base pairs and 1521 base pairs, respectively. Tropheryma whipplei Twist strain’s CDS, codons, frequency per thousand, and the number of codons details are summarized in Tables 3 and 4. These codon usage tables were used for the identification of rare codons and sequence optimization.
Rare and very rare codons
The analysis resulted from usage data, original sequence, and optimized sequence. Tropheryma whipplei strain Twist 23S ribosomal RNA gene sequence analyzed usage data predicted GTT and GAT (36.7% and 36.3 %) had the high frequency in codon usage. TAA, TAG, and TGA code as “STOP” had the lowest usage frequency percentage ((0.9 %, 1.0 % and 1.1 %) and found these are the very rare codons. The rare codons are CGA, TGC, CGG, TGT, CAC, ACG, CCC, and TCG. The stop codons are terminating the protein translation process . The details of rare codons and very rare codons (code as, count, and percentage of usage frequency) of 23s and 16S rRNA were summarized in Tables 5 and 6.
The calculated compositional properties for the coding sequences of the Tropheryma whipplei Twist strain are overall frequency of nucleotides A% (25.11 and 23.54), C% (22.76 and 24.0), T% (20.76 and 19.4), and G% (31.37 and 33.07) in 23s and 16s ribosomal RNA gene, respectively. The synonymous codons had the base content in 3rd position were calculated as A3S% (24.47 and 22.88), C3S% (20.99 and 22.88), T3S% (21.47 and 19.53), and G3S% (33.08 and 34.71) for 23s and 16s rRNA, respectively. GC3S% (52.85 and 57.85) is the third synonymous codon position in GC content of 23s and 16s rRNA, respectively. Figures 5 and 6 show rRNA characteristic features like length and nucleotide composition. In Fig. 7, rRNA synonymous codons percentage is given, while in Fig. 8, codon measurements were indicated.
Epitope-based vaccine prediction: application of codon usage studies
The in silico analysis reveals two epitopes of 15 amino acid residues (i.e., KPSYLSALSAHLNDK and FKSFNYNVAIGVRQP) that hold perfect interaction with HLA-DRB-0101 (MHC class II allelic determinant). In Table 7, retrieved sequences were shown with accession numbers, and allergenicity was also presented by deploying Allergen FP tool (this tool generates Tanimoto similarity index). Epitopes were determined by using NetMHCIIpan-4.0 server that gathers core information from IEDB database and uses artificial neural networks (ANN) to access interaction of peptidal stretches to HLA allelic determinants. Amino acids like valine, aspartate, leucine, and phenylalanine hold high codon usage frequency and also found to be present in these screened epitopes from excinuclease ABC subunit UvrC and 3-oxoacyl-ACP reductase FabG. In Table 8, all 10 peptides are holding good VaxiJen score, and NetMHCIIpan-4.0 scores are provided, but there were a total of 2151 epitopes discovered. VaxiJen score indicates antigenicity for peptides. ProtParam results reveal only two finalized epitopes to be stable (Table 9). Epitopes structure was predicted by using PEP-FOLD 3.5 , and HLA allelic determinant HLA DRB1_0101 (PDB-ID:1AQD) was retrieved from RCSB-PDB database to perform molecular docking analysis. Molecular docking of selected epitopes with HLA-DRB0101 shows perfect interaction (Table 10). Figure 9 indicates docked complexes of selected epitopes with HLA-DRB-0101 visualized in PyMOL software.
The Tropheryma whipplei causes acute gastroenteritis to neuronal damages in Homo sapiens. Genomics and codon adaptation studies would be helpful advancements of disease evolution prediction, prevention, and treatment of disease. The codon-pair usage table and dinucleotide usage data were identified from the CoCoPUTs database [23, 24]. The ENC value computed in our analysis was 56.138, which means more than one codon was used for each amino acid. The ENC value should be ≤ 35 for significant codon bias . Tropheryma whipplei Twist strain’s CDS, codons, frequency per thousand, and the number of codons; for identification of rare codons and sequence optimization. The ratio of observed codon frequency to the expected synonymous codons usage for the amino acid i.e., relative synonymous codon usage (RSCU) . The degree of bias towards estimated, i.e., Codon Adaptation Index, value was 0.73 and 0.725 for 23s and 16s rRNA respectively. The value ranged between 0 and 1; higher values indicate stronger bias in codon usage and high gene expression level. In previous studies, membrane proteins were considered to be associated with considerable biasness , while in current study, we recognized rare codon biasness associated with entire genome of T. whipplei. The major requirement of codon biasness study assists in determining amino acids expressed patterns that can be linked to epitope-based vaccine predictions. In recent studies, for SARS-CoV2 [30, 31], dengue [32, 33], Nipah , Candida fungus , Canine circovirus , and Zika virus , vaccine predictions were found to be successful. So, codon usage pattern determination can be considered as the preliminary step before deploying any ANN (artificial neural networking)-based web server/tool like NetMHC server for screening essential epitopes of small peptidal length (8–12 amino acids). The calculated compositional properties for the coding sequences of the Tropheryma whipplei Twist strain overall frequency of nucleotides A% (25.11 23.54), C% (22.76 24.0), T % (20.76 19.4), and G% (31.37 and 33.07) in 23s and 16 s ribosomal RNA gene respectively. In silico analysis reveals two epitopes of 15 amino acid residues (i.e., KPSYLSALSAHLNDK and FKSFNYNVAIGVRQP) that hold perfect interaction with HLA-DRB-0101 (MHC class II allelic determinant); future scope holds linkers and adjuvants to be connected and solid-phase synthesis of these epitopes to further test these epitopes in model organisms. Recent developments in immunoinformatics show novel ways to predict epitope-based vaccine candidates and therapeutics against many harmful pathogens like Candida auris  and human cytomegalovirus . Similarly, drug repurposing was made easy against harmful pathogens by deploying bioinformatic approaches . Similarly, for animal models, viral pathogenic proteomes were screened for vaccine designing by deploying immunoinformatics [33, 36, 40]. This study is unique in terms of saving time and money for peptide-based vaccine crafting.
Considerable biases in codon usage and amino acid usage indicate clearly that T. whipplei has a low codon bias. The synonymous codons had the base content in 3rd position were calculated as A3S% (24.47 and 22.88), C3S% (20.99 and 22.88), T3S% (21.47 and 19.53), and G3S% (33.08 and 34.71) for 23s and 16s rRNA, respectively. Also, codon-usage patterns clearly indicate that there will be less chances of variational or evolutionary alterations in T. whipplei genomic sets. The analysis could be targeted for disease evolution prediction, developing drugs, or vaccine candidates. We also found KPSYLSALSAHLNDK and FKSFNYNVAIGVRQP, two epitopes, can possibly act as vaccine candidates against T. whipplei. A future development requires wet-lab validations for these epitopes that are highly expressed in this bacterium and have therapeutic peptide formation capability.
Lagier JC, Lepidi H, Raoult D, Fenollar F (2010) Systemic Tropheryma whipplei: clinical presentation of 142 patients with infections diagnosed or confirmed in a reference center. Medicine 89(5):337–345. https://doi.org/10.1097/MD.0b013e3181f204a8
Gorvel L, Al Moussawi K, Ghigo E, Capo C, Mege JL, Desnues B (2010) Tropheryma whipplei, the Whipple’s disease bacillus, induces macrophage apoptosis through the extrinsic pathway. Cell Death Dis 1(4):e34–e34. https://doi.org/10.1038/cddis.2010.11
Bentley SD, Maiwald M, Murphy LD, Pallen MJ, Yeats CA, Dover LG et al (2003) Sequencing and analysis of the genome of the Whipple’s disease bacterium Tropheryma whipplei. Lancet 361(9358):637–644. https://doi.org/10.1016/S0140-6736(03)12597-4
Romero H, Zavala A, Musto H (2000) Codon usage in Chlamydia trachomatis is the result of strand-specific mutational biases and a complex pattern of selective forces. Nucleic Acids Res 28(10):2084–2090. https://doi.org/10.1093/nar/28.10.2084
Sharma P, Sharma P, Ahmad S, Kumar A (2022) Chikungunya virus vaccine development: through computational proteome exploration for finding of HLA and cTAP binding novel epitopes as vaccine candidates. Int J Pept Res Ther 28(2):1–15. https://doi.org/10.1007/s10989-021-10347-0
Joshi A, Ray NM, Singh J, Upadhyay AK, Kaushik V (2022) T-cell epitope-based vaccine designing against Orthohantavirus: a causative agent of deadly cardio-pulmonary disease. Netw Model Anal Health Inform Bioinform 11(1):1–10. https://doi.org/10.1007/s13721-021-00339-x
Reynisson B, Alvarez B, Paul S, Peters B, Nielsen M (2020) NetMHCpan-4.1 and NetMHCIIpan-4.0: improved predictions of MHC antigen presentation by concurrent motif deconvolution and integration of MS MHC eluted ligand data. Nucleic Acids Res 48(W1):W449–W454
Thévenet P, Shen Y, Maupetit J, Guyon F, Derreumaux P, Tuffery P (2012) PEP-FOLD: an updated de novo structure prediction server for both linear and disulfide bonded cyclic peptides. Nucleic Acids Res 40(W1):W288–W293
Joshi A, Joshi BC, Mannan MAU, Kaushik V (2020) Epitope based vaccine prediction for SARS-COV-2 by deploying immuno-informatics approach. Inform Med Unlocked 19:100338. https://doi.org/10.1016/j.imu.2020.100338
Krishnan S, Joshi A, Akhtar N, Kaushik V (2021) Immunoinformatics designed T cell multi epitope dengue peptide vaccine derived from non structural proteome. Microb Pathog 150:104728. https://doi.org/10.1016/j.micpath.2020.104728
Jain P, Joshi A, Akhtar N, Krishnan S, Kaushik V (2021) An immunoinformatics study: designing multivalent T-cell epitope vaccine against canine circovirus. J Genet Eng Biotechnol 19(1):1–11. https://doi.org/10.1186/s43141-021-00220-4
Akhtar N, Joshi A, Singh J, Kaushik V (2021) Design of a novel and potent multivalent epitope based human Cytomegalovirus peptide vaccine: an immunoinformatics approach. J Mol Liq 116586. https://doi.org/10.1016/j.molliq.2021.116586
Joshi A, Krishnan GS, Kaushik V (2020) Molecular docking and simulation investigation: effect of beta-sesquiphellandrene with ionic integration on SARS-CoV2 and SFTS viruses. J Genet Eng Biotechnol 18(1):1–8. https://doi.org/10.1186/s43141-020-00095-x
Joshi A, Pathak DC, Mannan MAU, Kaushik V (2021) In-silico designing of epitope-based vaccine against the seven banded grouper nervous necrosis virus affecting fish species. Netw Model Anal Health Inform Bioinform 10(1):1–12. https://doi.org/10.1007/s13721-021-00315-5
AJ and VK, peptide identification using codon bias studies. VK, conception of idea of this article and gap identification in existing studies and editing of the paper. AJ and SKG, molecular dynamic simulation study and analysis. The authors read and approved the final manuscript.
Not applicable. There is no impact on ethical standards in this study, and there is no human or animal involvement.
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Joshi, A., Krishnan, S. & Kaushik, V. Codon usage studies and epitope-based peptide vaccine prediction against Tropheryma whipplei.
J Genet Eng Biotechnol20, 41 (2022). https://doi.org/10.1186/s43141-022-00324-5