Codon usage studies and epitope-based peptide vaccine prediction against Tropheryma whipplei

Background The Tropheryma whipplei causes acute gastroenteritis to neuronal damages in Homo sapiens. Genomics and codon adaptation studies would be helpful advancements of disease evolution prediction, prevention, and treatment of disease. The codon usage data and codon usage measurement tools were deployed to detect the rare, very rare codons, and also synonymous codons usage. The higher effective number of codon usage values indicates the low codon usage bias in T. whipplei and also in the 23S and 16S ribosomal RNA genes. Results In T. whipplei, it was found to hold low codon biasness in genomic sets. The synonymous codons possess the base content in 3rd position that was calculated as A3S% (24.47 and 22.88), C3S% (20.99 and 22.88), T3S% (21.47 and 19.53), and G3S% (33.08 and 34.71) for 23s and 16s rRNA, respectively. Conclusion Amino acids like valine, aspartate, leucine, and phenylalanine hold high codon usage frequency and also found to be present in epitopes KPSYLSALSAHLNDK and FKSFNYNVAIGVRQP that were screened from proteins excinuclease ABC subunit UvrC and 3-oxoacyl-ACP reductase FabG, respectively. This method opens novel ways to determine epitope-based peptide vaccines against different pathogenic organisms.


Background
Tropheryma whipplei is an actinobacteria pathogen causing Whipple's disease in Homo sapiens. This pathogenic problem was discovered and found to be associated with gastroenteritis, endocarditis, and neuronal damages in Caucasian individuals [1]. Regardless of this, its lethal impact was additionally seen in canines [2]. The credit for its name and disclosure was connected with honorable Nobel laureate G. H Whipple, who performed many explorations for lipodystrophy (malfunctioned lipid biosynthesis and ingestion) brought about by T. whipplei [3] has a broad-spectrum infection. Caucasian populaces, kids, sewage, and farming specialists were discovered to be generally influenced by this illness. The bacterium causes immunomodulation with an extended IL-16 discharge, IL-10 synthesis, and dysregulation of mucosal T-helper cells. Further immunological irregularities were depicted because of Whipple's disease's multifaceted nature [4]. Clinical side effects of this infection were seen as extreme looseness of the bowels, loss of body weight, and weakness among patients [5]. T. whipplei assaults lamina propria of the gastrointestinal tract and targets macrophages for its replication [6]. Sequencing of two strains of T. whipplei (Twist and TW 08/27) was effectively led by the French researchers that already open scope for genomic examination and improvement of better treatment procedures for this lethal sickness; in their investigation, it was discovered that this actinobacterium has low GC content (46%) in correlations with other relatives of a similar order [7].
Current medicines like doxycycline, hydroxychloroquine, and trimethoprim/sulfamethoxazole must be used for almost 2 years and lifetime follow-up for patients [8,9]. Later in silico concentrates on epitope-based vaccine design can become conceivable prophylaxis for Whipple's illness [10]. This actinobacterium has a huge encoding of surface proteins, while some are additionally connected with the enormous substance of noncoding redundant DNA. This genome additionally shows the fluctuation in genomic sets, including phase variations causing the modifications of cell proteins; this shows the importance of immune bypass and association with the host genome [1,7]. Such uncommon genomic trademark highlights of bacterium open wide scope in discovering codon utilization patterns to uncover characteristic and mutational determination. Codons contained 3 nucleotides in sequence and coded for a particular amino acid or as a STOP codon for translation. The differences in codon usage are differences defined in codon usage bias. Equivalent codon utilization in numerous prokaryotic unicellular life forms is consistently connected with the directional mutational inclination and translational choice [11]. Other elements like replication-translation determination, protein hydropathy, can likewise have a critical impact [12]. In some microbial pathogen species, mutational predisposition was discovered to be strand explicit, and those living beings show differed interchangeable and nonequivalent codon utilization [13]. This examination not just give experiences about characteristic and mutational determination pressures acting at genomic levels of T. whipplei yet besides offer a superior cognizance of transformative improvements in this hostversatile bacterium. This computational examination uncovered the data concerning profoundly translated proteins and enzymes of this bacterium, and the conceivable amino acids that can be considered in epitope-based prophylaxis plan to get the inhibitory effect on bacterial action on its host or to create a better conceivable treatment like in immunoinformatics-based recent studies [14,15]. Ribosomal RNA (16S and 23S) codon usage patterns were analyzed here to determine the changes associated with evolutionary or phylogenetic patterns of the bacterium. In this study, we also revealed epitope-based peptide vaccine candidates against Tropheryma whipplei. The aim of the study is to determine codon usage patterns in T. whipplei, and on the basis of that we predicted epitope-based vaccine candidate by deploying latest bioinformatics tools.

Codon data retrieval
To measure the codon usage bias, retrieved codon usage tables from codon and codon pair usage tables (CoCoP-UTs) database. This database showed the relative frequency that different codons are used in genes in T.
whipplei RefSeq data. Similarly, codon-pair usage tables displayed the counts of each codon pair in the CDSs of T. whipplei genomic data (RefSeq) and calculated codonpair usage bias.

Retrieval of genomic data and codon usage table
The complete nucleotide sequences of T. whipplei strains.
The selected FASTA sequences of Twist 16S ribosomal RNA and 23S ribosomal RNA were retrieved from the NCBI Refseq database (https:// www. ncbi. nlm. nih. gov/ nucco re). The codon usage dataset was retrieved from the Codon Usage Database (http:// www. kazusa. or. jp/ codon/).

Genomic sequence optimization
All codons in the original sequence of T. whipplei strains are replaced with the corresponding redundant codon having the highest codon usage frequency. ATGme tool [16] was used to identify rare codons and accordingly optimize genomic sequences (http:// www. atgme. org/). Genomic sequences in FASTA format pasted in the search box, and codon usage table pasted in the respective interface and processed the data for analysis of rare codons and sequence optimization.

Codon usage measurements
From the identified genomic sequences of ribosomal RNA, nucleotide composition was computed. The G + C composition of 1st, 2nd, and 3rd positions and GC1s, GC2s, and GC3s in the codons were discovered for the frequency and mean frequency identification. The frequency of synonymous third position codon and percentage, i.e., A3, T3, G3, and C3 and %A3s, %C3s, %T3s, and %G3s, respectively, was calculated. To measure the bias of synonymous codons, the effective number of codons (ENC) was identified. Additionally, codon usage, codon usage per thousand, and relat ive synon ymous codon usage (RSCU) were also calculated using "CAIcal" tool availed from https:// ppuig bo. me/ progr ams/ CAIcal/.

Epitope-based vaccine prediction
Proteomic data for Tropheryma whipplei was accessed from NCBI GenBank database, and then allergenicity was estimated by deploying AllergenFP server [17]. Net-MHCIIpan-4.0 server [18] was used to screen epitopes from selected proteins that can interact with human leukocyte antigen (HLA) proteins. VaxiJen 2.0 tool [19] was used to reveal antigenicity of screened epitopes. Epitopes structure was predicted by using PEP-FOLD 3.5 [20], and HLA allelic determinant HLA DRB1_0101 (PDB-ID:1AQD) was retrieved from RCSB-PDB database. Biochemical properties for epitopes were calculated by using ProtParam tool of ExPASy web server. Molecular docking between epitopes and HLA determinants was done by using PatchDock [21], FireDock, and DINC web tool [22]. These tools not only assist in docking in user-friendly approach but also calculate

Identified codons and calculated usage bias
The codon-pair usage table and dinucleotide usage data were identified from the CoCoPUTs database [23,24]. The T. whipplei taxonomy ID or taxid (2039) was verified by NCBI's taxonomy tool, and the taxonomy was illustrated in Fig. 1. The log-transformed codon-pair frequency heat map was discovered from the data analysis as illustrated in Fig. 2. The degree of ENC values ranges from 20 to 61 [25]. If the value is 20, then one codon coding for each amino acid and value ranged to 61 means all the synonymous codon was used for each amino acid. The ENC value computed in our analysis was 56.138, which means more than one codon was used for each amino acid. The ENC value should be ≤ 35 for significant codon bias [26]. So, the higher ENC value indicates the low codon usage bias in T. whipplei. The ENC value details are demonstrated in Table 1.
The codon usage details are summarized in the Table 2, and the codon usage frequency per 1000 codons is illustrated in Fig. 3. The RefSeq (n = 859) of T. whipplei had 88597 CDSs and 28006357 codons. Table 2

Tropheryma whipplei str. Twist codon usage table
Tropheryma whipplei strain Twist complete sequence of 23S and 16S ribosomal RNA genes were composed of 3102 base pairs and 1521 base pairs, respectively. Tropheryma whipplei Twist strain's CDS, codons, frequency per thousand, and the number of codons details are summarized in Tables 3 and 4. These codon usage tables were used for the identification of rare codons and sequence optimization.

Rare and very rare codons
The analysis resulted from usage data, original sequence, and optimized sequence. Tropheryma whipplei strain Twist 23S ribosomal RNA gene sequence analyzed usage data predicted GTT and GAT (36.7% and 36.3 %) had the high frequency in codon usage. TAA, TAG, and TGA code as "STOP" had the lowest usage frequency percentage ((0.9 %, 1.0 % and 1.1 %) and found these are the very rare codons. The rare codons are CGA, TGC, CGG, TGT, CAC, ACG, CCC, and TCG. The stop codons are terminating the protein translation process [27]. The details of rare codons and very rare codons (code as, count, and percentage of usage frequency) of 23s and 16S rRNA were summarized in Tables 5 and 6.

Codon measurement
The calculated compositional properties for the coding sequences of the  Figures 5 and 6 show rRNA characteristic features like length and nucleotide composition. In Fig. 7, rRNA synonymous codons percentage is given, while in Fig. 8, codon measurements were indicated.

Epitope-based vaccine prediction: application of codon usage studies
The in silico analysis reveals two epitopes of 15 amino acid residues (i.e., KPSYLSALSAHLNDK and FKS-FNYNVAIGVRQP) that hold perfect interaction with HLA-DRB-0101 (MHC class II allelic determinant). In Table 7, retrieved sequences were shown with accession numbers, and allergenicity was also presented by deploying Allergen FP tool (this tool generates Tanimoto similarity index). Epitopes were determined by using NetMHCIIpan-4.0 server that gathers core information from IEDB database and uses artificial neural networks (ANN) to access interaction of peptidal stretches to HLA allelic determinants. Amino acids like valine, aspartate, leucine, and phenylalanine hold high codon usage frequency and also found to be present in these screened epitopes from excinuclease ABC subunit UvrC and 3-oxoacyl-ACP reductase FabG. In Table 8, all 10 peptides are holding good VaxiJen score, and NetMHCIIpan-4.0 scores are provided, but there were a total of 2151 epitopes discovered. VaxiJen score indicates antigenicity for peptides. ProtParam results reveal only two finalized epitopes to be stable (Table 9). Epitopes structure was predicted by using PEP-FOLD 3.5 [20], and HLA allelic determinant HLA DRB1_0101 (PDB-ID:1AQD) was retrieved from RCSB-PDB database to perform molecular docking analysis. Molecular docking of selected epitopes with HLA-DRB0101 shows perfect interaction (Table 10). Figure 9 indicates docked complexes of selected epitopes with HLA-DRB-0101 visualized in PyMOL software.

Discussion
The Tropheryma whipplei causes acute gastroenteritis to neuronal damages in Homo sapiens. Genomics and codon adaptation studies would be helpful advancements of disease evolution prediction, prevention, and treatment of disease. The codon-pair usage table and dinucleotide usage data were identified from the CoCoP-UTs database [23,24]. The ENC value computed in our   analysis was 56.138, which means more than one codon was used for each amino acid. The ENC value should be ≤ 35 for significant codon bias [26]. Tropheryma whipplei Twist strain's CDS, codons, frequency per thousand, and the number of codons; for identification of rare codons and sequence optimization. The ratio of observed codon frequency to the expected synonymous codons usage for the amino acid i.e., relative synonymous codon usage (RSCU) [28]. The degree of bias towards estimated, i.e., Codon Adaptation Index, value was 0.73 and 0.725 for 23s and 16s rRNA respectively. The value ranged between 0 and 1; higher values indicate stronger bias in codon usage and high gene expression level. In previous studies, membrane proteins were considered to be associated with considerable biasness [29], while in current study, we recognized rare codon biasness associated with entire genome of T. whipplei. The major requirement of codon biasness study assists in determining amino acids expressed patterns that can be linked to epitope-based vaccine predictions. In recent studies, for SARS-CoV2 [30,31], dengue [32,33], Nipah [34], Candida fungus [35], Canine circovirus [36], and Zika virus [37], vaccine predictions were found to be successful. So, codon usage pattern determination can be considered as   [35] and human cytomegalovirus [38]. Similarly, drug repurposing was made easy against harmful pathogens by deploying bioinformatic approaches [39]. Similarly, for animal models, viral pathogenic proteomes were screened for vaccine designing by deploying immunoinformatics [33,36,40]. This study is unique in terms of saving time and money for peptide-based vaccine crafting.

Conclusions
Considerable biases in codon usage and amino acid usage indicate clearly that T. whipplei has a low codon bias.    T. whipplei genomic sets. The analysis could be targeted for disease evolution prediction, developing drugs, or vaccine candidates. We also found KPSYLSALSAHL-NDK and FKSFNYNVAIGVRQP, two epitopes, can possibly act as vaccine candidates against T. whipplei. A future development requires wet-lab validations for these epitopes that are highly expressed in this bacterium and have therapeutic peptide formation capability.