Skip to main content

Immunoinformatics approach of epitope prediction for SARS-CoV-2

A Correction to this article was published on 09 May 2022

This article has been updated



The novel coronavirus (SARS-CoV-2) caused lethal infections worldwide during an unprecedented pandemic. Identification of the candidate viral epitopes is the first step in the design of vaccines against the viral infection. Several immunoinformatic approaches were employed to identify the SARS-CoV-2 epitopes that bind specifically with the major histocompatibility molecules class I (MHC-I). We utilized immunoinformatic tools to analyze the whole viral protein sequences, to identify the SARS-CoV-2 epitopes responsible for binding to the most frequent human leukocyte antigen (HLA) alleles in the Egyptian population. These alleles were also found with high frequency in other populations worldwide.


Molecular docking approach showed that using the co-crystallized MHC-I and T cell receptor (TCR) instead of using MHC-I structure only, significantly enhanced docking scores and stabilized the conformation, as well as the binding affinity of the identified SARS-CoV-2 epitopes. Our approach directly predicts 7 potential vaccine subunits from the available SARS-CoV-2 spike and ORF1ab protein sequence. This prediction has been confirmed by published experimentally validated and in silico predicted spike epitope. On the other hand, we predicted novel epitopes (RDLPQGFSA and FCLEASFNY) showing high docking scores and antigenicity response with both MHC-I and TCR. Moreover, antigenicity, allergenicity, toxicity, and physicochemical properties of the predicted SARS-CoV-2 epitopes were evaluated via state-of-the-art bioinformatic approaches, showing high efficacy of the proposed epitopes as a vaccine candidate.


Our predicted SARS-CoV-2 epitopes can facilitate vaccine development to enhance the immunogenicity against SARS-CoV-2 and provide supportive data for further experimental validation. Our proposed molecular docking approach of exploiting both MHC and TCR structures can be used to identify potential epitopes for most microbial pathogens, provided the crystal structure of MHC co-crystallized with TCR.


A virus that causes infectious pneumonia broke out at the end of 2019 and rapidly spread worldwide [1]. As it was phylogenetically similar to severe acute respiratory syndrome coronavirus (SARS-CoV) [2], the pathogen has been subsequently identified as a novel coronavirus, SARS-CoV-2 [3], and the associated disease was termed coronavirus disease-19 (COVID-19) [4, 5]. SARS-CoV-2 is more distantly linked to the Middle East respiratory syndrome coronavirus (MERS-CoV) [6], and the T cell responses have been found to give long-term immunity against viral infections [7]. Immune responses by T cells significantly contributed to protection against infection by SARS-CoV, and the pathological damage inflicted by MERS-CoV [8]. The cellular T lymphocyte-mediated responses have been shown to provide the most potent immunity against the structural proteins of SARS-CoV in patients during convalescence [9, 10], as cytotoxic T lymphocytes (CTLs) are known to induce the strongest response to viral infections [11]. Recent studies showed that the development of an epitope-based vaccine can be achieved through recognizing the viral peptides presented by human leukocyte antigens (HLAs) especially peptides of Spike and N proteins [12,13,14]. During the immune response against the virus, after antigen processing into epitopes through the antigen-presenting cells (APCs), these peptide fragments associate with MHC molecules in a form that is specifically identified by the T cell receptor (TCR).

Furthermore, T cells detect viral antigens presented by MHC class I (the immunogenic peptide–MHC class I complexes), which will enhance CD8+ T cell cytokine production and cytotoxic activity (active effector CTLs) [15]. The alpha-3 domain and beta-2 microglobulin (β2m) of the MHC-I molecule engage with the binding site of the TCR, which consists of two domains arising from a single heavy chain (HC). The two domains combine to form a shallow curved sheet as a base, with the two helices on top, to accommodate a peptide chain “epitope” in-between [16]. The establishment of a set of conserved hydrogen bonds (H bonds) between the side chains of the MHC molecule and the backbone of the peptide is required for binding between the two α-helices and the epitope. The geometry, the hydrophobicity of the binding site, and the charge distribution together determine the type of interactions of peptides with the MHC molecule. Reliable epitope prediction can be achieved through precise prediction of the affinity of the MHC-antigen interactions for individual allotropes [17, 18].

The presentation of a stable immunogenic peptide–MHC class I (MHC) complex is dependent on the fitting between the peptide and the MHC groove, but it is not the only factor. The other factors affecting the formation of MHC complex include protease activity, the accessibility of chaperones, or the antigen. The binding groove of MHC class I is closed on both ends by conserved tyrosine residues, limiting the size of peptides that bind to MHC molecules to roughly 8–10 residues at their C-terminal end docking into the F-pocket [19, 20].

The main objective of our study is to predict the most antigenic SARS-CoV-2 epitopes that are compatible with HLA haplotypes of the Egyptian population. We chose Spike and ORF1ab proteins, as they have a robust scores in several prediction tools including binding prediction with MHC, antigenicity response, and high docking scores with both MHC and TCR. These scores offer significant stability of the provided epitopes, whereas epitope prediction scores measure the affinity between the proposed epitopes and MHC molecules, while antigenicity response measures the ability of the proposed epitopes to elicit an immune response. The scores thus express the stimulation of the immune response against the proposed epitopes. Moreover, molecular docking scores evaluate the most conformational stability of our proposed epitopes with both MHC molecules and T cell receptors. The methods have been selected for their high accuracy in predicting binding conformation and are more fitting with our approach for protein-protein interaction. For example, HDock provides a robust homology modeling strategy for molecular docking via exploiting the FASTA format of the input data instead of the 3D structure prediction molecules. This improves the molecular docking results compared to feeding the 3D structures directly to the docking software. In this case, the software implements different conformation of the predicted epitopes according to their fitting in the binding pocket of both MHC and TCR. Additionally, NetMHCpan4.1 server [21] has a high accuracy score as an epitope prediction platform. The Immune Epitope Database (IEDB) provides a weekly benchmarking with other epitope prediction tools, while NetMHCpan4.1 server has the highest prediction score compared to other tools.

For a more reliable characterization of the epitopes, we used additional tools. Vaxijen [22,23,24] is a prediction algorithm tool that predicts the antigenic epitopes from three different sources (tumors, bacteria, and virus). The prediction is based on alignment-independent approach, which predicts the antigenicity response relying on the physicochemical properties of the peptides. PEP-FOLD3 [25,26,27] is a de novo strategy exploiting a linear peptide of amino acid sequence to predict the peptide structure. The structure prediction is relying on a hidden Markov model approach, which has the possibility of creating candidate confirmation by folding the peptides on a set patch of proteins. ToxinPred server [28] was used for toxicity prediction. The server is an in silico method using database of 1805 toxic peptides (≤35 residues). This method is developed to predict and design toxic/non-toxic peptides. AllergenFP v.1.0. server [29] is a bioinformatics tool for allergenicity prediction. This tool is based on a novel descriptor fingerprint approach, which could be applied for any classification problem in computational biology. Finally, ExPASy ProtParam Tool [30] is used for physicochemical properties prediction via computation of various physical and chemical parameters for a given protein. The tool is able to predict the molecular weight, theoretical isoelectric point (pI), amino acid composition, atomic composition, extinction coefficient, estimated half-life, instability index, aliphatic index, and grand average of hydropathicity (GRAVY).

In this study, several SARS-CoV-2 epitopes have been identified using a whole viral protein sequence analysis, exploiting the most updated version of tools for epitope prediction, antigenicity response, and molecular docking. These tools represented the most common and accurate platforms for epitope prediction analysis [21,22,23,24]. These proposed epitopes represent the most immunogenic peptides in SARS CoV-2 based on their strong docking affinity with both MHC and TCR. These proposed epitopes have been identified from Spike and ORF1ab proteins for their highest scores in MHC binding affinity, immunogenicity, and molecular docking scores. Due to the genomic variations of the SARS-CoV-2 and HLA haplotypes across populations [31,32,33,34,35,36], SARS CoV-2 epitopes were identified according to the most common HLA allele frequencies of the Egyptian population [37,38,39]. Our proposed docking approach of exploiting the structures of both MHC and TCR in validating docking affinity of the proposed epitopes can be applied with any pathogenic protein, as long as the structure of MHC is co-crystallized with TCR.

Material and methods

Sequence retrieval and multiple sequence alignment

Genomic sequences of SARS-CoV-2 isolates Egyptian strains [GISAID database [40] (accession ID: EPI_ISL_9047802, EPI_ISL_9047803, EPI_ISL_9047804, EPI_ISL_9047805, EPI_ISL_430820, and EPI_ISL_430819)] [41] were retrieved in FASTA format from GISAID, and a genomic sequence of SARS-CoV-2 isolate Chinese strain [GenBank database [42] (accession ID: NC_045512.2)] [43] was retrieved from GenBank. The viral genomes isolates Egyptian strains were translated into their amino acid sequences using EMBOSS Transeq (, and multiple sequence alignment of amino acid sequences was implemented via ClustalW using Molecular Evolutionary Genetics Analysis software “MegaX” [44, 45].

Identification of cytotoxic T cell epitopes and their antigenicity response

NetMHCpan4.1 server [21] was exploited to predict viral epitope binding to the most frequent HLA haplotypes in the Egyptian population (HLA-A*0101, HLA-A*0210 HLA-B*03501, HLA-B*4101) [39]. Every SARS-CoV-2 protein was provided to the platform, along with a threshold of 0.5% rank for strong binder and 2 for the weak binder. NetMHCpan4.1 uses artificial neural networks in their predictions, trained on many quantitative binding affinities in addition to mass-spectroscopy eluted ligands. The resulting epitopes were filtered to include only the strong binders with their corresponding HLA haplotypes. Then, antigenicity response was measured by Vaxijen [22,23,24] for every proposed epitope that was predicted previously. Vaxijen is implemented by using a threshold of 0.4 as a probable antigen. A threshold of 0.4 was selected, as the best prediction threshold of the epitopes’ antigenicity response. Moreover, this score was previously reported to validate the antigenicity response of the proposed epitopes [46,47,48]. Only crystal structures of HLA-A*0201 and HLA-B*03501 were retrieved from the protein data bank [49,50,51] under accession ID: 5YXN and 4PRP, respectively.

Homology modeling

Homology modeling of the resulting probable epitopes was predicted using PEP-FOLD3 [25,26,27] provided the protein sequences in their FASTA format. Structures with the lowest coarse-grained energy according to PEP-FOLD3 recommendations were selected for molecular docking with MHC-I crystal structures.

Toxicity and allergenic response

The toxicity and allergenic response of the proposed epitopes were predicted by ToxinPred server ( [28] and AllergenFP v.1.0. servers ( respectively. Physicochemical properties, including hydropathicity, charge, half-life, instability index, pI (theoretical isoelectric point value), and molecule weight, were predicted by ExPASy ProtParam Tool [30].

Molecular docking

We adopted the updated version of the HDock server (, which is currently exploited for protein docking based on two methods; template-based and template-free methods, both methods have been exploited to determine the most accurate one in providing high docking scores with both MHC and TCR. We found that template-free method provides more robust docking scores than template-based method. We provide both the crystal structures of MHC-I and TCR chains in PDB format, while the ligands are in their FASTA format. In the molecular docking, we substituted the crystallized epitopes bound between the groove of the MHC and TCR of 5YXN and 4PRP (as shown in brown and pink; Fig. 1a and b, respectively) with our putative SARS-CoV-2 epitopes. The PDB accession ID of MHC crystal structures (5YXN and 4PRP) have been used as input for HDock server along with their interacting chains, chain A for MHC and chains E and D for TCR.

Fig. 1
figure 1

The crystal structures of 5YXN and 4PRP. a 5YXN MHC molecule on the right side and TCR chains on the left side. b 4PRP MHC molecule on the right side, and TCR chains are on the left side. (white arrows indicate the co-crystalized epitopes)

The interaction of the candidate ligands with their receptors was visualized by PyMOL ( to investigate the number of interacting bonds between the structures, as depicted in (Fig. 2).

Fig. 2
figure 2

Flow chart of the approach used in epitope prediction of SARS-CoV-2


Variations in SARS-CoV-2 sequences

A number of synonymous mutations between the genomic sequences of SARS-CoV-2 isolated in Wuhan and Egypt were detected. However, two non-synonymous mutations were identified in “Spike” and “ORF1ab” sequences of SARS-CoV-2 in Egypt. The first, presented in all Egyptian strains SARS-CoV-2 isolates, was a mutation of aspartic acid (D) residue at position 7713 to glycine (G) residue in S protein. The second, presented in only one Egyptian strain SARS-CoV-2 isolate was a mutation of lysine (K) residue at position 2798, to arginine (R) in ORF1ab protein (Fig. 3).

Fig. 3
figure 3

Multiple sequence alignment of both a Spike and b ORF1ab proteins of SARS-CoV-2 in Wuhan and in Egypt

Recognition of CD8+ T cell epitopes in SARS-CoV-2

Since Cytotoxic T-lymphocytes recognize certain epitopes attached to MHC-I in the infected cells, T cell epitopes have been identified in our study NetMHCpan4.1 server predicted 406 peptides from all viral proteins, tested with the most common HLA haplotypes of the Egyptian population to evaluate their binding affinity with MHC-I and predict potential CTL epitopes.

Evaluation of antigenicity and allergenic response

The antigenicity was measured for every epitope by Vaxijen to produce 201 peptides acting as probable antigens (Table 1 and Supplementary Table 1). The Vaxijen score for every epitope provides a robust antigenicity of the proposed epitopes. The allergenicity of the candidate epitopes has been measured by AllergenFP v.1.0. Server (allergenicity scores are listed in Supplementary Table 1). Low allergenic scores indicate that the proposed epitopes might not show any detrimental allergenic reaction.

Table 1 The candidate SARS-CoV-2 epitopes for the Egyptian most frequent alleles of MHC class I molecules

Toxicity and physicochemical properties assessment

The toxicity and physicochemical properties of the proposed epitopes were evaluated to validate their quality (Table 2). All of the seven epitope candidates were non-toxic. RDLPQGFSA and NCYFPLQSY epitopes hydrophilic nature and can interact easily with water [52]. The GEYSHVVAF epitope showed the longest half-life of all epitope candidates to be 30 h in vitro and >20 h in vivo. FCLEASFNY, TLGVLVPHV, and GEYSHVVAF epitopes showed instability index < 40, indicating the stable form of these candidates. The GEYSHVVAF epitope shows here the highest stability potential.

Table 2 Toxicity and physicochemical properties of the candidate epitopes

Molecular docking

Molecular docking can evaluate the binding affinity and interaction between the proposed epitope and the target receptor. We obtained several epitopes with high docking scores along the whole viral protein sequences. However, we noticed that the structural Spike and non-structural ORF1ab proteins have the highest docking scores among SARS-CoV-2 proteins (Supplementary Table 1). Ten confirmations for their peptide epitope docking were produced (Supplementary file 1), and top positioned conformations dependent on their docking scores and interactions with MHC-I and TCR residues were visualized to ensure proper binding (Figs. 4 and 5), where they showed the hydrogen bonds (H bonds) that stabilize the candidate epitopes with both MHC class 1 molecule and TCR chains. These H bonds and their bound amino acids along with their bond distances were represented in Table 3. Finally, we found that three of the most promising seven predicted epitopes were shared between both HLA-A 0201 and HLA-B 35:01 (Table 1).

Fig. 4
figure 4

Molecular docking of Spike epitope (No. 14 in Table 1) with both 4prp MHC I molecule and TCR chains

Fig. 5
figure 5

Molecular docking of ORF1ab epitope (No. 79 in Table 1) with both 5yxn MHC I molecule and TCR chains

Table 3 The amino acids and bond distances between the proposed epitopes and both MHC and TCR


Vaccine development against viral infection is determined by finding the candidate immunogenic epitopes of the viral peptides. Our study aims to determine the putative immunogenic epitopes from the whole viral protein sequence of SARS-CoV-2, which possibly bind to both MHC-I molecule and cytotoxic T cells, as they present the first adaptive line of immune response against viral infection. Epitopes bind to the groove of MHC class I, which is expressed on all nucleated cells. This binding forms a stable conformation leading to antigen presentation and activation of the adaptive immune response CD8+ CTLs, which play an indispensable role in combating viral infection [15]. The binding between peptide epitopes and both MHC and TCR is enhanced by the presence of several hydrogen bonds between them, as represented in Table 3 [53, 54]. Due to the polymorphic nature of MHC haplotypes, specific confirmation of peptides can bind with specific MHC molecules [15]. For these variabilities, we sought to predict the candidate epitopes from the whole SARS-CoV-2 viral proteins to precisely determine the best peptide conformation for binding with the corresponding HLA haplotypes of the highest frequency in the Egyptian population [39].

We made several trials for molecular docking by HDock to get the best docking scores, in which we tried both the template-free (FASTA format) and template-based (PDB format) approaches of HDock. We tested both approaches by using the homology modeling structures of the candidate epitopes in their PDB format, which were obtained from the PEP-FOLD3 server, and the epitope protein sequences in FASTA format. We found that the template-free-based model provides higher docking scores than the template-based method. Moreover, by applying our docking approach in providing the alpha and beta chains of TCR, which were co-crystallized with MHC-I molecules, the docking scores and the number of hydrogen bonds increased significantly. This enhanced our analysis and presented a new docking approach by binding the query ligand to both TCR and HLA molecules that stabilize the binding and show a more confident docking conformation.

We located the most favorable vaccine candidates in the Spike and ORF1ab proteins. Similar to other coronaviruses, the Spike protein is a trimeric class I transmembrane glycoprotein located on the surface of SARS-CoV-2 [55]. SARS-CoV-2 Spike protein is involved in receptor recognition, cell attachment, and fusion, making it crucial for viral entry and infectivity [56,57,58,59,60,61]. On the other hand, ORF1ab has been shown to have key roles in viral interaction with the innate immune response [62, 63], viral replication [64], and viral RNA synthesis and processing [65, 66].

Our study proposed seven immunogenic epitopes, with no toxicity, and with a high antigenicity response that is compatible with their physiochemical properties. Some epitopes are novel and others were predicted in-silico or by experimental techniques [67,68,69,70]. The proposed docking approach could provide several antigenic epitopes that were confirmed by several methods experimentally and computationally. CD8+ epitope (YLQPRTFLL) has been validated experimentally, which also shows similarity with MERS-CoV epitope for the same HLA haplotype [67, 68]. Another confirmation to our prediction is by re-prediction of other in-silico predicted MHC class I epitope (SIIAYTMSL) that also overlapped with another SARS-CoV-2 MHC class II epitope for DRB1-04:01 and DRB1-07:01. Also, GEYSHVVAF, NCYFPLQSY, and TLGVLVPHV were previously predicted to different HLA haplotype binding [69, 70]. We however predicted the immunogenic potential of all epitopes by docking with both MHC-I and TCR chains. The data are in agreement with other studies that suggested some of these epitopes as potential targets for vaccine development [71,72,73]. Additionally, we have predicted other novel SARS-CoV-2 immunogenic epitopes. Experimental validation of these candidates is promising for both therapeutic applications and vaccine development.

The exploited HLA haplotypes represented the highest frequencies in the Egyptian population and also in worldwide population (HLA-A*01:01 16.2%, HLA-A*02:01 25.2%, HLA-B*35:01 6.5%) [70]. The predicted epitopes thus not only fit with the HLA haplotypes of the Egyptian population but can be also applied worldwide. Despite the highest docking scores and MHC binding affinity of the putative epitopes, in-vitro experimental validation or in vivo studies are required to confirm their immunogenicity against SARS-CoV-2.


We identified seven SARS-CoV-2 epitopes from Spike and ORF1ab proteins, according to the most common HLA allele frequencies of the Egyptian population. Some of these epitopes were previously validated in vitro and in silico and others are novel SARS-CoV-2 epitopes, characterized by a high probability of eliciting an immune response and stable molecular interaction. This was indicated by the high antigenicity, highest docking scores, and docking stability of these epitopes with both MHC class I and TCR chains that were stabilized by several hydrogen bonds. Importantly, our molecular docking approach is more feasible and useful when using the structure of MHC molecules co-crystallized with their TCR chains, and not only using the crystal structure of MHC molecules as followed in many recent studies. This molecular docking approach of utilizing both MHC and TCR structures for epitope prediction can be extended to most microbial infections. Experimental validation of these proposed epitopes should ultimately confirm their binding and interaction with specific TCRs, immunogenic response, and therapeutic potential against SARS-CoV-2.

Availability of data and materials

All data generated or analyzed during this study are included in this published article and its supplementary information files.

Change history



Severe acute respiratory syndrome


Cytotoxic T lymphocyte


Human leukocyte antigen


Major histocompatibility molecules


  1. Chen N, Zhou M, Dong X, Qu J, Gong F, Han Y, Qiu Y, Wang J, Liu Y, Wei Y, Xia J, Yu T, Zhang X, Zhang L (2020) Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study. Lancet (London, England) 395(10223):507–513.

    Article  Google Scholar 

  2. Zhou P, Yang X-L, Wang X-G, Hu B, Zhang L, Zhang W, Si H-R, Zhu Y, Li B, Huang C-L (2020) A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature 579(7798):270–273

    Article  Google Scholar 

  3. Gorbalenya AE, Baker SC, Baric R, de Groot RJ, Drosten C, Gulyaeva AA, Haagmans BL, Lauber C, Leontovich AM, Neuman BW (2020) Severe acute respiratory syndrome-related coronavirus: the species and its viruses–a statement of the Coronavirus Study Group

    Google Scholar 

  4. Zhu N, Zhang D, Wang W, Li X, Yang B, Song J, Zhao X, Huang B, Shi W, Lu R (2020) A novel coronavirus from patients with pneumonia in China, 2019. N Engl J Med 382(8):727–733

    Article  Google Scholar 

  5. (2020) Clinical study of anti-CD147 humanized meplazumab for injection to treat with 2019-nCoV pneumonia. Clinical Trials.Gov.

  6. Lu R, Zhao X, Li J, Niu P, Yang B, Wu H, Wang W, Song H, Huang B, Zhu N (2020) Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding. Lancet 395(10224):565–574

    Article  Google Scholar 

  7. Moss P (2022) The T cell immune response against SARS-CoV-2. Nat Immunol 23(2):186–193.

    Article  Google Scholar 

  8. Azkur AK, Akdis M, Azkur D, Sokolowska M, van de Veen W, Brüggen M, O’Mahony L, Gao Y, Nadeau K, Akdis CA (2020) Immune response to SARS-CoV-2 and mechanisms of immunopathological changes in COVID-19. Allergy 75(7):1564–1581

    Article  Google Scholar 

  9. Li CK, Wu H, Yan H, Ma S, Wang L, Zhang M, Tang X, Temperton NJ, Weiss RA, Brenchley JM (2008) T cell responses to whole SARS coronavirus in humans. J Immunol 181(8):5490–5500

    Article  Google Scholar 

  10. Ng O-W, Chia A, Tan AT, Jadi RS, Leong HN, Bertoletti A, Tan Y-J (2016) Memory T cell responses targeting the SARS coronavirus persist up to 11 years post-infection. Vaccine 34(17):2008–2014

    Article  Google Scholar 

  11. Guidotti LG, Chisari FV (2000) Cytokine-mediated control of viral infections. Virology 273(2):221–227

    Article  Google Scholar 

  12. Rakib A, Sami SA, Islam MA, Ahmed S, Faiz FB, Khanam BH, Marma KK, Rahman M, Uddin MM, Nainu F, Emran TB, Simal-Gandara J (2020) Epitope-based immunoinformatics approach on nucleocapsid protein of severe acute respiratory syndrome-coronavirus-2. Molecules 25(21):5088.

    Article  Google Scholar 

  13. Chen H-Z, Tang L-L, Yu X-L, Zhou J, Chang Y-F, Wu X (2020) Bioinformatics analysis of epitope-based vaccine design against the novel SARS-CoV-2. Infect Dis Poverty 9(1):88.

    Article  Google Scholar 

  14. Waqas M, Haider A, Sufyan M, Siraj S, Sehgal SA (2020) Determine the potential epitope based peptide vaccine against novel SARS-CoV-2 targeting structural proteins using immunoinformatics approaches. Front Mol Biosci 7

  15. Wieczorek M, Abualrous ET, Sticht J, Álvaro-Benito M, Stolzenberg S, Noé F, Freund C (2017) Major histocompatibility complex (MHC) class I and MHC class II proteins: conformational plasticity in antigen presentation. Front Immunol 8:292

    Article  Google Scholar 

  16. Apostolopoulos V, Yuriev E, Lazoura E, Yu M, Ramsland PA (2008) MHC and MHC-like molecules: structural perspectives on the design of molecular vaccines. Hum Vaccin 4(6):400–409

    Article  Google Scholar 

  17. Hunt DF, Henderson RA, Shabanowitz J, Sakaguchi K, Michel H, Sevilir N, Cox AL, Appella E, Engelhard VH (1992) Characterization of peptides bound to the class I MHC molecule HLA-A2. 1 by mass spectrometry. Science 255(5049):1261–1263

    Article  Google Scholar 

  18. Falk K, Rötzschke O, Stevanovié S, Jung G, Rammensee H-G (1991) Allele-specific motifs revealed by sequencing of self-peptides eluted from MHC molecules. Nature 351(6324):290–296

    Article  Google Scholar 

  19. Van Hateren A, James E, Bailey A, Phillips A, Dalchau N, Elliott T (2010) The cell biology of major histocompatibility complex class I assembly: towards a molecular understanding. Tissue Antigens 76(4):259–275

    Article  Google Scholar 

  20. Blum JS, Wearsch PA, Cresswell P (2013) Pathways of antigen processing. Annu Rev Immunol 31:443–473.

    Article  Google Scholar 

  21. Reynisson B, Alvarez B, Paul S, Peters B, Nielsen M (n.d.) NetMHCpan-4.1 and NetMHCIIpan-4.0: improved predictions of MHC antigen presentation by concurrent motif deconvolution and integration of MS MHC eluted ligand data. Nucleic Acids Res.

  22. Doytchinova IA, Flower DR (2007) VaxiJen: a server for prediction of protective antigens, tumour antigens and subunit vaccines. BMC Bioinformatics 8:4

    Article  Google Scholar 

  23. Doytchinova IA, Flower DR (2007) Identifying candidate subunit vaccines using an alignment-independent method based on principal amino acid properties. Vaccine 25:856–866

    Article  Google Scholar 

  24. Doytchinova IA, Flower DR (2008) Bioinformatic approach for identifying parasite and fungal candidate subunit vaccines. Open Vaccine J 1:22–26

    Article  Google Scholar 

  25. Lamiable A, Thévenet P, Rey J, Vavrusa M, Derreumaux P, Tufféry P (n.d.) PEP-FOLD3: faster de novo structure prediction for linear peptides in solution and in complex. Nucleic Acids Res 44(W1):W449–W454

  26. Shen Y, Maupetit J, Derreumaux P, Tufféry P (2014) Improved PEP-FOLD approach for peptide and miniprotein structure prediction. J Chem Theor Comput 10:4745–4758

    Article  Google Scholar 

  27. Thévenet P, Shen Y, Maupetit J, Guyon F, Derreumaux P, Tuffery P (2012) PEP-FOLD: an updated de novo structure prediction server for both linear and disulfide bonded cyclic peptides. Nucleic Acids Res 40:W288–W293

    Article  Google Scholar 

  28. Gupta S, Kapoor P, Chaudhary K, Gautam A, Kumar R, Raghava GPS (2013) In silico approach for predicting toxicity of peptides and proteins. PLoS One 8(9):e73957.

    Article  Google Scholar 

  29. Dimitrov I, Naneva L, Doytchinova I, Bangov I (2014) AllergenFP: allergenicity prediction by descriptor fingerprints. Bioinformatics (Oxford, England) 30(6):846–851.

    Article  Google Scholar 

  30. Gasteiger E, Hoogland C, Gattiker A, Duvaud S, Wilkins MR, Appel RD, Bairoch A (2005) Protein identification and analysis tools on the ExPASy server. In: Walker JM (ed) The proteomics protocols handbook. Humana Press, pp 571–607.

    Chapter  Google Scholar 

  31. Biswas S, Mudi S (2020) Genetic variation in SARS-CoV-2 may explain variable severity of COVID-19. Med Hypotheses 143:109877.

    Article  Google Scholar 

  32. Secolin R, de Araujo TK, Gonsales MC, Rocha CS, Naslavsky M, De Marco L, Bicalho MAC, Vazquez VL, Zatz M, Silva WA, Lopes-Cendes I (2021) Genetic variability in COVID-19-related genes in the Brazilian population. Human Genome Var 8(1):15.

    Article  Google Scholar 

  33. Migliorini F, Torsiello E, Spiezia F, Oliva F, Tingart M, Maffulli N (2021) Association between HLA genotypes and COVID-19 susceptibility, severity and progression: a comprehensive review of the literature. Eur J Med Res 26(1):84.

    Article  Google Scholar 

  34. Nguyen A, David JK, Maden SK, Wood MA, Weeder BR, Nellore A, Thompson RF (2021) Human leukocyte antigen susceptibility map for severe acute respiratory syndrome coronavirus 2. J Virol 94(13):e00510–e00520.

    Article  Google Scholar 

  35. Tavasolian F, Rashidi M, Hatam GR, Jeddi M, Hosseini AZ, Mosawi SH, Abdollahi E, Inman RD (2021) HLA, immune response, and susceptibility to COVID-19. Front Immunol 11:3581)

    Article  Google Scholar 

  36. Langton DJ, Bourke SC, Lie BA, Reiff G, Natu S, Darlay R, Burn J, Echevarria C (2021) The influence of HLA genotype on the severity of COVID-19 infection. HLA 98(1):14–22.

    Article  Google Scholar 

  37. Hafez M, El-Shennawy FA (1986) HLA-antigens in the Egyptian population. Forensic Sci Int 31(4):241–246.

    Article  Google Scholar 

  38. Abdelhafiz AS, Ali A, Fouda MA, Sayed DM, Kamel MM, Kamal LM, Khalil MA, Bakry RM (2022) HLA-B*15 predicts survival in Egyptian patients with COVID-19. Hum Immunol 83(1):10–16.

    Article  Google Scholar 

  39. Elshakankiry NH, Mossallam GI, Madbouly A, Maiers M, El Haddad A, Kamel H (2017) P227 determination of HLA -A, -B and - DRB1 alleles and HLA-A -B haplotype frequencies in Egyptians based on family study. Hum Immunol 78:222.

    Article  Google Scholar 

  40. Elbe S, Buckland-Merrett G (2017) Data, disease and diplomacy: GISAID’s innovative contribution to global health. Glob Chall 1(1):33–46

    Article  Google Scholar 

  41. Kandeil A, Mostafa A, El-Shesheny R, Shehata M, Roshdy WH, Ahmed SS, Gomaa M, El Taweel A, Kayed AE, Mahmoud SH (2020) Coding-complete genome sequences of two SARS-CoV-2 isolates from Egypt. Microbiol Resour Announc 9(22)

  42. Nucleotide. Bethesda: National Library of Medicine (US), National Center for Biotechnology Information; [1988] (n.d.) Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome.

  43. Wu F, Zhao S, Yu B, Chen YM, Wang W, Song ZG, Hu Y, Tao ZW (2020) Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome. Nature 579(7798):265–269

    Article  Google Scholar 

  44. CLUSTAL W (2008) Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. In: Encyclopedia of genetics, genomics, proteomics and informatics. Springer Netherlands, pp 376–377.

    Chapter  Google Scholar 

  45. Stecher G, Tamura K, Kumar S (2020) Molecular evolutionary genetics analysis (MEGA) for macOS. Mol Biol Evol.

  46. Doytchinova IA, Flower DR (2007) VaxiJen: a server for prediction of protective antigens, tumour antigens and subunit vaccines. BMC Bioinformatics 8:4.

    Article  Google Scholar 

  47. Bhattacharya M, Sharma AR, Patra P, Ghosh P, Sharma G, Patra BC, Lee S, Chakraborty C (2020) Development of epitope-based peptide vaccine against novel coronavirus 2019 (SARS-COV-2): Immunoinformatics approach. J Med Virol 92(6):618–631

    Article  Google Scholar 

  48. Enayatkhani M, Hasaniazad M, Faezi S, Gouklani H, Davoodian P, Ahmadi N, Einakian MA, Karmostaji A, Ahmadi K (2021) Reverse vaccinology approach to design a novel multi-epitope vaccine candidate against COVID-19: an in silico study. J Biomol Struct Dyn 39(8):2857–2872.

    Article  Google Scholar 

  49. Berman HM, Henrick K, Nakamura H (2003) Announcing the worldwide Protein Data Bank. Nat Struct Biol 10(12):980

    Article  Google Scholar 

  50. Burley SK, Berman HM et al (2019) RCSB protein data bank: biological macromolecular structures enabling research and education in fundamental biology, biomedicine, biotechnology and energy. Nucleic Acids Res 47:D464–D474.

    Article  Google Scholar 

  51. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) The protein data bank. Nucleic Acids Res 28:235–242

    Article  Google Scholar 

  52. Ali M, Pandey RK, Khatoon N, Narula A, Mishra A, Prajapati VK (2017) Exploring dengue genome to construct a multi-epitope based subunit vaccine by utilizing immunoinformatics approach to battle against dengue infection. Sci Rep 7(1):9232.

    Article  Google Scholar 

  53. Knapp B, Deane CM (2016) T-cell receptor binding affects the dynamics of the peptide/MHC-I complex. J Chem Inf Model 56(1):46–53.

    Article  Google Scholar 

  54. Ayres CM, Corcelli SA, Baker BM (2017) Peptide and peptide-dependent motions in MHC proteins: immunological implications and biophysical underpinnings. Front Immunol 8:935)

    Article  Google Scholar 

  55. Huang Y, Yang C, Xu X, Xu W, Liu S (2020) Structural and functional properties of SARS-CoV-2 spike protein: potential antivirus drug development for COVID-19. Acta Pharmacol Sin 41(9):1141–1149.

    Article  Google Scholar 

  56. Walls AC, Park Y-J, Tortorici MA, Wall A, McGuire AT, Veesler D (2020) Structure, function, and antigenicity of the SARS-CoV-2 spike glycoprotein. Cell 181(2):281–292.e6.

    Article  Google Scholar 

  57. Yan R, Zhang Y, Li Y, Xia L, Guo Y, Zhou Q (2020) Structural basis for the recognition of SARS-CoV-2 by full-length human ACE2. Science 367(6485):1444 LP–1441448.

    Article  Google Scholar 

  58. Hulswit RJG, de Haan CAM, Bosch B-J (2016) Coronavirus spike protein and tropism changes. Adv Virus Res 96:29–57.

    Article  Google Scholar 

  59. Gui M, Song W, Zhou H, Xu J, Chen S, Xiang Y, Wang X (2017) Cryo-electron microscopy structures of the SARS-CoV spike glycoprotein reveal a prerequisite conformational state for receptor binding. Cell Res 27(1):119–129.

    Article  Google Scholar 

  60. Lan J, Ge J, Yu J, Shan S, Zhou H, Fan S, Zhang Q, Shi X, Wang Q, Zhang L, Wang X (2020) Structure of the SARS-CoV-2 spike receptor-binding domain bound to the ACE2 receptor. Nature 581(7807):215–220.

    Article  Google Scholar 

  61. Wang Q, Zhang Y, Wu L, Niu S, Song C, Zhang Z, Lu G, Qiao C, Hu Y, Yuen K-Y, Wang Q, Zhou H, Yan J, Qi J (2020) Structural and functional basis of SARS-CoV-2 entry by using human ACE2. Cell 181(4):894–904.e9.

    Article  Google Scholar 

  62. Kamitani W, Narayanan K, Huang C, Lokugamage K, Ikegami T, Ito N, Kubo H, Makino S (2006) Severe acute respiratory syndrome coronavirus nsp1 protein suppresses host gene expression by promoting host mRNA degradation. Proc Natl Acad Sci 103(34):12885 LP–12812890.

    Article  Google Scholar 

  63. Law AHY, Lee DCW, Cheung BKW, Yim HCH, Lau ASY (2007) Role for nonstructural protein 1 of severe acute respiratory syndrome coronavirus in chemokine dysregulation. J Virol 81(1):416–422.

    Article  Google Scholar 

  64. Putics Á, Filipowicz W, Hall J, Gorbalenya AE, Ziebuhr J (2005) ADP-ribose-1-monophosphatase: a conserved coronavirus enzyme that is dispensable for viral replication in tissue culture. J Virol 79(20):12721 LP–12712731.

    Article  Google Scholar 

  65. Snijder EJ, Bredenbeek PJ, Dobbe JC, Thiel V, Ziebuhr J, Poon LLM, Guan Y, Rozanov M, Spaan WJM, Gorbalenya AE (2003) Unique and conserved features of genome and proteome of SARS-coronavirus, an early split-off from the coronavirus group 2 lineage. J Mol Biol 331(5):991–1004.

    Article  Google Scholar 

  66. Graham RL, Sparks JS, Eckerle LD, Sims AC, Denison MR (2008) SARS coronavirus replicase proteins in pathogenesis. Virus Res 133(1):88–100.

    Article  Google Scholar 

  67. Shomuradova AS, Vagida MS, Sheetikov SA, Zornikova KV, Kiryukhin D, Titov A, Peshkova IO, Khmelevskaya A, Dianov DV, Malasheva M, Shmelev A, Serdyuk Y, Bagaev DV, Pivnyuk A, Shcherbinin DS, Maleeva AV, Shakirova NT, Pilunov A, Malko DB et al (2020) SARS-CoV-2 epitopes are recognized by a public and diverse repertoire of human T-cell receptors. MedRxiv:2020.05.20.20107813.

  68. Baruah V, Bose S (2020) Immunoinformatics-aided identification of T cell and B cell epitopes in the surface glycoprotein of 2019-nCoV. J Med Virol 92(5):495–500

    Article  Google Scholar 

  69. Poran A, Harjanto D, Malloy M, Arieta CM, Rothenberg DA, Lenkala D, van Buuren MM, Addona TA, Rooney MS, Srinivasan L (2020) Sequence-based prediction of SARS-CoV-2 vaccine targets using a mass spectrometry-based bioinformatics predictor identifies immunogenic T cell epitopes. Genome Med 12(1):1–15

    Article  Google Scholar 

  70. Grifoni A, Sidney J, Zhang Y, Scheuermann RH, Peters B, Sette A (2020) A sequence homology and bioinformatic approach can predict candidate targets for immune responses to SARS-CoV-2. Cell Host Microbe 27(4):671–680

    Article  Google Scholar 

  71. Requena D, Médico A, Chacón RD, Ramírez M, Marín-Sánchez O (2020) Identification of novel candidate epitopes on SARS-CoV-2 proteins for south America: a review of HLA frequencies by country. Front Immunol 11:2008

    Article  Google Scholar 

  72. Jain R, Jain A, Verma SK (2021) Prediction of epitope based peptides for vaccine development from complete proteome of novel corona virus (SARS-COV-2) using immunoinformatics. Int J Pept Res Ther 27(3):1729–1740.

    Article  Google Scholar 

  73. Chukwudozie OS, Gray CM, Fagbayi TA, Chukwuanukwu RC, Oyebanji VO, Bankole TT, Adewole RA, Daniel EM (2021) Immuno-informatics design of a multimeric epitope peptide based vaccine targeting SARS-CoV-2 spike glycoprotein. PLoS One 16(3):e0248061.

    Article  Google Scholar 

Download references


Not applicable.


This work was supported by ASRT grant # 7479, from the Egyptian Academy of Scientific Research and Technology (ASRT), and by internal funding from Zewail City of Science and Technology (ZC 003-2019).

Author information

Authors and Affiliations



NE.A designed the analysis performed; multiple sequence alignment, epitope prediction, homology modeling, and molecular docking. NE.A, NI.G, and AO.E performed homology modeling, molecular docking, and writing the manuscript; RHM and NE editing and review of the manuscript. The authors read and approved the final manuscript.

Corresponding author

Correspondence to Nagwa El-Badri.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original online version of this article was revised: the authors identified an error in the HTML version of Fig. 1. The publisher apologise for this error.

Supplementary Information

Additional file 1: Figure S1.

Molecular docking of (a, b) ORF1ab and (c – e) Spike epitopes (No. 17, 12, 25 & 29 in the Supplementary Table 1) with both 5YXN MHC I molecules and TCR chains. Figure S2. Molecular docking of (a, b) ORF1ab and (c – e) Spike epitopes (No. 24, 79, 12, & 29 in Supplementary Table 1) with both 4PRP MHC I molecule and TCR chains.

Additional file 2: Supplementary Table 1.

Rights and permissions

This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Awad, N., Mohamed, R.H., Ghoneim, N.I. et al. Immunoinformatics approach of epitope prediction for SARS-CoV-2. J Genet Eng Biotechnol 20, 60 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • SARS-CoV-2
  • MHC class I epitopes
  • ORF1ab protein
  • Spike protein
  • Immunoinformatics