Construction, expression, and in vitro assembly of virus-like particles of L1 protein of human papillomavirus type 52 in Escherichia coli BL21 DE3

Background A major discovery in human etiology recognized that cervical cancer is a consequence of an infection caused by some mucosatropic types of human papillomavirus (HPV). Since L1 protein of HPV is able to induce the formation of neutralizing antibodies, it becomes a protein target to develop HPV vaccines. Therefore, this study aims to obtain and analyze the expression of HPV subunit recombinant protein, namely L1 HPV 52 in E. coli BL21 DE3. The raw material used was L1 HPV 52 protein, while the synthetic gene, which is measured at 1473 bp in pD451-MR plasmid, was codon-optimized (ATUM) and successfully integrated into 5643 base pairs (bps) of pETSUMO. Bioinformatic studies were also conducted to analyze B cell epitope, T cell epitope, and immunogenicity prediction for L1HPV52 protein. Results The pETSUMO-L1HPV52 construct was successfully obtained in a correct ligation size when it was cut with EcoRI. Digestion by EcoRI revealed a size of 5953 and 1160 bps for both TA cloning petSUMO vector and gene of interest, respectively. Furthermore, the right direction of construct pETSUMO-L1HPV52 was proven by PCR techniques using specific primer pairs then followed by sequencing, which shows 147 base pairs. Characterization of L1 HPV 52 by SDS-PAGE analysis confirms the presence of a protein band at a size of ~55 kDa with 6.12 mg/L of total protein concentration. Observation under by transmission electron microscope demonstrates the formation of VLP-L1 at a size between 30 and 40 nm in assembly buffer under the condition of pH 5.4. Based on bioinformatics studies, we found that there are three B cell epitopes (GFPDTSFYNPET, DYLQMASEPY, KEKFSADLDQFP) and four T cell epitopes (YLQMASEPY, PYGDSLFFF, DSLFFFLRR, MFVRHFFNR). Moreover, an immunogenicity study shows that among all the T cell epitopes, the one that has the highest affinity value is DSLFFFLRR for Indonesian HLAs. Conclusion Regarding the achievement on successful formation of L1 HPV52-VLPs, followed by some possibilities found from bioinformatics studies, this study suggests promising results for future development of L1 HPV type 52 vaccine in Indonesia.


Background
Human papillomavirus (HPV) causes the most common viral infection in the human reproductive tract system; it consists of small double-stranded DNA pathogens that infect the epithelium. Recently, more than 200 HPVs have been identified and differentiated by their genomic  20:19 sequences. Furthermore, around 40 types of those HPV's variants infect mucosal epithelium, and epidemiology studies found its strong association with cervical cancer. The HPV type-16 causes approximately 50% of cervical cancers globally, while its combination with type-18 increases the cases up to 66%. Additionally, the five-high risk types (31,33,45,52, and 58) are responsible for 15% cervical cancers, and 11% of all cases associated with HPV infection [1]. In Indonesia, the highest HPV prevalence is caused by HPV 52 (23.2%), 16 (18%), 18 (16.1%), and 39 (11.8%). The HPV 52 is known to be the most prevalent type in Indonesia, while another study also identified it to be the frequent cause of cervical cancer in East Asian Countries (Japan, South Korea, Taiwan, and China) than other parts of the world [2]. Therefore, based on these reports, HPV 52 prophylactic vaccine is suggested to be introduced in Indonesia, in order to curtail the prevailing human papillomavirus. Human papillomavirus consists of six genes located in the early region of the genome (E1, E2, E4, E5, E6, E7) and two (L1 and L2) in the late region. The exterior surface of papillomavirus virion is composed of a pentameric L1 capsomer and accommodates up to 72 molecules of a minor capsid protein L2, which is only minimally exposed. These dominant L1 features obviously mediate the initial attachment to the host cells. Additionally, the recombinant L1 protein spontaneously self-assembled into a highly immunogenic structure, that closely mimics the original surface of the HPV without genetic material [3]. Two prophylactic HPV vaccines have been licensed since 2006/2007 and composed virus-like particles (VLPs) of the L1 capsid protein. These include Cervarix ® , a bivalent HPV (bHPV) 16/18, and Gardasil ® (also known as Silgard), a quadrivalent type (qHPV) 6, 11, 16, 18 [3]. A 9-valent recombinant protein subunit HPV vaccine (9vHPV, Gardasil 9) has also been licensed for use, which prevents types 6,11,16,18,31,33,45,52, and 58.
The Indonesian government has set a national HPV vaccination program; however, the main obstacle encountered during implementation includes the inability to produce the vaccine domestically and the need to import from other countries. Besides the concern in regards of vaccine supplies, it also requires a huge budget in the purchase. Therefore, a solution regarding the availability of the vaccines should be provided, by implementing domestic production. This study focused on the production of VLP L1 protein that is highly recommended as a future vaccine to overcome for HPV type 52 infection in Indonesia. The VLP is obtained through L1 production in various expression systems, including mammalian cells, plants, bacteria, insects, and yeast. Although eukaryotic cells produce highly effective vaccines, this vector has several drawbacks, including high production and purification costs. In contrast, the bacterial expression system (Escherichia coli) has already been used in heterologous recombinant proteins with many advantages, such as faster growth, low production cost, ease during gene manipulation, and scaling up [4]. The bacterial expression system does not directly produce VLPs; it is achieved after a purification process and assembled into VLPs [5,6]. To obtain the highest protein yield, an appropriate expression vector should be validated to give an efficient method for high-level gene expression.
Formation of inclusion bodies in bacterial hosts poses a major challenge for large-scale production [6]. The SUMO fusion protein is a small ubiquitin-related modifier that enhances the solubility of the expressed recombinant protein [7]. This study used a codon-optimized L1 HPV 52 to induce higher expression in E. coli BL21 DE3 that is composed of a Champion pET SUMO protein expression system. This system facilitates an easy purification process for obtaining the native L1 protein.

Bacterial strains and plasmids
The bacterial strains and plasmids used in this study are shown in Table 1.

Construction of truncated L1 HPV type 52 in E. coli BL21 DE3
The complete genome used as template for HPV 52 L1 protein sequence was obtained by the genome analysis from the database of the National Centre for Biotechnology Information (NCBI) (https:// www. ncbi. nlm. nih. gov/). All the sequences were then subjected to multiple alignment analysis, and the HPV 52 L1 protein sequence having 100% similarity was selected as the template (GenBank accession no. APQ44871.1) [8]. The synthetic gene of L1-HPV52 used was designed as truncated L1 for its 26 amino acids in the N-terminal and the codonoptimized by ATUM Company DNA 2.0 Gene Design & Synthesis (Newark, California). It was further integrated into pD451-MR inducible plasmid (pD451-MR: 399524) and transformed into E. coli BL21 DE3 expression vector. , which produce linear gene product. The pETSUMO-L1HPV52 (+) construct was transformed into expression vector host E. coli BL21 (DE3), followed by DNA sequencing using SUMO forward (5′-AGA TTC TTG TAC GAC GGT ATTAG-3′) and T7 Reverse (5′-TAG TTA TTG CTC AGC GGT GG-3′) primers set at the 1st Base Laboratories (Malaysia) as validation of successful transformation into expression host. The sequence analysis was performed with the BLAST method (https:// blast. ncbi. nlm. nih. gov/ Blast).

Expression and purification of recombinant His-SUMO-L1-HPV52 protein
The recombinant E. coli BL21 DE3 bacteria harboring pETSUMO-L1HPV52 were grown on liquid Luria Bertani (LB) medium with an addition of 100 μg/mL of kanamycin. Then, 1% of pre-cultured bacteria was inoculated into a fresh LB medium and incubated at 37 °C for ± 3 h, until OD 600 reached ~0.5-0.6, which is a logarithmic phase of the E. coli BL21 DE3. The culture induction used 0.5 mM IPTG along with an addition of 1% (v/v) glucose and incubated at 20 °C for ± 5 h, until the OD 600 reached ~1.0 as the stationary phase of the E. coli BL21 DE3 [7,9]. The bacterial cells were harvested by centrifugation at 6000 rpm for 10 min. The cell pellets were then resuspended with 0.5% lysis buffer, containing 50 mM potassium phosphate pH 7.8, 400 mM NaCl, 100 mM KCl, 10% glycerol, 0.1% triton X-100, 5 mM imidazole, 100 μg/mL lysozyme, and 1 mM PMSF. The protein was harvested by centrifugation at 12,000 rpm for 10 min. The supernatant was collected as crude protein samples and stored at -20 °C for further analysis. The purification of L1-HPV52 protein with His-6x and the SUMO protein fusion were carried out using the Ni-NTA agarose (Thermo) procedure in native conditions [10].

SUMO cleavage and assembly of L1-HPV52
Further characterization was performed to obtain purified L1-HPV52. The protein fusion tag should be cleaved on the SUMO cleavage site using SUMO protease. As much as 10 units of SUMO protease was used to cleave SUMO fusion tag from the recombinant His-SUMO-L1 HPV52 and processed in 10x protease buffer without NaCl and 1x Native Binding Buffer (-salt) of the total volume 1.5 mL, in the rotating resin overnight (±16 h) at 4°C. The supernatant was collected per 0.5 mL sample, and purified L1-HPV52 (100 μg/mL) was assembled with buffer pH 5.4 (1M NaCl, 40 mM sodium acetate) for 30 min at 25°C [9]. Observations of self-assembled L1-HPV52 VLP were carried out using a transmission electron microscope (TEM-JEOL JEM 1400).

Protein characterization
Characterization of purified His-SUMO-L1HPV52 protein was analyzed using SDS-PAGE 10% [34] and Western blot at a voltage of 110 V for ± 90 min [11]. The total protein concentration was determined using Bicinchoninic Acid (BCA) assay kit (Thermo Fisher Scientific, USA) [12].

Bioinformatic studies, B cell epitope prediction, T cell epitope prediction, and immunogenicity analysis
The L1HPV52 sequencing results both from pDH451-MR_L1HPV52 and pETSUMO_L1HPV52 were checked and translated using BioEdit ver 7.2 and ExPASy DNA translate tool. Subsequently, a comparison was performed between DNA and amino acid (aa) sequence with Basic Local Alignment Search Tools (BLAST) ((https:// www. ncbi. nlm. nih. gov/)).
Epitope B cell prediction was done using IEDB analysis (http:// tools. iedb. org/ ellip ro/). The L1HPV52 aa sequence as the translation result from ExPASy in linear form was used to predict B cell epitope. Prediction of the position of B cell epitopes in monomer, pentamer, and VLP of L1HPV52 was done in Swiss-Pdb Viewer (SPDBv) v4.1.
The immunogenicity server (http:// tools. immun eepit ope. org/ immun ogene city/) was used for Epitope T cell and immunogenicity prediction. In this study, we used some Indonesian HLAs, classes I and II. Moreover, Swiss-Pdb Viewer (SPDBv) v4.1 was used to predict the position of T-cell epitopes in monomer, pentamer, and VLP of L1HPV52 forms.
Along with the plasmid-sequencing step, isolation of pD451-MR_L1-HPV52 plasmid was also carried out to confirm the insertion in pD451-MR plasmid (Fig. 2B, C). Figure 2B is the uncut pD451-MR_L1-HPV52 plasmid. There are 2 plasmid bands that are not in their proper position because the plasmid is in circular shape. Thus, it can form nicked and supercoiled states, which cause different migration speeds. Plasmids in supercoiled form will migrate faster than those in nicked form in agarose gel [25]. The pD451-MR_ L1-HPV52 was digested with EcoRI and XbaI (Fig. 2C).
The EcoRI can cut pD451-MR_L1-HPV52 at position 2543 while XbaI can cut this plasmid at position 1411. As the result, there are 2 bands in a size of 4507 bp and 933 bp. All the results mentioned above are useful to confirm specific primer design to obtain the L1-HPV52. Furthermore, the insert gene itself was amplified using a specific primer at 53°C and generated a single band under 1500 bp DNA ladder. The PCR product was confirmed as L1-HPV52 size at 1473 bps (Fig. 3). These results confirmed that the L1-HPV52 gene was integrated in the right size and direction. The recombinant transformation of pETSUMO-L1HPV52 into E. coli BL21 DE3 was carried out using a heat shock approach [13], and confirmed by colony PCR. The results show that there were six colonies (2,4,5,8,9, and 10) confirmed as positive, carrying the pETSUMO-L1HPV52 construct with band sizes 1473 bp (Fig. 4). Further confirmation of positive colonies was carried out by digesting the pETSUMO-L1HPV52 to identify the presence of ligated inserts and determine the direction in the TA cloning-pETSUMO vector. The plasmids of the confirmed colony were digested using XbaI (NEB), which only cut the plasmid in one site and generated linearized DNA. From the results shown, only one colony (number 9) was confirmed to carry the insert in the right direction, by demonstrating a DNA band on the size of 7113 bp after further digestion using XbaI restriction enzyme (Fig. 5).

Expression, purification, and characterization of recombinant L1-HPV52 protein
Protein characterization was performed by SDS-PAGE analysis, to confirm the size of the recombinant protein constructed in pETSUMO-L1HPV52 expression vector and expressed in E. coli BL21 DE3 expression host. Figure 6 shows bioinformatic analysis of 3-dimensional structure of His-SUMO-L1HPV52 and L1-HPV52 proteins. To determine whether the protein was properly expressed, SDS-PAGE analysis in Fig. 6A shows a protein band at ~68 kDa, which made a different profile between induced and uninduced samples. These results were also confirmed by immunoblotting, that the recombinant bacterial harboring the targeted gene was detected (Fig. 6B). The purified His-SUMO-L1HPV52 profile generated 2 bands, which were suspected to be lacking in the resin washing process (Fig. 6C). Therefore, other protein bands were still visible, both in the crude and eluate samples. The total protein obtained in every purification step was measured using BCA assay ( Table 2). The concentration obtained in Ni-NTA purification was low, because presumably, there were still abundant target proteins remaining in the resin. The eluents were collected (only six fractions) according to the original purification manual and were directly analyzed. Further characterization of the purified L1-HPV52 did not use the elution process since it lacks protein stability.
To produce pure L1-HPV52 protein, the poly-histidine-tagged SUMO fusion of the purified L1-HPV52 recombinant was removed with SUMO protease. The SDS-PAGE analysis shows a clear band in the size of 55 kDa, and also in the immunoblot assay for the fraction cleaved products 1-4, and 5-6 had not detected any band. This indicates that there was a lack of the cleaved products (Fig. 7). More precise validation of the L1-HPV52 protein was also done, we made a comparison with commercial L152 (Creative Diagnostic-DAGF-234) (Fig. 8), and results show that our protein generates the same characteristic/pattern through immunoblot assay.

Virus-like particles assembly of L1-HPV52 protein
In vitro assembly of VLP was conducted under defined and controllable conditions. The soluble form of HPV capsid protein is normally favorable for assembly. The L1-HPV52 without any fusion-tag protein had been successfully purified, well-characterized, and assembled in acid condition (pH 5.4). The transmission electron image of L1-VLPs was found to be homogenous in size, being 30-40 nm in diameter, and gave the mean at 26 nm (Fig. 9).

Bioinformatic studies, assessment of B cell epitope prediction, T cell epitope prediction, and immunogenicity analysis
The bioinformatic study results can be observed at Figs. 10 and 11. Figure 10 shows the DNA and aa sequences of L1HPV52 that was cloned and expressed by using pD451-MR_L1 and pETSUMO in E. coli BL21 DE3. Figure 11 shows the BLASTp analysis results in aa level. Some B-cell epitopes prediction by using IEDB tools is shown on Table 3. There are 17 candidates. The average, minimum and maximum values of those 17 epitopes are 0.487, 0.218, and 0.710, respectively. Some epitopes (number 1, 2, 3, 10, 12, and 16) are conserved region (no mutation) when it was compared and aligned to other peptides in L1HPV52 from NCBI. Out of those 17 B cell epitopes sequences, there are only 3 epitopes that have no mutation, they are not too short or too long, and positioned on the outer surface of L1HPV52 protein in monomer, pentamer, and VLP forms. The position of each B cell epitopes was shown at Fig. 12.
The T cell epitope prediction study shows that there are six T cell epitopes that can attach to Indonesian HLAs class I (Fig. 13). Epitope no 1 can attach HLA-B*15:02. Epitope No. 2 and 6 can attach with HLA-A*24:02. Epitope no 3, 4, and 5 can attach with HLA-A*33:03. Furthermore, the number 1 Tm cell epitope is also predicted as B cell epitope (No. 10 on Table 3).
The study about T-cel epitope prediction, which is focused on its binding to Indonesian HLA class II (HLA-DRB1*12:02), found T-cell epitope on aa from 234 until 249 (Fig. 14). Figure 15 shows that peptide DSLFFFLRR (marked as no 3 in Fig. 13) has the highest affinity value than others. A higher score indicates a greater likelihood of eliciting an immune.

Discussion
Several strategic steps had already been reported in overcoming HPV cancers, and also, diverse difficulties need to be solved for a successful L1 protein expression, in order to meet the protein demands. The expression of L1 in E. coli was reportedly low, forming inclusion bodies that induce misfolded protein. Evidences suggest the truncation of the N-terminal and exclude the strong secondary structure inhibitor elements [17]. A recombinant construct of L1-HPV52 protein with 26 aa deletion of the N-terminal was developed, using an advanced pET-SUMO expression system. Cloning of L1-HPV52 into appropriate vector is crucial for more efficient protein production that generates a high yield. The pETSUMO expression system employs TA cloning method that assures fast and efficient function [19]. Despite the challenges of using a bacterial expression system that generates inclusion bodies, pETSUMO solves the problem by enhancing the solubility of the partially insoluble protein.
The SUMO tag became covalently conjugated to other proteins via an amide linkage, between C-terminal carboxyl and amino group in a lysine side chain [20]. Additionally, the expression system enabled the production of native protein by eliminating the poly histidine-tag SUMO fusion protein, which in turn, potentially affects the native conformation of the target protein [21].
This study showed that the recombinant protein with fusion tag remained in the resin when elution was done by 250 mM imidazole. It was hypothesized that poly histidine tag promotes oligomerization of the recombinant protein, which indicates higher imidazole concentration is needed to elute the protein [7]. Since the washing process is composed a low imidazole, the steps should be optimized to maintain the target protein. On the other hand, buffers play a significant role to maintain protein stability, where most of the protein stability was corroborated above the isoelectric point (pI). The isoelectric point of His-SUMO-L1HPV52 protein was 6.8, and the elution buffer used during the purification step was pH 8.0. The stability of the purified protein in this buffer did not persist for a long time because of the protein degradation process; therefore, the antibody was unable to detect any trace of purified protein through western blot analysis (data not shown). The occurred degradation process was caused by high concentrations of imidazole component in the elution buffer. The role of imidazole in protein degradation was explained as a catalytic reaction on histidine residues, therefore, purified protein with His-tag fusion was recommended to undergo dialysis with a buffer that maintains protein stability during storage [14].
In addition, some proteins with His-tag fusion are least stable in solutions for having pH values close to or lower than the calculated amount [22]. To overcome this challenge, the cleaved recombinant was directly processed further while it was in the resin, without eluting the protein and this decision successfully generated a soluble protein with single band at 55 kDa, which then was considered as an optimal procedure, to obtain the highest yield and concentration. Furthermore, the cleaved product was also comparable with the commercial L1 that was confirmed by immunoblot analysis. The SUMO fusion prevents protein aggregation even after the cleavage, due to its chaperone-like function to assist proper folding. Growth condition at 30 °C is good for expression of many SUMO fusion proteins, however, whenever the protein target is found insoluble at this temperature, it is then necessary to explore lower degrees (down to 15°C) [15]. In correlation to those previous findings, our results show similar outcomes, where the expression was performed at 20°C, it generated good results before and after the fusion cleavage.
In addition, to maintain the soluble native protein after cleavage and assembly into VLP, the L1-HPV52 was incubated in the low pH condition. The pH plays an important role for in vitro assembly, since it affects capsid-protein charge. Additionally, low temperatures are normally favorable as they reduce protein aggregation and chemical degradation. The L1 VLPs composed of 72-pentamers. Former research on HPV L1 has found that deletion of ten N-terminal residues led to assembly of a 12-pentamers rather than 72 [17]. Another research discoveries on Norovirus-like particles state that deletion of 34 and 98 amino acids of GII.4 Sydney (VP1) VLPs did not show any detectable particle with electron microscopy, however deletion of 26 and 38 amino acids introduced VLPs assembly [18]. This study is more likely to support the previous finding, where 26 amino acids truncated L1 protein successfully assembled into VLPs.
It is known that the size and homogeneity of observed particles depend on N-terminal truncation [15]. In our study, purified L1 VLP HPV52 showed variable particle size with mean ~26 nm, while the final yield of VLPs obtained was ± 6 mg/L. The heterogeneous sizes of the HPV L1 VLPs among different types were caused by a varied amino acid sequence in the N-terminal domains. Evidence suggests that the first 129 nucleotides in the 5′-end are composed of a strong RNA inhibitory component, and at least 10 and 30 residues were deleted from the N and C-terminus [17]. Truncation of ten residues in the N-terminal generated small L1 11/16 VLP with ~30 nm diameter [23], while 15 amino acid truncation generated L152 VLP with ~ 55 nm diameter [24].
The bioinformatic study out of sequencing result for L1HPV52 gene that has been inserted into pETSUMO has a total length of 1476 bp with 2 stop codons (TAA and TGA) on downstream of the gene. It expresses the major capsid protein (L1) HPV52 with a sequence length of 490 aa. Prediction of molecular weight, using bioedit v7.2, is 54846.63 Daltons (Da) or 54.9 kDa. The size is smaller than the native L1HPV52 because as many as 117 bp (39 aa) in the upstream of the gene were removed. The purpose of this partial deletion is based on a research conducted by Wei M and colleagues in 2018 [24], where they found that removal of 15 aa in the N-terminal of L1 HPV52 can increase their soluble expression in E. coli and in vitro self-assembly.
The B cells have an important role in HPV-associated cancer immunotherapy and response to cervical epithelial neoplasms and invasive cancers caused by HPV [26]. The EIDB results show that there are 17 B cell epitopes of L1HPV52. The sizes are varied, from 1 until 25 aa. Among those 17 candidates, 5 epitopes were selected, namely numbers 1, 2, 10, 12, and 16, based on their aa lengths that are neither too long nor too short. Other related studies of B cell epitope mentions that selection of a specific aa is usually made if it has not too long and short sequences, such as epitope studies on HPV 16 [27] and HPV33 and 58 [28]. In addition, this epitope has no mutations (conserved region) when it was aligned to 80 full coding sequences of L1HPV52 sin NCBI Genbank (Data not shown).
From these 5 epitopes, the 3 of them are located on the outer side of L1HPV52 protein, in all forms of L1HPV52 (monomer, pentamer, and VLP). These outer B cell epitopes were chosen regarding the fact that B cell only can recognize the outer epitope of an antigen. Attracting the B cells is important due to its function during the phagocytosis process when there is an antigen enters the body. If these three epitopes (number 3, 10, and 16) are recognized by B cell, then this B cell will engulf and degrade/break L1HPV52 antigen into smaller parts of peptides. Antigen phagocytosis by B cells is required for a potent humoral response [29].
The T cell epitope is associated with human leukocyte antigen (HLA). The HLA class I regions (HLA-A, B, and C) are carrying the highly polymorphic gene and those unique characteristic makes HLA precisely fit within its interaction through immunology view. The HLA class II regions (DP, DN, DM, DO, DQ, DR) are involved in antigen processing and presentation. While, the class III regions, contain genes that are implicated in inflammatory responses, leucocyte maturation, and the complement cascade [30].
The HLA recognizes foreign proteins (peptides) present in germs that enter the human body. If there is an interaction between HLA and peptide, the interaction formed will be brought to the cell surface and then recognized by T cells which will cause an immune reaction. HLA is highly selective and only binds to specific peptides, so it is important to predict the match between HLA protein and antigen peptide or T cell epitope so that their formation can trigger an immune response [31].