In silico analysis of a novel causative mutation in Cadherin23 gene identified in an Omani family with hearing loss

Background Hereditary hearing loss is a heterogeneous group of complex disorders with an overall incidence of one in every 500 newborns presented as syndromic and non-syndromic forms. Cadherin-related 23 (CDH23) is one of the listed deafness causative genes. It is found to be expressed in the stereocilia of hair cells and in the retina photoreceptor cells. Defective CDH23 have been associated mostly with prelingual severe-to-profound sensorineural hearing loss (SNHL) in either syndromic (USH1D) or non-syndromic SNHL (DFNB12) deafness. The purpose of this study was to identify causative mutations in an Omani family diagnosed with severe-profound sensorineural hearing loss by whole exome sequencing technique and analyzing the detected variant in silico for pathogenicity using several in silico mutation prediction software. Results A novel homozygous missense variant, c.A7436C (p. D2479A), in exon 53 of CDH23 was detected in the family while the control samples were all negative for the detected variant. In silico mutation prediction analysis showed the novel substituted D2479A to be deleterious and protein destabilizing mutation at a conserved site on CDH23 protein. Conclusion In silico mutation prediction analysis might be used as a useful molecular diagnostic tool benefiting both genetic counseling and mutation verification. The aspartic acid 2479 alanine missense substitution might be the main disease-causing mutation that damages CDH23 function and could be used as a genetic hearing loss marker for this particular Omani family.


Background
With the high rate of consanguineous marriages, several inherited diseases have been diagnosed among the Arab population including syndromic and non-syndromic deafness. A survey conducted in 2016 indicated that up to 49% of Omani marriages were consanguineous [1]. As a result, 70% of hearing loss cases in Oman were reported as possible inherited forms and until now, two genes have been reported to be involved in nonsyndromic autosomal recessive genetic deafness in Oman, MYO15A, and Otoferlin [2][3][4].
There are no exact statistical figures of syndromic or non-syndromic hearing loss in Oman. However, worldwide studies revealed that approximately 466 million people (5.0%) of the world's population were suffering from hearing loss [5]. Earlier studies highlighted that about 30% of the total deafness cases are syndromic [6]. Usher syndrome (USH) is one of the syndromic deafness forms with an estimated prevalence of 1 in 6000 to 1 in 10,000 representing about 6% of the total congenital deafness and approximately 50% of hereditary deaf-blind individuals. USH is a genetic disorder accompanied by a dual sensory impairment, sensorineural hearing loss, retinitis pigmentosa, and variable vestibular dysfunction. Clinically, it is categorized into three subclasses: USH1, USH2, and USH3. USH1 is the most severe form, characterized by congenital severe to profound deafness, vestibular dysfunction, and prepubertal onset of visual loss. It accounts for 33 to 44% of USH cases. USH2 impact has been ranked from moderate to severe hearing loss with no vestibular dysfunction. It affects 56 to 67% of all USH patients. USH3 is less severe and is characterized by progressive hearing and vestibular function loss. It is found in 1 to 6% in the general population. However, in the Finnish and Ashkenazi Jews populations, it rises to about 40% [7][8][9][10]. So far, 13 genes have been identified to be involved in USH development (https://sph.uth.edu/ Retnet/sum-dis.htm). Among these genes is cadherinrelated 23 (CDH23), causing Usher syndrome type 1D (USH1D) [11,12]. Studies revealed that a defective CDH23 gene plays an important role in developing Usher syndrome (OMIM #601067) where it accounts for up to 32% of USH1 cases [13]. More than 350 associated mutations have been reported as either homozygous nonsense, frame-shift, splice-site, or missense mutations [14,15]. Defective CDH23 was also detected in autosomal recessive non-syndromic hearing loss (OMIM #601386) (DFNB12) where more than 24 associated mutations have been reported as missense mutations [16][17][18][19]. In the cell membrane, CDH23 interacts with procadherin 15 (PCDH15) to create stereocilia organization and hair bundle formation which reflects its importance in normal inner-ear mechanotransduction [20].
Next generation sequencing (NGS) made a big leap in genome DNA sequencing. A whole exome and a gene panel can be rapidly sequenced, and the abnormality and specificity of the genome can be detected in a short period. However, the Sanger principle remains a useful technique for sequencing a short DNA fragment and for the confirmation of the NGS findings.
In this study, we genetically analyzed an Omani family diagnosed clinically with severe to profound hearing loss. Mutation detection was performed by Illumina HiSeq2000 platform (Illumina Inc., San Diego, CA, USA) NGS technique. DNA of an affected family member was sequenced to identify the family-specific mutated gene loci. The Sanger sequencing technique (ABI 3130 xl) was then applied for the whole family and control samples to confirm the NGS findings. A homozygous missense mutation in exon 53 of the cadherin-related 23 gene (CDH23) was detected in all affected members but was absent in the normal family members and controls. Subsequently, in silico genetic testing was used to verify the pathogenicity of the identified mutation.

Methods
This study was conducted by the Department of Biochemistry, College of Medicine and Health Sciences, Sultan Qaboos University, and the Department of ENT, Al Nahdha Hospital, Ministry of Health, Oman, with collaboration from the Medical Genetics Unit, Polyclinic Sant'Orsola-Malpighi, Bologna, Italy.
Clinical examination: Four affected members, two males and two females, from one Omani family of consanguineous marriage (degree of parental relatedness, first cousins) were enrolled in this study.
Clinical history and audiological evaluation were done at the ENT department, Al Nahdha Hospital. Clinical examination was conducted using standard pure tone audiometry (PTA), optoacoustic and acoustic emittance tests. Blood samples from patients and their close relatives were collected in EDTA tubes. Samples from 130 male and female individuals without any hearing or visual disorders were used as normal controls. DNA extraction and sequencing: Qiagen kit (Qiagen, Hilden, Germany) was used to extract the genomic DNA from peripheral blood of all collected samples. DNA from an affected member was analyzed using the Illumina HiSeq2000 platform (Illumina Inc., San Diego, CA, USA) NGS technique. The Sanger sequencing method (ABI PRISM Big-Dye terminator cycle sequencing premix kit (PE Applied Biosystems, Austin, TX, USA) was used to sequence 442 base pairs including the variant site to confirm the NGS finding. The rest of the family members and 130 controls were also tested for the detected variant. Polymerase chain reaction (PCR) forward (5′TCAGTGTCAAATCTCCAGAG3′) and reverse (5′TTGGCAAAGATTTCTCCCAG3′) primers were designed to amplify and confirm the NGS detected variant.

DNA sequencing
The genetic abnormality of the affected family members diagnosed with hearing loss was detected by next generation sequencing whole exome technology. A novel homozygous missense variant, g.A71800709C, c.A7436 C, replacing the negatively charged aspartic acid residue with a nonpolar aliphatic amino acid alanine at position D 2479A in exon 53 of CDH23 gene was confirmed and verified by Sanger sequencing (Fig. 1). The CDH23 gene located on chromosome 10 contains 70 exons as illustrated on ensemble protein transcript CDH23 ENST00000224721.12, ENSG00000107736, Pfam: PF00028, and UniProtKBA0A0A0MQS6. Figure 2 illustrates the normal cDNA of CDH23 transcript as obtained from Ensemble genomic browser.
The CDH23 transcript (A0A0A0MQS6) was selected for further analysis to identify functional protein domains using the online SMART program. The program detected one signal peptide, 26 cadherin repeats (CA), also known as extracellular cadherin (EC) domains, one transmembrane region, and one low complexity region. The variant was found on domain 23 of the 26 CA (Fig. 2).

In silico mutation analysis
The detected missense point variant was evaluated for its pathogenicity using different mutation prediction programs (Table 1) and was considered to be deleterious and damaging.
The impact of the variant on protein stability changes was studied to evaluate its leverage on protein folding. The unfolding Gibbs free energy change (DDG or ΔΔG) was calculated using MUpro, I-Mutant 2.0, and DUET online tools. Models of native and mutant proteins were superimposed to predict the level of similarity between the two protein structures using the template modeling score (TM-score) and the root-mean-square deviation (RMSD) online software ( Table 2).
The domain of interest was further analyzed for secondary structure prediction. Polyview-2D was used to predict the possible effects of the detected variant on the confirmation of domain 23. The impact of amino acid exchange on domain structure was evaluated by comparing wild-type predicted secondary structures and mutant sequences. The mutated domain structure was predicted to consist of 292 coils, 218 strands, and 28 helices compared to 302 coils, 215 strands, and 21 helices in the wild type (Fig. 3).
CDH23 transcript (A0A0A0MQS6) for the wild and mutated types was analyzed by the Swiss model program to build up possible protein templates and models. 5szn.1.A was selected to be the template and building model for CDH23 because of its similar identity with the wild and mutated types (33.96 and 34.52%, respectively). The D2479A ensemble position moved to D2484A on the 5szn model. Jmol package and Ramachandran plots were used to align and validate the two 3D structures in order to predict the possible impact of the mutated amino acid on CDH23 protein structure (Figs. 4, 5, 6, and 7). The Ramachandran plot was used to calculate and visualize the dihedral angles predicting the energetically allowed residues based upon their phi and psi dihedral angles. A score of ≥ 90% in the allowed regions shows that the built model has high quality ( Fig. 8 and Table 3).
The evolutionary conservation rate of the substitution was analyzed using the online NCBI protein cluster (Fig.  9) and ConSurf programs (Fig. 10).

Discussion
CDH23 is an adhesive protein expressed in the neurosensory epithelium of the inner ear hair cells and encodes the transmembrane Ca 2+ -dependent adhesion protein, cadherin 23 (CDH23) [43]. It is thought to be involved in stereocilia organization and hair bundle formation [44]. Using its adhesion property, it interacts with protocadherin15 protein to form a tip-link filamentous complex, which is the main component that drives the normal mechano-transduction process in auditory and vestibular hair cells. Hence, a change in the protein structure might lead to a significant defect in its comprehended performance, which, in turn, could terminate the entire inner ear mechano-transduction process by turning off the sound perception and acceleration stages. The impact of the defective CDH23 protein was observed in both syndromic and non-syndromic hearing loss forms [11,43]. It accounted for up to 32% of Usher syndrome type 1 cases [13]. More than 24 associated mutations have been reported as missense mutations that clearly appeared as an important cause of hearing loss in Asian populations [16][17][18][19]. Recent research studies suggest that in silico mutation prediction might be used as a first-line molecular diagnosis tool serving both genetic counseling and mutation verification and variant classification [45,46]. Prediction of variant pathogenicity using bioinformatics tools was conducted by several studies. A homozygous c.5985C > A (p.Y1995X) variant, a heterozygous p.E1006K, and p.D1663V were detected in the Chinese population [47,48]. The mutation frequency spectrum of CDH23 among the recessive inherited cases is 5.7% in the Japanese population and 15% in the Korean population [16,19,49]. Other gene variants were also analyzed using such programs such as V66 M variant of human BDNF in psychiatric disorders and computational modeling of complete HOXB13 protein for predicting the functional effect of SNPs and the associated role in hereditary prostate cancer [50,51]. The American College of Medical Genetics and Genomics (ACMG) guide for the interpretation of sequence variants elaborated the usefulness of the predictive software programs for risk estimation and accurate interpretation of the potential causality of sequence variation [52]. The variant specifications (location), classification (mutation type), and pathogenicity degree interpretation (pathogenic, likely pathogenic, uncertain significance, likely benign, and benign) were thoroughly revised by the ACMG, and the use of specific standard terminology in describing the variant identity was recommended [45,46]. In this study, we genetically analyzed an Omani family who was diagnosed clinically with severe to profound hearing loss. The analysis revealed a missense variant on  Prediction of the variant's influence on the stability of protein structure is a crucial aspect for studying the function of the protein. The unfolding Gibbs free energy change (ΔG) of the native and mutant structures was calculated by subtracting the free energy change of the mutant protein from the free energy change of the native protein (Kcal/mol) (DDG or ΔΔG) = ΔG mutant -ΔG wild type. Above zero value of DDG predicts high stability of the mutant protein and a score below zero predicts low stability [53,54]. Structure stability was predicted by using I-Mutant 2.0, MUpro, and DUET programs. All analyses agreed that variant p.D2479A might destabilize the protein structure by indicating a negative score.
Alignment and proteins similarity are important factors assessing generated protein models of related identity. The template modeling score (Tm-score) was used to determine the topological deviation of native from mutant model structures, whereas RMSD was used to calculate the average distance of the alpha carbon backbones between the two models [55]. Both programs predicted a perfect match between the two model structures-wild type and mutant. The sequence of amino acids determines the protein conformation, and the physical and chemical properties of the amino acids greatly affects protein function. Alanine, known as a strong helix-favoring residue, engages in van der Waals interactions, nonpolar and uncharged (hydrophobic) status. Aspartic acid, on the other hand, is negative in charge, polar, and able to make hydrogen bonds with other amino acids and water (hydrophilic status). The substitution in this case might change the protein self-interaction. Therefore, secondary and tertiary protein structures were further analyzed to investigate the impact of the mutant variant on the protein function. According to the Polyview-2D results, the mutated domain structure was predicted to consist of 292 coils, 218 strands, and 28 helices compared to 302 coils, 215 strands, and 21 helixes in the wild type. It is clearly seen that alanine is located within the β-strand segment of the mutated protein whereas aspartic acid is located within the coiled loop of the wild-type protein [56,57]. Templates and models for both proteins were created by  [58]. The total number of repeats is 27 presented within the adherent junctions region as a glycoprotein. The EC domains are involved in cell-to-cell adhesion via hemophilic calcium-dependent interactions [59]. Binding of calcium to the EC domains at the linker region between consecutive EC repeats promotes the linearization, rigidity, and dimerization of CDH23 [60]. The aspartic acid residues have a high Ca + affinity, and that may play an important role in the interactions of CDH23 molecules either with CDH23 or with other proteins. Since calcium provides rigidity to the elongated structure of cadherin molecules and enables hemophilic lateral interaction, the mutation is likely to result in a decreased affinity for calcium and, in turn, impairs the whole process of protein interaction [61]. A Phi/Psi two torsion angles N-Cα (called Phi, φ) and Cα-C (called Psi, ψ) in a polypeptide chain play a role in the control of local structure folding. Therefore, applying Ramachandran plot would predict the protein folding capability and, in turn, predict the quality of the three-dimensional structures. A Ramachandran plot was obtained to validate the protein structures that were created by the Swiss model for both mutant and native template models. Swiss PDB viewer was used to create Ramachandran plot, and Rampage program was used to calculate the amino acid assembly point percentage.
According to the program, a good protein structure model is expected to have more than 90% of the residues within the core or favored region of the protein. RAMPAGE predicted greater than 94% of the 537 residues assembled within the favored region of both native and mutant proteins [62]. Conserved amino acids in proteins are found to be involved in various cellular processes in a biological system including genome stability [63]. Due to this, phylogeny and multi-sequencing alignment (MSA) were conducted to evaluate the aspartic acid 2484 stability and conservation status. As was predicted by ConSurf Server package and Polyview-2D, aspartic acid is highly conserved with a score of 9 among species reflecting the importance of this amino acid position that may play a crucial role in the integrity of protein structure and conformation. One limitation of this study is that the detected mutation was identified by next generation sequencing technology, which requires sequencing of the whole human exon. The technique is outside the routine daily assays, and the running cost is high. However, the mutation confirmation assay by Sanger DNA sequencing technology is more economical.

Conclusion
In this study, we used various in silico mutation prediction programs to analyze a substituted variant on CDH23 protein. The variant was typed on an Omani family diagnosed clinically with hearing loss. The analysis predicted the novel substituted D2479A to be deleterious and protein destabilizing mutation at a conserved site on CDH23 protein. This mutation might lead to a major disruption in CDH23 protein structure that may cause disturbance of stereocilia organization and hair bundle formation affecting the mechano-transduction process and, in turn, hearing loss. In silico mutation prediction analysis might be used as a useful molecular diagnostics tool benefiting both genetic counseling and mutation verification in the governmental and private sectors. The affected family might benefit from the outcome of this research by considering the potential risk of consanguineous marriage.