MECP2 mutations are the primary cause of RTT, a serious neurodevelopmental disorder affecting females. They are scattered throughout the whole gene including point mutations, small indels, and large rearrangements [19]. Missense mutations causing RTT are mainly localized to the main gene functional domains, MBD and TRD. However, some mutations outside these domains can also mediate disease progression [13]. On the other hand, few mutations with a neutral effect have been reported in the TRD such as T228S [20], G232A, and P251L [21]. Here, we used the ROC curve analysis to investigate whether different physicochemical characteristics of normal and mutant amino acids could help in expecting the mutation effect. However, no significant results were obtained. Only mutation location is a critical determinant for variant pathogenicity. Familial investigations may provide a useful tool to rule out the pathogenicity of a specific mutation. However, studying the molecular mechanisms of missense mutations represents a critical issue to identify disease-causing variants. By virtue of high-throughput sequencing techniques, there is an exponential documentation of novel gene variations. This necessitates the application of variable computational algorithms to filter out such detected variations prior to experimental validation and to investigate possible pathogenic mechanisms. In this study, 3D structure-based methods were applied to model the effects of certain missense mutations (D121A, R133H, S359Y, and P403S) on protein stability and interactions. The 3D structure of the whole protein has been not available yet in protein data bank (PDB), so in the present study, the native MeCP2 sequence, extracted from UniProt ID P51608, was submited to Phyre2 server to predict the protein structure.
Both D121A and R133H are located in the MBD and reported in patients with classical RTT. D121A is a novel mutation; however, another sequence variation at this amino acid residue, D121G, has been previously detected. R133H is a reported mutation with low frequency (0.17%). One of the most recurrent RTT mutations (R133C) also originates at this residue with a frequency of 4.52%. Substituting Arg133 with Gly or leu has also been documented, but with a very low occurrence rate (0.04 and 0.02, respectively) [4]. On RettBase, the effect of R133H is defined as unknown. The R133H mutant protein exhibited near-normal affinity to pericentromeric heterochromatin and transcriptional repressive activity [22]. Moreover, R133H containing MBD displayed similar folding stabilities to the wild type MBD [23].
Several in vitro studies demonstrated that many missense mutations within the MBD can significantly reduce the affinity of MeCP2 to bind methylated DNA [24,25,26,27]. In particular, Arg111, totally conserved among members of the MBD family, plays a crucial role in protein binding to methylated DNA, and its mutation results in MBD without any detectable affinity for DNA [25, 28]. In the current study, we concluded that Asp121 interacts with Arg111 and Arg133 orientating the latter side chains and enabling their contact with DNA. D121A leads to increased conformational plasticity of Arg111 and Arg133 dramatically affecting their interaction with DNA. Hence, the current results point to the indirect pathogenic mechanism for D121A through its effect on Arg111 and Arg133 orientation. It is noteworthy that Lei et al. also stated that Asp121 has a potential function in the rigidification of Arg111 [29].
The mutation R133H led to the decreased affinity of MBD to methylated DNA (∆∆G=1.26). In consistence, Yang and colleagues reported that R133H decreased MBD affinity for mC over 12-fold and for C less than 2-fold [23]. In the current study, the direct pathogenic mechanism of R133H has been illustrated. As arginine is more basic (pKa=12) than histidine (pKa=6), it can form a salt bridge with DNA. This difference in the binding capacity between arginine and histidine resulted in decreased MBD affinity for DNA. In this context, it was previously reported that Arg133 is the most critical residue in DNA binding, and its mutant forms led to diminished binding affinity to methylated DNA as measured by gel mobility shift assays and structure crystallization [23, 29, 30].
Our 6OGK post-molecular analysis showed that 5′-GTG-3′ trinucleotide on the unmethylated DNA strand is the main target of Arg133 and Arg111 demonstrating the potential role of the unmethylated strand in MBD-DNA interaction. Interestingly, a ChiP-seq analysis has revealed that the percentage of native GC is more determinant for MeCP2 distribution than methylated CG dinucleotides and MeCP2 binds with methylated non-CG motifs such as mCAC found in the brain [31]. Also, crystal structures exhibited that Arg111 and Arg133 residues mainly bind to GTG trinucleotides on the unmethylated DNA strand, but 5′-mC on the complementary strand is not essential for their interaction [29].
On the other hand, both S359Y and P403S are mutations in the CTD. There was a general conception that missense mutations in the CTD have a benign effect. However, some missense mutations in CTD have been also defined as pathogenic or likely pathogenic in ClinVar (see Supplementary table 1). Moreover, it has been demonstrated that the Rett-like phenotype can be originated in mice due to specific missense mutation (P322L) in CTD [13]. Importantly, there is no difference between Pro and Leu in charge, polarity, and hydrocarbon type denoting for the minimal effect of physicochemical properties of normal and mutant residues in determining mutation pathogenicity and emphasizing the necessity for studying the molecular mechanisms of missense mutations.
S359Y is detected in association with one of the most common RTT mutations, R168X in a girl with typical RTT. In fact, more than one pathogenic mutation has been already identified in some cases with RTT [4]. Therefore, it was required to explore the possible effect of this novel variation. However, we failed to report any pathogenic effect related to S359Y denoting that disease progression in that patient is mainly mediated by R168X.
The P403S variant converts the non-phosphorylated proline residue into serine, which might provide a new phosphorylated site that can be acquired. Importantly, it was reported that most phosphorylated serine signature of MECP2 is located in its CTD [26].
Mellén et al. hypothesized that post-translational modification (PTM) of MeCP2 domains might affect the protein DNA binding capacity and its substrate specificity [32]. In particular, it is expected that mutations at activity-dependent phosphorylation sites whether inside or outside MBD impair DNA binding [33]. Mice carrying the S80A mutation displayed very mild RTT-like symptoms [34]. Moreover, mice with alanine at Ser421 and Ser424 (in CTD) were associated with a gain of function effect [34, 35]. Also, it is noteworthy that the faulty phosphorylation pattern of MeCP2 whether inside or outside MBD can directly interfere with neuronal plasticity [36]. Both Zhou et al. and Tao et al. found that dephosphorylation of S80 and phosphorylation at S421 by CamKII kinase impair neuronal activity, dendritic growth, and synaptic connection development within the cerebral cortex [34, 37]. Significantly, no MECP2 missense mutations have been reported yet at the activity-dependent phosphorylation sites. It is noteworthy that according to data provided from patient guardians, a developmental delay might have started before age of 6 months. This might suggest that this girl is a case of congenital RTT, rather than classical RTT. A congenital variant is the most disease severe form, with onset of clinical features during the first 3 months of life.
Here, we explored that both CLK2 and TTBK1 are potential co-expressed kinases that can transfer the phosphoryl group to this mutant serine residue in the cerebellum and cerebella tissues. However, we were unable to define the region containing Ser403 as recognition or binding motif for proceeding targets. Therefore, it is more likely that mutant protein has no alternative behavior; however, a confirmatory experimental analysis may be still required, as bioinformatics algorithms have limited prediction toward this issue.