
CPROP
Loading...
Loading...
Latest CPROP News
Nov 7, 2023
Abstract Internal modifications of mRNA have emerged as widespread and versatile regulatory mechanism to control gene expression at the post-transcriptional level. Most of these modifications are methyl groups, making S-adenosyl-L-methionine (SAM) a central metabolic hub. Here we show that metabolic labeling with a clickable metabolic precursor of SAM, propargyl-selenohomocysteine (PSH), enables detection and identification of various methylation sites. Propargylated A, C, and G nucleosides form at detectable amounts via intracellular generation of the corresponding SAM analogue. Integration into next generation sequencing enables mapping of N6-methyladenosine (m6A) and 5-methylcytidine (m5C) sites in mRNA with single nucleotide precision (MePMe-seq). Analysis of the termination profiles can be used to distinguish m6A from 2′-O-methyladenosine (Am) and N1-methyladenosine (m1A) sites. MePMe-seq overcomes the problems of antibodies for enrichment and sequence-motifs for evaluation, which was limiting previous methodologies. Metabolic labeling via clickable SAM facilitates the joint evaluation of methylation sites in RNA and potentially DNA and proteins. Introduction Eukaryotic mRNA is canonically modified by the addition of the 5ʹ cap and bears additional modifications at internal sites. The N6-methylation of adenosine (m6A) is the most abundant and best-studied internal modification of mRNA. It has been linked to cellular differentiation, cancer progression, development and ageing 1 , 2 , 3 , 4 , 5 , 6 , 7 . Most of the more than 12,000 sites are introduced by the METTL3-14 complex, whereas METTL16 is responsible for six additional validated sites in mRNA 8 , 9 , 10 , 11 , 12 . Several reader proteins have been identified and mediate the effects of m6A in mRNA translation and degradation 13 , 14 , 15 , 16 , 17 , 18 , 19 , 20 , 21 , 22 , 23 , 24 , 25 . m5C is another internal modification in mammalian mRNA. The reported number of sites ranges from a few hundred to 40,000 sites, and various writer proteins (NSUN2, NSUN6, and TRDMT1) have been identified for mammalian cells 26 , 27 , 28 , 29 . Several reader proteins have been found, linking m5C to repair (via RAD52), export (via ALYREF), and proliferation (via YBX1 and ELAV) causing bladder cancer 30 , 31 , 32 , 33 , 34 . In total, ten different internal modifications of eukaryotic mRNA have been described and mapped 35 . In addition to m6A and m5C, these comprise the altered nucleobases inosine and pseudouridine, the acetylation ac4C, the oxidation 8-oxo-G and further methylations or derivatives thereof at either the nucleobase (m1A, m7G, hm5C) or the ribose (Nm). The prevalence of methylation as mRNA modification mark is striking, suggesting that the responsible cofactor SAM plays a key role for their abundance and potential interconnectivity 36 . Owing to the importance and abundance of m6A, multiple approaches have been developed to assign its positions on a transcriptome-wide level. Most methods rely on antibodies in combination with next-generation sequencing (NGS). While early transcriptome-wide detection methods had limited resolution 37 , 38 , 39 , crosslinking and bioinformatic analyses including search for the DRACH motif, improved the accuracy of single nucleotides 40 , 41 , 42 . Concerns regarding bias of antibodies in recognizing the tiny nucleobase as epitope along with the inability to distinguish between m6A and m6Am have prompted the development of antibody-free methods. DART-seq uses a YTH reader protein for m6A recognition and introduces adjacent C-to-U mutations by a fused deaminase 43 , 44 . MAZTER seq relies on the methylation-sensitive ribonuclease and bioinformatic alignment of cleaved versus uncleaved sequences at its target ACA sites 45 , 46 . m6A-SEAL uses m6A-specific methyl oxidation by FTO for further derivatization and enrichment 47 . m6A-REF-seq combines m6A demethylation by FTO and cleavage of ACA-sites with a m6A-sensitive RNA endonuclease 48 . eTAM-seq uses TadA for specific deamination of A to induce an A-to-I conversion during reverse transcription (RT) 49 . m6A-SAC-seq relies on N6-allylation of m6A using the methyltransferase (MTase) MjDim1 followed by N1-N6-cyclization, which leads to a mismatch in RT 50 . These methods rely on exogenous enzymes for modification, which bring about their own biases, such as preference for or even limitation to certain sequences. Furthermore, some of these approaches require transfection of cells. GLORI is a different recently released methodology. This approach is ideal for transcriptome-wide localization of m6A with single-nucleotide precision and quantification and also relies on deamination of A utilizing glyoxal and nitrite treatment instead of enzymes 51 . In summary, there is a plethora of methods for mapping m6A sites, however, most of them cannot be used to assign other methylation sites. Metabolic labeling with methionine (Met) analogues presents an interesting alternative approach for m6A detection 52 , 53 . After feeding cells with PSH, modified adenosines in rRNA could be detected via click chemistry and enrichment 53 . Label-Seq determined m6A-sites in mRNA by feeding allyl-selenohomocysteine followed by a highly specific cyclization reaction of the resulting N6-allyladenosine, causing mutations in RT identified by NGS. However, an antibody was required to enrich the allyl-modified mRNA and other modified nucleosides were not detected 52 . For m5C, another abundant internal modification of mRNA, chemical conversion of C to U in bisulfite sequencing is frequently used for mapping 54 , 55 . This treatment risks damaging RNA and causing artifacts, necessitating careful and repeated controls to obtain reliable data 56 . Antibody-based methods with and without photo-crosslinking have been developed in analogy to m6A mapping methods and underlie the same limitations 26 , 57 , 58 . The development of antibody-free methods includes progress towards nanopore sequencing and TAWO-seq, but the latter has not yet been implemented on a transcriptome-wide scale 59 , 60 . Taken together, both m6A and m5C as well as other methylations rely on the cofactor SAM, suggesting that the methyl-based modifications could be interconnected via SAM levels, and it would be important to study them in context 36 . Yet, current methodology has mainly focused on specific binding and detection of the modified nucleoside instead of the underlying and unifying process. The enrichment via antibodies or binding proteins, or the specific modification by m6A-sensitive enzymes counteracts a more global look at possible links. In this work, we therefore set out to develop metabolic labeling via the SAM pathway as methodology to detect more than one type of modified nucleoside by NGS. Such methodology should hinge on a SAM analogue that (1) can be efficiently made in genetically unaltered mammalian cells, (2) is accepted by promiscuous activity of several MTases, and (3) provides a universal handle for efficient antibody-free enrichment of different nucleosides to (4) make modified nucleosides amenable to detection in NGS. A perfect metabolite is the propargylic SAM analogue SeAdoYn, which is readily produced in cells with unaltered genetic makeup and is recognized by most MTases 53 , 61 . Above all, the propargyl group is bioorthogonal and specifically reacts with azides in a click reaction, making it possible to chemically enrich target RNA without the need for antibodies (Fig. 1a–e ). Fig. 1: Scheme of MePMe-seq (metabolic propargylation for methylation sequencing). a Metabolic labeling of cells with PSH leads to methionine adenosyl transferase (MAT)-catalyzed formation of SAM-analogue and propargylation of methyltransferase (MTase) target sites. b After cell lysis, poly(A)+ RNA is isolated and fragmented. c Propargylated fragments react with biotin azide in a copper-catalyzed azide-alkyne cycloaddition (CuAAC) and are enriched on streptavidin-coated magnetic beads (SA mag. beads). d On-bead reverse transcription (RT) terminates at modified sites. e Libraries for next generation sequencing (NGS) are prepared. Modified sites are detected as coverage drops with distinct termination profiles. Metabolic labeling and quantification of propargylated nucleosides Metabolic labeling with PSH embarks on the natural methylation pathways and the broad scope of nucleoside modifications. Previous work showed that many MTases are promiscuous regarding the cosubstrate and transfer also so-called bioorthogonal groups to the natural methylation sites, as validated for select RNA and histone modifications 53 , 62 , 63 . However, the potential to investigate more than one type of modification has not been exploited, as the level of most non-natural modifications remains low 64 . We therefore sought to maximize the level of metabolic RNA propargylation while maintaining cell viability and treated HeLa cells with different concentrations of PSH or Met (Supplementary Fig. 1a ). As proxy for the general propargylation level of RNA, we quantified Aprop (Fig. 2a ) in total RNA using LC-QqQ-MS and found that the level increased with higher concentrations of PSH up to 2.5 mM (Supplementary Fig. 1b, c ). Under these conditions, 2.2% of Am were substituted by Aprop in total RNA and the levels of the natural methylation Am and m6A remained largely unaffected by metabolic PSH labeling (Am/A ~ 4%, m6A/A 0.3%; Supplementary Fig. 1d, e ). These values are in line with literature reports for total RNA, suggesting that the general cellular methylation itself is not perturbed 56 , 65 , 66 , 67 . The cell viability was only slightly compromised by metabolic labeling with PSH, remaining at 81% at 2.5 mM PSH with respect to untreated cells (Supplementary Fig. 1f ). Controls were treated identically but using Met instead of PSH. Fig. 2: Metabolic labeling of mRNA via PSH in HeLa cells. PSH metabolism modifies RNA with propargyl (prop) at positions naturally found to be methylated. a–f Modified and unmodified nucleosides investigated by LC-QqQ-MS. Structures of (a) a nucleoside (N), a 2′-O-methylated nucleoside (Nm), a 2′-O-propargylated nucleoside (Nprop), (b) N6-methyl adenosine (m6A), N6-propargyl adenosine, (c) N1-methyl adenosine (m1A), N1-propargyl adenosine (prop1A), (d) cytidine (C), 5-methylcytidine (m5C), 5-propargylcytidine (prop5C), (e) N3-methyl cytidine (m3C), N3-propargyl cytidine (prop3C), (f) N7-methyl guanosine (m7G), N7-propargyl guanosine (prop7G). g–j, Quantification of modified nucleosides in poly(A)+ RNA from HeLa cells treated with 2.5 mM PSH (purple) or methionine (gray) as control. Relative abundance of (g) Am, m6A and m5C, (h) Aprop, prop6A and prop5C with respect to the methylated nucleoside, (i) prop6A, prop1A, prop5C, prop7G and prop3C, (j) Aprop, Cprop and Gprop. Quantification from dynamic MRM run on LC-QqQ-MS using external synthetic standards. Not detected (ND): no signal with correct quantifier detected. Mean values and SD from n = 3 biological replicates are shown. Statistical significance determined via unpaired two-sample two-tailed t test (n.s. P > 0.05; *P ≤ 0.05; **P ≤ 0.01; ***P ≤ 0.001). The P value for Am/A Met versus Am/A PSH is 0.30, for m6A/A Met versus m6A/A PSH it is 7.0 × 10−4, for m5C/C Met versus m5C/C PSH it is 0.004. Source data are provided as a Source Data file. Next, we focused on propargylation and methylation levels in mRNA after two times of poly(A)-enrichment, which is reported to reduce Nm originating from rRNA by 10–20-fold but maintain m6A, which is abundant in mRNA 56 , 65 . We found a ~20-fold reduction of Am (3.8% vs. 0.18%) compared to total RNA, whereas the m6A level remained almost constant (0.3% vs. 0.12%) (Fig. 2g , Supplementary Fig. 1d, e ), suggesting successful enrichment of mRNA, albeit with residual rRNA, as expected 67 . Metabolic labeling with PSH did not significantly affect the Am/A ratio in poly(A)+ RNA (0.13–0.18%, Fig. 2g ), however, the m6A/A levels in these samples were reduced from 0.12% to 0.06% (Fig. 2g ). This suggests a stronger effect of metabolic labeling on mRNAs, which have a shorter half-life than rRNAs. Analyzing the propagylation by LC-QqQ-MS, we observed a ~10-fold reduction of the Aprop level after poly(A)-enrichment (0.011% vs. 0.10%), which is in line with the observed Am depletion (Fig. 2j , Supplementary Fig. 1b ). In poly(A)+ RNA from labeled cells, prop6A was unambiguously detected, whereas it was not detectable in control cells (Fig. 2i ). It replaced 3.5% of m6A (Fig. 2h ) and was present in 0.0023% of all As (Fig. 2i ). Taken together, these data indicate that PSH can be used for metabolic labeling and enables detection of prop6A and Aprop in poly(A)+ RNA under conditions retaining cell viability. As metabolic labeling with PSH via the respective SAM analogues would be expected to propargylate also other nucleosides at various positions, we investigated whether additional propargylated nucleosides could be detected in poly(A)+ RNA. Current work agrees that mRNA contains m6A and m5C in addition to various methylations at the 5′ cap and suggests that m1A, m7G and m3C as well as ribose methylation (Nm) may exist, albeit with controversial reports 68 . To test for the respective propargylated nucleosides, we assembled a panel of nine propargyl-containing nucleosides that—in addition to prop6A and Aprop—contained all sugar modified nucleosides (Nprop) as well as prop5C, prop1A, prop7G, and prop3C for which natural methylation had been reported. Synthesis of the synthetic standards prop5C, prop1A, prop3C and prop7G (Fig. 2c–f ) is described in Supplementary Methods. Cprop, Gprop, prop5U and Uprop are commercially available. We then developed LC-QqQ-MS methods for all synthetic standards and analyzed cellular poly(A)+ RNA after metabolic labeling with PSH. To our delight, we could detect prop5C in poly(A)+ RNA from metabolically PSH-labeled HeLa cells (Fig. 2i ) based on the MS/MS fragmentation (MRM transition 282.2 → 150.1) that was used for quantification (Fig. 2i ) and two additional MS/MS peaks (MRM transition 282.1 → 121.9, 282.1 → 80) as qualifier (Supplementary Fig. 2 , for details see Supplementary Methods). The abundance of prop5C was ~20-fold lower than for prop6A (prop5C/C 0.00012%) and ~100-fold lower than for Aprop, which means that 1.4% of m5C was substituted by prop5C (Fig. 2h ). This is also consistent with the ratio of abundances of the natural methylated nucleotides m6A, Am and m5C 37 , 56 , 67 , 69 , 70 . Further LC-MS analysis revealed that in addition to prop6A, Aprop and prop5C, we could confirm the formation of prop1A, prop7G, Cprop, and Gprop in poly(A)+ RNA (Fig. 2i, j ). However, prop3C and Uprop could not be detected in poly(A)+ RNA after PSH feeding. These data show that metabolic labeling via the methylation pathway leads to various nucleoside modifications that can be analyzed from the same sample. As poly(A)+ RNA is known to contain residual rRNA, the LC-MS measurements of poly(A)+ RNA do not permit conclusions regarding their presence in mRNA 56 . To the best of our knowledge, this is the first dataset quantifying non-natural nucleosides from RNA comprehensively and provides insights into how well different MTases accept non-natural substrates. It also shows the scope of this methodology, by pointing out which modifications can be made available to NGS. Transcriptome-wide analysis of m6A from metabolic labeling with PSH The bioorthogonal propargyl group can be reacted with azides in a copper-catalyzed azide-alkyne cycloaddition (CuAAC). In order to effectively enrich modified sites and increase steric bulk to prevent reverse transcription, we installed an affinity tag using biotin-azide (Fig. 1 ) 53 . Click chemistry relies on covalent bond formation and requires the terminal alkyne as a functional group. As a result, it is not affected by the interactions of nearby nucleotides, which is an advantage compared to methods relying on non-covalent interactions 71 , 72 , 73 . Propargylated nucleosides will therefore be universally enriched by metabolic PSH labeling and click chemistry. For transcriptome-wide analysis, we isolated poly(A)+ RNA from cells after metabolic PSH labeling and performed click chemistry and enrichment (Fig. 1 ). The reaction with biotin-azide was almost complete (up to 96%, Supplementary Fig. 4 ). We first analyzed m6A as the most abundant modification in mRNA. To precisely assign m6A sites, we performed RT under conditions optimized to cause precise and strong termination (80%, Supplementary Fig. 5 ). We prepared libraries for NGS using an adapted iCLIP2 protocol 74 . The reads were preprocessed (FASTQ processing, barcode filtering and quality control), mapped to the human genome (hg38) and duplicate reads removed based on the introduced unique molecule identifier (UMIs). We obtained 14.8 (rep1) and 15.5 (rep2) million reads for the PSH-treated samples (see Supplementary Table 1 ). Visual inspection of the coverage profiles for known and validated m6A sites showed remarkably sharp edges one nucleotide downstream of m6A sites. This is exemplified by the six m6A positions in the hairpins of MAT2A, the m6A2515 and m6A2577 in MALAT1 as well as m6A1216 in β-actin (Supplementary Fig. 4 ), which are known targets of METTL16 and METTL3-14, respectively 8 , 75 , 76 . These data indicate (1) that metabolic labeling results in METTL3-14 as well as METTL16-mediated propargylation at the N6-position of adenosine in poly(A)+ RNA and (2) that reverse transcription in isolated mRNA terminates one nucleotide downstream of clicked sites, allowing assignment of m6A sites in mRNA with single nucleotide precision in NGS data. For systematic analysis of MePMe-seq data on the transcriptome-wide level, we used JACUSA2 77 . This improved version of JACUSA, a software for site-specific identification of RNA editing events from replicate sequencing data, is able to identify read termination events by calculating the arrest rate, i.e. the fraction of reads stopping at the position, from NGS data 77 . The difference between arrest rates from the sample and control (\({\Delta }_{{RT\; arrest}}\)) was used to filter the terminations identified by the algorithm and remove false positives. Setting a sample read coverage threshold ( > 35 for high stringency (HS) filter, >20 for low stringency (LS) filter) and arrest score threshold (\({\Delta }_{{RT\; arrest}}\) > 20 for HS filter; \({\Delta }_{{RT\; arrest}}\) > 15 for LS filter), resulted in calling a total number of 8802 (rep1) and 7124 (rep2) termination sites for all four nucleotides, if high stringency settings were used (Supplementary data 1 ). Filtering with low stringency called more termination sites, i.e. 26,673 (rep1) and 27,869 sites (rep2), respectively (Supplementary Figs. 7 , 8 ). For subsequent analysis, we exclusively used the results from high stringency filtering, which is most likely an underestimation of sites (Supplementary Fig. 12 ). JACUSA2 is available at https://github.com/dieterich-lab/JACUSA2 . Initial inspection of these terminations revealed clustering at transcription start sites (TSS) (Supplementary Fig. 13 ). Accordingly, NGS coverage profiles showed strong enrichment of 5′ end fragments (i.e. ~150 nt regions) for many transcripts and steep drop-offs within this region, which were called as arrests by JACUSA2 (Supplementary Fig. 9 ). Based on early literature on metabolic labeling with radioactive Met 78 , 79 , 80 , which led to the identification of multiple methylation sites at the 5′ cap, it is reasonable to assume that metabolic PSH labeling will also target the canonical cap methylation sites, resulting in the observed clustering. To rule out effects from metabolic cap labeling, we excluded regions ≤5 nt upstream from the TSS from analysis of internal modification sites. Using this computational pipeline and high stringency filtering for data analysis, MePMe-seq identified 5506 (rep1) and 3714 (rep2) modified internal sites for all four nucleosides in mRNA from PSH labeled cells (Supplementary Fig. 7 ). The modified nucleotides were predominantly adenosine (A, 70%), followed by cytidine (C, 16%), uridine (U, 9%) and guanosine (G, 5%) (Fig. 3a ). Fig. 3: Detection of m6A sites using MePMe-seq. a Distribution of all modified nucleotides identified in MePMe-seq. b Overlap of m6A sites identified in n = 2 independent experiments (% calculated from rep2 to rep1). c Integrative genomics viewer (IGV) browser coverage tracks of MePMe-seq data for the indicated AHNAK and YTHDF2 mRNAs from cells labeled with PSH (purple) or methionine as control (gray). Green bars indicate terminations identified by JACUSA2 applying high stringency filtering. Numbers (%) for calculated arrest rate at indicated positions are shown. Arrow indicates orientation of coding strand. d Frequency of m6A sites per transcript. e Frequency of distance between neighboring m6A positions located on the same transcript. Cutoff at 1000 nt (for cutoff at 5000 nt see Supplementary Fig. 14a ). f Metatranscript analysis showing a density plot of the distribution of prop6A sites detected by MePMe-seq. g Consensus motif for sequences surrounding identified m6A (HS filtering) for all 5-mers (all), if DRACH sequences are excluded (non-DRACH) or if NRACN sequences are excluded (non-NRACN). Representative example of n = 2 biologically independent samples is shown (2nd replicate: Supplementary Fig. 14b ). h Sequence motifs surrounding identified m6A sites (HS filtering), sorted by consensus motif DRACH, NRACN or non-NRACN, respectively. Arrow indicates ACAGA-motif, which is part of METTL16 motif. i Overlap of all 4502 m6A sites identified in MePMe-seq with sites identified by MeRIP, GLORI, m6A-SAC-seq, SEAL, miCLIP, m6A CLIP, PA-m6A-CLIP, eTAM-seq, m6A-seq improved (imp. ), m6A-REF-seq, DART-seq and m6A-label-seq 64 , 81 , 82 , 91 . In two biological replicates, MePMe-seq identified 3841 (rep1) and 2312 (rep2) internal As as MTase target sites in mRNA from HeLa cells using high stringency filtering (Fig. 3b , Supplementary Data 1 ). Of the modified As, 1651 (71%) were identified in both replicates, indicating very good reproducibility (Fig. 3b ). Inspection of the hits in the coverage profile showed drop-offs at the m6A sites identified by JACUSA2 in mRNA from labeled cells that are not observed in mRNA from control cells. This is illustrated for the AHNAK and the YTHDF2 mRNAs (Fig. 3c , Supplementary Fig. 10 ), which are known for their high m6A content 64 , 81 , 82 . The drop-offs are remarkably sharp and correspond to termination one nucleotide downstream of the modified A, in line with our thorough in vitro evaluation (Supplementary Fig. 5 ). Arrest rates ranged from 34–60%. These results demonstrate that MePMe-seq in combination with JACUSA2 analysis enables reliable calling of m6A sites with single-nucleotide precision. Next, we looked at the abundance of m6A sites in individual transcripts. MePMe-seq identified methylated A in 1834 different transcripts (Supplementary Data 2 ) (1311 in rep2). In 54% of these transcripts, a single methylated A was found (for rep2: 61%) (Fig. 3d ). In all other transcripts (i.e. 46% in rep1 and 39% in rep2, respectively), more than one methylated A was present, including some with >10 m6A sites (Fig. 3d , Supplementary Data 2 ). Some of the highest m6A densities were found on AHNAK, PLEC and YTHDF2 mRNAs (Fig. 3c , Supplementary Figs. 8 , 10 ), in line with previous reports 64 . We were particularly interested in these clustered m6A sites that currently pose a challenge to most of the m6A mapping methods. MePMe-seq identified a total of 80, 25, and 12 m6A sites for AHNAK, PLEC, and YTHDF2, respectively (56, 18, 10, in rep2). We calculated the distance between neighboring m6As on the same transcript and found that they tend to cluster in short distances (Fig. 3 , Supplementary Fig. 14a ), emphasizing the importance of precise assignment. In summary, MePMe-seq showed remarkable precision in assigning the position of m6A sites and identified m6A sites in very close proximity (<10 nt) to each other. We looked at the distribution of m6A sites by performing a metagene analysis of all modified As detected by MePMe-seq. The density plot shows enrichment at the 3′ end of the coding sequence (CDS) and around the stop codons (Fig. 3f ). This result is in line with the m6A distribution reported by various methods, confirming that metabolic PSH labeling in combination with MePMe-seq identifies natural m6A sites 37 , 38 , 41 , 42 , 43 , 64 , 83 . The metabolic propargylation does not seem to introduce bias, except for the heavily and canonically methylated 5′ cap region which had to be excluded from analysis. Comparing 5-mer sequences around the identified methylated internal adenosines revealed DRACH as the prevailing motif (Fig. 3g , Supplementary Fig. 14b ) with an abundance of 90% (most abundant: GGACU 25%, GGACA 22%, GGACC 20%, AGACU 5%, others <5% abundance, Fig. 3h ), which has been reported previously as the main consensus motif for N6-methylation of A via METTL3-14 37 , 38 , 42 , 43 , 64 , 83 , 84 . Interestingly, 10% of the m6A sites identified by MePMe-seq are located in non-DRACH motifs (Fig. 3h ). These are composed of NRACN sequences (5%), which are closely related to the DRACH motif and non-NRACN motifs (5% in total) (Fig. 3h ). The non-NRACN motifs do not share a consensus motif, but G is preferred over other nucleotides directly downstream of A (Fig. 3g ). Within the non-NRACN hits, the sequence ACAGA is most abundant (Fig. 3h ). This sequence is part of the motif targeted by METTL16 85 . Of note, MePMe-seq identified all currently known methylation sites of METTL16, i.e. six sites in the 3ʹ-UTR of MAT2A-mRNA, as well as the U6 snRNA (Supplementary Figs. 11 , 15 ). These non-DRACH sites escape antibody-based approaches and approaches relying on bioinformatics searches for the DRACH motif, like m6A-CLIP 41 . MePMe-seq is thus able to accurately detect m6A in non-DRACH contexts and provide data about the interconnectivity of different methylations in an unbiased manner 81 , 82 . Overlap with datasets from other m6A-mapping methods m6A sites have been mapped previously using antibody-dependent and antibody-independent methods 37 , 38 , 39 , 41 , 42 , 43 , 64 , 83 , 86 , 87 , 88 , 89 , 90 , 91 . To compare MePMe-seq results with m6A sites found in previous studies, we assembled data from the databases REPIC 82 , ATLAS 81 and publications comprising various methodologies 64 , 91 . However, in a pairwise comparison of published datasets from individual miCLIP experiments, the detected m6A sites differed significantly, even for the same cell line (Supplementary Table 2 ). We therefore combined the hits reported in different experiments to obtain an unbiased and more comprehensive reference dataset 88 . We found that 92% of the m6A sites identified by MePMe-seq matched the reported MeRIP hits (Fig. 3i ). 81–85% of the MePMe-seq sites overlapped with GLORI and m6A-SAC-seq, 51–67% with m6A sites identified using SEAL, miCLIP, m6A CLIP or m6A-SAC-seq. A lower fraction (8–15%) of the MePMe-seq sites were found in other antibody-free single nucleotide resolution techniques, i.e. PA-m6A-seq, m6A-REF-seq, m6A-label-seq (Fig. 3i ). Only 18 sites (0.4%) of m6A sites identified by MePMe-seq were also reported in DART-seq. This fraction increases when the exact sites are extended: 5% of m6A sites identified in MePMe-seq are in close proximity (±50 nt) to sites identified in DART-seq and the overlap between the techniques increases up to ~11% if an uncertainty range of ±150 nt is allowed (Supplementary Fig. 14c ). Of note, the overlap of m6A sites detected by MePMe-seq is higher than the range obtained by comparison of other single nucleotide resolution methods with CLIP and better than comparison between each other (Supplementary Table 3 , Supplementary data 3 ), suggesting that m6A sites reported by MePMe-seq are highly reliable. Independent validation of m6A sites identified by MePMe-seq To independently validate m6A sites identified by MePMe-seq, we performed SELECT, an elongation- and ligation-based qPCR amplification method with single-nucleotide resolution 92 . We evaluated eight putative m6A sites in poly(A)+ RNA, five of them with a DRACH motif and three with a non-DRACH motif (Fig. 4a ). Comparing the normalized ΔCq values of qPCRs of samples with and without FTO treatment, a ΔCq > 1 for —FTO indicated the presence of m6A. We found that all five chosen DRACH sites indeed contained m6A (Fig. 4a ). This includes an m6A site in the mRNA coding for the serine/arginine repetitive matrix protein 2 (SRRM2) that escaped many methods and was only recently reported 49 , 51 , 93 (Fig. 4c ). Of the three tested non-DRACH sites, WDR6 and CTNNB1 mRNAs were confirmed to contain m6A. These sites have been reported before via MeRIP and SEAL and only recently with the single-base resolution techniques m6A-SAC-seq, GLORI and eTAM-seq (Fig. 4c ). The putative non-DRACH m6A site in FLNB, however, could not be validated by SELECT (Fig. 4a, b ). As FTO has sequential and structural preferences 71 it is conceivable that this m6A site in the ACAGA sequence is not a good substrate for FTO and therefore not detectable via SELECT. To test this hypothesis, we tried to validate a known non-DRACH m6A site located in the same sequence motif in the 3′ UTR of MAT2A forming a hairpin structure. Indeed, SELECT failed to detect this well-known non-DRACH m6A site, most likely because of its hairpin structure and lack of FTO-mediated demethylation (Supplementary Fig. 16 ). Fig. 4: Validation of m6A sites identified in MePMe-seq via SELECT in HeLa poly(A)+ RNA. a The normalized ΔCq values of SELECT qPCR measurements are shown for five sites located in a DRACH motif and three sites located in a non-DRACH motif. Mean values and SD from n = 5 biological replicates are shown. Statistical significance determined via one-sample one-tailed t test (n.s. P > 0.05; *P ≤ 0.05; **P ≤ 0.01; ***P ≤ 0.001). The P values for – FTO versus + FTO samples are for MALAT1 2.6 × 10−5, for AHNAK 4.8 × 10−3, for MARCH6 6.9 × 10−3, for NFX1 9.8 × 10−4, for SRRM2 5.0 × 10−3, for WDR6 4.1 × 10−3, for CTNNB1 1.1 × 10−5 and for FLNB 0.39. b IGV browser coverage tracks of MePMe-seq data for the same sites from cells grown with PSH (purple) or methionine (gray) as control. Green bars indicate terminations identified by JACUSA2. c Comparison with m6A-SAC-seq, GLORI, eTAM-seq, MeRIP, SEAL, miCLIP, m6A CLIP, PA-m6A-CLIP, improved (imp.) m6A-seq, m6A-REF-seq, DART-seq or m6A-label-seq sequencing datasets 64 , 81 , 82 , 91 . Checkmark for sites present, x for sites not present in dataset (data obtained from literature). Source data are provided as a Source Data file. In summary, all of the five tested m6A sites in a DRACH motif were confirmed by SELECT. In addition, two of the three putative m6A sites detected by MePMe-seq (Fig. 4b ) in non-DRACH sites were confirmed by SELECT, indicating that MePMe-seq is one of the first methods with single nucleotide resolution able to detect m6A-sites in non-NRACN motifs. Since SELECT relies on FTO, bias originating from the enzyme’s substrate preference has to be considered. Therefore, it is conceivable that non-DRACH sites reported by MePMe-seq are true sites, even if confirmation by SELECT is not possible. Identification of METTL16-specific target sites by combined in vitro and metabolic modification MePMe-seq relies on intracellular formation of the SAM analogue SeAdoYn and therefore detects m6A sites originating from different MTases. The in vitro modification with a specific MTase, on the other hand, bears potential to modify exactly the target sites of this particular MTase. With this direct approach, modifications can be assigned to a specific MTase, provided that the target sites are not fully modified in cellular RNA. In mRNA, modifications are often sub-stoichiometric, allowing for subsequent in vitro modification. While most m6A sites are METTL3-14 dependent, METTL16 is an emerging player in the RNA modification landscape of the human cell 94 . METTL16 has been shown to bind a number of RNAs, including mRNAs and lncRNAs 9 , 94 , however methylation was only confirmed for six sites in MAT2A mRNA and U6 snRNA 8 . To pinpoint METTL16-dependent m6A sites, we isolated poly(A)+ RNA from untreated HeLa cells and propargylated it in vitro using recombinantly produced METTL16 and SeAdoYn (Fig. 5a , Supplementary Fig. 17 ). The in vitro propargylated mRNA was then processed as described above to enrich biotinylated RNA and determine the modification sites via termination in NGS. Visual inspection of the few known METTL16 target sites revealed sharp edges in the coverage profile precisely one nucleotide upstream of the targeted adenosine in all cases, i.e. the hairpins in the 3′ UTR of the MAT2A-mRNA and the U6 snRNA (Fig. 5b , Supplementary Fig. 15 , Supplementary data 4 ). These drops were exclusively found in the modified sample but not in a control sample and matched sites found by metabolic labeling, confirming that these are METTL16-dependent target sites that are also installed in intact cells. Fig. 5: METTL16-dependent propargylation. a Scheme illustrating METTL16-dependent labeling in combination with MePMe-seq to identify METTL16 target sites. Isolated mRNA is propargylated in vitro using METTL16 and analyzed by NGS. To eliminate false positive hits from in vitro off-target effects of METTL16, only hits observed also in MePMe-seq are identified as METTL16 targets. b IGV browser coverage tracks for MAT2A-mRNA mapped by METT16-labeling in vitro (cyan) or MePMe-seq (purple), or control (gray). Arrow indicates orientation of coding strand. c Overlap of identified m6A sites in n = 2 independent METTL16-labeling experiments. d Consensus motif for sequences surrounding identified As after METTL16-dependent labeling in vitro. e Overlap of identified m6A sites in METTL16-dependent in vitro and metabolic labeling for sites present in n = 2 independent experiments with HS filtering. Sequencing and evaluation yielded 9909 and 5812 putative METTL16 target sites in two independent replicates (Fig. 5c , Supplementary data 5 ). Of these sites, 4495 were found in both replicates (i.e. 77% overlap), indicating good reproducibility. Within these hits, we inspected previously reported interaction sites of METTL16, such as STUB1, RBM3, MYC, NT5DC2, GNPTG, GMIP and MALAT1 9 , 95 , for which it is unclear, whether they are also methylated by METTL16. Interestingly, we detected several of these sites in both replicates for MYC, RBM3, NT5DC2 and MALAT1 (Supplementary Fig. 19 , Supplementary Table 4 ), providing evidence that METTL16 is indeed able to modify them in vitro. We observed multiple METTL16-dependent sites in the cancer-associated MALAT1 lncRNA, however, A8290 was not methylated in vitro (Supplementary Fig. 20b ). This is of particular interest, as A8290 was shown to interact with METTL16 but could not be validated as methylation target 94 , 96 . Analysis of the sequence motif adjacent to the m6A sites resulting from in vitro METTL16 labeling, identified a TACAD (Fig. 5d ) motif, containing the reported METTL16 consensus motif TACA motif 85 . The large number of METTL16 sites identified by in vitro labeling is in stark contrast to the small number of confirmed sites. In vitro modification of RNA has also been used in other methods 97 , however, we wondered whether the non-natural conditions could lead to off-target modification by METTL16 in vitro. To unambiguously identify METTL16 sites, we therefore matched the data from in vitro METTL16 labeling with the data from metabolic labeling. Hits identified in both approaches should be relevant METTL16 sites in cells. For this comparison, we used sites appearing in both replicates of in vitro METTL16 labeling (4495 hits) and MePMe-seq (1651 hits) and identified only four overlapping sites as hits (Fig. 5e ). This indeed suggests that a large fraction of hits from in vitro METTL16 labeling result from off-target effects. Analysis of the four hits showed that these are the previously reported METTL16 target sites in the 3′-UTR of MAT2A 85 . Two of the reported METTL16 hits in mRNA escaped this assignment using the HS filtering conditions but were detected in either rep1 or rep2 of MePMe-seq (Supplementary data 3 ). When we inspected A8290 from MALAT1, which has been previously suspected to be a METTL16 target site, we found that this site was neither called by METTL16 in vitro labeling nor by metabolic labeling (Supplementary Fig. 20b , Supplementary Table 4 ) 98 . However, several m6A sites in close proximity to the putative METTL16 target site in MALAT1 could be clearly assigned owing to the high precision by MePMe-seq. Based on the combined analysis of in vitro and metabolic labeling we can now exclude A8290 in MALAT1 as a target of METTL16. In summary, the in vitro modification data of METTL16 show that real and off-targets are detected within the consensus motif TACAD when applied in vitro at high concentrations. As several of the METTL16-dependent in vitro sites coincide with the interaction sites identified by CRAC, it could mean that METTL16 binds and—with SeAdoYn—can modify them. It cannot be excluded that additional proteins/RNAs as cofactors facilitate METTL16-dependent methylation in cells. We could show that the combination of in vitro and metabolic labeling provides a reliable protocol to assign the m6A sites to a certain MTase and determine its target sites with single nucleotide precision. m6A, Am and m1A can be distinguished by termination signatures In order to efficiently enrich MTase target sites and cause termination, we used propargylation and click chemistry. This enables transcriptome-wide identification of m6A sites. However, termination can also be brought on by Am and m1A. We therefore checked whether MePMe-seq hits for m6A (Fig. 6 ) still contained various modifications. We examined the termination signatures of prop6A and Aprop in vitro to see if our technique can distinguish between m6A and Am sites (Supplementary Figs. 5 , 6 ). Indeed, the termination at prop6A and Aprop results in distinct patterns, suggesting that these modified nucleosides can be distinguished. Fig. 6: Termination signatures at different modified nucleosides after metabolic labeling. Two representative IGV coverage tracks of MePMe-seq data are shown for RNA modifications (a) m6A, (b) m5C, (d) Am, (e) Um, (f) Gm, (g) Cm and (h) m1A. poly(A)+ RNA from HeLa cells labeled with PSH (purple) or methionine as control (gray) was used. Terminations identified by JACUSA2 are indicated by a additional brackets. Numbers indicate reads and the calculated arrest rate (%) in that position. Arrow indicates orientation of coding strand. Box plots show bioinformatic analysis of termination signatures based on the coverage (Diff ctr-PSH) at the positions –2, –1, 0, 1, 2 of frequently identified modification sites of (a) m6A in mRNA, (c) Ψ, (d) Am, (e) Um, (f) Gm and (g) Cm (Nm in rRNA). In the boxplots the center lines, medians, upper and lower quartiles, whiskers (1.5×) and outliers are shown. Source data are provided as a Source Data file. The next step was to analyze our MePMe-seq data to separate m6A sites from Am sites on a transcriptome-wide scale. We thoroughly examined termination signals near known Am sites in rRNA making use of residual rRNA in our poly(A)+ RNA samples (Fig. 6 ). We discovered a stepwise termination pattern at positions –1 and –2 obvious in the IGV coverage tracks, as indicated for two known Am sites in the 18S rRNA (Fig. 6 ). Similar to this, we employed a test set of well-known and frequently identified m6A sites (identified in ≥7 independent studies and MePMe-seq) in mRNA to identify the m6A termination signature. We verified the strong and nearly exclusive termination at position –1 for m6A sites (Fig. 6 ). The stepwise versus precise termination signatures of Am compared to m6A sites obtained from transcriptome-wide analysis are consistent with the in vitro data (Supplementary Figs. 5 , 6 ). JACUSA2 would classify both, the Am and m6A sites, as hits (because of the –1 termination), but with the more accurate analysis, we can attribute the stepwise termination to Am. We conducted a cluster analysis of the termination signals to identify probable Am sites in mRNA in our MePMe-seq data. Two groups of termination signatures were found. Cluster 1 displayed the precise termination at –1 typical of m6A sites, while cluster 2 displayed a distinct pattern that did not correspond to the termination signature seen for Am sites (Supplementary Fig. 22 ). Most of the modified As discovered in mRNA by MePMe-seq do in fact originate from m6A sites, as evidenced by cluster 1’s dominance and presence of >98% of sites in both replicates (Supplementary Figs. 22 , 23 ). Also the putative m6A site in FLNB RNA was identified as member of cluster 1. Nearly all of the remaining <2% of cluster 2 sites are situated in close proximity to a TSS (≤5 nt) and were removed during filtering in our MePMe-seq-analysis. The remaining 5 sites (less than 0.2%) could be false positives brought on by alternative TSS. According to the additional pattern analysis MePMe-seq identifies m6A sites in mRNA and we do not have evidence for Am sites in mRNA. Next, we asked whether MePMe-seq would also identify m1A sites. However, position N1 is in the Watson-Crick side and its methylation impedes polymerases 99 , 100 . Inspecting known m1A sites in rRNA, such as the conserved m1A1322 in human 28S rRNA 101 , 102 , confirmed that the IGV coverage tracks for methionine-fed controls exhibit a strong termination at m1A sites (Fig. 6h ). RNA from PSH-labeled cells likewise exhibits this termination. As a result, JACUSA2—which examines the difference in arrest rate between PSH-cells and controls—does not classify these sites as hits. Therefore, m1A sites are absent from MePMe-seq hits for modified adenosines. The reports on m1A mapping remain controversial and reveal the limitations of antibody-based approaches 68 . Although JACUSA2 will not identify m1A sites as hits on a transcriptome-wide level, we can individually validate putative m1A sites by inspecting the IGV coverage tracks. We examined a putative m1A site in MALAT1 103 , and 9 internal m1A sites in cytosolic mRNA as well as 12 mitochondrial RNAs from previous reports 103 , 104 . The m1A sites in MALAT1 and one in the mitochondrial 16S rRNA were verified by visual inspection (Fig. 6h ). However, none of the reported m1A sites in mRNA and the additional mitochondrial RNAs had the PSH- and control-specific termination signature that would be anticipated for m1A sites (Supplementary Fig. 24 ). These findings indicate that the majority of the m1A sites previously identified are not detectable by termination, which may be because they do not exist or are modified at a very low stoichiometry. As the sole technique not reliant on antibodies, MePMe-seq can help validate probable m1A locations. Importantly, MePMe-seq will not identify false positive m6A hits that originate from m1A. Analysis of additional methylation sites The examination of termination signatures was then expanded to include all 2′-O-methylated nucleosides from rRNA (Fig. 6 ). A stepwise termination pattern was visible at all Nm sites. Um sites were extremely noticeable, much like Am sites, whereas Gm and Cm sites resulted in weaker termination. As control, we analyzed pseudouridine sites in the same rRNAs (Fig. 6c ). These caused no discernable profile for the median of termination difference between PSH treated sample and control, confirming that the termination signatures for Nm sites are associated with ribose propargylation. The termination signatures were independently confirmed via primer extension assays, using short RNAs with the corresponding modification at one specific position (Supplementary Fig. 6 ). As additional control, we compared our results to a previous report on Nm in human mRNA from NGS data at single nucleotide resolution 65 . We discovered five sites that matched (Supplementary Table 7 ). On closer investigation, it was discovered that they were situated in two rDNA gene family members, indicating that they are misaligned rRNA fragments rather than mRNA (Supplementary Table 7 ). All things considered, we do not have evidence that MePMe-seq found internal Nm sites in mRNA. MePMe-seq identifies m5C sites in mRNA Next, we examined the second-most prevalent termination in our sample—cytidines. 16% of terminations one nucleotide upstream of cytidines were found by MePMe-seq (Fig. 3a ). A total number of 1276 sites were discovered. In more detail, 875 sites were discovered in rep1, 726 sites in rep2, and 325 sites (44%) in both replicates (Fig. 7a ). These findings imply that metabolic labeling causes cytidines in mRNA to become propargylated, which can then be found using MePMe-seq. Fig. 7: Detection of m5C sites using MePMe-seq. a Overlap of identified m5C sites between n = 2 MePMe-seq experiments (44% calculated from rep2 to rep1). b IGV browser coverage tracks of MePMe-seq data for the indicated FURIN and PXDN mRNAs from cells labeled with PSH (purple) or methionine as control (gray). Blue and green bars indicate terminations at C and A, respectively, identified by JACUSA2. Numbers (%) denote the calculated arrest rate in that position. Arrow indicates orientation of coding strand. c Frequency of m5C sites per transcript and replicate. d Metatranscript analysis showing a density plot of the distribution of m5C sites detected by MePMe-seq. e Consensus motif for sequences surrounding identified m5C sites. Representative example of n = 2 biologically independent samples. f Overlap of m5C identified in MePMe-seq per replicate (combined sites from n = 2 independent experiments with HS filtering) with sites identified by bisulfite sequencing (BS-seq), improved BS-seq, Aza-IP and miCLIP 81 . g Overlap of m5C sites identified in MePMe-seq (combined sites from n = 2 independent experiments) with all m5C sites from ATLAS database 81 , when increasing uncertainty region around the site is applied. Contributions K.H., A.O., and A.R. conceived the project. K.H. designed, optimized, and performed MePMe-seq. A.O. designed and performed in vitro labeling with METTL16. K.H. and N.A.K. prepared sequencing libraries. N.A.K. and K.H. designed, performed, and analyzed SELECT experiments. P.S., A.B., and N.V.C. designed and performed chemical syntheses of modified nucleosides. A.O., N.A.K., and N.V.C. performed chemical syntheses of PSH and SeAdoYn. P.S. developed, optimized and performed LC-QqQ-MS measurements with contributions from K.H., N.V.C, and N.A.K. S.H. performed cell culture experiments. C.D., A.O., and K.H. analyzed and evaluated NGS data and performed statistical analyses. C.D. performed JACUSA2, signature, and cluster analysis. A.R. supervised the project and contributed to the design of experiments. All authors discussed the results. A.R., K.H., and N.A.K. wrote the manuscript with contributions from all coauthors. All authors read and approved the final manuscript. Corresponding author
CPROP Frequently Asked Questions (FAQ)
Where is CPROP's headquarters?
CPROP's headquarters is located at 112 West 34th Street, New York.
Loading...
Loading...