Abstract

Long single-molecular sequencing technologies, such as PacBio circular consensus sequencing (CCS) and nanopore sequencing, are advantageous in detecting DNA 5-methylcytosine in CpGs (5mCpGs), especially in repetitive genomic regions. However, existing methods for detecting 5mCpGs using PacBio CCS are less accurate and robust. Here, we present ccsmeth, a deep-learning method to detect DNA 5mCpGs using CCS reads. We sequence polymerase-chain-reaction treated and M.SssI-methyltransferase treated DNA of one human sample using PacBio CCS for training ccsmeth. Using long (≥10 Kb) CCS reads, ccsmeth achieves 0.90 accuracy and 0.97 Area Under the Curve on 5mCpG detection at single-molecule resolution. At the genome-wide site level, ccsmeth achieves >0.90 correlations with bisulfite sequencing and nanopore sequencing using only 10× reads. Furthermore, we develop a Nextflow pipeline, ccsmethphase, to detect haplotype-aware methylation using CCS reads, and then sequence a Chinese family trio to validate it. ccsmeth and ccsmethphase can be robust and accurate tools for detecting DNA 5-methylcytosines.

Download full-text PDF

Link Source
Download Source 1https://www.nature.com/articles/s41467-023-39784-9?error=cookies_not_supported&code=eae54b8f-1ff8-429e-8b60-af25c38bd0aaWeb Search
Download Source 2http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10329642PMC
Download Source 3http://dx.doi.org/10.1038/s41467-023-39784-9DOI Listing

Publication Analysis

Top Keywords

ccs reads
12
dna 5-methylcytosine
8
pacbio circular
8
circular consensus
8
consensus sequencing
8
nanopore sequencing
8
detecting dna
8
pacbio ccs
8
reads sequence
8
ccsmeth achieves
8