Cargando…

A Survey of Genomic Traces Reveals a Common Sequencing Error, RNA Editing, and DNA Editing

While it is widely held that an organism's genomic information should remain constant, several protein families are known to modify it. Members of the AID/APOBEC protein family can deaminate DNA. Similarly, members of the ADAR family can deaminate RNA. Characterizing the scope of these events i...

Descripción completa

Detalles Bibliográficos
Autores principales: Zaranek, Alexander Wait, Levanon, Erez Y., Zecharia, Tomer, Clegg, Tom, Church, George M.
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2873906/
https://www.ncbi.nlm.nih.gov/pubmed/20531933
http://dx.doi.org/10.1371/journal.pgen.1000954
_version_ 1782181409743437824
author Zaranek, Alexander Wait
Levanon, Erez Y.
Zecharia, Tomer
Clegg, Tom
Church, George M.
author_facet Zaranek, Alexander Wait
Levanon, Erez Y.
Zecharia, Tomer
Clegg, Tom
Church, George M.
author_sort Zaranek, Alexander Wait
collection PubMed
description While it is widely held that an organism's genomic information should remain constant, several protein families are known to modify it. Members of the AID/APOBEC protein family can deaminate DNA. Similarly, members of the ADAR family can deaminate RNA. Characterizing the scope of these events is challenging. Here we use large genomic data sets, such as the two billion sequences in the NCBI Trace Archive, to look for clusters of mismatches of the same type, which are a hallmark of editing events caused by APOBEC3 and ADAR. We align 603,249,815 traces from the NCBI trace archive to their reference genomes. In clusters of mismatches of increasing size, at least one systematic sequencing error dominates the results (G-to-A). It is still present in mismatches with 99% accuracy and only vanishes in mismatches at 99.99% accuracy or higher. The error appears to have entered into about 1% of the HapMap, possibly affecting other users that rely on this resource. Further investigation, using stringent quality thresholds, uncovers thousands of mismatch clusters with no apparent defects in their chromatograms. These traces provide the first reported candidates of endogenous DNA editing in human, further elucidating RNA editing in human and mouse and also revealing, for the first time, extensive RNA editing in Xenopus tropicalis. We show that the NCBI Trace Archive provides a valuable resource for the investigation of the phenomena of DNA and RNA editing, as well as setting the stage for a comprehensive mapping of editing events in large-scale genomic datasets.
format Text
id pubmed-2873906
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-28739062010-06-07 A Survey of Genomic Traces Reveals a Common Sequencing Error, RNA Editing, and DNA Editing Zaranek, Alexander Wait Levanon, Erez Y. Zecharia, Tomer Clegg, Tom Church, George M. PLoS Genet Research Article While it is widely held that an organism's genomic information should remain constant, several protein families are known to modify it. Members of the AID/APOBEC protein family can deaminate DNA. Similarly, members of the ADAR family can deaminate RNA. Characterizing the scope of these events is challenging. Here we use large genomic data sets, such as the two billion sequences in the NCBI Trace Archive, to look for clusters of mismatches of the same type, which are a hallmark of editing events caused by APOBEC3 and ADAR. We align 603,249,815 traces from the NCBI trace archive to their reference genomes. In clusters of mismatches of increasing size, at least one systematic sequencing error dominates the results (G-to-A). It is still present in mismatches with 99% accuracy and only vanishes in mismatches at 99.99% accuracy or higher. The error appears to have entered into about 1% of the HapMap, possibly affecting other users that rely on this resource. Further investigation, using stringent quality thresholds, uncovers thousands of mismatch clusters with no apparent defects in their chromatograms. These traces provide the first reported candidates of endogenous DNA editing in human, further elucidating RNA editing in human and mouse and also revealing, for the first time, extensive RNA editing in Xenopus tropicalis. We show that the NCBI Trace Archive provides a valuable resource for the investigation of the phenomena of DNA and RNA editing, as well as setting the stage for a comprehensive mapping of editing events in large-scale genomic datasets. Public Library of Science 2010-05-20 /pmc/articles/PMC2873906/ /pubmed/20531933 http://dx.doi.org/10.1371/journal.pgen.1000954 Text en Zaranek et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Zaranek, Alexander Wait
Levanon, Erez Y.
Zecharia, Tomer
Clegg, Tom
Church, George M.
A Survey of Genomic Traces Reveals a Common Sequencing Error, RNA Editing, and DNA Editing
title A Survey of Genomic Traces Reveals a Common Sequencing Error, RNA Editing, and DNA Editing
title_full A Survey of Genomic Traces Reveals a Common Sequencing Error, RNA Editing, and DNA Editing
title_fullStr A Survey of Genomic Traces Reveals a Common Sequencing Error, RNA Editing, and DNA Editing
title_full_unstemmed A Survey of Genomic Traces Reveals a Common Sequencing Error, RNA Editing, and DNA Editing
title_short A Survey of Genomic Traces Reveals a Common Sequencing Error, RNA Editing, and DNA Editing
title_sort survey of genomic traces reveals a common sequencing error, rna editing, and dna editing
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2873906/
https://www.ncbi.nlm.nih.gov/pubmed/20531933
http://dx.doi.org/10.1371/journal.pgen.1000954
work_keys_str_mv AT zaranekalexanderwait asurveyofgenomictracesrevealsacommonsequencingerrorrnaeditinganddnaediting
AT levanonerezy asurveyofgenomictracesrevealsacommonsequencingerrorrnaeditinganddnaediting
AT zechariatomer asurveyofgenomictracesrevealsacommonsequencingerrorrnaeditinganddnaediting
AT cleggtom asurveyofgenomictracesrevealsacommonsequencingerrorrnaeditinganddnaediting
AT churchgeorgem asurveyofgenomictracesrevealsacommonsequencingerrorrnaeditinganddnaediting
AT zaranekalexanderwait surveyofgenomictracesrevealsacommonsequencingerrorrnaeditinganddnaediting
AT levanonerezy surveyofgenomictracesrevealsacommonsequencingerrorrnaeditinganddnaediting
AT zechariatomer surveyofgenomictracesrevealsacommonsequencingerrorrnaeditinganddnaediting
AT cleggtom surveyofgenomictracesrevealsacommonsequencingerrorrnaeditinganddnaediting
AT churchgeorgem surveyofgenomictracesrevealsacommonsequencingerrorrnaeditinganddnaediting