Cargando…

Detecting Alu insertions from high-throughput sequencing data

High-throughput sequencing technologies have allowed for the cataloguing of variation in personal human genomes. In this manuscript, we present alu-detect, a tool that combines read-pair and split-read information to detect novel Alus and their precise breakpoints directly from either whole-genome o...

Descripción completa

Detalles Bibliográficos
Autores principales: David, Matei, Mustafa, Harun, Brudno, Michael
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3783187/
https://www.ncbi.nlm.nih.gov/pubmed/23921633
http://dx.doi.org/10.1093/nar/gkt612
_version_ 1782285639426768896
author David, Matei
Mustafa, Harun
Brudno, Michael
author_facet David, Matei
Mustafa, Harun
Brudno, Michael
author_sort David, Matei
collection PubMed
description High-throughput sequencing technologies have allowed for the cataloguing of variation in personal human genomes. In this manuscript, we present alu-detect, a tool that combines read-pair and split-read information to detect novel Alus and their precise breakpoints directly from either whole-genome or whole-exome sequencing data while also identifying insertions directly in the vicinity of existing Alus. To set the parameters of our method, we use simulation of a faux reference, which allows us to compute the precision and recall of various parameter settings using real sequencing data. Applying our method to 100 bp paired Illumina data from seven individuals, including two trios, we detected on average 1519 novel Alus per sample. Based on the faux-reference simulation, we estimate that our method has 97% precision and 85% recall. We identify 808 novel Alus not previously described in other studies. We also demonstrate the use of alu-detect to study the local sequence and global location preferences for novel Alu insertions.
format Online
Article
Text
id pubmed-3783187
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-37831872013-09-30 Detecting Alu insertions from high-throughput sequencing data David, Matei Mustafa, Harun Brudno, Michael Nucleic Acids Res Methods Online High-throughput sequencing technologies have allowed for the cataloguing of variation in personal human genomes. In this manuscript, we present alu-detect, a tool that combines read-pair and split-read information to detect novel Alus and their precise breakpoints directly from either whole-genome or whole-exome sequencing data while also identifying insertions directly in the vicinity of existing Alus. To set the parameters of our method, we use simulation of a faux reference, which allows us to compute the precision and recall of various parameter settings using real sequencing data. Applying our method to 100 bp paired Illumina data from seven individuals, including two trios, we detected on average 1519 novel Alus per sample. Based on the faux-reference simulation, we estimate that our method has 97% precision and 85% recall. We identify 808 novel Alus not previously described in other studies. We also demonstrate the use of alu-detect to study the local sequence and global location preferences for novel Alu insertions. Oxford University Press 2013-09 2013-08-05 /pmc/articles/PMC3783187/ /pubmed/23921633 http://dx.doi.org/10.1093/nar/gkt612 Text en © The Author(s) 2013. Published by Oxford University Press. http://creativecommons.org/licenses/by/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Online
David, Matei
Mustafa, Harun
Brudno, Michael
Detecting Alu insertions from high-throughput sequencing data
title Detecting Alu insertions from high-throughput sequencing data
title_full Detecting Alu insertions from high-throughput sequencing data
title_fullStr Detecting Alu insertions from high-throughput sequencing data
title_full_unstemmed Detecting Alu insertions from high-throughput sequencing data
title_short Detecting Alu insertions from high-throughput sequencing data
title_sort detecting alu insertions from high-throughput sequencing data
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3783187/
https://www.ncbi.nlm.nih.gov/pubmed/23921633
http://dx.doi.org/10.1093/nar/gkt612
work_keys_str_mv AT davidmatei detectingaluinsertionsfromhighthroughputsequencingdata
AT mustafaharun detectingaluinsertionsfromhighthroughputsequencingdata
AT brudnomichael detectingaluinsertionsfromhighthroughputsequencingdata