Cargando…
Detecting Alu insertions from high-throughput sequencing data
High-throughput sequencing technologies have allowed for the cataloguing of variation in personal human genomes. In this manuscript, we present alu-detect, a tool that combines read-pair and split-read information to detect novel Alus and their precise breakpoints directly from either whole-genome o...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3783187/ https://www.ncbi.nlm.nih.gov/pubmed/23921633 http://dx.doi.org/10.1093/nar/gkt612 |
_version_ | 1782285639426768896 |
---|---|
author | David, Matei Mustafa, Harun Brudno, Michael |
author_facet | David, Matei Mustafa, Harun Brudno, Michael |
author_sort | David, Matei |
collection | PubMed |
description | High-throughput sequencing technologies have allowed for the cataloguing of variation in personal human genomes. In this manuscript, we present alu-detect, a tool that combines read-pair and split-read information to detect novel Alus and their precise breakpoints directly from either whole-genome or whole-exome sequencing data while also identifying insertions directly in the vicinity of existing Alus. To set the parameters of our method, we use simulation of a faux reference, which allows us to compute the precision and recall of various parameter settings using real sequencing data. Applying our method to 100 bp paired Illumina data from seven individuals, including two trios, we detected on average 1519 novel Alus per sample. Based on the faux-reference simulation, we estimate that our method has 97% precision and 85% recall. We identify 808 novel Alus not previously described in other studies. We also demonstrate the use of alu-detect to study the local sequence and global location preferences for novel Alu insertions. |
format | Online Article Text |
id | pubmed-3783187 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-37831872013-09-30 Detecting Alu insertions from high-throughput sequencing data David, Matei Mustafa, Harun Brudno, Michael Nucleic Acids Res Methods Online High-throughput sequencing technologies have allowed for the cataloguing of variation in personal human genomes. In this manuscript, we present alu-detect, a tool that combines read-pair and split-read information to detect novel Alus and their precise breakpoints directly from either whole-genome or whole-exome sequencing data while also identifying insertions directly in the vicinity of existing Alus. To set the parameters of our method, we use simulation of a faux reference, which allows us to compute the precision and recall of various parameter settings using real sequencing data. Applying our method to 100 bp paired Illumina data from seven individuals, including two trios, we detected on average 1519 novel Alus per sample. Based on the faux-reference simulation, we estimate that our method has 97% precision and 85% recall. We identify 808 novel Alus not previously described in other studies. We also demonstrate the use of alu-detect to study the local sequence and global location preferences for novel Alu insertions. Oxford University Press 2013-09 2013-08-05 /pmc/articles/PMC3783187/ /pubmed/23921633 http://dx.doi.org/10.1093/nar/gkt612 Text en © The Author(s) 2013. Published by Oxford University Press. http://creativecommons.org/licenses/by/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methods Online David, Matei Mustafa, Harun Brudno, Michael Detecting Alu insertions from high-throughput sequencing data |
title | Detecting Alu insertions from high-throughput sequencing data |
title_full | Detecting Alu insertions from high-throughput sequencing data |
title_fullStr | Detecting Alu insertions from high-throughput sequencing data |
title_full_unstemmed | Detecting Alu insertions from high-throughput sequencing data |
title_short | Detecting Alu insertions from high-throughput sequencing data |
title_sort | detecting alu insertions from high-throughput sequencing data |
topic | Methods Online |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3783187/ https://www.ncbi.nlm.nih.gov/pubmed/23921633 http://dx.doi.org/10.1093/nar/gkt612 |
work_keys_str_mv | AT davidmatei detectingaluinsertionsfromhighthroughputsequencingdata AT mustafaharun detectingaluinsertionsfromhighthroughputsequencingdata AT brudnomichael detectingaluinsertionsfromhighthroughputsequencingdata |