Cargando…

Assembly of non-unique insertion content using next-generation sequencing

Recent studies in genomics have highlighted the significance of sequence insertions in determining individual variation. Efforts to discover the content of these sequence insertions have been limited to short insertions and long unique insertions. Much of the inserted sequence in the typical human g...

Descripción completa

Detalles Bibliográficos
Autores principales: Parrish, Nathaniel, Hormozdiari, Farhad, Eskin, Eleazar
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3194191/
https://www.ncbi.nlm.nih.gov/pubmed/21989261
http://dx.doi.org/10.1186/1471-2105-12-S6-S3
_version_ 1782213927010041856
author Parrish, Nathaniel
Hormozdiari, Farhad
Eskin, Eleazar
author_facet Parrish, Nathaniel
Hormozdiari, Farhad
Eskin, Eleazar
author_sort Parrish, Nathaniel
collection PubMed
description Recent studies in genomics have highlighted the significance of sequence insertions in determining individual variation. Efforts to discover the content of these sequence insertions have been limited to short insertions and long unique insertions. Much of the inserted sequence in the typical human genome, however, is a mixture of repeated and unique sequence. Current methods are designed to assemble only unique sequence insertions, using reads that do not map to the reference. These methods are not able to assemble repeated sequence insertions, as the reads will map to the reference in a different locus. In this paper, we present a computational method for discovering the content of sequence insertions that are unique, repeated, or a combination of the two. Our method analyzes the read mappings and depth of coverage of paired-end reads to identify reads that originated from inserted sequence. We demonstrate the process of assembling these reads to characterize the insertion content. Our method is based on the idea of segment extension, which progressively extends segments of known content using paired-end reads. We apply our method in simulation to discover the content of inserted sequences in a modified mouse chromosome and show that our method produces reliable results at 40x coverage.
format Online
Article
Text
id pubmed-3194191
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-31941912011-10-17 Assembly of non-unique insertion content using next-generation sequencing Parrish, Nathaniel Hormozdiari, Farhad Eskin, Eleazar BMC Bioinformatics Proceedings Recent studies in genomics have highlighted the significance of sequence insertions in determining individual variation. Efforts to discover the content of these sequence insertions have been limited to short insertions and long unique insertions. Much of the inserted sequence in the typical human genome, however, is a mixture of repeated and unique sequence. Current methods are designed to assemble only unique sequence insertions, using reads that do not map to the reference. These methods are not able to assemble repeated sequence insertions, as the reads will map to the reference in a different locus. In this paper, we present a computational method for discovering the content of sequence insertions that are unique, repeated, or a combination of the two. Our method analyzes the read mappings and depth of coverage of paired-end reads to identify reads that originated from inserted sequence. We demonstrate the process of assembling these reads to characterize the insertion content. Our method is based on the idea of segment extension, which progressively extends segments of known content using paired-end reads. We apply our method in simulation to discover the content of inserted sequences in a modified mouse chromosome and show that our method produces reliable results at 40x coverage. BioMed Central 2011-07-28 /pmc/articles/PMC3194191/ /pubmed/21989261 http://dx.doi.org/10.1186/1471-2105-12-S6-S3 Text en Copyright ©2011 Parrish et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Parrish, Nathaniel
Hormozdiari, Farhad
Eskin, Eleazar
Assembly of non-unique insertion content using next-generation sequencing
title Assembly of non-unique insertion content using next-generation sequencing
title_full Assembly of non-unique insertion content using next-generation sequencing
title_fullStr Assembly of non-unique insertion content using next-generation sequencing
title_full_unstemmed Assembly of non-unique insertion content using next-generation sequencing
title_short Assembly of non-unique insertion content using next-generation sequencing
title_sort assembly of non-unique insertion content using next-generation sequencing
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3194191/
https://www.ncbi.nlm.nih.gov/pubmed/21989261
http://dx.doi.org/10.1186/1471-2105-12-S6-S3
work_keys_str_mv AT parrishnathaniel assemblyofnonuniqueinsertioncontentusingnextgenerationsequencing
AT hormozdiarifarhad assemblyofnonuniqueinsertioncontentusingnextgenerationsequencing
AT eskineleazar assemblyofnonuniqueinsertioncontentusingnextgenerationsequencing