Cargando…
Assembly of non-unique insertion content using next-generation sequencing
Recent studies in genomics have highlighted the significance of sequence insertions in determining individual variation. Efforts to discover the content of these sequence insertions have been limited to short insertions and long unique insertions. Much of the inserted sequence in the typical human g...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2011
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3194191/ https://www.ncbi.nlm.nih.gov/pubmed/21989261 http://dx.doi.org/10.1186/1471-2105-12-S6-S3 |
_version_ | 1782213927010041856 |
---|---|
author | Parrish, Nathaniel Hormozdiari, Farhad Eskin, Eleazar |
author_facet | Parrish, Nathaniel Hormozdiari, Farhad Eskin, Eleazar |
author_sort | Parrish, Nathaniel |
collection | PubMed |
description | Recent studies in genomics have highlighted the significance of sequence insertions in determining individual variation. Efforts to discover the content of these sequence insertions have been limited to short insertions and long unique insertions. Much of the inserted sequence in the typical human genome, however, is a mixture of repeated and unique sequence. Current methods are designed to assemble only unique sequence insertions, using reads that do not map to the reference. These methods are not able to assemble repeated sequence insertions, as the reads will map to the reference in a different locus. In this paper, we present a computational method for discovering the content of sequence insertions that are unique, repeated, or a combination of the two. Our method analyzes the read mappings and depth of coverage of paired-end reads to identify reads that originated from inserted sequence. We demonstrate the process of assembling these reads to characterize the insertion content. Our method is based on the idea of segment extension, which progressively extends segments of known content using paired-end reads. We apply our method in simulation to discover the content of inserted sequences in a modified mouse chromosome and show that our method produces reliable results at 40x coverage. |
format | Online Article Text |
id | pubmed-3194191 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2011 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-31941912011-10-17 Assembly of non-unique insertion content using next-generation sequencing Parrish, Nathaniel Hormozdiari, Farhad Eskin, Eleazar BMC Bioinformatics Proceedings Recent studies in genomics have highlighted the significance of sequence insertions in determining individual variation. Efforts to discover the content of these sequence insertions have been limited to short insertions and long unique insertions. Much of the inserted sequence in the typical human genome, however, is a mixture of repeated and unique sequence. Current methods are designed to assemble only unique sequence insertions, using reads that do not map to the reference. These methods are not able to assemble repeated sequence insertions, as the reads will map to the reference in a different locus. In this paper, we present a computational method for discovering the content of sequence insertions that are unique, repeated, or a combination of the two. Our method analyzes the read mappings and depth of coverage of paired-end reads to identify reads that originated from inserted sequence. We demonstrate the process of assembling these reads to characterize the insertion content. Our method is based on the idea of segment extension, which progressively extends segments of known content using paired-end reads. We apply our method in simulation to discover the content of inserted sequences in a modified mouse chromosome and show that our method produces reliable results at 40x coverage. BioMed Central 2011-07-28 /pmc/articles/PMC3194191/ /pubmed/21989261 http://dx.doi.org/10.1186/1471-2105-12-S6-S3 Text en Copyright ©2011 Parrish et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Proceedings Parrish, Nathaniel Hormozdiari, Farhad Eskin, Eleazar Assembly of non-unique insertion content using next-generation sequencing |
title | Assembly of non-unique insertion content using next-generation sequencing |
title_full | Assembly of non-unique insertion content using next-generation sequencing |
title_fullStr | Assembly of non-unique insertion content using next-generation sequencing |
title_full_unstemmed | Assembly of non-unique insertion content using next-generation sequencing |
title_short | Assembly of non-unique insertion content using next-generation sequencing |
title_sort | assembly of non-unique insertion content using next-generation sequencing |
topic | Proceedings |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3194191/ https://www.ncbi.nlm.nih.gov/pubmed/21989261 http://dx.doi.org/10.1186/1471-2105-12-S6-S3 |
work_keys_str_mv | AT parrishnathaniel assemblyofnonuniqueinsertioncontentusingnextgenerationsequencing AT hormozdiarifarhad assemblyofnonuniqueinsertioncontentusingnextgenerationsequencing AT eskineleazar assemblyofnonuniqueinsertioncontentusingnextgenerationsequencing |