Cargando…

Identifying micro-inversions using high-throughput sequencing reads

BACKGROUND: The identification of inversions of DNA segments shorter than read length (e.g., 100 bp), defined as micro-inversions (MIs), remains challenging for next-generation sequencing reads. It is acknowledged that MIs are important genomic variation and may play roles in causing genetic disease...

Descripción completa

Detalles Bibliográficos
Autores principales: He, Feifei, Li, Yang, Tang, Yu-Hang, Ma, Jian, Zhu, Huaiqiu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4895285/
https://www.ncbi.nlm.nih.gov/pubmed/26818118
http://dx.doi.org/10.1186/s12864-015-2305-7
_version_ 1782435818745364480
author He, Feifei
Li, Yang
Tang, Yu-Hang
Ma, Jian
Zhu, Huaiqiu
author_facet He, Feifei
Li, Yang
Tang, Yu-Hang
Ma, Jian
Zhu, Huaiqiu
author_sort He, Feifei
collection PubMed
description BACKGROUND: The identification of inversions of DNA segments shorter than read length (e.g., 100 bp), defined as micro-inversions (MIs), remains challenging for next-generation sequencing reads. It is acknowledged that MIs are important genomic variation and may play roles in causing genetic disease. However, current alignment methods are generally insensitive to detect MIs. Here we develop a novel tool, MID (Micro-Inversion Detector), to identify MIs in human genomes using next-generation sequencing reads. RESULTS: The algorithm of MID is designed based on a dynamic programming path-finding approach. What makes MID different from other variant detection tools is that MID can handle small MIs and multiple breakpoints within an unmapped read. Moreover, MID improves reliability in low coverage data by integrating multiple samples. Our evaluation demonstrated that MID outperforms Gustaf, which can currently detect inversions from 30 bp to 500 bp. CONCLUSIONS: To our knowledge, MID is the first method that can efficiently and reliably identify MIs from unmapped short next-generation sequencing reads. MID is reliable on low coverage data, which is suitable for large-scale projects such as the 1000 Genomes Project (1KGP). MID identified previously unknown MIs from the 1KGP that overlap with genes and regulatory elements in the human genome. We also identified MIs in cancer cell lines from Cancer Cell Line Encyclopedia (CCLE). Therefore our tool is expected to be useful to improve the study of MIs as a type of genetic variant in the human genome. The source code can be downloaded from: http://cqb.pku.edu.cn/ZhuLab/MID. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-2305-7) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4895285
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-48952852016-06-10 Identifying micro-inversions using high-throughput sequencing reads He, Feifei Li, Yang Tang, Yu-Hang Ma, Jian Zhu, Huaiqiu BMC Genomics Proceedings BACKGROUND: The identification of inversions of DNA segments shorter than read length (e.g., 100 bp), defined as micro-inversions (MIs), remains challenging for next-generation sequencing reads. It is acknowledged that MIs are important genomic variation and may play roles in causing genetic disease. However, current alignment methods are generally insensitive to detect MIs. Here we develop a novel tool, MID (Micro-Inversion Detector), to identify MIs in human genomes using next-generation sequencing reads. RESULTS: The algorithm of MID is designed based on a dynamic programming path-finding approach. What makes MID different from other variant detection tools is that MID can handle small MIs and multiple breakpoints within an unmapped read. Moreover, MID improves reliability in low coverage data by integrating multiple samples. Our evaluation demonstrated that MID outperforms Gustaf, which can currently detect inversions from 30 bp to 500 bp. CONCLUSIONS: To our knowledge, MID is the first method that can efficiently and reliably identify MIs from unmapped short next-generation sequencing reads. MID is reliable on low coverage data, which is suitable for large-scale projects such as the 1000 Genomes Project (1KGP). MID identified previously unknown MIs from the 1KGP that overlap with genes and regulatory elements in the human genome. We also identified MIs in cancer cell lines from Cancer Cell Line Encyclopedia (CCLE). Therefore our tool is expected to be useful to improve the study of MIs as a type of genetic variant in the human genome. The source code can be downloaded from: http://cqb.pku.edu.cn/ZhuLab/MID. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-2305-7) contains supplementary material, which is available to authorized users. BioMed Central 2016-01-11 /pmc/articles/PMC4895285/ /pubmed/26818118 http://dx.doi.org/10.1186/s12864-015-2305-7 Text en © He et al. 2015 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Proceedings
He, Feifei
Li, Yang
Tang, Yu-Hang
Ma, Jian
Zhu, Huaiqiu
Identifying micro-inversions using high-throughput sequencing reads
title Identifying micro-inversions using high-throughput sequencing reads
title_full Identifying micro-inversions using high-throughput sequencing reads
title_fullStr Identifying micro-inversions using high-throughput sequencing reads
title_full_unstemmed Identifying micro-inversions using high-throughput sequencing reads
title_short Identifying micro-inversions using high-throughput sequencing reads
title_sort identifying micro-inversions using high-throughput sequencing reads
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4895285/
https://www.ncbi.nlm.nih.gov/pubmed/26818118
http://dx.doi.org/10.1186/s12864-015-2305-7
work_keys_str_mv AT hefeifei identifyingmicroinversionsusinghighthroughputsequencingreads
AT liyang identifyingmicroinversionsusinghighthroughputsequencingreads
AT tangyuhang identifyingmicroinversionsusinghighthroughputsequencingreads
AT majian identifyingmicroinversionsusinghighthroughputsequencingreads
AT zhuhuaiqiu identifyingmicroinversionsusinghighthroughputsequencingreads