Cargando…

MATCHCLIP: locate precise breakpoints for copy number variation using CIGAR string by matching soft clipped reads

Copy number variations (CNVs) are associated with many complex diseases. Next generation sequencing data enable one to identify precise CNV breakpoints to better under the underlying molecular mechanisms and to design more efficient assays. Using the CIGAR strings of the reads, we develop a method t...

Descripción completa

Detalles Bibliográficos
Autores principales: Wu, Yinghua, Tian, Lifeng, Pirastu, Mario, Stambolian, Dwight, Li, Hongzhe
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3744852/
https://www.ncbi.nlm.nih.gov/pubmed/23967014
http://dx.doi.org/10.3389/fgene.2013.00157
_version_ 1782280654789017600
author Wu, Yinghua
Tian, Lifeng
Pirastu, Mario
Stambolian, Dwight
Li, Hongzhe
author_facet Wu, Yinghua
Tian, Lifeng
Pirastu, Mario
Stambolian, Dwight
Li, Hongzhe
author_sort Wu, Yinghua
collection PubMed
description Copy number variations (CNVs) are associated with many complex diseases. Next generation sequencing data enable one to identify precise CNV breakpoints to better under the underlying molecular mechanisms and to design more efficient assays. Using the CIGAR strings of the reads, we develop a method that can identify the exact CNV breakpoints, and in cases when the breakpoints are in a repeated region, the method reports a range where the breakpoints can slide. Our method identifies the breakpoints of a CNV using both the positions and CIGAR strings of the reads that cover breakpoints of a CNV. A read with a long soft clipped part (denoted as S in CIGAR) at its 3′(right) end can be used to identify the 5′(left)-side of the breakpoints, and a read with a long S part at the 5′ end can be used to identify the breakpoint at the 3′-side. To ensure both types of reads cover the same CNV, we require the overlapped common string to include both of the soft clipped parts. When a CNV starts and ends in the same repeated regions, its breakpoints are not unique, in which case our method reports the left most positions for the breakpoints and a range within which the breakpoints can be incremented without changing the variant sequence. We have implemented the methods in a C++ package intended for the current Illumina Miseq and Hiseq platforms for both whole genome and exon-sequencing. Our simulation studies have shown that our method compares favorably with other similar methods in terms of true discovery rate, false positive rate and breakpoint accuracy. Our results from a real application have shown that the detected CNVs are consistent with zygosity and read depth information. The software package is available at http://statgene.med.upenn.edu/softprog.html.
format Online
Article
Text
id pubmed-3744852
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-37448522013-08-21 MATCHCLIP: locate precise breakpoints for copy number variation using CIGAR string by matching soft clipped reads Wu, Yinghua Tian, Lifeng Pirastu, Mario Stambolian, Dwight Li, Hongzhe Front Genet Genetics Copy number variations (CNVs) are associated with many complex diseases. Next generation sequencing data enable one to identify precise CNV breakpoints to better under the underlying molecular mechanisms and to design more efficient assays. Using the CIGAR strings of the reads, we develop a method that can identify the exact CNV breakpoints, and in cases when the breakpoints are in a repeated region, the method reports a range where the breakpoints can slide. Our method identifies the breakpoints of a CNV using both the positions and CIGAR strings of the reads that cover breakpoints of a CNV. A read with a long soft clipped part (denoted as S in CIGAR) at its 3′(right) end can be used to identify the 5′(left)-side of the breakpoints, and a read with a long S part at the 5′ end can be used to identify the breakpoint at the 3′-side. To ensure both types of reads cover the same CNV, we require the overlapped common string to include both of the soft clipped parts. When a CNV starts and ends in the same repeated regions, its breakpoints are not unique, in which case our method reports the left most positions for the breakpoints and a range within which the breakpoints can be incremented without changing the variant sequence. We have implemented the methods in a C++ package intended for the current Illumina Miseq and Hiseq platforms for both whole genome and exon-sequencing. Our simulation studies have shown that our method compares favorably with other similar methods in terms of true discovery rate, false positive rate and breakpoint accuracy. Our results from a real application have shown that the detected CNVs are consistent with zygosity and read depth information. The software package is available at http://statgene.med.upenn.edu/softprog.html. Frontiers Media S.A. 2013-08-16 /pmc/articles/PMC3744852/ /pubmed/23967014 http://dx.doi.org/10.3389/fgene.2013.00157 Text en Copyright © 2013 Wu, Tian, Pirastu, Stambolian and Li. http://creativecommons.org/licenses/by/3.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Wu, Yinghua
Tian, Lifeng
Pirastu, Mario
Stambolian, Dwight
Li, Hongzhe
MATCHCLIP: locate precise breakpoints for copy number variation using CIGAR string by matching soft clipped reads
title MATCHCLIP: locate precise breakpoints for copy number variation using CIGAR string by matching soft clipped reads
title_full MATCHCLIP: locate precise breakpoints for copy number variation using CIGAR string by matching soft clipped reads
title_fullStr MATCHCLIP: locate precise breakpoints for copy number variation using CIGAR string by matching soft clipped reads
title_full_unstemmed MATCHCLIP: locate precise breakpoints for copy number variation using CIGAR string by matching soft clipped reads
title_short MATCHCLIP: locate precise breakpoints for copy number variation using CIGAR string by matching soft clipped reads
title_sort matchclip: locate precise breakpoints for copy number variation using cigar string by matching soft clipped reads
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3744852/
https://www.ncbi.nlm.nih.gov/pubmed/23967014
http://dx.doi.org/10.3389/fgene.2013.00157
work_keys_str_mv AT wuyinghua matchcliplocateprecisebreakpointsforcopynumbervariationusingcigarstringbymatchingsoftclippedreads
AT tianlifeng matchcliplocateprecisebreakpointsforcopynumbervariationusingcigarstringbymatchingsoftclippedreads
AT pirastumario matchcliplocateprecisebreakpointsforcopynumbervariationusingcigarstringbymatchingsoftclippedreads
AT stamboliandwight matchcliplocateprecisebreakpointsforcopynumbervariationusingcigarstringbymatchingsoftclippedreads
AT lihongzhe matchcliplocateprecisebreakpointsforcopynumbervariationusingcigarstringbymatchingsoftclippedreads