Cargando…

Systematic Characteristic Exploration of the Chimeras Generated in Multiple Displacement Amplification through Next Generation Sequencing Data Reanalysis

BACKGROUND: The chimeric sequences produced by phi29 DNA polymerase, which are named as chimeras, influence the performance of the multiple displacement amplification (MDA) and also increase the difficulty of sequence data process. Despite several articles have reported the existence of chimeric seq...

Descripción completa

Detalles Bibliográficos
Autores principales:	Tu, Jing, Guo, Jing, Li, Junji, Gao, Shen, Yao, Bei, Lu, Zuhong
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2015
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4595205/ https://www.ncbi.nlm.nih.gov/pubmed/26440104 http://dx.doi.org/10.1371/journal.pone.0139857

_version_	1782393558427238400
author	Tu, Jing Guo, Jing Li, Junji Gao, Shen Yao, Bei Lu, Zuhong
author_facet	Tu, Jing Guo, Jing Li, Junji Gao, Shen Yao, Bei Lu, Zuhong
author_sort	Tu, Jing
collection	PubMed
description	BACKGROUND: The chimeric sequences produced by phi29 DNA polymerase, which are named as chimeras, influence the performance of the multiple displacement amplification (MDA) and also increase the difficulty of sequence data process. Despite several articles have reported the existence of chimeric sequence, there was only one research focusing on the structure and generation mechanism of chimeras, and it was merely based on hundreds of chimeras found in the sequence data of E. coli genome. METHOD: We finished data mining towards a series of Next Generation Sequencing (NGS) reads which were used for whole genome haplotype assembling in a primary study. We established a bioinformatics pipeline based on subsection alignment strategy to discover all the chimeras inside and achieve their structural visualization. Then, we artificially defined two statistical indexes (the chimeric distance and the overlap length), and their regular abundance distribution helped illustrate of the structural characteristics of the chimeras. Finally we analyzed the relationship between the chimera type and the average insertion size, so that illustrate a method to decrease the proportion of wasted data in the procedure of DNA library construction. RESULTS/CONCLUSION: 131.4 Gb pair-end (PE) sequence data was reanalyzed for the chimeras. Totally, 40,259,438 read pairs (6.19%) with chimerism were discovered among 650,430,811 read pairs. The chimeric sequences are consisted of two or more parts which locate inconsecutively but adjacently on the chromosome. The chimeric distance between the locations of adjacent parts on the chromosome followed an approximate bimodal distribution ranging from 0 to over 5,000 nt, whose peak was at about 250 to 300 nt. The overlap length of adjacent parts followed an approximate Poisson distribution and revealed a peak at 6 nt. Moreover, unmapped chimeras, which were classified as the wasted data, could be reduced by properly increasing the length of the insertion segment size through a linear correlation analysis. SIGNIFICANCE: This study exhibited the profile of the phi29MDA chimeras by tens of millions of chimeric sequences, and helped understand the amplification mechanism of the phi29 DNA polymerase. Our work also illustrated the importance of NGS data reanalysis, not only for the improvement of data utilization efficiency, but also for more potential genomic information.
format	Online Article Text
id	pubmed-4595205
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-45952052015-10-09 Systematic Characteristic Exploration of the Chimeras Generated in Multiple Displacement Amplification through Next Generation Sequencing Data Reanalysis Tu, Jing Guo, Jing Li, Junji Gao, Shen Yao, Bei Lu, Zuhong PLoS One Research Article BACKGROUND: The chimeric sequences produced by phi29 DNA polymerase, which are named as chimeras, influence the performance of the multiple displacement amplification (MDA) and also increase the difficulty of sequence data process. Despite several articles have reported the existence of chimeric sequence, there was only one research focusing on the structure and generation mechanism of chimeras, and it was merely based on hundreds of chimeras found in the sequence data of E. coli genome. METHOD: We finished data mining towards a series of Next Generation Sequencing (NGS) reads which were used for whole genome haplotype assembling in a primary study. We established a bioinformatics pipeline based on subsection alignment strategy to discover all the chimeras inside and achieve their structural visualization. Then, we artificially defined two statistical indexes (the chimeric distance and the overlap length), and their regular abundance distribution helped illustrate of the structural characteristics of the chimeras. Finally we analyzed the relationship between the chimera type and the average insertion size, so that illustrate a method to decrease the proportion of wasted data in the procedure of DNA library construction. RESULTS/CONCLUSION: 131.4 Gb pair-end (PE) sequence data was reanalyzed for the chimeras. Totally, 40,259,438 read pairs (6.19%) with chimerism were discovered among 650,430,811 read pairs. The chimeric sequences are consisted of two or more parts which locate inconsecutively but adjacently on the chromosome. The chimeric distance between the locations of adjacent parts on the chromosome followed an approximate bimodal distribution ranging from 0 to over 5,000 nt, whose peak was at about 250 to 300 nt. The overlap length of adjacent parts followed an approximate Poisson distribution and revealed a peak at 6 nt. Moreover, unmapped chimeras, which were classified as the wasted data, could be reduced by properly increasing the length of the insertion segment size through a linear correlation analysis. SIGNIFICANCE: This study exhibited the profile of the phi29MDA chimeras by tens of millions of chimeric sequences, and helped understand the amplification mechanism of the phi29 DNA polymerase. Our work also illustrated the importance of NGS data reanalysis, not only for the improvement of data utilization efficiency, but also for more potential genomic information. Public Library of Science 2015-10-06 /pmc/articles/PMC4595205/ /pubmed/26440104 http://dx.doi.org/10.1371/journal.pone.0139857 Text en © 2015 Tu et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle	Research Article Tu, Jing Guo, Jing Li, Junji Gao, Shen Yao, Bei Lu, Zuhong Systematic Characteristic Exploration of the Chimeras Generated in Multiple Displacement Amplification through Next Generation Sequencing Data Reanalysis
title	Systematic Characteristic Exploration of the Chimeras Generated in Multiple Displacement Amplification through Next Generation Sequencing Data Reanalysis
title_full	Systematic Characteristic Exploration of the Chimeras Generated in Multiple Displacement Amplification through Next Generation Sequencing Data Reanalysis
title_fullStr	Systematic Characteristic Exploration of the Chimeras Generated in Multiple Displacement Amplification through Next Generation Sequencing Data Reanalysis
title_full_unstemmed	Systematic Characteristic Exploration of the Chimeras Generated in Multiple Displacement Amplification through Next Generation Sequencing Data Reanalysis
title_short	Systematic Characteristic Exploration of the Chimeras Generated in Multiple Displacement Amplification through Next Generation Sequencing Data Reanalysis
title_sort	systematic characteristic exploration of the chimeras generated in multiple displacement amplification through next generation sequencing data reanalysis
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4595205/ https://www.ncbi.nlm.nih.gov/pubmed/26440104 http://dx.doi.org/10.1371/journal.pone.0139857
work_keys_str_mv	AT tujing systematiccharacteristicexplorationofthechimerasgeneratedinmultipledisplacementamplificationthroughnextgenerationsequencingdatareanalysis AT guojing systematiccharacteristicexplorationofthechimerasgeneratedinmultipledisplacementamplificationthroughnextgenerationsequencingdatareanalysis AT lijunji systematiccharacteristicexplorationofthechimerasgeneratedinmultipledisplacementamplificationthroughnextgenerationsequencingdatareanalysis AT gaoshen systematiccharacteristicexplorationofthechimerasgeneratedinmultipledisplacementamplificationthroughnextgenerationsequencingdatareanalysis AT yaobei systematiccharacteristicexplorationofthechimerasgeneratedinmultipledisplacementamplificationthroughnextgenerationsequencingdatareanalysis AT luzuhong systematiccharacteristicexplorationofthechimerasgeneratedinmultipledisplacementamplificationthroughnextgenerationsequencingdatareanalysis

Systematic Characteristic Exploration of the Chimeras Generated in Multiple Displacement Amplification through Next Generation Sequencing Data Reanalysis

Ejemplares similares