Cargando…

Improving PacBio Long Read Accuracy by Short Read Alignment

The recent development of third generation sequencing (TGS) generates much longer reads than second generation sequencing (SGS) and thus provides a chance to solve problems that are difficult to study through SGS alone. However, higher raw read error rates are an intrinsic drawback in most TGS techn...

Descripción completa

Detalles Bibliográficos
Autores principales: Au, Kin Fai, Underwood, Jason G., Lee, Lawrence, Wong, Wing Hung
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3464235/
https://www.ncbi.nlm.nih.gov/pubmed/23056399
http://dx.doi.org/10.1371/journal.pone.0046679
_version_ 1782245390195621888
author Au, Kin Fai
Underwood, Jason G.
Lee, Lawrence
Wong, Wing Hung
author_facet Au, Kin Fai
Underwood, Jason G.
Lee, Lawrence
Wong, Wing Hung
author_sort Au, Kin Fai
collection PubMed
description The recent development of third generation sequencing (TGS) generates much longer reads than second generation sequencing (SGS) and thus provides a chance to solve problems that are difficult to study through SGS alone. However, higher raw read error rates are an intrinsic drawback in most TGS technologies. Here we present a computational method, LSC, to perform error correction of TGS long reads (LR) by SGS short reads (SR). Aiming to reduce the error rate in homopolymer runs in the main TGS platform, the PacBio® RS, LSC applies a homopolymer compression (HC) transformation strategy to increase the sensitivity of SR-LR alignment without scarifying alignment accuracy. We applied LSC to 100,000 PacBio long reads from human brain cerebellum RNA-seq data and 64 million single-end 75 bp reads from human brain RNA-seq data. The results show LSC can correct PacBio long reads to reduce the error rate by more than 3 folds. The improved accuracy greatly benefits many downstream analyses, such as directional gene isoform detection in RNA-seq study. Compared with another hybrid correction tool, LSC can achieve over double the sensitivity and similar specificity.
format Online
Article
Text
id pubmed-3464235
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-34642352012-10-10 Improving PacBio Long Read Accuracy by Short Read Alignment Au, Kin Fai Underwood, Jason G. Lee, Lawrence Wong, Wing Hung PLoS One Research Article The recent development of third generation sequencing (TGS) generates much longer reads than second generation sequencing (SGS) and thus provides a chance to solve problems that are difficult to study through SGS alone. However, higher raw read error rates are an intrinsic drawback in most TGS technologies. Here we present a computational method, LSC, to perform error correction of TGS long reads (LR) by SGS short reads (SR). Aiming to reduce the error rate in homopolymer runs in the main TGS platform, the PacBio® RS, LSC applies a homopolymer compression (HC) transformation strategy to increase the sensitivity of SR-LR alignment without scarifying alignment accuracy. We applied LSC to 100,000 PacBio long reads from human brain cerebellum RNA-seq data and 64 million single-end 75 bp reads from human brain RNA-seq data. The results show LSC can correct PacBio long reads to reduce the error rate by more than 3 folds. The improved accuracy greatly benefits many downstream analyses, such as directional gene isoform detection in RNA-seq study. Compared with another hybrid correction tool, LSC can achieve over double the sensitivity and similar specificity. Public Library of Science 2012-10-04 /pmc/articles/PMC3464235/ /pubmed/23056399 http://dx.doi.org/10.1371/journal.pone.0046679 Text en © 2012 Au et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Au, Kin Fai
Underwood, Jason G.
Lee, Lawrence
Wong, Wing Hung
Improving PacBio Long Read Accuracy by Short Read Alignment
title Improving PacBio Long Read Accuracy by Short Read Alignment
title_full Improving PacBio Long Read Accuracy by Short Read Alignment
title_fullStr Improving PacBio Long Read Accuracy by Short Read Alignment
title_full_unstemmed Improving PacBio Long Read Accuracy by Short Read Alignment
title_short Improving PacBio Long Read Accuracy by Short Read Alignment
title_sort improving pacbio long read accuracy by short read alignment
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3464235/
https://www.ncbi.nlm.nih.gov/pubmed/23056399
http://dx.doi.org/10.1371/journal.pone.0046679
work_keys_str_mv AT aukinfai improvingpacbiolongreadaccuracybyshortreadalignment
AT underwoodjasong improvingpacbiolongreadaccuracybyshortreadalignment
AT leelawrence improvingpacbiolongreadaccuracybyshortreadalignment
AT wongwinghung improvingpacbiolongreadaccuracybyshortreadalignment