Cargando…
Using paired-end sequences to optimise parameters for alignment of sequence reads against related genomes
BACKGROUND: The advent of cheap high through-put sequencing methods has facilitated low coverage skims of a large number of organisms. To maximise the utility of the sequences, assembly into contigs and then ordering of those contigs is required. Whilst sequences can be assembled into contigs de nov...
Autores principales: | , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2010
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3091654/ https://www.ncbi.nlm.nih.gov/pubmed/20678236 http://dx.doi.org/10.1186/1471-2164-11-458 |
_version_ | 1782203296441696256 |
---|---|
author | Ratnakumar, Abhirami McWilliam, Sean Barris, Wesley Dalrymple, Brian P |
author_facet | Ratnakumar, Abhirami McWilliam, Sean Barris, Wesley Dalrymple, Brian P |
author_sort | Ratnakumar, Abhirami |
collection | PubMed |
description | BACKGROUND: The advent of cheap high through-put sequencing methods has facilitated low coverage skims of a large number of organisms. To maximise the utility of the sequences, assembly into contigs and then ordering of those contigs is required. Whilst sequences can be assembled into contigs de novo, using assembled genomes of closely related organisms as a framework can considerably aid the process. However, the preferred search programs and parameters that will optimise the sensitivity and specificity of the alignments between the sequence reads and the framework genome(s) are not necessarily obvious. Here we demonstrate a process that uses paired-end sequence reads to choose an optimal program and alignment parameters. RESULTS: Unlike two single fragment reads, in paired-end sequence reads, such as BAC-end sequences, the two sequences in the pair have a known positional relationship in the original genome. This provides an additional level of confidence over match scores and e-values in the accuracy of the positional assignment of the reads in the comparative genome. Three commonly used sequence alignment programs: MegaBLAST, Blastz and PatternHunter were used to align a set of ovine BAC-end sequences against the equine genome assembly. A range of different search parameters, with a particular focus on contiguous and discontiguous seeds, were used for each program. The number of reads with a hit and the number of read pairs with hits for the two end sequences in the tail-to-tail paired-end configuration were plotted relative to the theoretical maximum expected curve. Of the programs tested, MegaBLAST with short contiguous seed lengths (word size 8-11) performed best in this particular task. In addition the data also provides estimates of the false positive and false negative rates, which can be used to determine the appropriate values of additional parameters, such as score cut-off, to balance sensitivity and specificity. To determine whether the approach also worked for the alignment of shorter reads, the first 240 bases of each BAC end sequence were also aligned to the equine genome. Again, contiguous MegaBLAST performed the best in optimising the sensitivity and specificity with which sheep BAC end reads map to the equine and bovine genomes. CONCLUSIONS: Paired-end reads, such as BAC-end sequences, provide an efficient mechanism to optimise sequence alignment parameters, for example for comparative genome assemblies, by providing an objective standard to evaluate performance. |
format | Text |
id | pubmed-3091654 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2010 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-30916542011-05-11 Using paired-end sequences to optimise parameters for alignment of sequence reads against related genomes Ratnakumar, Abhirami McWilliam, Sean Barris, Wesley Dalrymple, Brian P BMC Genomics Research Article BACKGROUND: The advent of cheap high through-put sequencing methods has facilitated low coverage skims of a large number of organisms. To maximise the utility of the sequences, assembly into contigs and then ordering of those contigs is required. Whilst sequences can be assembled into contigs de novo, using assembled genomes of closely related organisms as a framework can considerably aid the process. However, the preferred search programs and parameters that will optimise the sensitivity and specificity of the alignments between the sequence reads and the framework genome(s) are not necessarily obvious. Here we demonstrate a process that uses paired-end sequence reads to choose an optimal program and alignment parameters. RESULTS: Unlike two single fragment reads, in paired-end sequence reads, such as BAC-end sequences, the two sequences in the pair have a known positional relationship in the original genome. This provides an additional level of confidence over match scores and e-values in the accuracy of the positional assignment of the reads in the comparative genome. Three commonly used sequence alignment programs: MegaBLAST, Blastz and PatternHunter were used to align a set of ovine BAC-end sequences against the equine genome assembly. A range of different search parameters, with a particular focus on contiguous and discontiguous seeds, were used for each program. The number of reads with a hit and the number of read pairs with hits for the two end sequences in the tail-to-tail paired-end configuration were plotted relative to the theoretical maximum expected curve. Of the programs tested, MegaBLAST with short contiguous seed lengths (word size 8-11) performed best in this particular task. In addition the data also provides estimates of the false positive and false negative rates, which can be used to determine the appropriate values of additional parameters, such as score cut-off, to balance sensitivity and specificity. To determine whether the approach also worked for the alignment of shorter reads, the first 240 bases of each BAC end sequence were also aligned to the equine genome. Again, contiguous MegaBLAST performed the best in optimising the sensitivity and specificity with which sheep BAC end reads map to the equine and bovine genomes. CONCLUSIONS: Paired-end reads, such as BAC-end sequences, provide an efficient mechanism to optimise sequence alignment parameters, for example for comparative genome assemblies, by providing an objective standard to evaluate performance. BioMed Central 2010-08-03 /pmc/articles/PMC3091654/ /pubmed/20678236 http://dx.doi.org/10.1186/1471-2164-11-458 Text en Copyright ©2010 Ratnakumar et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Ratnakumar, Abhirami McWilliam, Sean Barris, Wesley Dalrymple, Brian P Using paired-end sequences to optimise parameters for alignment of sequence reads against related genomes |
title | Using paired-end sequences to optimise parameters for alignment of sequence reads against related genomes |
title_full | Using paired-end sequences to optimise parameters for alignment of sequence reads against related genomes |
title_fullStr | Using paired-end sequences to optimise parameters for alignment of sequence reads against related genomes |
title_full_unstemmed | Using paired-end sequences to optimise parameters for alignment of sequence reads against related genomes |
title_short | Using paired-end sequences to optimise parameters for alignment of sequence reads against related genomes |
title_sort | using paired-end sequences to optimise parameters for alignment of sequence reads against related genomes |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3091654/ https://www.ncbi.nlm.nih.gov/pubmed/20678236 http://dx.doi.org/10.1186/1471-2164-11-458 |
work_keys_str_mv | AT ratnakumarabhirami usingpairedendsequencestooptimiseparametersforalignmentofsequencereadsagainstrelatedgenomes AT mcwilliamsean usingpairedendsequencestooptimiseparametersforalignmentofsequencereadsagainstrelatedgenomes AT barriswesley usingpairedendsequencestooptimiseparametersforalignmentofsequencereadsagainstrelatedgenomes AT dalrymplebrianp usingpairedendsequencestooptimiseparametersforalignmentofsequencereadsagainstrelatedgenomes |