Cargando…

FRAGS: estimation of coding sequence substitution rates from fragmentary data

BACKGROUND: Rates of substitution in protein-coding sequences can provide important insights into evolutionary processes that are of biomedical and theoretical interest. Increased availability of coding sequence data has enabled researchers to estimate more accurately the coding sequence divergence...

Descripción completa

Detalles Bibliográficos
Autores principales: Swart, Estienne C, Hide, Winston A, Seoighe, Cathal
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2004
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC344743/
https://www.ncbi.nlm.nih.gov/pubmed/15005802
http://dx.doi.org/10.1186/1471-2105-5-8
_version_ 1782121238546612224
author Swart, Estienne C
Hide, Winston A
Seoighe, Cathal
author_facet Swart, Estienne C
Hide, Winston A
Seoighe, Cathal
author_sort Swart, Estienne C
collection PubMed
description BACKGROUND: Rates of substitution in protein-coding sequences can provide important insights into evolutionary processes that are of biomedical and theoretical interest. Increased availability of coding sequence data has enabled researchers to estimate more accurately the coding sequence divergence of pairs of organisms. However the use of different data sources, alignment protocols and methods to estimate substitution rates leads to widely varying estimates of key parameters that define the coding sequence divergence of orthologous genes. Although complete genome sequence data are not available for all organisms, fragmentary sequence data can provide accurate estimates of substitution rates provided that an appropriate and consistent methodology is used and that differences in the estimates obtainable from different data sources are taken into account. RESULTS: We have developed FRAGS, an application framework that uses existing, freely available software components to construct in-frame alignments and estimate coding substitution rates from fragmentary sequence data. Coding sequence substitution estimates for human and chimpanzee sequences, generated by FRAGS, reveal that methodological differences can give rise to significantly different estimates of important substitution parameters. The estimated substitution rates were also used to infer upper-bounds on the amount of sequencing error in the datasets that we have analysed. CONCLUSION: We have developed a system that performs robust estimation of substitution rates for orthologous sequences from a pair of organisms. Our system can be used when fragmentary genomic or transcript data is available from one of the organisms and the other is a completely sequenced genome within the Ensembl database. As well as estimating substitution statistics our system enables the user to manage and query alignment and substitution data.
format Text
id pubmed-344743
institution National Center for Biotechnology Information
language English
publishDate 2004
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-3447432004-02-25 FRAGS: estimation of coding sequence substitution rates from fragmentary data Swart, Estienne C Hide, Winston A Seoighe, Cathal BMC Bioinformatics Software BACKGROUND: Rates of substitution in protein-coding sequences can provide important insights into evolutionary processes that are of biomedical and theoretical interest. Increased availability of coding sequence data has enabled researchers to estimate more accurately the coding sequence divergence of pairs of organisms. However the use of different data sources, alignment protocols and methods to estimate substitution rates leads to widely varying estimates of key parameters that define the coding sequence divergence of orthologous genes. Although complete genome sequence data are not available for all organisms, fragmentary sequence data can provide accurate estimates of substitution rates provided that an appropriate and consistent methodology is used and that differences in the estimates obtainable from different data sources are taken into account. RESULTS: We have developed FRAGS, an application framework that uses existing, freely available software components to construct in-frame alignments and estimate coding substitution rates from fragmentary sequence data. Coding sequence substitution estimates for human and chimpanzee sequences, generated by FRAGS, reveal that methodological differences can give rise to significantly different estimates of important substitution parameters. The estimated substitution rates were also used to infer upper-bounds on the amount of sequencing error in the datasets that we have analysed. CONCLUSION: We have developed a system that performs robust estimation of substitution rates for orthologous sequences from a pair of organisms. Our system can be used when fragmentary genomic or transcript data is available from one of the organisms and the other is a completely sequenced genome within the Ensembl database. As well as estimating substitution statistics our system enables the user to manage and query alignment and substitution data. BioMed Central 2004-01-29 /pmc/articles/PMC344743/ /pubmed/15005802 http://dx.doi.org/10.1186/1471-2105-5-8 Text en Copyright © 2004 Swart et al; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.
spellingShingle Software
Swart, Estienne C
Hide, Winston A
Seoighe, Cathal
FRAGS: estimation of coding sequence substitution rates from fragmentary data
title FRAGS: estimation of coding sequence substitution rates from fragmentary data
title_full FRAGS: estimation of coding sequence substitution rates from fragmentary data
title_fullStr FRAGS: estimation of coding sequence substitution rates from fragmentary data
title_full_unstemmed FRAGS: estimation of coding sequence substitution rates from fragmentary data
title_short FRAGS: estimation of coding sequence substitution rates from fragmentary data
title_sort frags: estimation of coding sequence substitution rates from fragmentary data
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC344743/
https://www.ncbi.nlm.nih.gov/pubmed/15005802
http://dx.doi.org/10.1186/1471-2105-5-8
work_keys_str_mv AT swartestiennec fragsestimationofcodingsequencesubstitutionratesfromfragmentarydata
AT hidewinstona fragsestimationofcodingsequencesubstitutionratesfromfragmentarydata
AT seoighecathal fragsestimationofcodingsequencesubstitutionratesfromfragmentarydata