Cargando…

Detecting long tandem duplications in genomic sequences

BACKGROUND: Detecting duplication segments within completely sequenced genomes provides valuable information to address genome evolution and in particular the important question of the emergence of novel functions. The usual approach to gene duplication detection, based on all-pairs protein gene com...

Descripción completa

Detalles Bibliográficos
Autores principales: Audemard, Eric, Schiex, Thomas, Faraut, Thomas
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3464658/
https://www.ncbi.nlm.nih.gov/pubmed/22568762
http://dx.doi.org/10.1186/1471-2105-13-83
_version_ 1782245446858571776
author Audemard, Eric
Schiex, Thomas
Faraut, Thomas
author_facet Audemard, Eric
Schiex, Thomas
Faraut, Thomas
author_sort Audemard, Eric
collection PubMed
description BACKGROUND: Detecting duplication segments within completely sequenced genomes provides valuable information to address genome evolution and in particular the important question of the emergence of novel functions. The usual approach to gene duplication detection, based on all-pairs protein gene comparisons, provides only a restricted view of duplication. RESULTS: In this paper, we introduce ReD Tandem, a software using a flow based chaining algorithm targeted at detecting tandem duplication arrays of moderate to longer length regions, with possibly locally weak similarities, directly at the DNA level. On the A. thaliana genome, using a reference set of tandem duplicated genes built using TAIR,(a) we show that ReD Tandem is able to predict a large fraction of recently duplicated genes (dS < 1) and that it is also able to predict tandem duplications involving non coding elements such as pseudo-genes or RNA genes. CONCLUSIONS: ReD Tandem allows to identify large tandem duplications without any annotation, leading to agnostic identification of tandem duplications. This approach nicely complements the usual protein gene based which ignores duplications involving non coding regions. It is however inherently restricted to relatively recent duplications. By recovering otherwise ignored events, ReD Tandem gives a more comprehensive view of existing evolutionary processes and may also allow to improve existing annotations.
format Online
Article
Text
id pubmed-3464658
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-34646582012-10-05 Detecting long tandem duplications in genomic sequences Audemard, Eric Schiex, Thomas Faraut, Thomas BMC Bioinformatics Methodology Article BACKGROUND: Detecting duplication segments within completely sequenced genomes provides valuable information to address genome evolution and in particular the important question of the emergence of novel functions. The usual approach to gene duplication detection, based on all-pairs protein gene comparisons, provides only a restricted view of duplication. RESULTS: In this paper, we introduce ReD Tandem, a software using a flow based chaining algorithm targeted at detecting tandem duplication arrays of moderate to longer length regions, with possibly locally weak similarities, directly at the DNA level. On the A. thaliana genome, using a reference set of tandem duplicated genes built using TAIR,(a) we show that ReD Tandem is able to predict a large fraction of recently duplicated genes (dS < 1) and that it is also able to predict tandem duplications involving non coding elements such as pseudo-genes or RNA genes. CONCLUSIONS: ReD Tandem allows to identify large tandem duplications without any annotation, leading to agnostic identification of tandem duplications. This approach nicely complements the usual protein gene based which ignores duplications involving non coding regions. It is however inherently restricted to relatively recent duplications. By recovering otherwise ignored events, ReD Tandem gives a more comprehensive view of existing evolutionary processes and may also allow to improve existing annotations. BioMed Central 2012-05-08 /pmc/articles/PMC3464658/ /pubmed/22568762 http://dx.doi.org/10.1186/1471-2105-13-83 Text en Copyright ©2012 Audemard et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Audemard, Eric
Schiex, Thomas
Faraut, Thomas
Detecting long tandem duplications in genomic sequences
title Detecting long tandem duplications in genomic sequences
title_full Detecting long tandem duplications in genomic sequences
title_fullStr Detecting long tandem duplications in genomic sequences
title_full_unstemmed Detecting long tandem duplications in genomic sequences
title_short Detecting long tandem duplications in genomic sequences
title_sort detecting long tandem duplications in genomic sequences
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3464658/
https://www.ncbi.nlm.nih.gov/pubmed/22568762
http://dx.doi.org/10.1186/1471-2105-13-83
work_keys_str_mv AT audemarderic detectinglongtandemduplicationsingenomicsequences
AT schiexthomas detectinglongtandemduplicationsingenomicsequences
AT farautthomas detectinglongtandemduplicationsingenomicsequences