Cargando…

LTC: a novel algorithm to improve the efficiency of contig assembly for physical mapping in complex genomes

BACKGROUND: Physical maps are the substrate of genome sequencing and map-based cloning and their construction relies on the accurate assembly of BAC clones into large contigs that are then anchored to genetic maps with molecular markers. High Information Content Fingerprinting has become the method...

Descripción completa

Detalles Bibliográficos
Autores principales: Frenkel, Zeev, Paux, Etienne, Mester, David, Feuillet, Catherine, Korol, Abraham
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3098104/
https://www.ncbi.nlm.nih.gov/pubmed/21118513
http://dx.doi.org/10.1186/1471-2105-11-584
_version_ 1782203918467465216
author Frenkel, Zeev
Paux, Etienne
Mester, David
Feuillet, Catherine
Korol, Abraham
author_facet Frenkel, Zeev
Paux, Etienne
Mester, David
Feuillet, Catherine
Korol, Abraham
author_sort Frenkel, Zeev
collection PubMed
description BACKGROUND: Physical maps are the substrate of genome sequencing and map-based cloning and their construction relies on the accurate assembly of BAC clones into large contigs that are then anchored to genetic maps with molecular markers. High Information Content Fingerprinting has become the method of choice for large and repetitive genomes such as those of maize, barley, and wheat. However, the high level of repeated DNA present in these genomes requires the application of very stringent criteria to ensure a reliable assembly with the FingerPrinted Contig (FPC) software, which often results in short contig lengths (of 3-5 clones before merging) as well as an unreliable assembly in some difficult regions. Difficulties can originate from a non-linear topological structure of clone overlaps, low power of clone ordering algorithms, and the absence of tools to identify sources of gaps in Minimal Tiling Paths (MTPs). RESULTS: To address these problems, we propose a novel approach that: (i) reduces the rate of false connections and Q-clones by using a new cutoff calculation method; (ii) obtains reliable clusters robust to the exclusion of single clone or clone overlap; (iii) explores the topological contig structure by considering contigs as networks of clones connected by significant overlaps; (iv) performs iterative clone clustering combined with ordering and order verification using re-sampling methods; and (v) uses global optimization methods for clone ordering and Band Map construction. The elements of this new analytical framework called Linear Topological Contig (LTC) were applied on datasets used previously for the construction of the physical map of wheat chromosome 3B with FPC. The performance of LTC vs. FPC was compared also on the simulated BAC libraries based on the known genome sequences for chromosome 1 of rice and chromosome 1 of maize. CONCLUSIONS: The results show that compared to other methods, LTC enables the construction of highly reliable and longer contigs (5-12 clones before merging), the detection of "weak" connections in contigs and their "repair", and the elongation of contigs obtained by other assembly methods.
format Text
id pubmed-3098104
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-30981042011-07-08 LTC: a novel algorithm to improve the efficiency of contig assembly for physical mapping in complex genomes Frenkel, Zeev Paux, Etienne Mester, David Feuillet, Catherine Korol, Abraham BMC Bioinformatics Methodology Article BACKGROUND: Physical maps are the substrate of genome sequencing and map-based cloning and their construction relies on the accurate assembly of BAC clones into large contigs that are then anchored to genetic maps with molecular markers. High Information Content Fingerprinting has become the method of choice for large and repetitive genomes such as those of maize, barley, and wheat. However, the high level of repeated DNA present in these genomes requires the application of very stringent criteria to ensure a reliable assembly with the FingerPrinted Contig (FPC) software, which often results in short contig lengths (of 3-5 clones before merging) as well as an unreliable assembly in some difficult regions. Difficulties can originate from a non-linear topological structure of clone overlaps, low power of clone ordering algorithms, and the absence of tools to identify sources of gaps in Minimal Tiling Paths (MTPs). RESULTS: To address these problems, we propose a novel approach that: (i) reduces the rate of false connections and Q-clones by using a new cutoff calculation method; (ii) obtains reliable clusters robust to the exclusion of single clone or clone overlap; (iii) explores the topological contig structure by considering contigs as networks of clones connected by significant overlaps; (iv) performs iterative clone clustering combined with ordering and order verification using re-sampling methods; and (v) uses global optimization methods for clone ordering and Band Map construction. The elements of this new analytical framework called Linear Topological Contig (LTC) were applied on datasets used previously for the construction of the physical map of wheat chromosome 3B with FPC. The performance of LTC vs. FPC was compared also on the simulated BAC libraries based on the known genome sequences for chromosome 1 of rice and chromosome 1 of maize. CONCLUSIONS: The results show that compared to other methods, LTC enables the construction of highly reliable and longer contigs (5-12 clones before merging), the detection of "weak" connections in contigs and their "repair", and the elongation of contigs obtained by other assembly methods. BioMed Central 2010-11-30 /pmc/articles/PMC3098104/ /pubmed/21118513 http://dx.doi.org/10.1186/1471-2105-11-584 Text en Copyright ©2010 Frenkel et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Frenkel, Zeev
Paux, Etienne
Mester, David
Feuillet, Catherine
Korol, Abraham
LTC: a novel algorithm to improve the efficiency of contig assembly for physical mapping in complex genomes
title LTC: a novel algorithm to improve the efficiency of contig assembly for physical mapping in complex genomes
title_full LTC: a novel algorithm to improve the efficiency of contig assembly for physical mapping in complex genomes
title_fullStr LTC: a novel algorithm to improve the efficiency of contig assembly for physical mapping in complex genomes
title_full_unstemmed LTC: a novel algorithm to improve the efficiency of contig assembly for physical mapping in complex genomes
title_short LTC: a novel algorithm to improve the efficiency of contig assembly for physical mapping in complex genomes
title_sort ltc: a novel algorithm to improve the efficiency of contig assembly for physical mapping in complex genomes
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3098104/
https://www.ncbi.nlm.nih.gov/pubmed/21118513
http://dx.doi.org/10.1186/1471-2105-11-584
work_keys_str_mv AT frenkelzeev ltcanovelalgorithmtoimprovetheefficiencyofcontigassemblyforphysicalmappingincomplexgenomes
AT pauxetienne ltcanovelalgorithmtoimprovetheefficiencyofcontigassemblyforphysicalmappingincomplexgenomes
AT mesterdavid ltcanovelalgorithmtoimprovetheefficiencyofcontigassemblyforphysicalmappingincomplexgenomes
AT feuilletcatherine ltcanovelalgorithmtoimprovetheefficiencyofcontigassemblyforphysicalmappingincomplexgenomes
AT korolabraham ltcanovelalgorithmtoimprovetheefficiencyofcontigassemblyforphysicalmappingincomplexgenomes