Cargando…

Optimal reference sequence selection for genome assembly using minimum description length principle

Reference assisted assembly requires the use of a reference sequence, as a model, to assist in the assembly of the novel genome. The standard method for identifying the best reference sequence for the assembly of a novel genome aims at counting the number of reads that align to the reference sequenc...

Descripción completa

Detalles Bibliográficos
Autores principales: Wajid, Bilal, Serpedin, Erchin, Nounou, Mohamed, Nounou, Hazem
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3608252/
https://www.ncbi.nlm.nih.gov/pubmed/23186305
http://dx.doi.org/10.1186/1687-4153-2012-18
_version_ 1782264209577345024
author Wajid, Bilal
Serpedin, Erchin
Nounou, Mohamed
Nounou, Hazem
author_facet Wajid, Bilal
Serpedin, Erchin
Nounou, Mohamed
Nounou, Hazem
author_sort Wajid, Bilal
collection PubMed
description Reference assisted assembly requires the use of a reference sequence, as a model, to assist in the assembly of the novel genome. The standard method for identifying the best reference sequence for the assembly of a novel genome aims at counting the number of reads that align to the reference sequence, and then choosing the reference sequence which has the highest number of reads aligning to it. This article explores the use of minimum description length (MDL) principle and its two variants, the two-part MDL and Sophisticated MDL, in identifying the optimal reference sequence for genome assembly. The article compares the MDL based proposed scheme with the standard method coming to the conclusion that “counting the number of reads of the novel genome present in the reference sequence” is not a sufficient condition. Therefore, the proposed MDL scheme includes within itself the standard method of “counting the number of reads that align to the reference sequence” and also moves forward towards looking at the model, the reference sequence, as well, in identifying the optimal reference sequence. The proposed MDL based scheme not only becomes the sufficient criterion for identifying the optimal reference sequence for genome assembly but also improves the reference sequence so that it becomes more suitable for the assembly of the novel genome.
format Online
Article
Text
id pubmed-3608252
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-36082522013-03-29 Optimal reference sequence selection for genome assembly using minimum description length principle Wajid, Bilal Serpedin, Erchin Nounou, Mohamed Nounou, Hazem EURASIP J Bioinform Syst Biol Research Reference assisted assembly requires the use of a reference sequence, as a model, to assist in the assembly of the novel genome. The standard method for identifying the best reference sequence for the assembly of a novel genome aims at counting the number of reads that align to the reference sequence, and then choosing the reference sequence which has the highest number of reads aligning to it. This article explores the use of minimum description length (MDL) principle and its two variants, the two-part MDL and Sophisticated MDL, in identifying the optimal reference sequence for genome assembly. The article compares the MDL based proposed scheme with the standard method coming to the conclusion that “counting the number of reads of the novel genome present in the reference sequence” is not a sufficient condition. Therefore, the proposed MDL scheme includes within itself the standard method of “counting the number of reads that align to the reference sequence” and also moves forward towards looking at the model, the reference sequence, as well, in identifying the optimal reference sequence. The proposed MDL based scheme not only becomes the sufficient criterion for identifying the optimal reference sequence for genome assembly but also improves the reference sequence so that it becomes more suitable for the assembly of the novel genome. BioMed Central 2012 2012-11-27 /pmc/articles/PMC3608252/ /pubmed/23186305 http://dx.doi.org/10.1186/1687-4153-2012-18 Text en Copyright ©2012 Wajid et al.; licensee Springer. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Wajid, Bilal
Serpedin, Erchin
Nounou, Mohamed
Nounou, Hazem
Optimal reference sequence selection for genome assembly using minimum description length principle
title Optimal reference sequence selection for genome assembly using minimum description length principle
title_full Optimal reference sequence selection for genome assembly using minimum description length principle
title_fullStr Optimal reference sequence selection for genome assembly using minimum description length principle
title_full_unstemmed Optimal reference sequence selection for genome assembly using minimum description length principle
title_short Optimal reference sequence selection for genome assembly using minimum description length principle
title_sort optimal reference sequence selection for genome assembly using minimum description length principle
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3608252/
https://www.ncbi.nlm.nih.gov/pubmed/23186305
http://dx.doi.org/10.1186/1687-4153-2012-18
work_keys_str_mv AT wajidbilal optimalreferencesequenceselectionforgenomeassemblyusingminimumdescriptionlengthprinciple
AT serpedinerchin optimalreferencesequenceselectionforgenomeassemblyusingminimumdescriptionlengthprinciple
AT nounoumohamed optimalreferencesequenceselectionforgenomeassemblyusingminimumdescriptionlengthprinciple
AT nounouhazem optimalreferencesequenceselectionforgenomeassemblyusingminimumdescriptionlengthprinciple