Cargando…
Optimal reference sequence selection for genome assembly using minimum description length principle
Reference assisted assembly requires the use of a reference sequence, as a model, to assist in the assembly of the novel genome. The standard method for identifying the best reference sequence for the assembly of a novel genome aims at counting the number of reads that align to the reference sequenc...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3608252/ https://www.ncbi.nlm.nih.gov/pubmed/23186305 http://dx.doi.org/10.1186/1687-4153-2012-18 |
_version_ | 1782264209577345024 |
---|---|
author | Wajid, Bilal Serpedin, Erchin Nounou, Mohamed Nounou, Hazem |
author_facet | Wajid, Bilal Serpedin, Erchin Nounou, Mohamed Nounou, Hazem |
author_sort | Wajid, Bilal |
collection | PubMed |
description | Reference assisted assembly requires the use of a reference sequence, as a model, to assist in the assembly of the novel genome. The standard method for identifying the best reference sequence for the assembly of a novel genome aims at counting the number of reads that align to the reference sequence, and then choosing the reference sequence which has the highest number of reads aligning to it. This article explores the use of minimum description length (MDL) principle and its two variants, the two-part MDL and Sophisticated MDL, in identifying the optimal reference sequence for genome assembly. The article compares the MDL based proposed scheme with the standard method coming to the conclusion that “counting the number of reads of the novel genome present in the reference sequence” is not a sufficient condition. Therefore, the proposed MDL scheme includes within itself the standard method of “counting the number of reads that align to the reference sequence” and also moves forward towards looking at the model, the reference sequence, as well, in identifying the optimal reference sequence. The proposed MDL based scheme not only becomes the sufficient criterion for identifying the optimal reference sequence for genome assembly but also improves the reference sequence so that it becomes more suitable for the assembly of the novel genome. |
format | Online Article Text |
id | pubmed-3608252 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-36082522013-03-29 Optimal reference sequence selection for genome assembly using minimum description length principle Wajid, Bilal Serpedin, Erchin Nounou, Mohamed Nounou, Hazem EURASIP J Bioinform Syst Biol Research Reference assisted assembly requires the use of a reference sequence, as a model, to assist in the assembly of the novel genome. The standard method for identifying the best reference sequence for the assembly of a novel genome aims at counting the number of reads that align to the reference sequence, and then choosing the reference sequence which has the highest number of reads aligning to it. This article explores the use of minimum description length (MDL) principle and its two variants, the two-part MDL and Sophisticated MDL, in identifying the optimal reference sequence for genome assembly. The article compares the MDL based proposed scheme with the standard method coming to the conclusion that “counting the number of reads of the novel genome present in the reference sequence” is not a sufficient condition. Therefore, the proposed MDL scheme includes within itself the standard method of “counting the number of reads that align to the reference sequence” and also moves forward towards looking at the model, the reference sequence, as well, in identifying the optimal reference sequence. The proposed MDL based scheme not only becomes the sufficient criterion for identifying the optimal reference sequence for genome assembly but also improves the reference sequence so that it becomes more suitable for the assembly of the novel genome. BioMed Central 2012 2012-11-27 /pmc/articles/PMC3608252/ /pubmed/23186305 http://dx.doi.org/10.1186/1687-4153-2012-18 Text en Copyright ©2012 Wajid et al.; licensee Springer. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Wajid, Bilal Serpedin, Erchin Nounou, Mohamed Nounou, Hazem Optimal reference sequence selection for genome assembly using minimum description length principle |
title | Optimal reference sequence selection for genome assembly using minimum description length principle |
title_full | Optimal reference sequence selection for genome assembly using minimum description length principle |
title_fullStr | Optimal reference sequence selection for genome assembly using minimum description length principle |
title_full_unstemmed | Optimal reference sequence selection for genome assembly using minimum description length principle |
title_short | Optimal reference sequence selection for genome assembly using minimum description length principle |
title_sort | optimal reference sequence selection for genome assembly using minimum description length principle |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3608252/ https://www.ncbi.nlm.nih.gov/pubmed/23186305 http://dx.doi.org/10.1186/1687-4153-2012-18 |
work_keys_str_mv | AT wajidbilal optimalreferencesequenceselectionforgenomeassemblyusingminimumdescriptionlengthprinciple AT serpedinerchin optimalreferencesequenceselectionforgenomeassemblyusingminimumdescriptionlengthprinciple AT nounoumohamed optimalreferencesequenceselectionforgenomeassemblyusingminimumdescriptionlengthprinciple AT nounouhazem optimalreferencesequenceselectionforgenomeassemblyusingminimumdescriptionlengthprinciple |