Cargando…

Efficient algorithms for analyzing segmental duplications with deletions and inversions in genomes

BACKGROUND: Segmental duplications, or low-copy repeats, are common in mammalian genomes. In the human genome, most segmental duplications are mosaics comprised of multiple duplicated fragments. This complex genomic organization complicates analysis of the evolutionary history of these sequences. On...

Descripción completa

Detalles Bibliográficos
Autores principales: Kahn, Crystal L, Mozes, Shay, Raphael, Benjamin J
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2820476/
https://www.ncbi.nlm.nih.gov/pubmed/20047668
http://dx.doi.org/10.1186/1748-7188-5-11
Descripción
Sumario:BACKGROUND: Segmental duplications, or low-copy repeats, are common in mammalian genomes. In the human genome, most segmental duplications are mosaics comprised of multiple duplicated fragments. This complex genomic organization complicates analysis of the evolutionary history of these sequences. One model proposed to explain this mosaic patterns is a model of repeated aggregation and subsequent duplication of genomic sequences. RESULTS: We describe a polynomial-time exact algorithm to compute duplication distance, a genomic distance defined as the most parsimonious way to build a target string by repeatedly copying substrings of a fixed source string. This distance models the process of repeated aggregation and duplication. We also describe extensions of this distance to include certain types of substring deletions and inversions. Finally, we provide a description of a sequence of duplication events as a context-free grammar (CFG). CONCLUSION: These new genomic distances will permit more biologically realistic analyses of segmental duplications in genomes.