Cargando…

Gentle Masking of Low-Complexity Sequences Improves Homology Search

Detection of sequences that are homologous, i.e. descended from a common ancestor, is a fundamental task in computational biology. This task is confounded by low-complexity tracts (such as atatatatatat), which arise frequently and independently, causing strong similarities that are not homologies. T...

Descripción completa

Detalles Bibliográficos
Autor principal: Frith, Martin C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3242753/
https://www.ncbi.nlm.nih.gov/pubmed/22205972
http://dx.doi.org/10.1371/journal.pone.0028819
_version_ 1782219644919087104
author Frith, Martin C.
author_facet Frith, Martin C.
author_sort Frith, Martin C.
collection PubMed
description Detection of sequences that are homologous, i.e. descended from a common ancestor, is a fundamental task in computational biology. This task is confounded by low-complexity tracts (such as atatatatatat), which arise frequently and independently, causing strong similarities that are not homologies. There has been much research on identifying low-complexity tracts, but little research on how to treat them during homology search. We propose to find homologies by aligning sequences with “gentle” masking of low-complexity tracts. Gentle masking means that the match score involving a masked letter is [Image: see text], where [Image: see text] is the unmasked score. Gentle masking slightly but noticeably improves the sensitivity of homology search (compared to “harsh” masking), without harming specificity. We show examples in three useful homology search problems: detection of NUMTs (nuclear copies of mitochondrial DNA), recruitment of metagenomic DNA reads to reference genomes, and pseudogene detection. Gentle masking is currently the best way to treat low-complexity tracts during homology search.
format Online
Article
Text
id pubmed-3242753
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-32427532011-12-28 Gentle Masking of Low-Complexity Sequences Improves Homology Search Frith, Martin C. PLoS One Research Article Detection of sequences that are homologous, i.e. descended from a common ancestor, is a fundamental task in computational biology. This task is confounded by low-complexity tracts (such as atatatatatat), which arise frequently and independently, causing strong similarities that are not homologies. There has been much research on identifying low-complexity tracts, but little research on how to treat them during homology search. We propose to find homologies by aligning sequences with “gentle” masking of low-complexity tracts. Gentle masking means that the match score involving a masked letter is [Image: see text], where [Image: see text] is the unmasked score. Gentle masking slightly but noticeably improves the sensitivity of homology search (compared to “harsh” masking), without harming specificity. We show examples in three useful homology search problems: detection of NUMTs (nuclear copies of mitochondrial DNA), recruitment of metagenomic DNA reads to reference genomes, and pseudogene detection. Gentle masking is currently the best way to treat low-complexity tracts during homology search. Public Library of Science 2011-12-19 /pmc/articles/PMC3242753/ /pubmed/22205972 http://dx.doi.org/10.1371/journal.pone.0028819 Text en Martin C. Frith. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Frith, Martin C.
Gentle Masking of Low-Complexity Sequences Improves Homology Search
title Gentle Masking of Low-Complexity Sequences Improves Homology Search
title_full Gentle Masking of Low-Complexity Sequences Improves Homology Search
title_fullStr Gentle Masking of Low-Complexity Sequences Improves Homology Search
title_full_unstemmed Gentle Masking of Low-Complexity Sequences Improves Homology Search
title_short Gentle Masking of Low-Complexity Sequences Improves Homology Search
title_sort gentle masking of low-complexity sequences improves homology search
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3242753/
https://www.ncbi.nlm.nih.gov/pubmed/22205972
http://dx.doi.org/10.1371/journal.pone.0028819
work_keys_str_mv AT frithmartinc gentlemaskingoflowcomplexitysequencesimproveshomologysearch