Cargando…

Probably Correct: Rescuing Repeats with Short and Long Reads

Ever since the introduction of high-throughput sequencing following the human genome project, assembling short reads into a reference of sufficient quality posed a significant problem as a large portion of the human genome—estimated 50–69%—is repetitive. As a result, a sizable proportion of sequenci...

Descripción completa

Detalles Bibliográficos
Autor principal:	Cechova, Monika
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2020
Materias:	Review
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7823596/ https://www.ncbi.nlm.nih.gov/pubmed/33396198 http://dx.doi.org/10.3390/genes12010048

_version_	1783639873258782720
author	Cechova, Monika
author_facet	Cechova, Monika
author_sort	Cechova, Monika
collection	PubMed
description	Ever since the introduction of high-throughput sequencing following the human genome project, assembling short reads into a reference of sufficient quality posed a significant problem as a large portion of the human genome—estimated 50–69%—is repetitive. As a result, a sizable proportion of sequencing reads is multi-mapping, i.e., without a unique placement in the genome. The two key parameters for whether or not a read is multi-mapping are the read length and genome complexity. Long reads are now able to span difficult, heterochromatic regions, including full centromeres, and characterize chromosomes from “telomere to telomere”. Moreover, identical reads or repeat arrays can be differentiated based on their epigenetic marks, such as methylation patterns, aiding in the assembly process. This is despite the fact that long reads still contain a modest percentage of sequencing errors, disorienting the aligners and assemblers both in accuracy and speed. Here, I review the proposed and implemented solutions to the repeat resolution and the multi-mapping read problem, as well as the downstream consequences of reference choice, repeat masking, and proper representation of sex chromosomes. I also consider the forthcoming challenges and solutions with regards to long reads, where we expect the shift from the problem of repeat localization within a single individual to the problem of repeat positioning within pangenomes.
format	Online Article Text
id	pubmed-7823596
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-78235962021-01-24 Probably Correct: Rescuing Repeats with Short and Long Reads Cechova, Monika Genes (Basel) Review Ever since the introduction of high-throughput sequencing following the human genome project, assembling short reads into a reference of sufficient quality posed a significant problem as a large portion of the human genome—estimated 50–69%—is repetitive. As a result, a sizable proportion of sequencing reads is multi-mapping, i.e., without a unique placement in the genome. The two key parameters for whether or not a read is multi-mapping are the read length and genome complexity. Long reads are now able to span difficult, heterochromatic regions, including full centromeres, and characterize chromosomes from “telomere to telomere”. Moreover, identical reads or repeat arrays can be differentiated based on their epigenetic marks, such as methylation patterns, aiding in the assembly process. This is despite the fact that long reads still contain a modest percentage of sequencing errors, disorienting the aligners and assemblers both in accuracy and speed. Here, I review the proposed and implemented solutions to the repeat resolution and the multi-mapping read problem, as well as the downstream consequences of reference choice, repeat masking, and proper representation of sex chromosomes. I also consider the forthcoming challenges and solutions with regards to long reads, where we expect the shift from the problem of repeat localization within a single individual to the problem of repeat positioning within pangenomes. MDPI 2020-12-31 /pmc/articles/PMC7823596/ /pubmed/33396198 http://dx.doi.org/10.3390/genes12010048 Text en © 2020 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Review Cechova, Monika Probably Correct: Rescuing Repeats with Short and Long Reads
title	Probably Correct: Rescuing Repeats with Short and Long Reads
title_full	Probably Correct: Rescuing Repeats with Short and Long Reads
title_fullStr	Probably Correct: Rescuing Repeats with Short and Long Reads
title_full_unstemmed	Probably Correct: Rescuing Repeats with Short and Long Reads
title_short	Probably Correct: Rescuing Repeats with Short and Long Reads
title_sort	probably correct: rescuing repeats with short and long reads
topic	Review
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7823596/ https://www.ncbi.nlm.nih.gov/pubmed/33396198 http://dx.doi.org/10.3390/genes12010048
work_keys_str_mv	AT cechovamonika probablycorrectrescuingrepeatswithshortandlongreads

Probably Correct: Rescuing Repeats with Short and Long Reads

Ejemplares similares