Cargando…

Identification and characterization of occult human-specific LINE-1 insertions using long-read sequencing technology

Long Interspersed Element-1 (LINE-1) retrotransposition contributes to inter- and intra-individual genetic variation and occasionally can lead to human genetic disorders. Various strategies have been developed to identify human-specific LINE-1 (L1Hs) insertions from short-read whole genome sequencin...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhou, Weichen, Emery, Sarah B, Flasch, Diane A, Wang, Yifan, Kwan, Kenneth Y, Kidd, Jeffrey M, Moran, John V, Mills, Ryan E
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7026601/
https://www.ncbi.nlm.nih.gov/pubmed/31853540
http://dx.doi.org/10.1093/nar/gkz1173
_version_ 1783498714623508480
author Zhou, Weichen
Emery, Sarah B
Flasch, Diane A
Wang, Yifan
Kwan, Kenneth Y
Kidd, Jeffrey M
Moran, John V
Mills, Ryan E
author_facet Zhou, Weichen
Emery, Sarah B
Flasch, Diane A
Wang, Yifan
Kwan, Kenneth Y
Kidd, Jeffrey M
Moran, John V
Mills, Ryan E
author_sort Zhou, Weichen
collection PubMed
description Long Interspersed Element-1 (LINE-1) retrotransposition contributes to inter- and intra-individual genetic variation and occasionally can lead to human genetic disorders. Various strategies have been developed to identify human-specific LINE-1 (L1Hs) insertions from short-read whole genome sequencing (WGS) data; however, they have limitations in detecting insertions in complex repetitive genomic regions. Here, we developed a computational tool (PALMER) and used it to identify 203 non-reference L1Hs insertions in the NA12878 benchmark genome. Using PacBio long-read sequencing data, we identified L1Hs insertions that were absent in previous short-read studies (90/203). Approximately 81% (73/90) of the L1Hs insertions reside within endogenous LINE-1 sequences in the reference assembly and the analysis of unique breakpoint junction sequences revealed 63% (57/90) of these L1Hs insertions could be genotyped in 1000 Genomes Project sequences. Moreover, we observed that amplification biases encountered in single-cell WGS experiments led to a wide variation in L1Hs insertion detection rates between four individual NA12878 cells; under-amplification limited detection to 32% (65/203) of insertions, whereas over-amplification increased false positive calls. In sum, these data indicate that L1Hs insertions are often missed using standard short-read sequencing approaches and long-read sequencing approaches can significantly improve the detection of L1Hs insertions present in individual genomes.
format Online
Article
Text
id pubmed-7026601
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-70266012020-02-25 Identification and characterization of occult human-specific LINE-1 insertions using long-read sequencing technology Zhou, Weichen Emery, Sarah B Flasch, Diane A Wang, Yifan Kwan, Kenneth Y Kidd, Jeffrey M Moran, John V Mills, Ryan E Nucleic Acids Res Computational Biology Long Interspersed Element-1 (LINE-1) retrotransposition contributes to inter- and intra-individual genetic variation and occasionally can lead to human genetic disorders. Various strategies have been developed to identify human-specific LINE-1 (L1Hs) insertions from short-read whole genome sequencing (WGS) data; however, they have limitations in detecting insertions in complex repetitive genomic regions. Here, we developed a computational tool (PALMER) and used it to identify 203 non-reference L1Hs insertions in the NA12878 benchmark genome. Using PacBio long-read sequencing data, we identified L1Hs insertions that were absent in previous short-read studies (90/203). Approximately 81% (73/90) of the L1Hs insertions reside within endogenous LINE-1 sequences in the reference assembly and the analysis of unique breakpoint junction sequences revealed 63% (57/90) of these L1Hs insertions could be genotyped in 1000 Genomes Project sequences. Moreover, we observed that amplification biases encountered in single-cell WGS experiments led to a wide variation in L1Hs insertion detection rates between four individual NA12878 cells; under-amplification limited detection to 32% (65/203) of insertions, whereas over-amplification increased false positive calls. In sum, these data indicate that L1Hs insertions are often missed using standard short-read sequencing approaches and long-read sequencing approaches can significantly improve the detection of L1Hs insertions present in individual genomes. Oxford University Press 2020-02-20 2019-12-19 /pmc/articles/PMC7026601/ /pubmed/31853540 http://dx.doi.org/10.1093/nar/gkz1173 Text en © The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Computational Biology
Zhou, Weichen
Emery, Sarah B
Flasch, Diane A
Wang, Yifan
Kwan, Kenneth Y
Kidd, Jeffrey M
Moran, John V
Mills, Ryan E
Identification and characterization of occult human-specific LINE-1 insertions using long-read sequencing technology
title Identification and characterization of occult human-specific LINE-1 insertions using long-read sequencing technology
title_full Identification and characterization of occult human-specific LINE-1 insertions using long-read sequencing technology
title_fullStr Identification and characterization of occult human-specific LINE-1 insertions using long-read sequencing technology
title_full_unstemmed Identification and characterization of occult human-specific LINE-1 insertions using long-read sequencing technology
title_short Identification and characterization of occult human-specific LINE-1 insertions using long-read sequencing technology
title_sort identification and characterization of occult human-specific line-1 insertions using long-read sequencing technology
topic Computational Biology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7026601/
https://www.ncbi.nlm.nih.gov/pubmed/31853540
http://dx.doi.org/10.1093/nar/gkz1173
work_keys_str_mv AT zhouweichen identificationandcharacterizationofocculthumanspecificline1insertionsusinglongreadsequencingtechnology
AT emerysarahb identificationandcharacterizationofocculthumanspecificline1insertionsusinglongreadsequencingtechnology
AT flaschdianea identificationandcharacterizationofocculthumanspecificline1insertionsusinglongreadsequencingtechnology
AT wangyifan identificationandcharacterizationofocculthumanspecificline1insertionsusinglongreadsequencingtechnology
AT kwankennethy identificationandcharacterizationofocculthumanspecificline1insertionsusinglongreadsequencingtechnology
AT kiddjeffreym identificationandcharacterizationofocculthumanspecificline1insertionsusinglongreadsequencingtechnology
AT moranjohnv identificationandcharacterizationofocculthumanspecificline1insertionsusinglongreadsequencingtechnology
AT millsryane identificationandcharacterizationofocculthumanspecificline1insertionsusinglongreadsequencingtechnology