Cargando…
Deciphering highly similar multigene family transcripts from Iso-Seq data with IsoCon
A significant portion of genes in vertebrate genomes belongs to multigene families, with each family containing several gene copies whose presence/absence, as well as isoform structure, can be highly variable across individuals. Existing de novo techniques for assaying the sequences of such highly-s...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6214943/ https://www.ncbi.nlm.nih.gov/pubmed/30389934 http://dx.doi.org/10.1038/s41467-018-06910-x |
_version_ | 1783368041761865728 |
---|---|
author | Sahlin, Kristoffer Tomaszkiewicz, Marta Makova, Kateryna D. Medvedev, Paul |
author_facet | Sahlin, Kristoffer Tomaszkiewicz, Marta Makova, Kateryna D. Medvedev, Paul |
author_sort | Sahlin, Kristoffer |
collection | PubMed |
description | A significant portion of genes in vertebrate genomes belongs to multigene families, with each family containing several gene copies whose presence/absence, as well as isoform structure, can be highly variable across individuals. Existing de novo techniques for assaying the sequences of such highly-similar gene families fall short of reconstructing end-to-end transcripts with nucleotide-level precision or assigning alternatively spliced transcripts to their respective gene copies. We present IsoCon, a high-precision method using long PacBio Iso-Seq reads to tackle this challenge. We apply IsoCon to nine Y chromosome ampliconic gene families and show that it outperforms existing methods on both experimental and simulated data. IsoCon has allowed us to detect an unprecedented number of novel isoforms and has opened the door for unraveling the structure of many multigene families and gaining a deeper understanding of genome evolution and human diseases. |
format | Online Article Text |
id | pubmed-6214943 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-62149432018-11-05 Deciphering highly similar multigene family transcripts from Iso-Seq data with IsoCon Sahlin, Kristoffer Tomaszkiewicz, Marta Makova, Kateryna D. Medvedev, Paul Nat Commun Article A significant portion of genes in vertebrate genomes belongs to multigene families, with each family containing several gene copies whose presence/absence, as well as isoform structure, can be highly variable across individuals. Existing de novo techniques for assaying the sequences of such highly-similar gene families fall short of reconstructing end-to-end transcripts with nucleotide-level precision or assigning alternatively spliced transcripts to their respective gene copies. We present IsoCon, a high-precision method using long PacBio Iso-Seq reads to tackle this challenge. We apply IsoCon to nine Y chromosome ampliconic gene families and show that it outperforms existing methods on both experimental and simulated data. IsoCon has allowed us to detect an unprecedented number of novel isoforms and has opened the door for unraveling the structure of many multigene families and gaining a deeper understanding of genome evolution and human diseases. Nature Publishing Group UK 2018-11-02 /pmc/articles/PMC6214943/ /pubmed/30389934 http://dx.doi.org/10.1038/s41467-018-06910-x Text en © The Author(s) 2018 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. |
spellingShingle | Article Sahlin, Kristoffer Tomaszkiewicz, Marta Makova, Kateryna D. Medvedev, Paul Deciphering highly similar multigene family transcripts from Iso-Seq data with IsoCon |
title | Deciphering highly similar multigene family transcripts from Iso-Seq data with IsoCon |
title_full | Deciphering highly similar multigene family transcripts from Iso-Seq data with IsoCon |
title_fullStr | Deciphering highly similar multigene family transcripts from Iso-Seq data with IsoCon |
title_full_unstemmed | Deciphering highly similar multigene family transcripts from Iso-Seq data with IsoCon |
title_short | Deciphering highly similar multigene family transcripts from Iso-Seq data with IsoCon |
title_sort | deciphering highly similar multigene family transcripts from iso-seq data with isocon |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6214943/ https://www.ncbi.nlm.nih.gov/pubmed/30389934 http://dx.doi.org/10.1038/s41467-018-06910-x |
work_keys_str_mv | AT sahlinkristoffer decipheringhighlysimilarmultigenefamilytranscriptsfromisoseqdatawithisocon AT tomaszkiewiczmarta decipheringhighlysimilarmultigenefamilytranscriptsfromisoseqdatawithisocon AT makovakaterynad decipheringhighlysimilarmultigenefamilytranscriptsfromisoseqdatawithisocon AT medvedevpaul decipheringhighlysimilarmultigenefamilytranscriptsfromisoseqdatawithisocon |