Cargando…

The utility of PacBio circular consensus sequencing for characterizing complex gene families in non-model organisms

BACKGROUND: Molecular characterization of highly diverse gene families can be time consuming, expensive, and difficult, especially when considering the potential for relatively large numbers of paralogs and/or pseudogenes. Here we investigate the utility of Pacific Biosciences single molecule real-t...

Descripción completa

Detalles Bibliográficos
Autores principales: Larsen, Peter A, Heilman, Amy M, Yoder, Anne D
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4152597/
https://www.ncbi.nlm.nih.gov/pubmed/25159659
http://dx.doi.org/10.1186/1471-2164-15-720
_version_ 1782333150687395840
author Larsen, Peter A
Heilman, Amy M
Yoder, Anne D
author_facet Larsen, Peter A
Heilman, Amy M
Yoder, Anne D
author_sort Larsen, Peter A
collection PubMed
description BACKGROUND: Molecular characterization of highly diverse gene families can be time consuming, expensive, and difficult, especially when considering the potential for relatively large numbers of paralogs and/or pseudogenes. Here we investigate the utility of Pacific Biosciences single molecule real-time (SMRT) circular consensus sequencing (CCS) as an alternative to traditional cloning and Sanger sequencing PCR amplicons for gene family characterization. We target vomeronasal gene receptors, one of the most diverse gene families in mammals, with the goal of better understanding intra-specific V1R diversity of the gray mouse lemur (Microcebus murinus). Our study compares intragenomic variation for two V1R subfamilies found in the mouse lemur. Specifically, we compare gene copy variation within and between two individuals of M. murinus as characterized by different methods for nucleotide sequencing. By including the same individual animal from which the M. murinus draft genome was derived, we are able to cross-validate gene copy estimates from Sanger sequencing versus CCS methods. RESULTS: We generated 34,088 high quality circular consensus sequences of two diverse V1R subfamilies (here referred to as V1RI and V1RIX) from two individuals of Microcebus murinus. Using a minimum threshold of 7× coverage, we recovered approximately 90% of V1RI sequences previously identified in the draft M. murinus genome (59% being identical at all nucleotide positions). When low coverage sequences were considered (i.e. < 7× coverage) 100% of V1RI sequences identified in the draft genome were recovered. At least 13 putatively novel V1R loci were also identified using CCS technology. CONCLUSIONS: Recent upgrades to the Pacific Biosciences RS instrument have improved the CCS technology and offer an alternative to traditional sequencing approaches. Our results suggest that the Microcebus murinus V1R repertoire has been underestimated in the draft genome. In addition to providing an improved understanding of V1R diversity in the mouse lemur, this study demonstrates the utility of CCS technology for characterizing complex regions of the genome. We anticipate that long-read sequencing technologies such as PacBio SMRT will allow for the assembly of multigene family clusters and serve to more accurately characterize patterns of gene copy variation in large gene families, thus revealing novel micro-evolutionary patterns within non-model organisms. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-720) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4152597
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-41525972014-09-09 The utility of PacBio circular consensus sequencing for characterizing complex gene families in non-model organisms Larsen, Peter A Heilman, Amy M Yoder, Anne D BMC Genomics Research Article BACKGROUND: Molecular characterization of highly diverse gene families can be time consuming, expensive, and difficult, especially when considering the potential for relatively large numbers of paralogs and/or pseudogenes. Here we investigate the utility of Pacific Biosciences single molecule real-time (SMRT) circular consensus sequencing (CCS) as an alternative to traditional cloning and Sanger sequencing PCR amplicons for gene family characterization. We target vomeronasal gene receptors, one of the most diverse gene families in mammals, with the goal of better understanding intra-specific V1R diversity of the gray mouse lemur (Microcebus murinus). Our study compares intragenomic variation for two V1R subfamilies found in the mouse lemur. Specifically, we compare gene copy variation within and between two individuals of M. murinus as characterized by different methods for nucleotide sequencing. By including the same individual animal from which the M. murinus draft genome was derived, we are able to cross-validate gene copy estimates from Sanger sequencing versus CCS methods. RESULTS: We generated 34,088 high quality circular consensus sequences of two diverse V1R subfamilies (here referred to as V1RI and V1RIX) from two individuals of Microcebus murinus. Using a minimum threshold of 7× coverage, we recovered approximately 90% of V1RI sequences previously identified in the draft M. murinus genome (59% being identical at all nucleotide positions). When low coverage sequences were considered (i.e. < 7× coverage) 100% of V1RI sequences identified in the draft genome were recovered. At least 13 putatively novel V1R loci were also identified using CCS technology. CONCLUSIONS: Recent upgrades to the Pacific Biosciences RS instrument have improved the CCS technology and offer an alternative to traditional sequencing approaches. Our results suggest that the Microcebus murinus V1R repertoire has been underestimated in the draft genome. In addition to providing an improved understanding of V1R diversity in the mouse lemur, this study demonstrates the utility of CCS technology for characterizing complex regions of the genome. We anticipate that long-read sequencing technologies such as PacBio SMRT will allow for the assembly of multigene family clusters and serve to more accurately characterize patterns of gene copy variation in large gene families, thus revealing novel micro-evolutionary patterns within non-model organisms. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-720) contains supplementary material, which is available to authorized users. BioMed Central 2014-08-26 /pmc/articles/PMC4152597/ /pubmed/25159659 http://dx.doi.org/10.1186/1471-2164-15-720 Text en © Larsen et al.; licensee BioMed Central Ltd. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Larsen, Peter A
Heilman, Amy M
Yoder, Anne D
The utility of PacBio circular consensus sequencing for characterizing complex gene families in non-model organisms
title The utility of PacBio circular consensus sequencing for characterizing complex gene families in non-model organisms
title_full The utility of PacBio circular consensus sequencing for characterizing complex gene families in non-model organisms
title_fullStr The utility of PacBio circular consensus sequencing for characterizing complex gene families in non-model organisms
title_full_unstemmed The utility of PacBio circular consensus sequencing for characterizing complex gene families in non-model organisms
title_short The utility of PacBio circular consensus sequencing for characterizing complex gene families in non-model organisms
title_sort utility of pacbio circular consensus sequencing for characterizing complex gene families in non-model organisms
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4152597/
https://www.ncbi.nlm.nih.gov/pubmed/25159659
http://dx.doi.org/10.1186/1471-2164-15-720
work_keys_str_mv AT larsenpetera theutilityofpacbiocircularconsensussequencingforcharacterizingcomplexgenefamiliesinnonmodelorganisms
AT heilmanamym theutilityofpacbiocircularconsensussequencingforcharacterizingcomplexgenefamiliesinnonmodelorganisms
AT yoderanned theutilityofpacbiocircularconsensussequencingforcharacterizingcomplexgenefamiliesinnonmodelorganisms
AT larsenpetera utilityofpacbiocircularconsensussequencingforcharacterizingcomplexgenefamiliesinnonmodelorganisms
AT heilmanamym utilityofpacbiocircularconsensussequencingforcharacterizingcomplexgenefamiliesinnonmodelorganisms
AT yoderanned utilityofpacbiocircularconsensussequencingforcharacterizingcomplexgenefamiliesinnonmodelorganisms