Cargando…

A bioinformatic platform to integrate target capture and whole genome sequences of various read depths for phylogenomics

The increasing availability of short‐read whole genome sequencing (WGS) provides unprecedented opportunities to study ecological and evolutionary processes. Although loci of interest can be extracted from WGS data and combined with target sequence data, this requires suitable bioinformatic workflows...

Descripción completa

Detalles Bibliográficos
Autores principales: G. Ribeiro, Pedro, Torres Jiménez, María Fernanda, Andermann, Tobias, Antonelli, Alexandre, Bacon, Christine D., Matos‐Maraví, Pável
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley and Sons Inc. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9298010/
https://www.ncbi.nlm.nih.gov/pubmed/34674330
http://dx.doi.org/10.1111/mec.16240
_version_ 1784750604169510912
author G. Ribeiro, Pedro
Torres Jiménez, María Fernanda
Andermann, Tobias
Antonelli, Alexandre
Bacon, Christine D.
Matos‐Maraví, Pável
author_facet G. Ribeiro, Pedro
Torres Jiménez, María Fernanda
Andermann, Tobias
Antonelli, Alexandre
Bacon, Christine D.
Matos‐Maraví, Pável
author_sort G. Ribeiro, Pedro
collection PubMed
description The increasing availability of short‐read whole genome sequencing (WGS) provides unprecedented opportunities to study ecological and evolutionary processes. Although loci of interest can be extracted from WGS data and combined with target sequence data, this requires suitable bioinformatic workflows. Here, we test different assembly and locus extraction strategies and implement them into secapr, a pipeline that processes short‐read data into multilocus alignments for phylogenetics and molecular ecology analyses. We integrate the processing of data from low‐coverage WGS (<30×) and target sequence capture into a flexible framework, while optimizing de novo contig assembly and loci extraction. Specifically, we test different assembly strategies by contrasting their ability to recover loci from targeted butterfly protein‐coding genes, using four data sets: a WGS data set across different average coverages (10×, 5× and 2×) and a data set for which these loci were enriched prior to sequencing via target sequence capture. Using the resulting de novo contigs, we account for potential errors within contigs and infer phylogenetic trees to evaluate the ability of each assembly strategy to recover species relationships. We demonstrate that choosing multiple sizes of kmer simultaneously for assembly results in the highest yield of extracted loci from de novo assembled contigs, while data sets derived from sequencing read depths as low as 5× recovers the expected species relationships in phylogenetic trees. By making the tested assembly approaches available in the secapr pipeline, we hope to inspire future studies to incorporate complementary data and make an informed choice on the optimal assembly strategy.
format Online
Article
Text
id pubmed-9298010
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher John Wiley and Sons Inc.
record_format MEDLINE/PubMed
spelling pubmed-92980102022-07-21 A bioinformatic platform to integrate target capture and whole genome sequences of various read depths for phylogenomics G. Ribeiro, Pedro Torres Jiménez, María Fernanda Andermann, Tobias Antonelli, Alexandre Bacon, Christine D. Matos‐Maraví, Pável Mol Ecol Methodological Approaches and Advances for Wgs The increasing availability of short‐read whole genome sequencing (WGS) provides unprecedented opportunities to study ecological and evolutionary processes. Although loci of interest can be extracted from WGS data and combined with target sequence data, this requires suitable bioinformatic workflows. Here, we test different assembly and locus extraction strategies and implement them into secapr, a pipeline that processes short‐read data into multilocus alignments for phylogenetics and molecular ecology analyses. We integrate the processing of data from low‐coverage WGS (<30×) and target sequence capture into a flexible framework, while optimizing de novo contig assembly and loci extraction. Specifically, we test different assembly strategies by contrasting their ability to recover loci from targeted butterfly protein‐coding genes, using four data sets: a WGS data set across different average coverages (10×, 5× and 2×) and a data set for which these loci were enriched prior to sequencing via target sequence capture. Using the resulting de novo contigs, we account for potential errors within contigs and infer phylogenetic trees to evaluate the ability of each assembly strategy to recover species relationships. We demonstrate that choosing multiple sizes of kmer simultaneously for assembly results in the highest yield of extracted loci from de novo assembled contigs, while data sets derived from sequencing read depths as low as 5× recovers the expected species relationships in phylogenetic trees. By making the tested assembly approaches available in the secapr pipeline, we hope to inspire future studies to incorporate complementary data and make an informed choice on the optimal assembly strategy. John Wiley and Sons Inc. 2021-10-31 2021-12 /pmc/articles/PMC9298010/ /pubmed/34674330 http://dx.doi.org/10.1111/mec.16240 Text en © 2021 The Authors. Molecular Ecology published by John Wiley & Sons Ltd. https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the terms of the http://creativecommons.org/licenses/by-nc-nd/4.0/ (https://creativecommons.org/licenses/by-nc-nd/4.0/) License, which permits use and distribution in any medium, provided the original work is properly cited, the use is non‐commercial and no modifications or adaptations are made.
spellingShingle Methodological Approaches and Advances for Wgs
G. Ribeiro, Pedro
Torres Jiménez, María Fernanda
Andermann, Tobias
Antonelli, Alexandre
Bacon, Christine D.
Matos‐Maraví, Pável
A bioinformatic platform to integrate target capture and whole genome sequences of various read depths for phylogenomics
title A bioinformatic platform to integrate target capture and whole genome sequences of various read depths for phylogenomics
title_full A bioinformatic platform to integrate target capture and whole genome sequences of various read depths for phylogenomics
title_fullStr A bioinformatic platform to integrate target capture and whole genome sequences of various read depths for phylogenomics
title_full_unstemmed A bioinformatic platform to integrate target capture and whole genome sequences of various read depths for phylogenomics
title_short A bioinformatic platform to integrate target capture and whole genome sequences of various read depths for phylogenomics
title_sort bioinformatic platform to integrate target capture and whole genome sequences of various read depths for phylogenomics
topic Methodological Approaches and Advances for Wgs
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9298010/
https://www.ncbi.nlm.nih.gov/pubmed/34674330
http://dx.doi.org/10.1111/mec.16240
work_keys_str_mv AT gribeiropedro abioinformaticplatformtointegratetargetcaptureandwholegenomesequencesofvariousreaddepthsforphylogenomics
AT torresjimenezmariafernanda abioinformaticplatformtointegratetargetcaptureandwholegenomesequencesofvariousreaddepthsforphylogenomics
AT andermanntobias abioinformaticplatformtointegratetargetcaptureandwholegenomesequencesofvariousreaddepthsforphylogenomics
AT antonellialexandre abioinformaticplatformtointegratetargetcaptureandwholegenomesequencesofvariousreaddepthsforphylogenomics
AT baconchristined abioinformaticplatformtointegratetargetcaptureandwholegenomesequencesofvariousreaddepthsforphylogenomics
AT matosmaravipavel abioinformaticplatformtointegratetargetcaptureandwholegenomesequencesofvariousreaddepthsforphylogenomics
AT gribeiropedro bioinformaticplatformtointegratetargetcaptureandwholegenomesequencesofvariousreaddepthsforphylogenomics
AT torresjimenezmariafernanda bioinformaticplatformtointegratetargetcaptureandwholegenomesequencesofvariousreaddepthsforphylogenomics
AT andermanntobias bioinformaticplatformtointegratetargetcaptureandwholegenomesequencesofvariousreaddepthsforphylogenomics
AT antonellialexandre bioinformaticplatformtointegratetargetcaptureandwholegenomesequencesofvariousreaddepthsforphylogenomics
AT baconchristined bioinformaticplatformtointegratetargetcaptureandwholegenomesequencesofvariousreaddepthsforphylogenomics
AT matosmaravipavel bioinformaticplatformtointegratetargetcaptureandwholegenomesequencesofvariousreaddepthsforphylogenomics