Cargando…

High-resolution species assignment of Anopheles mosquitoes using k-mer distances on targeted sequences

The ANOSPP amplicon panel is a genus-wide targeted sequencing panel to facilitate large-scale monitoring of Anopheles species diversity. Combining information from the 62 nuclear amplicons present in the ANOSPP panel allows for a more senstive and specific species assignment than single gene (e.g. C...

Descripción completa

Detalles Bibliográficos
Autores principales: Boddé, Marilou, Makunin, Alex, Ayala, Diego, Bouafou, Lemonde, Diabaté, Abdoulaye, Ekpo, Uwem Friday, Kientega, Mahamadi, Le Goff, Gilbert, Makanga, Boris K, Ngangue, Marc F, Omitola, Olaitan Olamide, Rahola, Nil, Tripet, Frederic, Durbin, Richard, Lawniczak, Mara KN
Formato: Online Artículo Texto
Lenguaje:English
Publicado: eLife Sciences Publications, Ltd 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9648975/
https://www.ncbi.nlm.nih.gov/pubmed/36222650
http://dx.doi.org/10.7554/eLife.78775
_version_ 1784827698408849408
author Boddé, Marilou
Makunin, Alex
Ayala, Diego
Bouafou, Lemonde
Diabaté, Abdoulaye
Ekpo, Uwem Friday
Kientega, Mahamadi
Le Goff, Gilbert
Makanga, Boris K
Ngangue, Marc F
Omitola, Olaitan Olamide
Rahola, Nil
Tripet, Frederic
Durbin, Richard
Lawniczak, Mara KN
author_facet Boddé, Marilou
Makunin, Alex
Ayala, Diego
Bouafou, Lemonde
Diabaté, Abdoulaye
Ekpo, Uwem Friday
Kientega, Mahamadi
Le Goff, Gilbert
Makanga, Boris K
Ngangue, Marc F
Omitola, Olaitan Olamide
Rahola, Nil
Tripet, Frederic
Durbin, Richard
Lawniczak, Mara KN
author_sort Boddé, Marilou
collection PubMed
description The ANOSPP amplicon panel is a genus-wide targeted sequencing panel to facilitate large-scale monitoring of Anopheles species diversity. Combining information from the 62 nuclear amplicons present in the ANOSPP panel allows for a more senstive and specific species assignment than single gene (e.g. COI) barcoding, which is desirable in the light of permeable species boundaries. Here, we present NNoVAE, a method using Nearest Neighbours (NN) and Variational Autoencoders (VAE), which we apply to k-mers resulting from the ANOSPP amplicon sequences in order to hierarchically assign species identity. The NN step assigns a sample to a species-group by comparing the k-mers arising from each haplotype’s amplicon sequence to a reference database. The VAE step is required to distinguish between closely related species, and also has sufficient resolution to reveal population structure within species. In tests on independent samples with over 80% amplicon coverage, NNoVAE correctly classifies to species level 98% of samples within the An. gambiae complex and 89% of samples outside the complex. We apply NNoVAE to over two thousand new samples from Burkina Faso and Gabon, identifying unexpected species in Gabon. NNoVAE presents an approach that may be of value to other targeted sequencing panels, and is a method that will be used to survey Anopheles species diversity and Plasmodium transmission patterns through space and time on a large scale, with plans to analyse half a million mosquitoes in the next five years.
format Online
Article
Text
id pubmed-9648975
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher eLife Sciences Publications, Ltd
record_format MEDLINE/PubMed
spelling pubmed-96489752022-11-15 High-resolution species assignment of Anopheles mosquitoes using k-mer distances on targeted sequences Boddé, Marilou Makunin, Alex Ayala, Diego Bouafou, Lemonde Diabaté, Abdoulaye Ekpo, Uwem Friday Kientega, Mahamadi Le Goff, Gilbert Makanga, Boris K Ngangue, Marc F Omitola, Olaitan Olamide Rahola, Nil Tripet, Frederic Durbin, Richard Lawniczak, Mara KN eLife Evolutionary Biology The ANOSPP amplicon panel is a genus-wide targeted sequencing panel to facilitate large-scale monitoring of Anopheles species diversity. Combining information from the 62 nuclear amplicons present in the ANOSPP panel allows for a more senstive and specific species assignment than single gene (e.g. COI) barcoding, which is desirable in the light of permeable species boundaries. Here, we present NNoVAE, a method using Nearest Neighbours (NN) and Variational Autoencoders (VAE), which we apply to k-mers resulting from the ANOSPP amplicon sequences in order to hierarchically assign species identity. The NN step assigns a sample to a species-group by comparing the k-mers arising from each haplotype’s amplicon sequence to a reference database. The VAE step is required to distinguish between closely related species, and also has sufficient resolution to reveal population structure within species. In tests on independent samples with over 80% amplicon coverage, NNoVAE correctly classifies to species level 98% of samples within the An. gambiae complex and 89% of samples outside the complex. We apply NNoVAE to over two thousand new samples from Burkina Faso and Gabon, identifying unexpected species in Gabon. NNoVAE presents an approach that may be of value to other targeted sequencing panels, and is a method that will be used to survey Anopheles species diversity and Plasmodium transmission patterns through space and time on a large scale, with plans to analyse half a million mosquitoes in the next five years. eLife Sciences Publications, Ltd 2022-10-12 /pmc/articles/PMC9648975/ /pubmed/36222650 http://dx.doi.org/10.7554/eLife.78775 Text en © 2022, Boddé et al https://creativecommons.org/licenses/by/4.0/This article is distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use and redistribution provided that the original author and source are credited.
spellingShingle Evolutionary Biology
Boddé, Marilou
Makunin, Alex
Ayala, Diego
Bouafou, Lemonde
Diabaté, Abdoulaye
Ekpo, Uwem Friday
Kientega, Mahamadi
Le Goff, Gilbert
Makanga, Boris K
Ngangue, Marc F
Omitola, Olaitan Olamide
Rahola, Nil
Tripet, Frederic
Durbin, Richard
Lawniczak, Mara KN
High-resolution species assignment of Anopheles mosquitoes using k-mer distances on targeted sequences
title High-resolution species assignment of Anopheles mosquitoes using k-mer distances on targeted sequences
title_full High-resolution species assignment of Anopheles mosquitoes using k-mer distances on targeted sequences
title_fullStr High-resolution species assignment of Anopheles mosquitoes using k-mer distances on targeted sequences
title_full_unstemmed High-resolution species assignment of Anopheles mosquitoes using k-mer distances on targeted sequences
title_short High-resolution species assignment of Anopheles mosquitoes using k-mer distances on targeted sequences
title_sort high-resolution species assignment of anopheles mosquitoes using k-mer distances on targeted sequences
topic Evolutionary Biology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9648975/
https://www.ncbi.nlm.nih.gov/pubmed/36222650
http://dx.doi.org/10.7554/eLife.78775
work_keys_str_mv AT boddemarilou highresolutionspeciesassignmentofanophelesmosquitoesusingkmerdistancesontargetedsequences
AT makuninalex highresolutionspeciesassignmentofanophelesmosquitoesusingkmerdistancesontargetedsequences
AT ayaladiego highresolutionspeciesassignmentofanophelesmosquitoesusingkmerdistancesontargetedsequences
AT bouafoulemonde highresolutionspeciesassignmentofanophelesmosquitoesusingkmerdistancesontargetedsequences
AT diabateabdoulaye highresolutionspeciesassignmentofanophelesmosquitoesusingkmerdistancesontargetedsequences
AT ekpouwemfriday highresolutionspeciesassignmentofanophelesmosquitoesusingkmerdistancesontargetedsequences
AT kientegamahamadi highresolutionspeciesassignmentofanophelesmosquitoesusingkmerdistancesontargetedsequences
AT legoffgilbert highresolutionspeciesassignmentofanophelesmosquitoesusingkmerdistancesontargetedsequences
AT makangaborisk highresolutionspeciesassignmentofanophelesmosquitoesusingkmerdistancesontargetedsequences
AT nganguemarcf highresolutionspeciesassignmentofanophelesmosquitoesusingkmerdistancesontargetedsequences
AT omitolaolaitanolamide highresolutionspeciesassignmentofanophelesmosquitoesusingkmerdistancesontargetedsequences
AT raholanil highresolutionspeciesassignmentofanophelesmosquitoesusingkmerdistancesontargetedsequences
AT tripetfrederic highresolutionspeciesassignmentofanophelesmosquitoesusingkmerdistancesontargetedsequences
AT durbinrichard highresolutionspeciesassignmentofanophelesmosquitoesusingkmerdistancesontargetedsequences
AT lawniczakmarakn highresolutionspeciesassignmentofanophelesmosquitoesusingkmerdistancesontargetedsequences