Cargando…
A crowdsourced set of curated structural variants for the human genome
A high quality benchmark for small variants encompassing 88 to 90% of the reference genome has been developed for seven Genome in a Bottle (GIAB) reference samples. However a reliable benchmark for large indels and structural variants (SVs) is more challenging. In this study, we manually curated 123...
Autores principales: | , , , , , , , , , , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7329145/ https://www.ncbi.nlm.nih.gov/pubmed/32559231 http://dx.doi.org/10.1371/journal.pcbi.1007933 |
_version_ | 1783552859725365248 |
---|---|
author | Chapman, Lesley M. Spies, Noah Pai, Patrick Lim, Chun Shen Carroll, Andrew Narzisi, Giuseppe Watson, Christopher M. Proukakis, Christos Clarke, Wayne E. Nariai, Naoki Dawson, Eric Jones, Garan Blankenberg, Daniel Brueffer, Christian Xiao, Chunlin Kolora, Sree Rohit Raj Alexander, Noah Wolujewicz, Paul Ahmed, Azza E. Smith, Graeme Shehreen, Saadlee Wenger, Aaron M. Salit, Marc Zook, Justin M. |
author_facet | Chapman, Lesley M. Spies, Noah Pai, Patrick Lim, Chun Shen Carroll, Andrew Narzisi, Giuseppe Watson, Christopher M. Proukakis, Christos Clarke, Wayne E. Nariai, Naoki Dawson, Eric Jones, Garan Blankenberg, Daniel Brueffer, Christian Xiao, Chunlin Kolora, Sree Rohit Raj Alexander, Noah Wolujewicz, Paul Ahmed, Azza E. Smith, Graeme Shehreen, Saadlee Wenger, Aaron M. Salit, Marc Zook, Justin M. |
author_sort | Chapman, Lesley M. |
collection | PubMed |
description | A high quality benchmark for small variants encompassing 88 to 90% of the reference genome has been developed for seven Genome in a Bottle (GIAB) reference samples. However a reliable benchmark for large indels and structural variants (SVs) is more challenging. In this study, we manually curated 1235 SVs, which can ultimately be used to evaluate SV callers or train machine learning models. We developed a crowdsourcing app—SVCurator—to help GIAB curators manually review large indels and SVs within the human genome, and report their genotype and size accuracy. SVCurator displays images from short, long, and linked read sequencing data from the GIAB Ashkenazi Jewish Trio son [NIST RM 8391/HG002]. We asked curators to assign labels describing SV type (deletion or insertion), size accuracy, and genotype for 1235 putative insertions and deletions sampled from different size bins between 20 and 892,149 bp. ‘Expert’ curators were 93% concordant with each other, and 37 of the 61 curators had at least 78% concordance with a set of ‘expert’ curators. The curators were least concordant for complex SVs and SVs that had inaccurate breakpoints or size predictions. After filtering events with low concordance among curators, we produced high confidence labels for 935 events. The SVCurator crowdsourced labels were 94.5% concordant with the heuristic-based draft benchmark SV callset from GIAB. We found that curators can successfully evaluate putative SVs when given evidence from multiple sequencing technologies. |
format | Online Article Text |
id | pubmed-7329145 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-73291452020-07-14 A crowdsourced set of curated structural variants for the human genome Chapman, Lesley M. Spies, Noah Pai, Patrick Lim, Chun Shen Carroll, Andrew Narzisi, Giuseppe Watson, Christopher M. Proukakis, Christos Clarke, Wayne E. Nariai, Naoki Dawson, Eric Jones, Garan Blankenberg, Daniel Brueffer, Christian Xiao, Chunlin Kolora, Sree Rohit Raj Alexander, Noah Wolujewicz, Paul Ahmed, Azza E. Smith, Graeme Shehreen, Saadlee Wenger, Aaron M. Salit, Marc Zook, Justin M. PLoS Comput Biol Research Article A high quality benchmark for small variants encompassing 88 to 90% of the reference genome has been developed for seven Genome in a Bottle (GIAB) reference samples. However a reliable benchmark for large indels and structural variants (SVs) is more challenging. In this study, we manually curated 1235 SVs, which can ultimately be used to evaluate SV callers or train machine learning models. We developed a crowdsourcing app—SVCurator—to help GIAB curators manually review large indels and SVs within the human genome, and report their genotype and size accuracy. SVCurator displays images from short, long, and linked read sequencing data from the GIAB Ashkenazi Jewish Trio son [NIST RM 8391/HG002]. We asked curators to assign labels describing SV type (deletion or insertion), size accuracy, and genotype for 1235 putative insertions and deletions sampled from different size bins between 20 and 892,149 bp. ‘Expert’ curators were 93% concordant with each other, and 37 of the 61 curators had at least 78% concordance with a set of ‘expert’ curators. The curators were least concordant for complex SVs and SVs that had inaccurate breakpoints or size predictions. After filtering events with low concordance among curators, we produced high confidence labels for 935 events. The SVCurator crowdsourced labels were 94.5% concordant with the heuristic-based draft benchmark SV callset from GIAB. We found that curators can successfully evaluate putative SVs when given evidence from multiple sequencing technologies. Public Library of Science 2020-06-19 /pmc/articles/PMC7329145/ /pubmed/32559231 http://dx.doi.org/10.1371/journal.pcbi.1007933 Text en https://creativecommons.org/publicdomain/zero/1.0/ This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 (https://creativecommons.org/publicdomain/zero/1.0/) public domain dedication. |
spellingShingle | Research Article Chapman, Lesley M. Spies, Noah Pai, Patrick Lim, Chun Shen Carroll, Andrew Narzisi, Giuseppe Watson, Christopher M. Proukakis, Christos Clarke, Wayne E. Nariai, Naoki Dawson, Eric Jones, Garan Blankenberg, Daniel Brueffer, Christian Xiao, Chunlin Kolora, Sree Rohit Raj Alexander, Noah Wolujewicz, Paul Ahmed, Azza E. Smith, Graeme Shehreen, Saadlee Wenger, Aaron M. Salit, Marc Zook, Justin M. A crowdsourced set of curated structural variants for the human genome |
title | A crowdsourced set of curated structural variants for the human genome |
title_full | A crowdsourced set of curated structural variants for the human genome |
title_fullStr | A crowdsourced set of curated structural variants for the human genome |
title_full_unstemmed | A crowdsourced set of curated structural variants for the human genome |
title_short | A crowdsourced set of curated structural variants for the human genome |
title_sort | crowdsourced set of curated structural variants for the human genome |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7329145/ https://www.ncbi.nlm.nih.gov/pubmed/32559231 http://dx.doi.org/10.1371/journal.pcbi.1007933 |
work_keys_str_mv | AT chapmanlesleym acrowdsourcedsetofcuratedstructuralvariantsforthehumangenome AT spiesnoah acrowdsourcedsetofcuratedstructuralvariantsforthehumangenome AT paipatrick acrowdsourcedsetofcuratedstructuralvariantsforthehumangenome AT limchunshen acrowdsourcedsetofcuratedstructuralvariantsforthehumangenome AT carrollandrew acrowdsourcedsetofcuratedstructuralvariantsforthehumangenome AT narzisigiuseppe acrowdsourcedsetofcuratedstructuralvariantsforthehumangenome AT watsonchristopherm acrowdsourcedsetofcuratedstructuralvariantsforthehumangenome AT proukakischristos acrowdsourcedsetofcuratedstructuralvariantsforthehumangenome AT clarkewaynee acrowdsourcedsetofcuratedstructuralvariantsforthehumangenome AT nariainaoki acrowdsourcedsetofcuratedstructuralvariantsforthehumangenome AT dawsoneric acrowdsourcedsetofcuratedstructuralvariantsforthehumangenome AT jonesgaran acrowdsourcedsetofcuratedstructuralvariantsforthehumangenome AT blankenbergdaniel acrowdsourcedsetofcuratedstructuralvariantsforthehumangenome AT bruefferchristian acrowdsourcedsetofcuratedstructuralvariantsforthehumangenome AT xiaochunlin acrowdsourcedsetofcuratedstructuralvariantsforthehumangenome AT kolorasreerohitraj acrowdsourcedsetofcuratedstructuralvariantsforthehumangenome AT alexandernoah acrowdsourcedsetofcuratedstructuralvariantsforthehumangenome AT wolujewiczpaul acrowdsourcedsetofcuratedstructuralvariantsforthehumangenome AT ahmedazzae acrowdsourcedsetofcuratedstructuralvariantsforthehumangenome AT smithgraeme acrowdsourcedsetofcuratedstructuralvariantsforthehumangenome AT shehreensaadlee acrowdsourcedsetofcuratedstructuralvariantsforthehumangenome AT wengeraaronm acrowdsourcedsetofcuratedstructuralvariantsforthehumangenome AT salitmarc acrowdsourcedsetofcuratedstructuralvariantsforthehumangenome AT zookjustinm acrowdsourcedsetofcuratedstructuralvariantsforthehumangenome AT chapmanlesleym crowdsourcedsetofcuratedstructuralvariantsforthehumangenome AT spiesnoah crowdsourcedsetofcuratedstructuralvariantsforthehumangenome AT paipatrick crowdsourcedsetofcuratedstructuralvariantsforthehumangenome AT limchunshen crowdsourcedsetofcuratedstructuralvariantsforthehumangenome AT carrollandrew crowdsourcedsetofcuratedstructuralvariantsforthehumangenome AT narzisigiuseppe crowdsourcedsetofcuratedstructuralvariantsforthehumangenome AT watsonchristopherm crowdsourcedsetofcuratedstructuralvariantsforthehumangenome AT proukakischristos crowdsourcedsetofcuratedstructuralvariantsforthehumangenome AT clarkewaynee crowdsourcedsetofcuratedstructuralvariantsforthehumangenome AT nariainaoki crowdsourcedsetofcuratedstructuralvariantsforthehumangenome AT dawsoneric crowdsourcedsetofcuratedstructuralvariantsforthehumangenome AT jonesgaran crowdsourcedsetofcuratedstructuralvariantsforthehumangenome AT blankenbergdaniel crowdsourcedsetofcuratedstructuralvariantsforthehumangenome AT bruefferchristian crowdsourcedsetofcuratedstructuralvariantsforthehumangenome AT xiaochunlin crowdsourcedsetofcuratedstructuralvariantsforthehumangenome AT kolorasreerohitraj crowdsourcedsetofcuratedstructuralvariantsforthehumangenome AT alexandernoah crowdsourcedsetofcuratedstructuralvariantsforthehumangenome AT wolujewiczpaul crowdsourcedsetofcuratedstructuralvariantsforthehumangenome AT ahmedazzae crowdsourcedsetofcuratedstructuralvariantsforthehumangenome AT smithgraeme crowdsourcedsetofcuratedstructuralvariantsforthehumangenome AT shehreensaadlee crowdsourcedsetofcuratedstructuralvariantsforthehumangenome AT wengeraaronm crowdsourcedsetofcuratedstructuralvariantsforthehumangenome AT salitmarc crowdsourcedsetofcuratedstructuralvariantsforthehumangenome AT zookjustinm crowdsourcedsetofcuratedstructuralvariantsforthehumangenome |