Cargando…

A crowdsourced set of curated structural variants for the human genome

A high quality benchmark for small variants encompassing 88 to 90% of the reference genome has been developed for seven Genome in a Bottle (GIAB) reference samples. However a reliable benchmark for large indels and structural variants (SVs) is more challenging. In this study, we manually curated 123...

Descripción completa

Detalles Bibliográficos
Autores principales: Chapman, Lesley M., Spies, Noah, Pai, Patrick, Lim, Chun Shen, Carroll, Andrew, Narzisi, Giuseppe, Watson, Christopher M., Proukakis, Christos, Clarke, Wayne E., Nariai, Naoki, Dawson, Eric, Jones, Garan, Blankenberg, Daniel, Brueffer, Christian, Xiao, Chunlin, Kolora, Sree Rohit Raj, Alexander, Noah, Wolujewicz, Paul, Ahmed, Azza E., Smith, Graeme, Shehreen, Saadlee, Wenger, Aaron M., Salit, Marc, Zook, Justin M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7329145/
https://www.ncbi.nlm.nih.gov/pubmed/32559231
http://dx.doi.org/10.1371/journal.pcbi.1007933
_version_ 1783552859725365248
author Chapman, Lesley M.
Spies, Noah
Pai, Patrick
Lim, Chun Shen
Carroll, Andrew
Narzisi, Giuseppe
Watson, Christopher M.
Proukakis, Christos
Clarke, Wayne E.
Nariai, Naoki
Dawson, Eric
Jones, Garan
Blankenberg, Daniel
Brueffer, Christian
Xiao, Chunlin
Kolora, Sree Rohit Raj
Alexander, Noah
Wolujewicz, Paul
Ahmed, Azza E.
Smith, Graeme
Shehreen, Saadlee
Wenger, Aaron M.
Salit, Marc
Zook, Justin M.
author_facet Chapman, Lesley M.
Spies, Noah
Pai, Patrick
Lim, Chun Shen
Carroll, Andrew
Narzisi, Giuseppe
Watson, Christopher M.
Proukakis, Christos
Clarke, Wayne E.
Nariai, Naoki
Dawson, Eric
Jones, Garan
Blankenberg, Daniel
Brueffer, Christian
Xiao, Chunlin
Kolora, Sree Rohit Raj
Alexander, Noah
Wolujewicz, Paul
Ahmed, Azza E.
Smith, Graeme
Shehreen, Saadlee
Wenger, Aaron M.
Salit, Marc
Zook, Justin M.
author_sort Chapman, Lesley M.
collection PubMed
description A high quality benchmark for small variants encompassing 88 to 90% of the reference genome has been developed for seven Genome in a Bottle (GIAB) reference samples. However a reliable benchmark for large indels and structural variants (SVs) is more challenging. In this study, we manually curated 1235 SVs, which can ultimately be used to evaluate SV callers or train machine learning models. We developed a crowdsourcing app—SVCurator—to help GIAB curators manually review large indels and SVs within the human genome, and report their genotype and size accuracy. SVCurator displays images from short, long, and linked read sequencing data from the GIAB Ashkenazi Jewish Trio son [NIST RM 8391/HG002]. We asked curators to assign labels describing SV type (deletion or insertion), size accuracy, and genotype for 1235 putative insertions and deletions sampled from different size bins between 20 and 892,149 bp. ‘Expert’ curators were 93% concordant with each other, and 37 of the 61 curators had at least 78% concordance with a set of ‘expert’ curators. The curators were least concordant for complex SVs and SVs that had inaccurate breakpoints or size predictions. After filtering events with low concordance among curators, we produced high confidence labels for 935 events. The SVCurator crowdsourced labels were 94.5% concordant with the heuristic-based draft benchmark SV callset from GIAB. We found that curators can successfully evaluate putative SVs when given evidence from multiple sequencing technologies.
format Online
Article
Text
id pubmed-7329145
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-73291452020-07-14 A crowdsourced set of curated structural variants for the human genome Chapman, Lesley M. Spies, Noah Pai, Patrick Lim, Chun Shen Carroll, Andrew Narzisi, Giuseppe Watson, Christopher M. Proukakis, Christos Clarke, Wayne E. Nariai, Naoki Dawson, Eric Jones, Garan Blankenberg, Daniel Brueffer, Christian Xiao, Chunlin Kolora, Sree Rohit Raj Alexander, Noah Wolujewicz, Paul Ahmed, Azza E. Smith, Graeme Shehreen, Saadlee Wenger, Aaron M. Salit, Marc Zook, Justin M. PLoS Comput Biol Research Article A high quality benchmark for small variants encompassing 88 to 90% of the reference genome has been developed for seven Genome in a Bottle (GIAB) reference samples. However a reliable benchmark for large indels and structural variants (SVs) is more challenging. In this study, we manually curated 1235 SVs, which can ultimately be used to evaluate SV callers or train machine learning models. We developed a crowdsourcing app—SVCurator—to help GIAB curators manually review large indels and SVs within the human genome, and report their genotype and size accuracy. SVCurator displays images from short, long, and linked read sequencing data from the GIAB Ashkenazi Jewish Trio son [NIST RM 8391/HG002]. We asked curators to assign labels describing SV type (deletion or insertion), size accuracy, and genotype for 1235 putative insertions and deletions sampled from different size bins between 20 and 892,149 bp. ‘Expert’ curators were 93% concordant with each other, and 37 of the 61 curators had at least 78% concordance with a set of ‘expert’ curators. The curators were least concordant for complex SVs and SVs that had inaccurate breakpoints or size predictions. After filtering events with low concordance among curators, we produced high confidence labels for 935 events. The SVCurator crowdsourced labels were 94.5% concordant with the heuristic-based draft benchmark SV callset from GIAB. We found that curators can successfully evaluate putative SVs when given evidence from multiple sequencing technologies. Public Library of Science 2020-06-19 /pmc/articles/PMC7329145/ /pubmed/32559231 http://dx.doi.org/10.1371/journal.pcbi.1007933 Text en https://creativecommons.org/publicdomain/zero/1.0/ This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 (https://creativecommons.org/publicdomain/zero/1.0/) public domain dedication.
spellingShingle Research Article
Chapman, Lesley M.
Spies, Noah
Pai, Patrick
Lim, Chun Shen
Carroll, Andrew
Narzisi, Giuseppe
Watson, Christopher M.
Proukakis, Christos
Clarke, Wayne E.
Nariai, Naoki
Dawson, Eric
Jones, Garan
Blankenberg, Daniel
Brueffer, Christian
Xiao, Chunlin
Kolora, Sree Rohit Raj
Alexander, Noah
Wolujewicz, Paul
Ahmed, Azza E.
Smith, Graeme
Shehreen, Saadlee
Wenger, Aaron M.
Salit, Marc
Zook, Justin M.
A crowdsourced set of curated structural variants for the human genome
title A crowdsourced set of curated structural variants for the human genome
title_full A crowdsourced set of curated structural variants for the human genome
title_fullStr A crowdsourced set of curated structural variants for the human genome
title_full_unstemmed A crowdsourced set of curated structural variants for the human genome
title_short A crowdsourced set of curated structural variants for the human genome
title_sort crowdsourced set of curated structural variants for the human genome
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7329145/
https://www.ncbi.nlm.nih.gov/pubmed/32559231
http://dx.doi.org/10.1371/journal.pcbi.1007933
work_keys_str_mv AT chapmanlesleym acrowdsourcedsetofcuratedstructuralvariantsforthehumangenome
AT spiesnoah acrowdsourcedsetofcuratedstructuralvariantsforthehumangenome
AT paipatrick acrowdsourcedsetofcuratedstructuralvariantsforthehumangenome
AT limchunshen acrowdsourcedsetofcuratedstructuralvariantsforthehumangenome
AT carrollandrew acrowdsourcedsetofcuratedstructuralvariantsforthehumangenome
AT narzisigiuseppe acrowdsourcedsetofcuratedstructuralvariantsforthehumangenome
AT watsonchristopherm acrowdsourcedsetofcuratedstructuralvariantsforthehumangenome
AT proukakischristos acrowdsourcedsetofcuratedstructuralvariantsforthehumangenome
AT clarkewaynee acrowdsourcedsetofcuratedstructuralvariantsforthehumangenome
AT nariainaoki acrowdsourcedsetofcuratedstructuralvariantsforthehumangenome
AT dawsoneric acrowdsourcedsetofcuratedstructuralvariantsforthehumangenome
AT jonesgaran acrowdsourcedsetofcuratedstructuralvariantsforthehumangenome
AT blankenbergdaniel acrowdsourcedsetofcuratedstructuralvariantsforthehumangenome
AT bruefferchristian acrowdsourcedsetofcuratedstructuralvariantsforthehumangenome
AT xiaochunlin acrowdsourcedsetofcuratedstructuralvariantsforthehumangenome
AT kolorasreerohitraj acrowdsourcedsetofcuratedstructuralvariantsforthehumangenome
AT alexandernoah acrowdsourcedsetofcuratedstructuralvariantsforthehumangenome
AT wolujewiczpaul acrowdsourcedsetofcuratedstructuralvariantsforthehumangenome
AT ahmedazzae acrowdsourcedsetofcuratedstructuralvariantsforthehumangenome
AT smithgraeme acrowdsourcedsetofcuratedstructuralvariantsforthehumangenome
AT shehreensaadlee acrowdsourcedsetofcuratedstructuralvariantsforthehumangenome
AT wengeraaronm acrowdsourcedsetofcuratedstructuralvariantsforthehumangenome
AT salitmarc acrowdsourcedsetofcuratedstructuralvariantsforthehumangenome
AT zookjustinm acrowdsourcedsetofcuratedstructuralvariantsforthehumangenome
AT chapmanlesleym crowdsourcedsetofcuratedstructuralvariantsforthehumangenome
AT spiesnoah crowdsourcedsetofcuratedstructuralvariantsforthehumangenome
AT paipatrick crowdsourcedsetofcuratedstructuralvariantsforthehumangenome
AT limchunshen crowdsourcedsetofcuratedstructuralvariantsforthehumangenome
AT carrollandrew crowdsourcedsetofcuratedstructuralvariantsforthehumangenome
AT narzisigiuseppe crowdsourcedsetofcuratedstructuralvariantsforthehumangenome
AT watsonchristopherm crowdsourcedsetofcuratedstructuralvariantsforthehumangenome
AT proukakischristos crowdsourcedsetofcuratedstructuralvariantsforthehumangenome
AT clarkewaynee crowdsourcedsetofcuratedstructuralvariantsforthehumangenome
AT nariainaoki crowdsourcedsetofcuratedstructuralvariantsforthehumangenome
AT dawsoneric crowdsourcedsetofcuratedstructuralvariantsforthehumangenome
AT jonesgaran crowdsourcedsetofcuratedstructuralvariantsforthehumangenome
AT blankenbergdaniel crowdsourcedsetofcuratedstructuralvariantsforthehumangenome
AT bruefferchristian crowdsourcedsetofcuratedstructuralvariantsforthehumangenome
AT xiaochunlin crowdsourcedsetofcuratedstructuralvariantsforthehumangenome
AT kolorasreerohitraj crowdsourcedsetofcuratedstructuralvariantsforthehumangenome
AT alexandernoah crowdsourcedsetofcuratedstructuralvariantsforthehumangenome
AT wolujewiczpaul crowdsourcedsetofcuratedstructuralvariantsforthehumangenome
AT ahmedazzae crowdsourcedsetofcuratedstructuralvariantsforthehumangenome
AT smithgraeme crowdsourcedsetofcuratedstructuralvariantsforthehumangenome
AT shehreensaadlee crowdsourcedsetofcuratedstructuralvariantsforthehumangenome
AT wengeraaronm crowdsourcedsetofcuratedstructuralvariantsforthehumangenome
AT salitmarc crowdsourcedsetofcuratedstructuralvariantsforthehumangenome
AT zookjustinm crowdsourcedsetofcuratedstructuralvariantsforthehumangenome