Cargando…

Detecting fabrication in large-scale molecular omics data

Fraud is a pervasive problem and can occur as fabrication, falsification, plagiarism, or theft. The scientific community is not exempt from this universal problem and several studies have recently been caught manipulating or fabricating data. Current measures to prevent and deter scientific miscondu...

Descripción completa

Detalles Bibliográficos
Autores principales: Bradshaw, Michael S., Payne, Samuel H.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8631639/
https://www.ncbi.nlm.nih.gov/pubmed/34847169
http://dx.doi.org/10.1371/journal.pone.0260395
_version_ 1784607602218369024
author Bradshaw, Michael S.
Payne, Samuel H.
author_facet Bradshaw, Michael S.
Payne, Samuel H.
author_sort Bradshaw, Michael S.
collection PubMed
description Fraud is a pervasive problem and can occur as fabrication, falsification, plagiarism, or theft. The scientific community is not exempt from this universal problem and several studies have recently been caught manipulating or fabricating data. Current measures to prevent and deter scientific misconduct come in the form of the peer-review process and on-site clinical trial auditors. As recent advances in high-throughput omics technologies have moved biology into the realm of big-data, fraud detection methods must be updated for sophisticated computational fraud. In the financial sector, machine learning and digit-frequencies are successfully used to detect fraud. Drawing from these sources, we develop methods of fabrication detection in biomedical research and show that machine learning can be used to detect fraud in large-scale omic experiments. Using the gene copy-number data as input, machine learning models correctly predicted fraud with 58–100% accuracy. With digit frequency as input features, the models detected fraud with 82%-100% accuracy. All of the data and analysis scripts used in this project are available at https://github.com/MSBradshaw/FakeData.
format Online
Article
Text
id pubmed-8631639
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-86316392021-12-01 Detecting fabrication in large-scale molecular omics data Bradshaw, Michael S. Payne, Samuel H. PLoS One Research Article Fraud is a pervasive problem and can occur as fabrication, falsification, plagiarism, or theft. The scientific community is not exempt from this universal problem and several studies have recently been caught manipulating or fabricating data. Current measures to prevent and deter scientific misconduct come in the form of the peer-review process and on-site clinical trial auditors. As recent advances in high-throughput omics technologies have moved biology into the realm of big-data, fraud detection methods must be updated for sophisticated computational fraud. In the financial sector, machine learning and digit-frequencies are successfully used to detect fraud. Drawing from these sources, we develop methods of fabrication detection in biomedical research and show that machine learning can be used to detect fraud in large-scale omic experiments. Using the gene copy-number data as input, machine learning models correctly predicted fraud with 58–100% accuracy. With digit frequency as input features, the models detected fraud with 82%-100% accuracy. All of the data and analysis scripts used in this project are available at https://github.com/MSBradshaw/FakeData. Public Library of Science 2021-11-30 /pmc/articles/PMC8631639/ /pubmed/34847169 http://dx.doi.org/10.1371/journal.pone.0260395 Text en © 2021 Bradshaw, Payne https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Bradshaw, Michael S.
Payne, Samuel H.
Detecting fabrication in large-scale molecular omics data
title Detecting fabrication in large-scale molecular omics data
title_full Detecting fabrication in large-scale molecular omics data
title_fullStr Detecting fabrication in large-scale molecular omics data
title_full_unstemmed Detecting fabrication in large-scale molecular omics data
title_short Detecting fabrication in large-scale molecular omics data
title_sort detecting fabrication in large-scale molecular omics data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8631639/
https://www.ncbi.nlm.nih.gov/pubmed/34847169
http://dx.doi.org/10.1371/journal.pone.0260395
work_keys_str_mv AT bradshawmichaels detectingfabricationinlargescalemolecularomicsdata
AT paynesamuelh detectingfabricationinlargescalemolecularomicsdata