Cargando…

Analyzing big datasets of genomic sequences: fast and scalable collection of k-mer statistics

BACKGROUND: Distributed approaches based on the MapReduce programming paradigm have started to be proposed in the Bioinformatics domain, due to the large amount of data produced by the next-generation sequencing techniques. However, the use of MapReduce and related Big Data technologies and framewor...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ferraro Petrillo, Umberto, Sorella, Mara, Cattaneo, Giuseppe, Giancarlo, Raffaele, Rombo, Simona E.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2019
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6471689/ https://www.ncbi.nlm.nih.gov/pubmed/30999863 http://dx.doi.org/10.1186/s12859-019-2694-8

Ejemplares similares

DIAMIN: a software library for the distributed analysis of large-scale molecular interaction networks
por: Di Rocco, Lorenzo, et al.
Publicado: (2022)

FASTA/Q data compressors for MapReduce-Hadoop genomics: space and time savings made easy
por: Ferraro Petrillo, Umberto, et al.
Publicado: (2021)

Correction to: FASTA/Q data compressors for MapReduce-Hadoop genomics: space and time savings made easy
por: Ferraro Petrillo, Umberto, et al.
Publicado: (2022)

Understanding big data scalability
por: Isaacson, Cory
Publicado: (2015)

Statistical tests and identifiability conditions for pooling and analyzing multisite datasets
por: Zhou, Hao Henry, et al.
Publicado: (2018)

Neutralization of MERS coronavirus through a scalable nanoparticle vaccine
por: Mohsen, Mona O., et al.
Publicado: (2021)

Ten quick tips for bioinformatics analyses using an Apache Spark distributed computing environment
por: Chicco, Davide, et al.
Publicado: (2023)

iMOKA: k-mer based software to analyze large collections of sequencing data
por: Lorenzi, Claudio, et al.
Publicado: (2020)

Airpart: interpretable statistical models for analyzing allelic imbalance in single-cell datasets
por: Mu, Wancen, et al.
Publicado: (2022)

THiCweed: fast, sensitive detection of sequence features by clustering big datasets
por: Agrawal, Ankit, et al.
Publicado: (2018)

Scalability and Validation of Big Data Bioinformatics Software
por: Yang, Andrian, et al.
Publicado: (2017)

Scalable biclustering — the future of big data exploration?
por: Orzechowski, Patryk, et al.
Publicado: (2019)

STAT: a fast, scalable, MinHash-based k-mer tool to assess Sequence Read Archive next-generation sequence submissions
por: Katz, Kenneth S., et al.
Publicado: (2021)

Fast multidimensional ensemble empirical mode decomposition for the analysis of big spatio-temporal datasets
por: Wu, Zhaohua, et al.
Publicado: (2016)

The K-mer antibiotic resistance gene variant analyzer (KARGVA)
por: Marini, Simone, et al.
Publicado: (2023)

Upscaling Statistical Patterns from Reduced Storage in Social and Life Science Big Datasets
por: Garlaschi, Stefano, et al.
Publicado: (2020)

Analyzing Big Data in Psychology: A Split/Analyze/Meta-Analyze Approach
por: Cheung, Mike W.-L., et al.
Publicado: (2016)

Matrisome AnalyzeR: A suite of tools to annotate and quantify ECM molecules in big datasets across organisms
por: Petrov, Petar B., et al.
Publicado: (2023)

Matrisome AnalyzeR – a suite of tools to annotate and quantify ECM molecules in big datasets across organisms
por: Petrov, Petar B., et al.
Publicado: (2023)

Dataset for mosquito collections on Big Pine Key, Florida, USA
por: Hribar, Lawrence J.
Publicado: (2019)

Fast and scalable image auto-tagging
por: Frejaville, Camille
Publicado: (2014)

Investigation of nonlinear epidemiological models for analyzing and controlling the MERS outbreak in Korea
por: Ahn, Inkyung, et al.
Publicado: (2018)

SPOROS: A pipeline to analyze DISE/6mer seed toxicity
por: Bartom, Elizabeth T., et al.
Publicado: (2022)

Scalable methods for analyzing and visualizing phylogenetic placement of metagenomic samples
por: Czech, Lucas, et al.
Publicado: (2019)

Scalable Iterative Classification for Sanitizing Large-Scale Datasets
por: Li, Bo, et al.
Publicado: (2017)

Scalable big data architecture: a practitioner's guide to choosing relevant big data architecture
por: Azarmi, Bahaaldine
Publicado: (2016)

Libra: scalable k-mer–based tool for massive all-vs-all metagenome comparisons
por: Choi, Illyoung, et al.
Publicado: (2018)

Scalable Genome Assembly through Parallel de Bruijn Graph Construction for Multiple k-mers
por: Mahadik, Kanak, et al.
Publicado: (2019)

REINDEER: efficient indexing of k-mer presence and abundance in sequencing datasets
por: Marchet, Camille, et al.
Publicado: (2020)

Selecting Accurate Classifier Models for a MERS-CoV Dataset
por: AlMoammar, Afnan, et al.
Publicado: (2018)

Hands-on big data analytics with PySpark: analyze large datasets and discover techniques for testing, immunizing, and parallelizing Spark jobs
por: Lai, Rudy, et al.
Publicado: (2019)

Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation
por: Koren, Sergey, et al.
Publicado: (2017)

Correction: Scalable methods for analyzing and visualizing phylogenetic placement of metagenomic samples
Publicado: (2019)

Hierarchical sets: analyzing pangenome structure through scalable set visualizations
por: Pedersen, Thomas Lin
Publicado: (2017)

Scalable Extraction of Big Macromolecular Data in Azure Data Lake Environment
por: Mrozek, Dariusz, et al.
Publicado: (2019)

Node Conductance: A Scalable Node Centrality Measure on Big Networks
por: Lyu, Tianshu, et al.
Publicado: (2020)

Ultrafast and scalable variant annotation and prioritization with big functional genomics data
por: Huang, Dandan, et al.
Publicado: (2020)

KAnalyze: a fast versatile pipelined K-mer toolkit
por: Audano, Peter, et al.
Publicado: (2014)

BigWig and BigBed: enabling browsing of large distributed datasets
por: Kent, W. J., et al.
Publicado: (2010)

Statistical methods for analyzing immunosignatures
por: Brown, Justin R, et al.
Publicado: (2011)

Cannot write session to /tmp/vufind_sessions/sess_591d9hla4te7qteiqdn3ofeqo7