Cargando…

SEXCMD: Development and validation of sex marker sequences for whole-exome/genome and RNA sequencing

Over the last decade, a large number of nucleotide sequences have been generated by next-generation sequencing technologies and deposited to public databases. However, most of these datasets do not specify the sex of individuals sampled because researchers typically ignore or hide this information....

Descripción completa

Detalles Bibliográficos
Autores principales: Jeong, Seongmun, Kim, Jiwoong, Park, Won, Jeon, Hongmin, Kim, Namshin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5590872/
https://www.ncbi.nlm.nih.gov/pubmed/28886064
http://dx.doi.org/10.1371/journal.pone.0184087
_version_ 1783262602933043200
author Jeong, Seongmun
Kim, Jiwoong
Park, Won
Jeon, Hongmin
Kim, Namshin
author_facet Jeong, Seongmun
Kim, Jiwoong
Park, Won
Jeon, Hongmin
Kim, Namshin
author_sort Jeong, Seongmun
collection PubMed
description Over the last decade, a large number of nucleotide sequences have been generated by next-generation sequencing technologies and deposited to public databases. However, most of these datasets do not specify the sex of individuals sampled because researchers typically ignore or hide this information. Male and female genomes in many species have distinctive sex chromosomes, XX/XY and ZW/ZZ, and expression levels of many sex-related genes differ between the sexes. Herein, we describe how to develop sex marker sequences from syntenic regions of sex chromosomes and use them to quickly identify the sex of individuals being analyzed. Array-based technologies routinely use either known sex markers or the B-allele frequency of X or Z chromosomes to deduce the sex of an individual. The same strategy has been used with whole-exome/genome sequence data; however, all reads must be aligned onto a reference genome to determine the B-allele frequency of the X or Z chromosomes. SEXCMD is a pipeline that can extract sex marker sequences from reference sex chromosomes and rapidly identify the sex of individuals from whole-exome/genome and RNA sequencing after training with a known dataset through a simple machine learning approach. The pipeline counts total numbers of hits from sex-specific marker sequences and identifies the sex of the individuals sampled based on the fact that XX/ZZ samples do not have Y or W chromosome hits. We have successfully validated our pipeline with mammalian (Homo sapiens; XY) and avian (Gallus gallus; ZW) genomes. Typical calculation time when applying SEXCMD to human whole-exome or RNA sequencing datasets is a few minutes, and analyzing human whole-genome datasets takes about 10 minutes. Another important application of SEXCMD is as a quality control measure to avoid mixing samples before bioinformatics analysis. SEXCMD comprises simple Python and R scripts and is freely available at https://github.com/lovemun/SEXCMD.
format Online
Article
Text
id pubmed-5590872
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-55908722017-09-15 SEXCMD: Development and validation of sex marker sequences for whole-exome/genome and RNA sequencing Jeong, Seongmun Kim, Jiwoong Park, Won Jeon, Hongmin Kim, Namshin PLoS One Research Article Over the last decade, a large number of nucleotide sequences have been generated by next-generation sequencing technologies and deposited to public databases. However, most of these datasets do not specify the sex of individuals sampled because researchers typically ignore or hide this information. Male and female genomes in many species have distinctive sex chromosomes, XX/XY and ZW/ZZ, and expression levels of many sex-related genes differ between the sexes. Herein, we describe how to develop sex marker sequences from syntenic regions of sex chromosomes and use them to quickly identify the sex of individuals being analyzed. Array-based technologies routinely use either known sex markers or the B-allele frequency of X or Z chromosomes to deduce the sex of an individual. The same strategy has been used with whole-exome/genome sequence data; however, all reads must be aligned onto a reference genome to determine the B-allele frequency of the X or Z chromosomes. SEXCMD is a pipeline that can extract sex marker sequences from reference sex chromosomes and rapidly identify the sex of individuals from whole-exome/genome and RNA sequencing after training with a known dataset through a simple machine learning approach. The pipeline counts total numbers of hits from sex-specific marker sequences and identifies the sex of the individuals sampled based on the fact that XX/ZZ samples do not have Y or W chromosome hits. We have successfully validated our pipeline with mammalian (Homo sapiens; XY) and avian (Gallus gallus; ZW) genomes. Typical calculation time when applying SEXCMD to human whole-exome or RNA sequencing datasets is a few minutes, and analyzing human whole-genome datasets takes about 10 minutes. Another important application of SEXCMD is as a quality control measure to avoid mixing samples before bioinformatics analysis. SEXCMD comprises simple Python and R scripts and is freely available at https://github.com/lovemun/SEXCMD. Public Library of Science 2017-09-08 /pmc/articles/PMC5590872/ /pubmed/28886064 http://dx.doi.org/10.1371/journal.pone.0184087 Text en © 2017 Jeong et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Jeong, Seongmun
Kim, Jiwoong
Park, Won
Jeon, Hongmin
Kim, Namshin
SEXCMD: Development and validation of sex marker sequences for whole-exome/genome and RNA sequencing
title SEXCMD: Development and validation of sex marker sequences for whole-exome/genome and RNA sequencing
title_full SEXCMD: Development and validation of sex marker sequences for whole-exome/genome and RNA sequencing
title_fullStr SEXCMD: Development and validation of sex marker sequences for whole-exome/genome and RNA sequencing
title_full_unstemmed SEXCMD: Development and validation of sex marker sequences for whole-exome/genome and RNA sequencing
title_short SEXCMD: Development and validation of sex marker sequences for whole-exome/genome and RNA sequencing
title_sort sexcmd: development and validation of sex marker sequences for whole-exome/genome and rna sequencing
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5590872/
https://www.ncbi.nlm.nih.gov/pubmed/28886064
http://dx.doi.org/10.1371/journal.pone.0184087
work_keys_str_mv AT jeongseongmun sexcmddevelopmentandvalidationofsexmarkersequencesforwholeexomegenomeandrnasequencing
AT kimjiwoong sexcmddevelopmentandvalidationofsexmarkersequencesforwholeexomegenomeandrnasequencing
AT parkwon sexcmddevelopmentandvalidationofsexmarkersequencesforwholeexomegenomeandrnasequencing
AT jeonhongmin sexcmddevelopmentandvalidationofsexmarkersequencesforwholeexomegenomeandrnasequencing
AT kimnamshin sexcmddevelopmentandvalidationofsexmarkersequencesforwholeexomegenomeandrnasequencing