Cargando…
S136. CLASSIFYING SCHIZOPHRENIA USING PHONOLOGICAL, SEMANTIC AND SYNTACTIC FEATURES OF LANGUAGE; A COMBINATORY MACHINE LEARNING APPROACH
BACKGROUND: The diagnosis of schizophrenia is currently based on anamnesis and psychiatric examination only. Language biomarkers may be useful to provide a quantitative and reproducible risk estimate for this spectrum of disorders. While people with schizophrenia spectrum disorders may show one or m...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7234480/ http://dx.doi.org/10.1093/schbul/sbaa031.202 |
_version_ | 1783535772831318016 |
---|---|
author | Voppel, Alban de Boer, Janna Slegers, Fleur Schnack, Hugo Sommer, Iris |
author_facet | Voppel, Alban de Boer, Janna Slegers, Fleur Schnack, Hugo Sommer, Iris |
author_sort | Voppel, Alban |
collection | PubMed |
description | BACKGROUND: The diagnosis of schizophrenia is currently based on anamnesis and psychiatric examination only. Language biomarkers may be useful to provide a quantitative and reproducible risk estimate for this spectrum of disorders. While people with schizophrenia spectrum disorders may show one or more language abnormalities, such as incoherence, affective flattening, failure of reference as well as changes in sentence length and complexity, the clinical picture can vary largely between individuals and language abnormalities will reflect this heterogeneity. Computational linguistics can be used to quantify these features of language. Because of the heterogeneous character of the various symptoms present in schizophrenia spectrum subjects, we expect some subjects to show semantic incoherence, while others may have more affective symptoms such as monotonous speech. Here, we combine phonological, semantic and syntactic features of semi-spontaneous language with machine learning algorithms for classification in order to develop a biomarker sensitive to the broad spectrum of schizophrenia. METHODS: Semi-spontaneous natural language samples were collected from 50 subjects with schizophrenia spectrum disorders and 50 age, gender and parental education matched controls, using recorded neutral-topic, open-ended interviews. The audio samples were speaker coded; audio belonging to the subject was extracted and transcribed. Phonological features were extracted using OpenSMILE; semantic features were calculated using a word2vec model using a moving windows of coherence approach, and finally syntactic aspects were calculated using the T-scan tool. Feature reduction was applied to each of the domains. To distinguish groups, results from machine learning classifiers trained using leave-one-out cross-validation on each of these aspects were combined, incorporating a voting mechanism. RESULTS: The machine-learning classifier approach obtained 75–78% accuracy for the semantic, syntactic and phonological domains individually. As most distinguishing features of their respective domain, we found reduced timbre and intonation for the phonological domain, increased variance of coherence for the semantic domain and decreased complexity of speech in the syntactic domain. The combined approach, using a voting algorithm across the domains, achieved an accuracy of 83% and a precision score of 89%. No significant differences in age, gender or parental education between healthy controls and subjects with schizophrenia spectrum disorders was found. DISCUSSION: In this study we demonstrated that computational features derived from different linguistic domains capture aspects of symptomatic language of schizophrenia spectrum disorder subjects. The combination of these features was useful to improve classification for this heterogeneous disorder, as we showed high accuracy and precision from the language parameters in distinguishing schizophrenia patients from healthy controls. These values are better than those obtained with imaging or blood analyses, while language is a more easily obtained and cheaper measure than those derived from other methods. Validation in an independent sample is required, and further features of differentiation should be extracted for their respective domains. Our positive results in using language abnormalities to automatically detect schizophrenia show that computational linguistics is a promising method in the search for reliable markers in psychiatry. |
format | Online Article Text |
id | pubmed-7234480 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-72344802020-05-23 S136. CLASSIFYING SCHIZOPHRENIA USING PHONOLOGICAL, SEMANTIC AND SYNTACTIC FEATURES OF LANGUAGE; A COMBINATORY MACHINE LEARNING APPROACH Voppel, Alban de Boer, Janna Slegers, Fleur Schnack, Hugo Sommer, Iris Schizophr Bull Poster Session I BACKGROUND: The diagnosis of schizophrenia is currently based on anamnesis and psychiatric examination only. Language biomarkers may be useful to provide a quantitative and reproducible risk estimate for this spectrum of disorders. While people with schizophrenia spectrum disorders may show one or more language abnormalities, such as incoherence, affective flattening, failure of reference as well as changes in sentence length and complexity, the clinical picture can vary largely between individuals and language abnormalities will reflect this heterogeneity. Computational linguistics can be used to quantify these features of language. Because of the heterogeneous character of the various symptoms present in schizophrenia spectrum subjects, we expect some subjects to show semantic incoherence, while others may have more affective symptoms such as monotonous speech. Here, we combine phonological, semantic and syntactic features of semi-spontaneous language with machine learning algorithms for classification in order to develop a biomarker sensitive to the broad spectrum of schizophrenia. METHODS: Semi-spontaneous natural language samples were collected from 50 subjects with schizophrenia spectrum disorders and 50 age, gender and parental education matched controls, using recorded neutral-topic, open-ended interviews. The audio samples were speaker coded; audio belonging to the subject was extracted and transcribed. Phonological features were extracted using OpenSMILE; semantic features were calculated using a word2vec model using a moving windows of coherence approach, and finally syntactic aspects were calculated using the T-scan tool. Feature reduction was applied to each of the domains. To distinguish groups, results from machine learning classifiers trained using leave-one-out cross-validation on each of these aspects were combined, incorporating a voting mechanism. RESULTS: The machine-learning classifier approach obtained 75–78% accuracy for the semantic, syntactic and phonological domains individually. As most distinguishing features of their respective domain, we found reduced timbre and intonation for the phonological domain, increased variance of coherence for the semantic domain and decreased complexity of speech in the syntactic domain. The combined approach, using a voting algorithm across the domains, achieved an accuracy of 83% and a precision score of 89%. No significant differences in age, gender or parental education between healthy controls and subjects with schizophrenia spectrum disorders was found. DISCUSSION: In this study we demonstrated that computational features derived from different linguistic domains capture aspects of symptomatic language of schizophrenia spectrum disorder subjects. The combination of these features was useful to improve classification for this heterogeneous disorder, as we showed high accuracy and precision from the language parameters in distinguishing schizophrenia patients from healthy controls. These values are better than those obtained with imaging or blood analyses, while language is a more easily obtained and cheaper measure than those derived from other methods. Validation in an independent sample is required, and further features of differentiation should be extracted for their respective domains. Our positive results in using language abnormalities to automatically detect schizophrenia show that computational linguistics is a promising method in the search for reliable markers in psychiatry. Oxford University Press 2020-05 2020-05-18 /pmc/articles/PMC7234480/ http://dx.doi.org/10.1093/schbul/sbaa031.202 Text en © The Author(s) 2020. Published by Oxford University Press on behalf of the Maryland Psychiatric Research Center. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Poster Session I Voppel, Alban de Boer, Janna Slegers, Fleur Schnack, Hugo Sommer, Iris S136. CLASSIFYING SCHIZOPHRENIA USING PHONOLOGICAL, SEMANTIC AND SYNTACTIC FEATURES OF LANGUAGE; A COMBINATORY MACHINE LEARNING APPROACH |
title | S136. CLASSIFYING SCHIZOPHRENIA USING PHONOLOGICAL, SEMANTIC AND SYNTACTIC FEATURES OF LANGUAGE; A COMBINATORY MACHINE LEARNING APPROACH |
title_full | S136. CLASSIFYING SCHIZOPHRENIA USING PHONOLOGICAL, SEMANTIC AND SYNTACTIC FEATURES OF LANGUAGE; A COMBINATORY MACHINE LEARNING APPROACH |
title_fullStr | S136. CLASSIFYING SCHIZOPHRENIA USING PHONOLOGICAL, SEMANTIC AND SYNTACTIC FEATURES OF LANGUAGE; A COMBINATORY MACHINE LEARNING APPROACH |
title_full_unstemmed | S136. CLASSIFYING SCHIZOPHRENIA USING PHONOLOGICAL, SEMANTIC AND SYNTACTIC FEATURES OF LANGUAGE; A COMBINATORY MACHINE LEARNING APPROACH |
title_short | S136. CLASSIFYING SCHIZOPHRENIA USING PHONOLOGICAL, SEMANTIC AND SYNTACTIC FEATURES OF LANGUAGE; A COMBINATORY MACHINE LEARNING APPROACH |
title_sort | s136. classifying schizophrenia using phonological, semantic and syntactic features of language; a combinatory machine learning approach |
topic | Poster Session I |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7234480/ http://dx.doi.org/10.1093/schbul/sbaa031.202 |
work_keys_str_mv | AT voppelalban s136classifyingschizophreniausingphonologicalsemanticandsyntacticfeaturesoflanguageacombinatorymachinelearningapproach AT deboerjanna s136classifyingschizophreniausingphonologicalsemanticandsyntacticfeaturesoflanguageacombinatorymachinelearningapproach AT slegersfleur s136classifyingschizophreniausingphonologicalsemanticandsyntacticfeaturesoflanguageacombinatorymachinelearningapproach AT schnackhugo s136classifyingschizophreniausingphonologicalsemanticandsyntacticfeaturesoflanguageacombinatorymachinelearningapproach AT sommeriris s136classifyingschizophreniausingphonologicalsemanticandsyntacticfeaturesoflanguageacombinatorymachinelearningapproach |