Cargando…

S136. CLASSIFYING SCHIZOPHRENIA USING PHONOLOGICAL, SEMANTIC AND SYNTACTIC FEATURES OF LANGUAGE; A COMBINATORY MACHINE LEARNING APPROACH

BACKGROUND: The diagnosis of schizophrenia is currently based on anamnesis and psychiatric examination only. Language biomarkers may be useful to provide a quantitative and reproducible risk estimate for this spectrum of disorders. While people with schizophrenia spectrum disorders may show one or m...

Descripción completa

Detalles Bibliográficos
Autores principales: Voppel, Alban, de Boer, Janna, Slegers, Fleur, Schnack, Hugo, Sommer, Iris
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7234480/
http://dx.doi.org/10.1093/schbul/sbaa031.202
_version_ 1783535772831318016
author Voppel, Alban
de Boer, Janna
Slegers, Fleur
Schnack, Hugo
Sommer, Iris
author_facet Voppel, Alban
de Boer, Janna
Slegers, Fleur
Schnack, Hugo
Sommer, Iris
author_sort Voppel, Alban
collection PubMed
description BACKGROUND: The diagnosis of schizophrenia is currently based on anamnesis and psychiatric examination only. Language biomarkers may be useful to provide a quantitative and reproducible risk estimate for this spectrum of disorders. While people with schizophrenia spectrum disorders may show one or more language abnormalities, such as incoherence, affective flattening, failure of reference as well as changes in sentence length and complexity, the clinical picture can vary largely between individuals and language abnormalities will reflect this heterogeneity. Computational linguistics can be used to quantify these features of language. Because of the heterogeneous character of the various symptoms present in schizophrenia spectrum subjects, we expect some subjects to show semantic incoherence, while others may have more affective symptoms such as monotonous speech. Here, we combine phonological, semantic and syntactic features of semi-spontaneous language with machine learning algorithms for classification in order to develop a biomarker sensitive to the broad spectrum of schizophrenia. METHODS: Semi-spontaneous natural language samples were collected from 50 subjects with schizophrenia spectrum disorders and 50 age, gender and parental education matched controls, using recorded neutral-topic, open-ended interviews. The audio samples were speaker coded; audio belonging to the subject was extracted and transcribed. Phonological features were extracted using OpenSMILE; semantic features were calculated using a word2vec model using a moving windows of coherence approach, and finally syntactic aspects were calculated using the T-scan tool. Feature reduction was applied to each of the domains. To distinguish groups, results from machine learning classifiers trained using leave-one-out cross-validation on each of these aspects were combined, incorporating a voting mechanism. RESULTS: The machine-learning classifier approach obtained 75–78% accuracy for the semantic, syntactic and phonological domains individually. As most distinguishing features of their respective domain, we found reduced timbre and intonation for the phonological domain, increased variance of coherence for the semantic domain and decreased complexity of speech in the syntactic domain. The combined approach, using a voting algorithm across the domains, achieved an accuracy of 83% and a precision score of 89%. No significant differences in age, gender or parental education between healthy controls and subjects with schizophrenia spectrum disorders was found. DISCUSSION: In this study we demonstrated that computational features derived from different linguistic domains capture aspects of symptomatic language of schizophrenia spectrum disorder subjects. The combination of these features was useful to improve classification for this heterogeneous disorder, as we showed high accuracy and precision from the language parameters in distinguishing schizophrenia patients from healthy controls. These values are better than those obtained with imaging or blood analyses, while language is a more easily obtained and cheaper measure than those derived from other methods. Validation in an independent sample is required, and further features of differentiation should be extracted for their respective domains. Our positive results in using language abnormalities to automatically detect schizophrenia show that computational linguistics is a promising method in the search for reliable markers in psychiatry.
format Online
Article
Text
id pubmed-7234480
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-72344802020-05-23 S136. CLASSIFYING SCHIZOPHRENIA USING PHONOLOGICAL, SEMANTIC AND SYNTACTIC FEATURES OF LANGUAGE; A COMBINATORY MACHINE LEARNING APPROACH Voppel, Alban de Boer, Janna Slegers, Fleur Schnack, Hugo Sommer, Iris Schizophr Bull Poster Session I BACKGROUND: The diagnosis of schizophrenia is currently based on anamnesis and psychiatric examination only. Language biomarkers may be useful to provide a quantitative and reproducible risk estimate for this spectrum of disorders. While people with schizophrenia spectrum disorders may show one or more language abnormalities, such as incoherence, affective flattening, failure of reference as well as changes in sentence length and complexity, the clinical picture can vary largely between individuals and language abnormalities will reflect this heterogeneity. Computational linguistics can be used to quantify these features of language. Because of the heterogeneous character of the various symptoms present in schizophrenia spectrum subjects, we expect some subjects to show semantic incoherence, while others may have more affective symptoms such as monotonous speech. Here, we combine phonological, semantic and syntactic features of semi-spontaneous language with machine learning algorithms for classification in order to develop a biomarker sensitive to the broad spectrum of schizophrenia. METHODS: Semi-spontaneous natural language samples were collected from 50 subjects with schizophrenia spectrum disorders and 50 age, gender and parental education matched controls, using recorded neutral-topic, open-ended interviews. The audio samples were speaker coded; audio belonging to the subject was extracted and transcribed. Phonological features were extracted using OpenSMILE; semantic features were calculated using a word2vec model using a moving windows of coherence approach, and finally syntactic aspects were calculated using the T-scan tool. Feature reduction was applied to each of the domains. To distinguish groups, results from machine learning classifiers trained using leave-one-out cross-validation on each of these aspects were combined, incorporating a voting mechanism. RESULTS: The machine-learning classifier approach obtained 75–78% accuracy for the semantic, syntactic and phonological domains individually. As most distinguishing features of their respective domain, we found reduced timbre and intonation for the phonological domain, increased variance of coherence for the semantic domain and decreased complexity of speech in the syntactic domain. The combined approach, using a voting algorithm across the domains, achieved an accuracy of 83% and a precision score of 89%. No significant differences in age, gender or parental education between healthy controls and subjects with schizophrenia spectrum disorders was found. DISCUSSION: In this study we demonstrated that computational features derived from different linguistic domains capture aspects of symptomatic language of schizophrenia spectrum disorder subjects. The combination of these features was useful to improve classification for this heterogeneous disorder, as we showed high accuracy and precision from the language parameters in distinguishing schizophrenia patients from healthy controls. These values are better than those obtained with imaging or blood analyses, while language is a more easily obtained and cheaper measure than those derived from other methods. Validation in an independent sample is required, and further features of differentiation should be extracted for their respective domains. Our positive results in using language abnormalities to automatically detect schizophrenia show that computational linguistics is a promising method in the search for reliable markers in psychiatry. Oxford University Press 2020-05 2020-05-18 /pmc/articles/PMC7234480/ http://dx.doi.org/10.1093/schbul/sbaa031.202 Text en © The Author(s) 2020. Published by Oxford University Press on behalf of the Maryland Psychiatric Research Center. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Poster Session I
Voppel, Alban
de Boer, Janna
Slegers, Fleur
Schnack, Hugo
Sommer, Iris
S136. CLASSIFYING SCHIZOPHRENIA USING PHONOLOGICAL, SEMANTIC AND SYNTACTIC FEATURES OF LANGUAGE; A COMBINATORY MACHINE LEARNING APPROACH
title S136. CLASSIFYING SCHIZOPHRENIA USING PHONOLOGICAL, SEMANTIC AND SYNTACTIC FEATURES OF LANGUAGE; A COMBINATORY MACHINE LEARNING APPROACH
title_full S136. CLASSIFYING SCHIZOPHRENIA USING PHONOLOGICAL, SEMANTIC AND SYNTACTIC FEATURES OF LANGUAGE; A COMBINATORY MACHINE LEARNING APPROACH
title_fullStr S136. CLASSIFYING SCHIZOPHRENIA USING PHONOLOGICAL, SEMANTIC AND SYNTACTIC FEATURES OF LANGUAGE; A COMBINATORY MACHINE LEARNING APPROACH
title_full_unstemmed S136. CLASSIFYING SCHIZOPHRENIA USING PHONOLOGICAL, SEMANTIC AND SYNTACTIC FEATURES OF LANGUAGE; A COMBINATORY MACHINE LEARNING APPROACH
title_short S136. CLASSIFYING SCHIZOPHRENIA USING PHONOLOGICAL, SEMANTIC AND SYNTACTIC FEATURES OF LANGUAGE; A COMBINATORY MACHINE LEARNING APPROACH
title_sort s136. classifying schizophrenia using phonological, semantic and syntactic features of language; a combinatory machine learning approach
topic Poster Session I
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7234480/
http://dx.doi.org/10.1093/schbul/sbaa031.202
work_keys_str_mv AT voppelalban s136classifyingschizophreniausingphonologicalsemanticandsyntacticfeaturesoflanguageacombinatorymachinelearningapproach
AT deboerjanna s136classifyingschizophreniausingphonologicalsemanticandsyntacticfeaturesoflanguageacombinatorymachinelearningapproach
AT slegersfleur s136classifyingschizophreniausingphonologicalsemanticandsyntacticfeaturesoflanguageacombinatorymachinelearningapproach
AT schnackhugo s136classifyingschizophreniausingphonologicalsemanticandsyntacticfeaturesoflanguageacombinatorymachinelearningapproach
AT sommeriris s136classifyingschizophreniausingphonologicalsemanticandsyntacticfeaturesoflanguageacombinatorymachinelearningapproach