Cargando…

Automatic classification of written descriptions by healthy adults: An overview of the application of natural language processing and machine learning techniques to clinical discourse analysis

Discourse production is an important aspect in the evaluation of brain-injured individuals. We believe that studies comparing the performance of brain-injured subjects with that of healthy controls must use groups with compatible education. A pioneering application of machine learning methods using...

Descripción completa

Detalles Bibliográficos
Autores principales: Toledo, Cíntia Matsuda, Cunha, Andre, Scarton, Carolina, Aluísio, Sandra
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Associação de Neurologia Cognitiva e do Comportamento 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5619399/
https://www.ncbi.nlm.nih.gov/pubmed/29213908
http://dx.doi.org/10.1590/S1980-57642014DN83000006
_version_ 1783267394918023168
author Toledo, Cíntia Matsuda
Cunha, Andre
Scarton, Carolina
Aluísio, Sandra
author_facet Toledo, Cíntia Matsuda
Cunha, Andre
Scarton, Carolina
Aluísio, Sandra
author_sort Toledo, Cíntia Matsuda
collection PubMed
description Discourse production is an important aspect in the evaluation of brain-injured individuals. We believe that studies comparing the performance of brain-injured subjects with that of healthy controls must use groups with compatible education. A pioneering application of machine learning methods using Brazilian Portuguese for clinical purposes is described, highlighting education as an important variable in the Brazilian scenario. OBJECTIVE: The aims were to describe how to: (i) develop machine learning classifiers using features generated by natural language processing tools to distinguish descriptions produced by healthy individuals into classes based on their years of education; and (ii) automatically identify the features that best distinguish the groups. METHODS: The approach proposed here extracts linguistic features automatically from the written descriptions with the aid of two Natural Language Processing tools: Coh-Metrix-Port and AIC. It also includes nine task-specific features (three new ones, two extracted manually, besides description time; type of scene described – simple or complex; presentation order – which type of picture was described first; and age). In this study, the descriptions by 144 of the subjects studied in Toledo(18) were used,which included 200 healthy Brazilians of both genders. RESULTS AND CONCLUSION: A Support Vector Machine (SVM) with a radial basis function (RBF) kernel is the most recommended approach for the binary classification of our data, classifying three of the four initial classes. CfsSubsetEval (CFS) is a strong candidate to replace manual feature selection methods.
format Online
Article
Text
id pubmed-5619399
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Associação de Neurologia Cognitiva e do Comportamento
record_format MEDLINE/PubMed
spelling pubmed-56193992017-12-06 Automatic classification of written descriptions by healthy adults: An overview of the application of natural language processing and machine learning techniques to clinical discourse analysis Toledo, Cíntia Matsuda Cunha, Andre Scarton, Carolina Aluísio, Sandra Dement Neuropsychol Original Articles Discourse production is an important aspect in the evaluation of brain-injured individuals. We believe that studies comparing the performance of brain-injured subjects with that of healthy controls must use groups with compatible education. A pioneering application of machine learning methods using Brazilian Portuguese for clinical purposes is described, highlighting education as an important variable in the Brazilian scenario. OBJECTIVE: The aims were to describe how to: (i) develop machine learning classifiers using features generated by natural language processing tools to distinguish descriptions produced by healthy individuals into classes based on their years of education; and (ii) automatically identify the features that best distinguish the groups. METHODS: The approach proposed here extracts linguistic features automatically from the written descriptions with the aid of two Natural Language Processing tools: Coh-Metrix-Port and AIC. It also includes nine task-specific features (three new ones, two extracted manually, besides description time; type of scene described – simple or complex; presentation order – which type of picture was described first; and age). In this study, the descriptions by 144 of the subjects studied in Toledo(18) were used,which included 200 healthy Brazilians of both genders. RESULTS AND CONCLUSION: A Support Vector Machine (SVM) with a radial basis function (RBF) kernel is the most recommended approach for the binary classification of our data, classifying three of the four initial classes. CfsSubsetEval (CFS) is a strong candidate to replace manual feature selection methods. Associação de Neurologia Cognitiva e do Comportamento 2014 /pmc/articles/PMC5619399/ /pubmed/29213908 http://dx.doi.org/10.1590/S1980-57642014DN83000006 Text en http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Articles
Toledo, Cíntia Matsuda
Cunha, Andre
Scarton, Carolina
Aluísio, Sandra
Automatic classification of written descriptions by healthy adults: An overview of the application of natural language processing and machine learning techniques to clinical discourse analysis
title Automatic classification of written descriptions by healthy adults: An overview of the application of natural language processing and machine learning techniques to clinical discourse analysis
title_full Automatic classification of written descriptions by healthy adults: An overview of the application of natural language processing and machine learning techniques to clinical discourse analysis
title_fullStr Automatic classification of written descriptions by healthy adults: An overview of the application of natural language processing and machine learning techniques to clinical discourse analysis
title_full_unstemmed Automatic classification of written descriptions by healthy adults: An overview of the application of natural language processing and machine learning techniques to clinical discourse analysis
title_short Automatic classification of written descriptions by healthy adults: An overview of the application of natural language processing and machine learning techniques to clinical discourse analysis
title_sort automatic classification of written descriptions by healthy adults: an overview of the application of natural language processing and machine learning techniques to clinical discourse analysis
topic Original Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5619399/
https://www.ncbi.nlm.nih.gov/pubmed/29213908
http://dx.doi.org/10.1590/S1980-57642014DN83000006
work_keys_str_mv AT toledocintiamatsuda automaticclassificationofwrittendescriptionsbyhealthyadultsanoverviewoftheapplicationofnaturallanguageprocessingandmachinelearningtechniquestoclinicaldiscourseanalysis
AT cunhaandre automaticclassificationofwrittendescriptionsbyhealthyadultsanoverviewoftheapplicationofnaturallanguageprocessingandmachinelearningtechniquestoclinicaldiscourseanalysis
AT scartoncarolina automaticclassificationofwrittendescriptionsbyhealthyadultsanoverviewoftheapplicationofnaturallanguageprocessingandmachinelearningtechniquestoclinicaldiscourseanalysis
AT aluisiosandra automaticclassificationofwrittendescriptionsbyhealthyadultsanoverviewoftheapplicationofnaturallanguageprocessingandmachinelearningtechniquestoclinicaldiscourseanalysis