Научно-практический рецензируемый журнал
"Современные проблемы здравоохранения
и медицинской статистики"
Scientific journal «Current problems of health care and medical statistics»
Новости научно-практического рецензируемого журнала
Больше новостей

Диагностика и профилактика преждевременного старения

Организация здравоохранения

USING TEXT MINING TO CLASSIFY ARTICLES IN A SYSTEMATIC REVIEW OF MEDICINE

D.N. Begun1, E.V. Gavrilova1, N.V. Mirzaeva1, O.V. Golovko1, N.V. Zarishnyak1
1. Orenburg State Medical University (OrSMU), Orenburg, Russian Federation
Full file PDF (622 Kb)
Summary:
Introduction. Systematic reviews are a relatively new application of text mining technologies. The purpose of the study is to use text mining methods using the RapidMiner program to pre-process and select articles in a systematic review. Materials and methods. A search for publications was carried out using the keywords “nursing education” AND “teaching models” OR “teaching methods” in the Medline databases through the PubMed NLM (www.pubmed.com) and ScienceDirect (sciencedirect.com) system interface. Inclusion criteria: articles in English published on the topic of review from 2014 to 2023. The WebHarvy parser (SysNucleus company) made it possible to collect data and import it into M.Excel with the following columns highlighted - “Title”, “Author”, “Journal” ", "Year", "Output", "Abstract", a total of 8451 publications. Using the RapidMiner program (Altair company), pre-processing of the text was carried out, followed by clustering (K-Meams method) and classification - topic modeling method using latent Dirichlet distribution (LDA). Results. 305 documents were removed (11 were missing values, 295 were duplicates) and the set amounted to 8146 documents. Number of publications on the review topic from 2014 to 2023. increased 7 times (from 219 to 1601 articles). Articles were published in 1171 journals, but most of them in 9 journals, the total number of authors is 6884. The clustering method made it possible to identify 4 clusters of publications, which allow us to judge only the main topic of each cluster. Using the topic modeling method, 10 topics of articles were identified, which make it possible to judge not only the topics of the articles, but also the types of articles. Thus, Topic_6 includes review articles on nursing education in education. Discussion. The most important step in text classification is choosing the best classifier. Latent Dirichlet Allocation (LDA) - Treats each document as a mixture of topics and each topic as a mixture of words and identifies hidden topics in the data corpus. Conclusions (conclusion). In our case, applying topic modeling using Latent Dirichlet Allocation (LDA) gave very good results, but it may turn out that this model will not be effective on a different data set.
Keywords review article, text mining, RapidMiner, clustering, topic modeling

Bibliographic reference:
D.N. Begun, E.V. Gavrilova, N.V. Mirzaeva, O.V. Golovko, N.V. Zarishnyak, USING TEXT MINING TO CLASSIFY ARTICLES IN A SYSTEMATIC REVIEW OF MEDICINE // Scientific journal «Current problems of health care and medical statistics». - 2024. - №1;
URL: http://healthproblem.ru/magazines?textEn=1256 (date of access: 21.11.2024).

Code to embed on your website or blog:

Article views:
Today 7 | Week 7 | Total: 201