Resumen:
The extraction of relationships in natural language processing (NLP) is a task that consists of identifying interactions between entities within a text. This approach facilitates comprehension of context and meaning. In the medical field, this is of particular significance due to the substantial volume of information contained in scientific articles. This paper explores various training strategies for medical relationship extraction using large pre-trained language models. The findings indicate significant variations in performance between models trained with general domain data and those specialized in the medical domain. Furthermore, a methodology is proposed that utilizes language models for relation extraction with hyperparameter optimization techniques. This approach uses a triplet-based system. It provides a framework for the organization of relationships between entities and facilitates the development of medical knowledge graphs in the Spanish language. The training process was conducted using a dataset constructed and validated by medical experts. The dataset under consideration focused on relationships between entities, including anatomy, medications, and diseases. The final model demonstrated an 85.9% accuracy rate in the relationship classification task, thereby substantiating the efficacy of the proposed approach.
Descripción:
Un artículo derivado de la tesis de Doctorado en Ciencias de la Computación, en el CU Texcoco. Trata de la extracción de información de documentos de texto en español y conversión a un grafo de conocimiento sobre textos médicos.