Ground Truth Spanish Automatic Extractive Text Summarization Bounds

Matias Mendoza, Griselda Areli; GARCIA HERNANDEZ, RENE ARNULFO; Ledeneva, Yulia; HERNANDEZ CASTAÑEDA, ANGEL; Mihail, Aleksandrov

Mostrar el registro sencillo del objeto digital

dc.contributor.author	Matias Mendoza, Griselda Areli
dc.contributor.author	GARCIA HERNANDEZ, RENE ARNULFO
dc.contributor.author	Ledeneva, Yulia
dc.contributor.author	HERNANDEZ CASTAÑEDA, ANGEL
dc.contributor.author	Mihail, Aleksandrov
dc.creator	Matias Mendoza, Griselda Areli; 559868
dc.creator	GARCIA HERNANDEZ, RENE ARNULFO; 202667
dc.creator	Ledeneva, Yulia;#0000-0003-0766-542X
dc.creator	HERNANDEZ CASTAÑEDA, ANGEL; 447784
dc.creator	Mihail, Aleksandrov;x1349048
dc.date.accessioned	2020-11-13T03:31:36Z
dc.date.available	2020-11-13T03:31:36Z
dc.date.issued	2020-10-12
dc.identifier.issn	2007-9737
dc.identifier.uri	http://hdl.handle.net/20.500.11799/109468
dc.description.abstract	The textual information has accelerated growth in the most spoken languages by native Internet users, such as Chinese, Spanish, English, Arabic, Hindi, Portuguese, Bengali, Russian, among others. It is necessary to innovate the methods of Automatic Text Summarization (ATS) that can extract essential information without reading the entire text. The most competent methods are Extractive ATS (EATS) that extract essential parts of the document (sentences, phrases, or paragraphs) to compose a summary. During the last 60 years of research of EATS, the creation of standard corpus with human-generated summaries and evaluation methods which are highly correlated with human judgments help to increase the number of new state-of-the-art methods. However, these methods are mainly supported for the English language, leaving aside other equally important languages such as Spanish, which is the second most spoken language by natives and the third most used on the Internet. A standard corpus for Spanish EATS (SAETS) is created to evaluate the state-of-the-art methods and systems for the Spanish language. The main contribution consists of a proposal for configuration and evaluation of 5 state-ofthe-art methods, five systems and four heuristics using three evaluation methods (ROUGE, ROUGE-C, and Jensen-Shannon divergence). It is the first time that Jensen-Shannon divergence is used to evaluate AETS. In this paper the ground truth bounds for the Spanish language are presented, which are the heuristics baseline:first, baseline:random, topline and concordance. In addition, the ranking of 30 evaluation tests of the state-of-the-art methods and systems is calculated that forms a benchmark for SAETS.	es
dc.language.iso	eng	es
dc.publisher	Computación y Sistemas	es
dc.rights	openAccess	es
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0
dc.subject	Automatic text summarization	es
dc.subject	Corpus TER	es
dc.subject	Procesamiento de Lenguaje Natural	es
dc.subject	Lingüística Computacional	es
dc.subject.classification	INGENIERÍA Y TECNOLOGÍA
dc.title	Ground Truth Spanish Automatic Extractive Text Summarization Bounds	es
dc.type	Artículo	es
dc.provenance	Científica	es
dc.road	Dorada	es
dc.organismo	Unidad Académica Profesional Tianguistenco	es
dc.ambito	Internacional	es
dc.cve.CenCos	31201	es
dc.audience	students	es
dc.audience	researchers	es
dc.type.conacyt	article
dc.identificator	7

Ficheros en el objeto digital

Nombre: 3484-7721-1-PB (1).pdf

Tamaño: 672.5Kb

Formato: PDF

Ver documento

Este ítem aparece en la(s) siguiente(s) colección(ones)

Conacyt [10019]
Científica [44]

Visualización del Documento

Título
Ground Truth Spanish Automatic Extractive Text Summarization Bounds
Autor
Matias Mendoza, Griselda Areli
GARCIA HERNANDEZ, RENE ARNULFO
Ledeneva, Yulia
HERNANDEZ CASTAÑEDA, ANGEL
Mihail, Aleksandrov
Fecha de publicación
2020-10-12
Editor
Computación y Sistemas
Tipo de documento
Artículo
Palabras clave
Automatic text summarization
Corpus TER
Procesamiento de Lenguaje Natural
Lingüística Computacional

Los documentos depositados en el Repositorio Institucional de la Universidad Autónoma del Estado de México se encuentran a disposición en Acceso Abierto bajo la licencia Creative Commons: Atribución-NoComercial-SinDerivar 4.0 Internacional (CC BY-NC-ND 4.0)

Mostrar el registro sencillo del objeto digital