Mostrar el registro sencillo del objeto digital

dc.contributor.author Matias Mendoza, Griselda Areli
dc.contributor.author GARCIA HERNANDEZ, RENE ARNULFO
dc.contributor.author Ledeneva, Yulia
dc.contributor.author HERNANDEZ CASTAÑEDA, ANGEL
dc.contributor.author Mihail, Aleksandrov
dc.creator Matias Mendoza, Griselda Areli; 559868
dc.creator GARCIA HERNANDEZ, RENE ARNULFO; 202667
dc.creator Ledeneva, Yulia;#0000-0003-0766-542X
dc.creator HERNANDEZ CASTAÑEDA, ANGEL; 447784
dc.creator Mihail, Aleksandrov;x1349048
dc.date.accessioned 2020-11-13T03:31:36Z
dc.date.available 2020-11-13T03:31:36Z
dc.date.issued 2020-10-12
dc.identifier.issn 2007-9737
dc.identifier.uri http://hdl.handle.net/20.500.11799/109468
dc.description.abstract The textual information has accelerated growth in the most spoken languages by native Internet users, such as Chinese, Spanish, English, Arabic, Hindi, Portuguese, Bengali, Russian, among others. It is necessary to innovate the methods of Automatic Text Summarization (ATS) that can extract essential information without reading the entire text. The most competent methods are Extractive ATS (EATS) that extract essential parts of the document (sentences, phrases, or paragraphs) to compose a summary. During the last 60 years of research of EATS, the creation of standard corpus with human-generated summaries and evaluation methods which are highly correlated with human judgments help to increase the number of new state-of-the-art methods. However, these methods are mainly supported for the English language, leaving aside other equally important languages such as Spanish, which is the second most spoken language by natives and the third most used on the Internet. A standard corpus for Spanish EATS (SAETS) is created to evaluate the state-of-the-art methods and systems for the Spanish language. The main contribution consists of a proposal for configuration and evaluation of 5 state-ofthe-art methods, five systems and four heuristics using three evaluation methods (ROUGE, ROUGE-C, and Jensen-Shannon divergence). It is the first time that Jensen-Shannon divergence is used to evaluate AETS. In this paper the ground truth bounds for the Spanish language are presented, which are the heuristics baseline:first, baseline:random, topline and concordance. In addition, the ranking of 30 evaluation tests of the state-of-the-art methods and systems is calculated that forms a benchmark for SAETS. es
dc.language.iso eng es
dc.publisher Computación y Sistemas es
dc.rights openAccess es
dc.rights.uri http://creativecommons.org/licenses/by-nc-nd/4.0
dc.subject Automatic text summarization es
dc.subject Corpus TER es
dc.subject Procesamiento de Lenguaje Natural es
dc.subject Lingüística Computacional es
dc.subject.classification INGENIERÍA Y TECNOLOGÍA
dc.title Ground Truth Spanish Automatic Extractive Text Summarization Bounds es
dc.type Artículo es
dc.provenance Científica es
dc.road Dorada es
dc.organismo Unidad Académica Profesional Tianguistenco es
dc.ambito Internacional es
dc.cve.CenCos 31201 es
dc.audience students es
dc.audience researchers es
dc.type.conacyt article
dc.identificator 7


Ficheros en el objeto digital

Este ítem aparece en la(s) siguiente(s) colección(ones)

Visualización del Documento

  • Título
  • Ground Truth Spanish Automatic Extractive Text Summarization Bounds
  • Autor
  • Matias Mendoza, Griselda Areli
  • GARCIA HERNANDEZ, RENE ARNULFO
  • Ledeneva, Yulia
  • HERNANDEZ CASTAÑEDA, ANGEL
  • Mihail, Aleksandrov
  • Fecha de publicación
  • 2020-10-12
  • Editor
  • Computación y Sistemas
  • Tipo de documento
  • Artículo
  • Palabras clave
  • Automatic text summarization
  • Corpus TER
  • Procesamiento de Lenguaje Natural
  • Lingüística Computacional
  • Los documentos depositados en el Repositorio Institucional de la Universidad Autónoma del Estado de México se encuentran a disposición en Acceso Abierto bajo la licencia Creative Commons: Atribución-NoComercial-SinDerivar 4.0 Internacional (CC BY-NC-ND 4.0)

Mostrar el registro sencillo del objeto digital

openAccess Excepto si se señala otra cosa, la licencia del ítem se describe cómo openAccess

Buscar en RI


Buscar en RI

Usuario

Estadísticas