Resumen:
AnaPro is software that solves direct anaphora in Spanish,
specifically pronouns: it finds the noun or group of words to
which the pronoun refers. It locates in the previous sentenc
es the referent or antecedent
which the pronoun replaces.
An example of a direct anaphora solved is the pronoun “
he” in the sentence “He is sad.” Much of the work on
anaphora has been done for texts in English; thus
, we specifically focus on Spanish documents.
AnaPro directly supports text analys
is (to understand what a document says
), a non trivial task since there are
different writing styles, references, idiomatic expressions,
etc. The problem grows if t
he analyzer is a computer,
because they lack “common sense” (which persons possess)
. Hence, before text analysis, its preprocessing is
required, in order to assign tags (noun, verb,...) to
each word, find the stems, disambiguate nouns, verbs,
prepositions, identify colloquial expressions, i
dentify and resolve anaphor
a, among other chores.
AnaPro works for Spanish sentences. It is a novel procedure,
since it is automatic (no user intervenes during the
resolution) and it does not need dictionaries. It employs heu
ristics procedures to discover the semantics and help in
the decisions; they are rather easy to implement and use li
mited knowledge. Nevertheless,
its results are good (81%
of correct answers, at least). However, more
tests will give a better idea of its goodness.
Descripción:
Introduction
Anaphora is a relation of coreference between
linguistic terms. According to Webster’s dictionary: “It
is the use of a grammatical substitute (as a pronoun
or a pro-verb) to refer to the denotation of a
preceding word or group of words;
also
: the relation
between a grammatical substitute and its
antecedent.” Therefore, anaphora is a discourse
relation. Anaphora resolution is very important in
Natural Language Processing (NLP).
This work is part of Project OM* (Ontology Merging),
which seeks to build a large ontology by fusing
smaller ontologies extracted from textual documents.
An important part of the project is to analyze the
sentences in a document with the goal to transform
that text into an ontology that comprises its contents.
A brief description of Project OM* follows.