Beyond Named Entity Recognition
Semantic labelling for NLP tasks
Centro Cultural
de Belem
LISBON, Portugal
25th may 2004
In Association with
4th INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION
LREC2004
Main conference 26-27-28 May 2004

Workshop Program (NEW)
Paper Abstracts

Motivation
and Aims
Although it is
generally assumed that improvements in language processing will be made
through the integration of linguistic information and statistical techniques,
the reality is that language is very diverse and looking for specific
patterns of words that repeat enough to be statistically significant tends
not to be a very fruitful task: sequences longer than three words are not
generally repeated often enough to be statistically significant. At the same
time, the identification of named entities: names, dates, places,
organizations etc., has proved to be a very usefulpreliminary task in many
natural language processing systems. We are interested in pursuing approaches
which extend this notion by identifying and labeling other semantic
information in a text, in such a way as to allow repeatable semantic patterns
to emerge. Our interest is in
attacking the data sparseness problem by exploring ways to collapse
(semantically) related phrases which are expressed by different word
sequences.
As this seems closely related to previously
proposed class-based language models (see for example Brown et al. 90 in
Computational Linguistics), it is different in that the empirical notion of
classes used in the previous work (e.g. classes made up of collocationally
similar words) are replaced by semantically justified sets.
Notice how Name Entity (NE) tagging and Word
Sense Disambiguation (WSD) represent, in terms of granularity and
representational complexity, two extremes of a single general problem:
semantic disambiguation. Semantic disambiguation serves thus the purpose of
improving the generalization power of statistical models. One of the
questions here is how to determine a suitable level of clustering (for NE
identification and for WSD) that would lead to high accuracy and to performance
improvement by obtained statistical models.
Reason of Interest
It is to be noticed that several independent
research efforts that focused recently on the statistical treatment of
semantic phenomena (e.g. WordNet navigation as a stochastic process, as
studied in Light and Abney or in Ciaramita & Johnson, 2003) correlated
highly with the research program proposed above.
The workshop will offer a forum where
experience from lexical semantics and statistical learning will be presented
and fruitfuldiscussion among researchers in both fields will be promoted. The
workshop is expected to attract researchers and practitioners from a range of
areas as well as developers of large scale semantic resources who are
interested in effective methods of semantic labeling.
Topics (to be addressed in the workshop include,
but are not limited to)
- Methods
for lexical - semantic annotation of corpora
- Methods
and standards for lexical semantic representation of dictionary
information
- Lexico-semantic
taxonomies
- Existing
sources of classification: dictionaries, thesauri and computerized
ontologies
- Corpus-driven
methods for semantic disambiguation
- Feature
selection for semantic disambiguation
- Lexico-semantic
tagging of very large corpora
- Algorithms
and methods for disambiguation of semantic phenomena
- Statistical
learning models and their applications to semantic labeling
- Computational
learning frameworks for Natural Language Learning
- Semi-supervised
and unsupervised statistical semantic disambiguation
- Evaluation
of semantic disambiguation
Workshop format
The workshop
will be a half-day event with position statements from invited speakers (half
an hour each) with two hours for 4-6 presentations of scientific papers. Submissions
are intended to present works in progress and more completed works which fall
within the scope defined by the topics listed above. A final 1 hour open discussion among
all the workshop participants will be moderated by the organizers. In order
to stimulate an interesting general discussion, each member of the program
committee will be invited to submit a position statement of max. 1000 words.
Submission
Participants are invited to submit an
extended abstract of max. 3500 words concerning one or more of the topics of
interest. Each accepted paper receives a slot of 25 minutes for presentation
(15 minutes talk and 10 minutes for discussion). Each submission should show:
title; author(s); affiliation(s); and contact author's e-mail address, postal
address, telephone and fax numbers. Submissions must be sent electronically
in PDF to the following adddress:
Roberto Basili
Dept. of Computer Science, Systems and Management
University of Roma Tor Vergata
e-mail:
basili@info.uniroma2.it
Proceedings and
Publications
Proceedings
of the workshop will be printed by the LREC Local Organising Committee.
Organizers
are negotiating for the publications of a special issue on “Semantic
tagging/labelling for NLP tasks” with the Computer Speech and
Language Journal and selected papers will appear on in that issue.
Important dates
|
Extended
abstract submission (max. 3500 words)
|
16th of February 2004
|
|
Notification
of acceptance
|
8th of March 2004
|
|
Preliminary
Program
|
29th of March 2004
|
|
Submission
of the final version of paper
|
5th of April 2004
|
|
Workshop
|
25th May 2004
|
|
|
|
Organizing
Committee
Louise
Guthrie - University of Sheffield, UK
Roberto
Basili - University of Rome, Tor Vergata, Italy
Eva
Hajicova - Charles University, Czech Republic
Fred
Jelinek - Johns Hopkins University, Maryland, USA

Further Information
For any information related to the organization,
please contact:
Roberto
Basili
e-mail: basili@info.uniroma2.it
Dept. of
Computer Science, Systems and Management
University of Roma Tor Vergata
Via di Tor Vergata
00133 Roma (ITALY)
tel:
+39 06 72597391
fax: +39 06 72597460

|