NLP Term Project: Word Sense Disambiguation

NLP Term Project: Word Sense Disambiguation

Original description by 洋蔥先輩. Edited by TA.

Introduction to WSD

In the English language, a word may have many meanings (or senses). Consider the following example:

A paper round or weekend job doesn’t pay enough.
P. Dixon, Postgraduate Medical Journal (in paper No 17/Napp July 1991).

Both passages above contain the word paper, but the sense of paper is different in each passage. In the first passage, paper refers to newspaper. Whereas in the second passage, paper refers to research paper. The task of word sense disambiguation (WSD) is to identify the sense of a word in a given context.

Project Description

In this project, you will be performing WSD on a set of words with multiple senses. You will be provided a corpus containing sentences where these ambiguous words appear in and your job is to identify the correct sense for each sentence. The corpus consists of two parts. The first part contains sentences where the ambiguous words are already labelled with the correct senses. This is the training corpus. You can use it to construct your WSD algorithm by applying the rule-based approach or the corpus-based approach or a combination of the two.

The second part of the corpus is the testing corpus. The testing corpus will contain unlabelled sentences. You will use the WSD algorithm that you developed on the testing corpus. The answer key for the testing corpus will not be provided initially, because we do not want you to “optimize” your WSD algorithm to suit the testing corpus. But the answer key for the testing corpus will be provided near the due-date of the project so that you can evaluate and discuss the performance of your WSD algorithm in your project report.

Corpus Description

In this project, you will be provided with the following corpus files:

eng-lex-sample.training.xml : training corpus
eng-lex-sample.test.xml : testing corpus
eng-lex-sample.training.key : answer key to the training corpus
eng-lex-sample.test.key : answer key to the testing corpus (this file will be provided near the due-date of the project)

eng-lex-sample.training.xml

eng-lex-sample.training.xml is in an XML format. The structure is as follows:

The section for each unique word are enclosed by the tags <lexelt></lexelt>. For example, the word art has the tag <lexelt item=”art.n”> and the word authority has the tag <lexelt item=”authority.n”>. Each section consists of a set of instances. For example, the first instance of art is

<instance id=”art.40001” docsrc=”bnc_ACN_245”>
<answer instance=”art.40001” senseid=”art%1:06:00::”/>
<context>
Their multiscreen projections of slides and film loops have featured in orbital parties, at the Astoria and Heaven, in Rifat Ozbek’s 1988/89 fashion shows, and at Energy’s recent Doc klands all-dayer. From their residency at the Fridge during the first summer of love, Halo used slide and fil m projectors to throw up a collage of op-art patterns, film loops of dancers like E-Boy and Wumni, and unique fractals derived from video feedback. &bquo;We’re not aware of creating a visual identify for the house scene, because we’re righ t in there. We see a dancer at a rave, film him later that week, and project him at the next rave.&equo ; [hi]Ben Lewis [/hi] Halo can be contacted on 071 738 3248. [ptr][/p] [caption] <head>Art</head>you can dance to from the creative group called Halo [/caption] [/div2] [di v2] [head] </context>
</instance>

In the above example, the ID of the instance is art.40001, as stated by <instance id=”art.40001” docsrc=”bnc_ACN_245”>. The correct sense of art in this context is art%1:06:00::, as stated by <answer instance=”art.40001” senseid=”art%1:06:00::”/>. For this example, your job is identify the correct sense of art which appears in <context>. The occurrence of art in context is enclosed by the tags <head></head>. By the way, a context may contain more than one occurrence of the word we wish to disambiuate. In this case, there will be multiple <head></head> tags, but a word will have only one sense in a given context even if the word appears several times in the context.

eng-lex-sample.test.xml

The format of eng-lex-sample.test.xml is the same as eng-lex-sample.training.xml except that entries in eng-lex-sample.test.xml do not have the <answer> tags.

eng-lex-sample.training.key

Each entry of eng-lex-sample.training.key has the following format:

word instance_id list_of_possible_senses

For example, the answer key for instance art.40082 of the word art is

art art.40082 art%1:04:00:: arts%1:09:00::

The two possible answers of instance art.40082 are art%1:04:00:: and arts%1:09:00:: . Using either answer is fine.

eng-lex-sample.test.key

eng-lex-sample.test.key has the same format as eng-lex-sample.train.key.

Performance Evaluation

Your result output file needs to be in the same format as the answer keys. Howevever, you will supply only a maximum of one sense to each instance. Once your outputs are generated, download the following two files to evaluate your performance.

score : it’s a Python script for evaluating the performance. You need to run it on a machinen with Python installed.
sensemap : it’s needed by the score program. You don’t have to know what the content means.

To evaluate your performance, use the following command:

python score your_answer answer_key sensemap

A sample output for the command “python score dummy.answer eng-lex-sample.test.key sensemap” is

Fine-grained score for “dummy.answer” using key “eng-lex-sample.test.key”:
precision: 0.642 (2780.00 correct of 4328.00 attempted)
recall: 0.642 (2780.00 correct of 4328.00 in total)
attempted: 100.000 % (4328.00 attempted of 4328.00 in total)

Project Hints

You can use any programming language you want. You are allowed to use existing tools such as dictionaries and classifiers, but you cannot use a WSD software. However, you can compare the performance of other WSD software to the performance of your system. Some useful tools are

WordNet : a good dictionary
SVM-Light : a svm software
Rainbow : a naive bayes software
TreeTagger : a part-of-speech tagger

The TA will not provide any technical support for these tools. Google is your best friend.

Project Schedule and Groups

You will form groups of 3 to 4 people. The schedule for the remainder of the course is

12/7/2006 : Email your group member list to the TA
1/4/2007 : Final Exam
1/4/2007 : eng-lex-sample.test.key available for download
1/11/2007 : Project Presentation
1/11/2007 : Submit your first hand-in before 11:59pm
1/18/2007 : Project Presentation
1/18/2007 : Submit your second hand-in before 11:59pm

Each group will have to give a presentation on the project. Each presentation will last at most 20 minutes. The schedule for the presentation will be posted on the course website after all the groups are formed. The presentation order will be decided randomly to ensure fairness.

Project Hand-in

The project hand-in consists of two parts.

First Hand-in

For the first part, each group will have to hand in the following:

Source code of program
The answer output files of your system for training and testing phases
A README file explaining the content of all your submitted files

The first part has to be handed in on Jan. 11, 2007.

Second Hand-in

For the second part, each group will have to hand-in the following:

Source code of program
The answer output files of your system for training and testing phases
Presentation slides
Report of your system (does not have to be formal)
Report of which group member is responsible for which part of the project
A README file explaining the content of all your submitted files

Note that your source code and output files can be different from the first hand-in, if you wish to improve your system. But indicate in your report about your enhancements. The second part has to be handed in on Jan. 18, 2007. Don’t forget to include the names of your team members in the report.

Hand-in Format

Compress the hand-in files into a zip file and email them to the TA (ecqgon@gmail.com).

Marks for Your System

You will be graded primarily on the performance of your WSD system. Don’t despair if you do not achieve the top performance, because innovation will also be taken into account.