Lex: A Software Project for Linguists

Chris Wilson

Abstract


This paper describes a project called Lex, developed initially to assist a linguist with analysis of the Hebrew Bible and now being extended to other languages.

Lex is an implementation of the Role Lexical Module (RLM) by Winther-Nielsen [NWN 08]. It integrates with the tagged Biblical Hebrew corpus of the Workgroep Informatica (WIVU) and provides corpus navigation, and display of morphological and syntactic markup from that corpus. We summarise our use of this corpus and the steps needed to extend our approach to other languages and corpora.

We introduce the Emdros database [USP 08] used by Lex and the WIVU corpus, and explain the reasons for its choice in this project and the advantages that it appears to offer to corpus and computational linguists.

We present the rule-based active chart parser developed for Lex, and its extensions to support free word-order languages, including Dyirbal. We describe the features of attributes and unification which enable arbitrary restrictions on rule combination, to simulate the linguistic template structures of RRG, and generation of logical structures and focus structure. Winther-Nielsen has developed grammatical rules for Biblical Hebrew using this parser, and we believe that it should be powerful enough to work with any written language, and facilitate computational linguistics and fully automated machine translation.

We present an idea for a rule-based morphological analysis system that would work with Lex to enable parsing of other languages while avoiding the need to store each morphological word form in the lexicon. We give examples of the use of this system with Biblical Hebrew.

We present the new database-driven transliteration system in Lex, illustrated with examples from Biblical Hebrew, and describe its potential in developing and testing transliterations of other languages using a test-driven development [KB 02] approach borrowed from software engineering.

We present our ideas for machine translation based on Role and Reference Grammar [VVLP], with optimisations for conversion of the parse tree from the source to the target language.

 

 


Full Text: PDF

Refbacks

  • There are currently no refbacks.