A Formal Framework for Linguistic Annotation
- Steven Bird; Mark Liberman
`Linguistic annotation' covers any descriptive or analytic notations applied to raw language data. The basic data may be in the form of time functions -- audio, video and/or physiological recordings -- or it may be textual. The added notations may include transcriptions of all sorts (from phonetic features to discourse structures), part-of-speech and sense tagging, syntactic analysis, `named entity' identification, co-reference annotation, and so on. While there are several ongoing efforts to provide formats and tools for such annotations and to publish annotated linguistic databases, the lack of widely accepted standards is becoming a critical problem. Proposed standards, to the...
COLOR-X: Object Modeling profits from Linguistics
- J. F. M. Burg; R.P. van de Riet
This paper describes a linguistically based object modeling technique for modeling Information and Communication Systems. This technique is a combination of a linguistically based, formal conceptual modeling language and a high-level graphical analysis and design method. The process of modeling Information and Communication Systems is interactively supported by a Lexicon, which delivers correct information that the analyst and designer use as a base for their final models. Our modeling technique and the supporting lexicon facilitates the modeling process and results in models that are consistent and complete. Keywords: Object Model, Linguistics, Cpl, Information and Communication Systems, Lexicon 1 INTRODUCTION This...
Integrating Different Learning Approaches into a Multilingual Spoken Language Translation System
- P. Geutner; B. Suhm; F.-D. Buø; T. Kemp; L. Mayfield; A. E. Mcnair; I. Rogina; T. Schultz; T. Sloboda; W. Ward; M. Woszczyna; A. Waibel
Building multilingual spoken language translation systems requires knowledge about both acoustic models and language models of each language to be translated. Our multilingual translation system JANUS-2 is able to translate English and German spoken input into either English, German, Spanish, Japanese or Korean output. Getting optimal acoustic and language models as well as developing adequate dictionaries for all these languages requires a lot of hand-tuning and is time-consuming and labor intensive. In this paper we will present learning techniques that improve acoustic models by automatically adapting codebook sizes, a learning algorithm that increases and adapts phonetic dictionaries for the recognition...
The Impact of Linguistics on Conceptual Models: Consistency and Understandability
- J. F. M. Burg; R.P. van de Riet
This paper describes a vision in which linguistic knowledge and theories are introduced into conceptual modeling, and it sums up the advantages achieved by this approach. We will show how the extension of conceptual modeling techniques with linguistic theories increases their expressive power, the capability to formalize wellknown conceptual aspects, like object roles and constraints, and their internal consistency. Furthermore, we will explain the advantages gained from using such an extended conceptual modeling technique by describing the adjustments and improvements of the modeling process itself and the extensions to the validation and verification process of the sophisticated models. We will...
A Fuzzy Approach to Complex Linguistic Query Based Image Retrieval
- Swarup Medasani And; Swarup Medasani; Raghu Krishnapuram
The current trend in the rapid growth of on-line image databases has brought forth several innovative approaches to content-based image retrieval. Most current techniques retrieve images based on an example image or object shapes/features extracted from images. Retrieval based on linguistic queries has not recieved much attention. In this paper, we present a fuzzy connective approach to handle complex linguistic queries consisting of multiple attributes. We represent each (fuzzy) attribute in a complex query by a (multi-dimensional) membership function. The degree to which an image satisfies the attribute is obtained by finding the membership value of the feature vector corresponding...
GETESS - Searching the Web Exploiting German Texts
- Steffen Staab; Christian Braun; Ilvio Bruder; Antje Düsterhöft; Andreas Heuer; Meike Klettke; Günter Neumann; Bernd Prager; Jan Pretzel; Hans-Peter Schnurr; Rudi Studer
. We present an intelligent information agent that uses semantic methods and natural language processing capabilites in order to gather tourist information from the WWW and present it to the human user in an intuitive, user-friendly way. Thereby, the information agent is designed such that as background knowledge and linguistic coverage increase, its benefits improve, while it guarantees state-of-the-art information and database retrieval capabilities as its bottom line. 1 Introduction Due to the vast amounts of information in the WWW, its users have more and more difficulties finding the information they are looking for among the many heterogeneous information resources....
Statistical Analysis of Dialogue Structure
- Ye-Yi Wang; Alex Waibel
We introduce a statistical model for dialogues. We describe a dynamic programming algorithm that can be used to bracket a dialogue into segments and label each segment with its speech act. We evaluate the performance of the model. We also use this model for language modelling and get perplexity reduction. 1 INTRODUCTION Dialogue structure provides important information for spoken language understanding. This structure comprises the current topic, discourse state, and speech act, etc. Many researchers used topic information to reduce the perplexity of a task [1, 2]. In our experiments, we also found that dialogue structure information also helps to...
Experiments With Lvcsr Based Language Identification
- T. Schultz; I. Rogina; A. Waibel
Automatic language identification is an important problem in building multilingual speech recognition and understanding systems. We have developed a front-end LID module based on LVCSR to identify English, German, and Spanish language for use in spontaneous speech-to-speech translation. We studied the constitution of different levels of knowledge to identify a language, i.e. the phonetic, phonotactic, lexical, and syntactic-semantic knowledge. A comparison of LID systems using different levels of these knowledge sources is presented. We showed that the incorporation of lexical and linguistic knowledge leads to a reduction of the language identification error by up to 50%. 1. INTRODUCTION In recent...
Lvcsr-Based Language Identification
- T. Schultz; I. Rogina; A. Waibel
Automatic language identification is an important problem in building multilingual speech recognition and understanding systems. Building a language identification module for four languages we studied the influence of applying different levels of knowledge sources on a large vocabulary continuous speech recognition (LVCSR) approach, i.e. the phonetic, phonotactic, lexical, and syntactic-semantic knowledge. The resulting language identification (LID) module can identify spontaneous speech input and can be used as a frontend for our multilingual speech-to-speech translation system JANUS-II. A comparison of five LID systems showed that the incorporation of lexical and linguistic knowledge reduces the language identification error for the 2-language tests...
Plan-based Event Representations for the Analysis of Tense and Aspect
- Guido Boella; Rossana Damiano
. In this paper, a representation formalism based on actions and hierarchical plans is proposed to model the aspectual and temporal composition of sentences. Action verbs are interpreted as (possibly underspecified) instances of action schemata that include a plan body; the interpretation process is carried out in an incremental way: the other linguistic elements, like tense and adverbs, are evaluated by means of rules that add constraints to event representations. Pragmatic factors related to the communicative situations are accounted for by introducing defeasible rules for conversational implicatures that determine the telicity of a given description. 1 Introduction According to ,...
Associating semantic components with intersective Levin classes
- Hoa Trang Dang; Joseph Rosenzweig; Martha Palmer
This paper examines the question of differences between a traditional interlingua approach and a transferbased approach that uses cross-linguistic semantic features to generalize its transfer lexicon entries, and concludes that the two approaches share a common interest in lexical classifications that can be distinguished by cross-linguistic semantic features. The paper goes on to discuss current approaches to English classification, Levin classes  and WordNet . We present a refinement of Levin classes - Intersective Classes - that shows interesting correlations to WordNet and that makes more explicit the semantic components that serve to distinguish different classes. Tradition holds that an...
Definiteness in the Hebrew Noun Phrase
- Shuly Wintner
This paper suggests an analysis of Modern Hebrew noun phrases in the framework of HPSG. It focuses on the peculiar properties of the definite article, including the requirement for definiteness agreement among various elements in the noun phrase, definiteness inheritance in construct-state nominals, the fact that the article does not combine with constructs and the similarities between construct-state nouns and adjectives. Central to our analysis is the assumption that the Hebrew definite article is an affix, rather than a clitic or a stand-alone word. Several arguments, from all levels of linguistic representation, are provided to justify this claim. Adopting the...
Multidimensional Exploration of Online Linguistic Field Data
- Steven Bird
Advances in storage technology make it possible to house virtually unlimited quantities of recorded speech data online. Advances in character-encoding technology make it possible to create platform-independent transcriptions. Advances in web technology make it possible to publish this data for essentially no marginal cost. These developments have profound consequences for the accessibility, quality and quantity of linguistic field data. Recordings become accessible. Transcriptions become verifiable. Large corpora become manageable. In order to illustrate the potential for this mode of operation in field linguistics, I describe a piece of online fieldwork involving a tone language of Cameroon. A complex verb paradigm...
Computer based spelling remediation for dyslexic children: A comparison of rule based and mastery techniques.
- Angela J. Fawcett; Roderick I. Nicolson; Stephanie Morris
The SelfSpell programs provide a multimedia environment for dyslexic children which uses synthesised speech to augment the written text. In earlier research we established that by encouraging users to enter a rule to help them remember how to spell each word, Selfspell was very effective in improving spelling ability. The evaluation study reported here confirmed the efficacy of the rule-based approach using a group of 11 year old dyslexic children with severe impairments in spelling. Of particular theoretical significance, however, was the finding that use of a mastery learning technique for learning spellings was just as effective as the rule-based...
Discourse Multiple Dependencies
- Claire Gardent
It is sometimes claimed (cf. [Mann/Thompson 1988, Scha/Polanyi 1988, Webber 1991, Gardent 1991, Prust 1992]) that discourse has a tree structure which reflects the semantic structure of discourse. In this paper, I argue that this claim is problematic in cases of discourse multiple dependencies i.e. cases where one discourse segment is semantically related to two discourse segments. I develop a discourse framework which is based on [Scha/Polanyi 1988] but integrates ideas from Feature-based Tree Adjoining Grammars (FTAGs). I then show that this new framework adequately captures multiple dependencies whilst retaining the precise linguistic predictions made by the discourse grammar. In...
Linguistic Variation in Information Retrieval and Filtering
- Arampatzis Van
In this paper, a natural language approach to Information Retrieval (IR) and Information Filtering (IF) is described. Rather than keywords, noun-phrases are used for both document description and as query language, resulting in a marked improvement of retrieval precision. Recall then can be enhanced by applying normalization to the noun-phrases and some other constructions. This new approach is incorporated in the Information Filtering Project Profile 1 . The overall structure of the Profile project is described, focusing especially on the Parsing Engine involved in the natural language processing. Effectiveness and efficiency issues are elaborated concerning the Parsing Engine. The major...
Describing Business Processes with a Guided Use Case Approach
- Selmin Nurcan; Georges Grosz; Carine Souveyet
Business Process (BP) improvement and alike require accurate descriptions of the BPs. We suggest to describe BPs as use case specifications. A use case specification comprises a description of the context of the BP, the interactions between the agents involved in the BP, the interactions of these agents with an automated system supporting the BP and attached system internal requirements. Constructing such specifications remains a difficult task. Our proposal is to use textual scenarios as inputs, describing fragments of the BP, and to guide, using a set of rules, their incremental production and integration in a use case specification also...
Derivation of Fuzzy Classification Rules from Multidimensional Data
- F. Klawonn; R. Kruse
This paper describes techniques for deriving fuzzy classification rules based on special modified fuzzy clustering algorithms. The basic idea is that each fuzzy cluster induces a fuzzy classification rule. The fuzzy sets appearing in a rule associated with a fuzzy cluster are obtained by projecting the cluster to the one-- dimensional coordinate spaces. In order to allow clusters of varying shape and size we derive special fuzzy clustering algorithms which are searching for clusters in the form of axes--parallel hyper--ellipsoids. Our method can be applied to classification tasks where the classification of the sample data is known as well as...
Complexity and the Induction of Tree Adjoining Grammars
- Robin Clark; Robin Clark
this paper, I will develop the formal foundations of a theory of complexity that underlies theory of grammatical induction. The initial concern will be the learning theoretic foundations of linguistic locality. That is, I will develop a theory that will place a bounds on the amount a learner can draw from an input text. These bounds will limit the amount of variation that could potentially be encoded within a parameter space. A fully developed form of the theory will place a tangible upper limit on what the learner can induce from the input text. I will first turn to a...
Guiding Scenario Authoring
- C. Ben Achour
Since a few years, scenario based requirements engineering approaches have gained in popularity. Textual scenarios are narrative descriptions of flows of actions between agents. They are often proposed to elicit, validate or document requirements. The CREWS experience has shown that the advantage of scenarios is their easiness of use, and that their disadvantage stands in the lack of guidelines for authoring. In this article, we propose guidance for the authoring of scenarios. The guided scenario authoring process is divided into two main stages: the writing of scenarios, and the correcting of scenarios. To guide the writing of scenarios, we provide...