Efficient Implementation of a Semantic-based Transfer Approach
- Michael Dorna,Martin C. Emele
. This article gives an overview of a new semantic-based
transfer approach developed and applied within the Verbmobil Machine
Translation project . We present the declarative transfer
formalism and discuss its implementation. The results presented in
this paper have been integrated successfully in the Verbmobil system.
The application domain of the Verbmobil Machine Translation (MT)
project [16, 22] is spontaneous spoken language in face-to-face dialogs.
The scenario is restricted to the task of arranging business
meetings, but the approach is intended to be extensible to other
topics as well. The languages involved are English, German and
Japanese. Apart from linguistic and cognitive research, this project
is also a software-engineering challenge....
Identifying Procedural Relations in Text
- Donia R. Scott,Anthony Hartley,Judy Delin,Simon Attfield
We have developed a methodology for analysing corpora of instructional text to record the mappings from aspects of the underlying task being described to surface linguistic expressions (Delin et al., 1994). We describe here an investigation of the reliability of judgements of two procedural relations, GENERATION and ENABLEMENT, from text. Our results show that such judgements can be reliably made by non-trained subjects, thus providing support for the validity of our methodology.
Resistance is Futile; Formal Linguistic Observations on Design Patterns
- Peter Van Emde Boas
Inspection of the current literature on Design Patterns shows that the Prime Directive
for this community is Pragmatics. It hardly matters what patterns are, or how
Patterns are represented formally or syntactically. What does matter is their role in
enhancing the reuse of good solutions to recurring problems.
In this article I want to show that minimal assumptions about the pragmatic use
of Patterns suffice to show that Design Patterns form just another formal language,
which can be shown to be at least Recursively enumerable. Whether the language
is Recursive depends on further conditions on the actual relation which is assumed
to hold between a pattern and its...
The Role of Semantics in Spoken Dialogue Translation Systems
- Scott Mcglashan
In this paper, we consider the role of semantics in the spoken dialogue translation systems.
We begin by looking at some of the key properties of an existing spoken dialogue system, namely
the sundial system which provides flight and train information over the telephone, and how
these properties affect the design methodology and functionality of spoken translation systems.
These properties include the effects of speech processing, designing the system to meet the needs
of users, and an analysis model which clearly separates the linguistic, conceptual, pragmatic
and task levels. In this model many task functionalities are dependent upon, and sometimes
realizable by, the semantic and pragmatic analysis...
A Connectionist Model for Bootstrap Learning of Syllabic Structure
- Jean Vroomen,Antal Van Den Bosch,Contact Antal,Bosch Jean Vroomen
We report on a series of experiments with simple recurrent networks (srns)
solving phoneme prediction in continuous phonemic data. The purpose of the
experiments is to investigate whether the network output could function as a
source for syllable boundary detection. We show that this is possible, using
a generalisation of the network which resembles the linguistic sonority principle.
We argue that the primary generalisation of the network, i.e., the fact
that sonority varies in a hat-shaped way across phonemic strings, ending and
starting at syllable boundaries, is an indication that sonority might be a major
cue in discovering the essential building bricks of language when confronted
with unsegmented running...
A Diagnostic Tool for German Syntax
- Intelligenz Gmbh,John Nerbonne,Klaus Netter,Abdel Kader Diagne,Ludwig Dickmann,Deutsches Forschungszentrum,K Unstliche Intelligenz
In this paper we describe an effort to construct a catalogue of syntactic data, exemplifying the
major syntactic patterns of German. The purpose of the corpus is to support the diagnosis of
errors in the syntactic components of natural language processing (NLP) systems. Two secondary
aims are the evaluation of NLP systems components and the support of theoretical and
empirical work on German syntax.
The data consist of artificially and systematically constructed expressions, including also negative
(ungrammatical) examples. The data are organized into a relational data base and annotated
with some basic information about the phenomena illustrated and the internal structure
of the sample sentences. The organization of...
A Simple Introduction to Maximum Entropy Models for Natural Language Processing
Many problems in natural language processing can be viewed as linguistic
classification problems, in which linguistic contexts are used to predict
linguistic classes. Maximum entropy models offer a clean way to combine
diverse pieces of contextual evidence in order to estimate the probability
of a certain linguistic class occurring with a certain linguistic context.
This report demonstrates the use of a particular maximum entropy
model on an example problem, and then proves some relevant mathematical
facts about the model in a simple and accessible manner. This report
also describes an existing procedure called Generalized Iterative Scaling,
which estimates the parameters of this particular model. The goal of this
Enhancing Design Methods to Support Real Design Processes
- Barbara Staudt Lerner,Stanley M. Sutton,Leon J. Osterweil
Software design methods typically focus on the activities that individual designers should perform
under ideal circumstances. They rarely, if ever, address the activities that should be performed when
things do not go according to plan, such as when a customer requests changes to the specification, or
when early design decisions must be changed. They also rarely address issues involving coordination
of multiple designers in cooperative design tasks or in competition for limited resources. We are investigating
fundamental concepts required for more complete definition of design methods, developing
linguistic mechanisms within a process programming language to support these concepts, and validating
these through the definition of a process...
Grammar Formalisms Viewed as Evolving Algebras
- David E. Johnson,Lawrence S. Moss
We consider the use of evolving algebra methods of specifying grammars for natural
languages. We are especially interested in distributed evolving algebras. We provide the
motivation for doing this, and we give a reconstruction of some classical grammatical
formalisms in directly dynamic terms. Finally, we consider some technical questions
arising from the use of direct dynamism in grammatical formalisms.
Formal work in linguistics has both produced and used important mathematical tools. It led
to formal language theory, and later developments in that field have found their way back
to linguistics. But in addition, ideas originally developed for other applications have been
incorporated into linguistic research. This paper...
Towards a Peircean model of language
- Guy Debrock,Janos Sarbo
We argue that traditional approaches to natural language suffer
from the `fallacy of misplaced concreteness'. Because `language' is
a noun, and nouns usually refer to `things', it is often assumed that
language is some `thing' with a certain immutable structure and
properties. This problem of language modelling is also witnessed
by the limited success of phrase structure-based parsers in natural
language processing. One reason for this lies in the rigidity of hierarchical
structure on the one hand, as opposed to the high flexibility
of language use on the other.
It will be argued that language is in the first place a process, and
that this assumption puts the task of...
Diagnostic Evaluation in Linguistic Word Recognition
- Julie Carson-berndsen And,Martina Pampel,Gehort Antragsabschnitt,Interaktive Phonologische Interpretation
This report is concerned with a new method of evaluation for the
Linguistic Word Recognition component of the Verbmobil-Project:
Architektur. A two stage model of diagnostic evaluation is presented
consisting of logical and empirical evaluation steps. Logical evaluation
is carried out according to a data model which acts as optimal
input in order that each component participating in the evaluation
process can be tested for soundness and completeness. Inconsistencies
can thus be remedied before empirical evaluation of the model is undertaken
using real data. The diagnostic evaluation method has been
operationalised within the Bielefeld Extended Evaluation Toolkit for
Lattices of Events (BEETLE).
This report is concerned with a new approach...
A Dependency-based Approach to Bounded Unbounded Movement
This paper addresses the treatment of movement phenomena within multimodal categorial,
or type-logical, grammar systems. Multimodal approaches allow different modes of logical
behaviour to be displayed within a single system. Intuitively, this characteristic corresponds
to making available different modes of linguistic description within a single formalism. A
key benefit of taking a multimodal approach is that it allows us to choose, for any linguistic
phenomenon addressed, a level of description that encodes only the aspects of linguistic
structure that are relevant to the treatment of that phenomenon. In practice, this means
that we may lexically encode linguistic information which is relevant to one phenomenon
but not another, but...
. The words of language contain a nontrivial amount of linguistic
information, such as what the word means, how it may be used in a sentence,
and how it is to be spoken and written. In this article, we consider the computational
problem of learning the linguistic structure of a novel word, as well
as that of learning the "overall" morpho-phonology of a language. Our main
result is that the computational problem of acquiring the morpho-phonology
of a language is NP-complete.
The words of language contain a nontrivial amount of linguistic information,
such as what the word means, how it may be used in a sentence, and...
Lattice-Based Word Identification In Clare
- David M. Carter
I argue that because of spelling and typing
errors and other properties of typed text, the
identification of words and word boundaries
in general requires syntactic and semantic
knowledge. A lattice representation is therefore
appropriate for lexical analysis. I show
how the use of such a representation in the
CLARE system allows different kinds of hypothesis
about word identity to be integrated
in a uniform framework. I then describe a
quantitative evaluation of CLARE's performance
on a set of sentences into which typographic
errors have been introduced. The
results show that syntax and semantics can be
applied as powerful sources of constraint on
the possible corrections for misspelled words.
In many language processing systems, uncertainty
Text Expansion: A Question/Answer Explanation based Approach
- Yllias Chali
this paper the related work to my PhD thesis. It deals with an explanation approach for the
generation of extended versions of text. The explanation is considered in the sense that an expansion is intended
to make what is evoked into the text version understood. In order to give an account of this aspect, we resort
to the notion of text questionnability defined by Virbel (1996), while exploiting the question/answer structure
(Nespoulous and Virbel, 1991).
The appropriate framework for the representation of the question/answer structure is that proposed by
Harris (1968) which allows to represent a text in terms of syntactic and semantic structure knowledges, and
Probability-Driven Lexical Classification: A Corpus-Based Approach
- Tony C. Smith,Ian H. Witten
Successful grammatical inference from a corpus of
linguistic material rests largely on the ability to tag
the words of the corpus with appropriate lexical categories.
Static tagging methods, such as dictionary
lookup, often misclassify words associated with multiple
categories, and adaptive statistical taggers or
context-based corrective taggers may still have error
rates of 3 or 4 percent. Even a small proportion of
lexical misclassification may lead a syntax induction
mechanism to produce an extraordinarily large number
of special-case rules.
By treating grammar induction as a "bootstrapping
" problem in which it is necessary to simultaneously
discover a set of categories and a set of rules
defined over them, lexical tagging is relieved of constraints
Estimating Performance of Pipelined Spoken Language Translation Systems
- Manny Rayner,David Carter,Patti Price,Bertil Lyberg
Most spoken language translation systems developed to date rely
on a pipelined architecture, in which the main stages are speech recognition,
linguistic analysis, transfer, generation and speech synthesis.
When making projections of error rates for systems of this kind, it is
natural to assume that the error rates for the individual components
are independent, making the system accuracy the product of the component
The paper reports experiments carried out using the SRI-SICSTelia
Research Spoken Language Translator and a 1000-utterance sample
of unseen data. The results suggest that the naive performance
model leads to serious overestimates of system error rates, since there
are in fact strong dependencies between the components. Predicting
Language Identification Incorporating Lexical Information
- D. Matrouf,M. Adda-decker,L. F. Lamel,J. L. Gauvain
In this paper we explore the use of lexical information
for language identification (LID). Our reference LID system
uses language-dependent acoustic phone models and
phone-based bigram language models. For each language,
lexical information is introduced by augmenting the phone
vocabulary with the N most frequent words in the training
data. Combined phone and word bigram models are used
to provide linguistic constraints during acoustic decoding.
Experiments were carried out on a 4-language telephone
speech corpus. Using lexical information achieves a relative
error reduction of about 20% on spontaneous and read
speech compared to the reference phone-based system.
Identification rates of 92%, 96% and 99% are achieved
for spontaneous, read and task-specific speech segments...
Lexical Attraction Models of Language
- Deniz Yuret
Abstract ID: A229
This paper presents lexical attraction models of language,
in which the only explicitly represented linguistic
knowledge is the likelihood of pairwise relations between
words. This is in contrast with models that represent
linguistic knowledge in terms of a lexicon, which
assigns categories to each word, and a grammar, which
expresses possible combinations in terms of these categories.
The word-based nature and the simplicity of
lexical attraction models make them good candidates
for experiments in language learning. I introduce an
unsupervised learning algorithm that uses lexical attraction
and gives accuracy results comparable to supervised
Content Areas: Natural Language Processing #
Techniques or Algorithms # statistical or corpus based
methods, Machine Learning and Discovery...
Statistical Language Modeling For Speech Disfluencies
- Andreas Stolcke,Elizabeth Shriberg
Speech disfluencies (such as filled pauses, repetitions, restarts) are
among the characteristics distinguishing spontaneous speech from
planned or read speech. We introduce a language model that predicts
disfluencies probabilistically and uses anedited,fluent context
to predict following words. The model is based on a generalization
of the standard N-gram languagemodel. It uses dynamic programming
to compute the probability of a word sequence, taking into
account possible hidden disfluency events. We analyze the model
's performance for various disfluency types on the Switchboard
corpus. We find that the model reduces word perplexity in the
neighborhood of disfluency events; however, overall differences
are small and have no significant impact on recognition accuracy.
We also note...