Automated Extraction Of Tags From The Penn Treebank
- John Chen; K. Vijay-Shanker
The accuracy of statistical parsing models can be improved with the use of lexical information. Statistical parsing using Lexicalized tree adjoining grammar (LTAG), a kind of lexicalized grammar, has remained relatively unexplored. We believe that is largely in part due to the absence of large corpora accurately bracketed in terms of a perspicuous yet broad coverage LTAG. Our work attempts to alleviate this difficulty. We extract different LTAGs from the Penn Treebank. We show that certain strategies yield an improved extracted LTAG in terms of compactness, broad coverage, and supertagging accuracy. Furthermore, we perform a preliminary investigation in smoothing these...
Developing A 3d-Agent For The August Dialogue System
- Magnus Lundeberg; Jonas Beskow
In our continuing work with multimodal text-tospeech synthesis with high quality for speechreading, a new talking head has been developed with the purpose of acting as an interactive agent in a dialogue system, set up in a public exhibition area in downtown Stockholm. The new agent conforms to the same set of basic control parameters as our earlier faces, allowing us to control it using existing rules for visual speech synthesis. To add to the realism and believability of the dialogue system, the agent has been given a rich repertoire of extra-linguistic gestures and expressions, including emotional cues, turn-taking signals...
Generation Of Multiple Hypothesis In Connected Phonetic-Unit Recognition By A Modified One-Stage Dynamic Programming Algorithm
- Jos Mari; Enrique Monte; Dpto Teora; Seal Comunicaciones Upc. Spain
One of the most popular algorithms for connected word (or subword phonetic unit) recognition is the one-stage dynamic programming algorithm. In its available formulation, this algorithm is not designed to provide multiple hypothesis; such a limitation is currently becoming a drawback, since the need of alternative recognitions in the systems presently under research is being acknowledged. This paper introduces a modified version of the one-stage dynamic programming algorithm tailored to afford multiple hypothesis. I.- INTRODUCTION Most of continuous speech recognition systems being currently under research realize separately the acoustic and linguistic analysis of utterances to be recognized. In other words,...
Parrot-talk Requires Multiple Context Dimensions
- Sabine Geldof
The analysis of human-generated utterances reveals that not only the linguistic (i.e. discourse) context but also the physical context and the user profile of the hearer should be considered when aiming at 'natural` language generation (NLG) embedded in a real-life situation. We propose a framework that allows for annotating the propositional content of sentences to be generated along these three dimensions of context and illustrate this with concrete examples. The context of our research is the COMRIS project, where text is generated for output on a wearable device (parrot).
Computing Quantifier Scope
- Edward P. Stabler; A Puzzle
This paper provides a preliminary account of this sort. Providing such an account has become more challenging in some recent derivational approaches to syntax. If some lexical item has a syntactic requirement which is met in the course of a derivation, there may be no need to assume that the requirement is in some significant way still present in the derived structure. A "checked" or fulfilled requirement that has no further role may be regarded as a deleted syntactic feature. This perspective on syntactic derivation, according to which key features of the structures are deleted in the course of a...
Conceptual Distance and Automatic Spelling Correction
- E. Agirre; X. Arregi; X. Artola; A. Díaz de Ilarraza; A. D��az De Ilarraza; K. Sarasola
. Text from different sources usually arrives under imperfect conditions. When an anomalous word is detected automatic word recognisers produce a list of candidates from which only one is correct. A variety of techniques have been devised to discriminate among the possible correction candidates. The project we are involved in tries to exploit linguistic knowledge in Spelling Correction. A preliminary investigation shows syntactic discrimination not to be enough. The gap could be covered by semantic techniques like conceptual distance. Basically, we define conceptual distance between two concepts as the shortest path length in the hierarchies of the lexical knowledge base...
Perception of late peak in Japanese: Experimental Proposal
- Mafuyu Kitahara; L Jong
Introduction Tonal alignment has been one of the most active research areas in laboratory phonology. In English, the alignment of tonal targets does not exactly fall on the center of the alleged accented syllable but varies systematically according to certain conditions, such as tonal context, word boundary location, and prosodic phrasing (Silverman & Pierrehumbert, 1990). In Dutch, tonal context also plays a role in the alignment of falling tone (Caspers & Heuven, 1993). Cross linguistic studies, for example between English and German (Grabe, 1998), and between Dutch and Greek (Mennen, 1998) show that some aspects of tonal alignment are language...
Modelling Multivariate Data by Neuro-Fuzzy Systems
- Jianwei Zhang; Alois Knoll
This paper proposes an approach for solving multivariate modelling problems with neuro-fuzzy systems. Instead of using selected input variables, statistical indices are extracted to feed the fuzzy controller. The original input space is transformed into an eigenspace. If a sequence of training data are sampled in a local context, a small number of eigenvectors which possess larger eigenvalues provide a good summary of all the original variables. Fuzzy controllers can be trained for mapping the input projection in the eigenspace to the outputs. Implementations with the prediction of time series validate the concept. 1 Multivariate Problems in Modelling For efficiently...
Towards Transparent Control of Large and Complex Systems
- Jianwei Zhang; Alois Knoll
System Identi cation. Unlike with Markovian Decision Processes, some systems' output depends not only on the current state, but also the previous input/output. As a training data set for nonlinear system identi cation, the Box-Jenkins gas furnace data [BJ70] is often studied and compared. The furnace input is the gas ow rate x(t), the output y(t) is the CO 2 concentration. At least 10 candidate inputs are considered: x(t 6); x(t 5); : : : ; x(t 1); y(t 1); : : : ; y(t 4). If all of them are used, building a fuzzy controller means to solve a...
Automating Feature Set Selection for Case-Based Learning of Linguistic Knowledge
- Claire Cardie
This paper addresses the issue of "algorithm vs. representation" for case-based learning of linguistic knowledge. We first present empirical evidence that the success of case-based learning methods for natural language processing tasks depends to a large degree on the feature set used to describe the training instances. Next, we present a technique for automating feature set selection for case-based learning of linguistic knowledge. Given as input a baseline case representation, the method modifies the representation in response to a number of predefined linguistic biases by adding, deleting, and weighting features appropriately. We apply the linguistic bias approach to feature set...
Discourse Effects On The Prosodic Properties Of Repetitions In Human-Computer Interaction
- Kerstin Fischer
Repetitions may occur in human-computer interaction for various reasons; in this paper the constraints on the use of repetitions and their prosodic realization in the communication with a (simulated) automatic speech processing system which is not functioning properly will be analysed. It will be shown that repeats may have certain phonetic and prosodic properties which the respective original utterances do not necessarily display; however, besides these local changes depending on the immediate sequential context, the use of linguistic strategies such as repetitions changes globally throughout the dialogue. Thus, both the occurrence of repeated utterances and their prosodic realization depend on...
Phonetic-Distance-Based Hypothesis Driven Lexical Adaptation For Transcribing Multlingual Broadcast News
- Petra Geutner; Michael Finke; Alex Waibel
High out-of-vocabulary (OOV) rates are one of the most prevailing problems for languages with a rapid vocabulary growth due to a large number of inflections. Especially when transcribing SerboCroatian and German broadcast news, the OOV-rate is between 8.7% and 4.5%. Hypothesis Driven Lexical Adaptation (HDLA) has already been shown to decrease high OOV-rates significantly by using morphology-based linguistic knowledge. This paper introduces another approach to dynamically adapt a recognition lexicon to the utterance to be recognized. Instead of morphological knowledge about word stems and inflection endings, distance measures based on Levenstein distance are used. Results based on phoneme and grapheme...
Annotation Graphs as a Framework for Multidimensional Linguistic Data Analysis
- Steven Bird; Mark Liberman
In recent work we have presented a formal framework for linguistic annotation based on labeled acyclic digraphs. These `annotation graphs' offer a simple yet powerful method for representing complex annotation structures incorporating hierarchy and overlap. Here, we motivate and illustrate our approach using discourse-level annotations of text and speech data drawn from the CALLHOME, COCONUT, MUC-7, DAMSL and TRAINS annotation schemes. With the help of domain specialists, we have constructed a hybrid multi-level annotation for a fragment of the Boston University Radio Speech Corpus which includes the following levels: segment, word, breath, ToBI, Tilt, Treebank, coreference and named entity. We...
Beyond N-Grams: Can Linguistic Sophistication Improve Language Modeling?
- Eric Brill; Radu Florian; John C. Henderson; Lidia Mangu
It seems obvious that a successful model of natural language would incorporate a great deal of both linguistic and world knowledge. Interestingly, state of the art language models for speech recognition are based on a very crude linguistic model, namely conditioning the probability of a word on a small fixed number of preceding words. Despite many attempts to incorporate more sophisticated information into the models, the n-gram model remains the state of the art, used in virtually all speech recognition systems. In this paper we address the question of whether there is hope in improving language modeling by incorporating more...
Compositional Semantics for Unification-based Linguistic Formalisms
- Shuly Wintner
Contemporary linguistic formalisms have become so rigorous that it is now possible to view them as very high level declarative programming languages. Consequently, grammars for natural languages can be viewed as programs; this view enables the application of various methods and techniques that were proved useful for programming languages to the study of natural languages. This paper adapts the notion of program composition, well developed in the context of logic programming languages, to the domain of linguistic formalisms. We study alternative definitions for the semantics of such formalisms, suggesting a denotational semantics that we show to be compositional and fully-abstract....
Phrase-based Information Retrieval
- Arampatzis Tsoris; A. T. Arampatzis
In this article we describe a retrieval schema which goes beyond the classical information retrieval keyword hypothesis and takes into account also linguistic variation. Guided by the failures and successes of other state-of-the-art approaches, as well as our own experience with the Irena system, our approach is based on phrases and incorporates linguistic resources and processors. In this respect, we introduce the Phrase Retrieval Hypothesis to replace the Keyword Retrieval Hypothesis. We suggest a representation of phrases suitable for indexing, and an architecture for such a retrieval system. Syntactical normalization is introduced to improve retrieval effectiveness. Morphological and lexico-semantical normalizations...
A Logical Formalism For Intergrammatical Representations
- Vincenzo Manca
This paper presents a coherent notational system of fundamental concepts for traditional grammar. Let us consider the usual concepts of linguistic analysis, e.g.: proposition, predicate, substantive, subject, attribute, apposition, complement, genitive, modifier, determiner, coordination, subordination, anaphora, deixis, sentence, quantification, ellipsis. These concepts are used in usual grammars on the basis of their informal linguistic evidence and often with criteria that depend on the particular language they refer to. Neverthless, their theoretical relevance is based on very deep roots in Western philosophy from Plato, Aristotle and Dionysus Trax, to the Speculative Grammar of Scholastic philosophy, and eventually to the `Grammar' and...
`Click and Listen': A Case Study of the Development of a CALL Package
- Anne King; Rob Procter
We describe the resourcing, development, use and evaluation of interactive multimedia courseware in language and linguistics teaching and learning at three institutions of higher education in Edinburgh who collaborated on the `Click and Listen' project. It also addresses several topics, arising from our experiences with `Click and Listen', some that are of relevence to CAL generally and others that are particularly salient to CAL in the sphere of linguistic education. They are: Benefits, Added Value and Good Practice (which touch inevitably on the related areas of professional and vocational training); Functional and Cultural Integration and Present and Future Perspectives. 1...
A Database Model for Object Dynamics
- M. P. Papazoglou; B.J. Krämer
. To effectively model complex applications in which constantly changing situations can be represented, a database system must be able to support the runtime specification of structural and behavioral nuances for objects on an individual or group basis. This paper introduces the role mechanism as an extension of objectoriented databases to support unanticipated behavioral oscillations for objects that may attain many types and share a single object identity. A role refers to the ability to represent object dynamics by seamlessly integrating idiosyncratic behavior, possibly in response to external events, with pre-existing object behavior specified at instance creation time. In this...
Interfaces as locus of historical change - Workshop: Grammaticalization and Linguistic Theory
- Miriam Butt
this paper, I take a look at a particular V-V complex predicate which occurs in both Urdu/Hindi and Bengali