A DICTIONARY CONTENT MANAGEMENT SYSTEM
- I. Alegria; X. Arregi; X. Artola; M. Astiz; L. Ruiz Miyares; Centro De Lingüística Aplicada
This article presents a new dictionary edition environment, easily adaptable to any data representation. Its main features are user-friendliness, third-party tool integration, configuration flexibility, and Unicode support, among others. Automatically generated entry meta-information is used to provide advanced functionality, such as context-dependent tasks, and any changes are immediately mirrored in a WYSIWYG preview. Its client-server design enables to centrally configure all the system, making it easier to maintain and customize.
Conveying spatial information in linguistic human-robot interaction
- Thora Tenbrink
An overview is presented of the range of variability employed by speakers with regard to the specificity of information in spatial communication, focussing on the significance of the context. This variability is described in terms of two underlying dimensions, one of which reflects a dichotomy of underdeterminacy and redundancy, while the other concerns vagueness and precision. Relevant results of several HRI experiments are used for exemplification. 1
of Linguistic Formalisms
- Mo Dule; Pete Whitelo
- Harald Clahsen; Colchester Co Sq; Harald Clahsen; Claudia Felser
The ability to process the linguistic input in real time is crucial for successfully acquiring a language, and yet little is known about how language learners comprehend or produce language in real time. Against this background, we have conducted a detailed study of grammatical processing in language learners using experimental psycholinguistic techniques and comparing different populations (mature native speakers, child first language (L1) and adult second language (L2) learners) as well as different domains of language (morphology and syntax). This article presents an overview of the results from this project and of other previous studies, with the aim of explaining...
Functional segregation of cortical language areas by sentence repetition
- Ghislaine Dehaene-lambertz; Stanislas Dehaene; Jean-luc Anton; Aurelie Campagne; Guillaume P. Dehaene; Isabelle Denghien; Denis Lebihan; Mariano Sigman; Christophe Pallier; Jean-baptiste Poline
Abstract: The functional organization of the perisylvian language network was examined using a functional MRI (fMRI) adaptation paradigm with spoken sentences. In Experiment 1, a given sentence was presented every 14.4 s and repeated two, three, or four times in a row. The study of the temporal properties of the BOLD response revealed a temporal gradient along the dorsal–ventral and rostral–caudal directions: From Heschl’s gyrus, where the fastest responses were recorded, responses became increasingly slower toward the posterior part of the superior temporal gyrus and toward the temporal poles and the left inferior frontal gyrus, where the slowest responses were...
A Diagnostic Tool for German Syntax
- John Nerbonne; Klaus Netter; Abdel Kader Diagne; Judith Klein; Ludwig Dickmann
In this paper we describe an ongoing effort to construct a catalogue of syntactic data exemplifying the major syntactic patterns of German. The purpose of the corpus is to support the diagnosis of errors in the syntactic components of natural language processing (NLP) systems. Secondary aims are the evaluation of NLP syntax components and support of theoretical and empirical work on German syntax. The data consist of artificially and systematically constructed expressions, including also negative (ungrammatical) examples. The data are organized into a relational database and annotated with some basic information about the phenomena illustrated and the internal structure of...
Generating Tutorial Feedback with Affect
- Johanna D. Moore; Kaska Porayska-pomsta; Sebastian Varges; Claus Zinn
Studies aimed at understanding what makes human tutoring effective have noted that the type of indirect guidance that characterizes human tutorial dialogue is a key factor. In this paper, we describe an approach that brings together sociolingusitic research on the basis of linguistic choice with natural language generation technology to systematically produce tutorial feedback appropriate to the given situation.
Visualization of Qualitative Locations in Geographic Information Systems
- Xiaobai Yao; Bin Jiang
ABSTRACT: A qualitative location (QL) refers to the reference of a spatial location using linguistic terms such as qualitative descriptions and qualitative spatial relations with other geo-referenced features. Qualitative locations will be increasingly more popular in the future, driven by theoretical, technological, and database developments. Multiplicity and uncertainty are two innate characteristics of QLs. In other words, a QL often has multiple target locations (multiplicity), and the target locations sometimes cannot be pinpointed exactly due to the qualitative nature (uncertainty) of the qualitative descriptions and relations. The presence of the characteristics imposes research challenges on visualization of QL in geographic...
Historical Data and Methodological Issues
- Paul Siegel; Elizabeth A. Martin; Rosalind Bruno; U. S. Census Bureau
Disclaimer: This report is released to inform interested parties of research and to encourage discussion. The views expressed are those of the authors and not necessarily those of the U.S. Census Bureau. Language Use and Linguistic Isolation:
EXTRANS, AN ANSWER EXTRACTION SYSTEM
- Diego Mollá; Rolf Schwitter; Michael Hess; Rachel Fournier
Answer Extraction (AE) systems retrieve phrases in textual documents that directly answer natural language questions. AE over technical manuals requires very high recall and precision, and yet small text units must be retrieved. It is therefore important to perform linguistic analysis in detail. We present ExtrAns, an AE system over Unix manuals that uses full parsing, partial disambiguation, and anaphora resolution to generate the minimal logical forms of the documents and the query. The search procedure uses a proof algorithm of the user query over the Horn clause representation of the minimal logical forms. Remaining ambiguities in the retrieved sentences...
Acquisition of English-Chinese Transliterated Word Pairs from Parallel-Aligned Texts using a Statistical Machine Transliteration Model
- Chun-jen Lee, et al.
This paper presents a framework for extracting English and Chinese transliterated word pairs from parallel texts. The approach is based on the statistical machine transliteration model to exploit the phonetic similarities between English words and corresponding Chinese transliterations. For a given proper noun in English, the proposed method extracts the corresponding transliterated word from the aligned text in Chinese. Under the proposed approach, the parameters of the model are automatically learned from a bilingual proper name list. Experimental results show that the average rates of word and character precision are 86.0 % and 94.4%, respectively. The rates can be further...
2004b) On the linguistic implications of context-bound adult-infant interactions
- Francisco Lacerda; Ellen Marklund; Lisa Lagerkvist; Lisa Gustavsson; Eeva Klintfors; Ulla Sundberg
This poster presents a study of the linguistic information potentially available in adult speech directed to 3-month-old infants. The repetitive nature of the speech directed to young infants and the ecological context of the adult-infant natural interaction setting are analyzed in the light of the “Ecological theory of language acquisition ” proposed by Lacerda et al. (2004, this volume). The analysis of transcripts of adult-infant interaction sessions suggests that enough information to derive general noun associations may be available as a consequence of the particular context of the adult-infant interactions during the early stages of the language acquisition process. 1.
COMBINED GESTURE-SPEECH ANALYSIS AND SPEECH DRIVEN GESTURE SYNTHESIS
- M. E. Sargin; O. Aran; A. Karpov; F. Ofli; Y. Yasinnik; S. Wilson; E. Erzin; Y. Yemez; A. M. Tekalp
Multimodal speech and speaker modeling and recognition are widely accepted as vital aspects of state of the art human-machine interaction systems. While correlations between speech and lip motion as well as speech and facial expressions are widely studied, relatively little work has been done to investigate the correlations between speech and gesture. Detection and modeling of head, hand and arm gestures of a speaker have been studied extensively and these gestures were shown to carry linguistic information. A typical example is the head gesture while saying ”yes/no”. In this study, correlation between gestures and speech is investigated. In speech signal...
Tokenization and proper noun recognition for information retrieval
- Fco Mario Barcala; Jesús Vilares; Miguel A. Alonso; Jorge Graña; Manuel Vilares
In this paper we consider a set of natural language processing techniques that can be used to analyze large amounts of texts, focusing on the advanced tokenizer which accounts for a number of complex linguistic phenomena, as well as for pre-tagging tasks such as proper noun recognition. We also show the results of several experiments performed in order to study the impact of the strategy chosen for the recognition of proper nouns. 1
Automatic acquisition of adjectival subcategorisation from corpora
- Jeremy Yallop; Anna Korhonen; Ted Briscoe
This paper describes a novel system for acquiring adjectival subcategorization frames (SCFs) and associated frequency information from English corpus data. The system incorporates a decision-tree classifier for 30 SCF types which tests for the presence of grammatical relations (GRs) in the output of a robust statistical parser. It uses a powerful patternmatching language to classify GRs into frames hierarchically in a way that mirrors inheritance-based lexica. The experiments show that the system is able to detect SCF types with 70 % precision and 66 % recall rate. A new tool for linguistic annotation of SCFs in corpus data is also...
Acoustic-based improving pronunciation inference using n-best list, acoustics and orthography
- Gopala Krishna Anumanchipalli; Mosur Ravishankar
In this paper, we tackle the problem of pronunciation inference and Out-of-Vocabulary (OOV) enrollment in Automatic Speech Recognition (ASR) applications. We combine linguistic and acoustic information of the OOV word using its spelling and a single instance of its utterance to derive an appropriate phonetic baseform. The novelty of the approach is in its employment of an orthography-driven n-best hypothesis and rescoring strategy of the pronunciation alternatives. We make use of decision trees and heuristic tree search to construct and score the n-best hypotheses space. We use acoustic alignment likelihood and phone transition cost to leverage the empirical evidence and...
- The English; Finnish Australians
The paper discusses an application of a technique to tag a corpus containing the English of Finnish Australians automatically and to analyse the frequency vectors of part-of-speech (POS) trigrams using a permutation test. Our goal is to detect the linguistic sources of the syntactic variation between two groups, the ‘Adults’, who had received their school education in Finland, and the ‘Juveniles’, who were educated in Australia. The idea of the technique is to utilise frequency profiles of trigrams of POS categories as indicators of syntactic distance between the groups and then examine potential effects of language contact and language (‘vernacular’)...
Applying productive derivational morphology to term indexing of Spanish texts
- Jesús Vilares; David Cabrero; Miguel A. Alonso
Abstract. This paper deals with the application of natural language processing techniques to the field of information retrieval. To be precise, we propose the application of morphological families for single term conflation in order to reduce the linguistic variety of indexed documents written in Spanish. A system for automatic generation of morphological families by means of Productive Derivational Morphology is discussed. The main characteristics of this system are the use of a minimum of linguistic resources, a low computational cost, and the independence with respect to the indexing engine. 1
Article Submitted to Computer Speech and Language Contrast in concept-to-speech generation
- Mariët Theune
In concept-to-speech systems, spoken output is generated on the basis of a text that has been produced by the system itself. In such systems, linguistic information from the text generation component may be exploited to achieve a higher prosodic quality of the speech output than can be obtained in a plain text-to-speech system. In this paper we discuss how information from natural language generation can be used to compute prosody in a concept-to-speech system, focusing on the automatic marking of contrastive accents on the basis of information about the preceding discourse. We discuss and compare some formal approaches to this...
KNOWLEDGE EXTRACTION FROM TEXTS BY SINTESI
In this paper we present SINTESI, a system for the knowledge extraction from Italian inputs, currently under development in our re,search centre. It is used on short descriptive diagnostic texts, in order to summarise their technical content and to build a knowledge base on faults. Often in these texts complex linguistic constructions like conjunctions, negations, ellipsis and anaphorae are involved. The presence of extragrammaticalities and of implicit knowledge is also frequent, especially because of the use of a sublanguage. SINTESI extracts the diagnostic information by performing a full text analysis; it is based on a semantics driven approach integrated by...