A Diagnostic Tool for German Syntax
- John Nerbonne; Klaus Netter; Abdel Kader Diagne; Judith Klein; Ludwig Dickmann
In this paper we describe an ongoing effort to construct a catalogue of syntactic data exemplifying the major syntactic patterns of German. The purpose of the corpus is to support the diagnosis of errors in the syntactic components of natural language processing (NLP) systems. Secondary aims are the evaluation of NLP syntax components and support of theoretical and empirical work on German syntax. The data consist of artificially and systematically constructed expressions, including also negative (ungrammatical) examples. The data are organized into a relational database and annotated with some basic information about the phenomena illustrated and the internal structure of...
Generating Tutorial Feedback with Affect
- Johanna D. Moore; Kaska Porayska-pomsta; Sebastian Varges; Claus Zinn
Studies aimed at understanding what makes human tutoring effective have noted that the type of indirect guidance that characterizes human tutorial dialogue is a key factor. In this paper, we describe an approach that brings together sociolingusitic research on the basis of linguistic choice with natural language generation technology to systematically produce tutorial feedback appropriate to the given situation.
Visualization of Qualitative Locations in Geographic Information Systems
- Xiaobai Yao; Bin Jiang
ABSTRACT: A qualitative location (QL) refers to the reference of a spatial location using linguistic terms such as qualitative descriptions and qualitative spatial relations with other geo-referenced features. Qualitative locations will be increasingly more popular in the future, driven by theoretical, technological, and database developments. Multiplicity and uncertainty are two innate characteristics of QLs. In other words, a QL often has multiple target locations (multiplicity), and the target locations sometimes cannot be pinpointed exactly due to the qualitative nature (uncertainty) of the qualitative descriptions and relations. The presence of the characteristics imposes research challenges on visualization of QL in geographic...
Historical Data and Methodological Issues
- Paul Siegel; Elizabeth A. Martin; Rosalind Bruno; U. S. Census Bureau
Disclaimer: This report is released to inform interested parties of research and to encourage discussion. The views expressed are those of the authors and not necessarily those of the U.S. Census Bureau. Language Use and Linguistic Isolation:
EXTRANS, AN ANSWER EXTRACTION SYSTEM
- Diego Mollá; Rolf Schwitter; Michael Hess; Rachel Fournier
Answer Extraction (AE) systems retrieve phrases in textual documents that directly answer natural language questions. AE over technical manuals requires very high recall and precision, and yet small text units must be retrieved. It is therefore important to perform linguistic analysis in detail. We present ExtrAns, an AE system over Unix manuals that uses full parsing, partial disambiguation, and anaphora resolution to generate the minimal logical forms of the documents and the query. The search procedure uses a proof algorithm of the user query over the Horn clause representation of the minimal logical forms. Remaining ambiguities in the retrieved sentences...
Acquisition of English-Chinese Transliterated Word Pairs from Parallel-Aligned Texts using a Statistical Machine Transliteration Model
- Chun-jen Lee, et al.
This paper presents a framework for extracting English and Chinese transliterated word pairs from parallel texts. The approach is based on the statistical machine transliteration model to exploit the phonetic similarities between English words and corresponding Chinese transliterations. For a given proper noun in English, the proposed method extracts the corresponding transliterated word from the aligned text in Chinese. Under the proposed approach, the parameters of the model are automatically learned from a bilingual proper name list. Experimental results show that the average rates of word and character precision are 86.0 % and 94.4%, respectively. The rates can be further...
2004b) On the linguistic implications of context-bound adult-infant interactions
- Francisco Lacerda; Ellen Marklund; Lisa Lagerkvist; Lisa Gustavsson; Eeva Klintfors; Ulla Sundberg
This poster presents a study of the linguistic information potentially available in adult speech directed to 3-month-old infants. The repetitive nature of the speech directed to young infants and the ecological context of the adult-infant natural interaction setting are analyzed in the light of the “Ecological theory of language acquisition ” proposed by Lacerda et al. (2004, this volume). The analysis of transcripts of adult-infant interaction sessions suggests that enough information to derive general noun associations may be available as a consequence of the particular context of the adult-infant interactions during the early stages of the language acquisition process. 1.
COMBINED GESTURE-SPEECH ANALYSIS AND SPEECH DRIVEN GESTURE SYNTHESIS
- M. E. Sargin; O. Aran; A. Karpov; F. Ofli; Y. Yasinnik; S. Wilson; E. Erzin; Y. Yemez; A. M. Tekalp
Multimodal speech and speaker modeling and recognition are widely accepted as vital aspects of state of the art human-machine interaction systems. While correlations between speech and lip motion as well as speech and facial expressions are widely studied, relatively little work has been done to investigate the correlations between speech and gesture. Detection and modeling of head, hand and arm gestures of a speaker have been studied extensively and these gestures were shown to carry linguistic information. A typical example is the head gesture while saying ”yes/no”. In this study, correlation between gestures and speech is investigated. In speech signal...
Tokenization and proper noun recognition for information retrieval
- Fco Mario Barcala; Jesús Vilares; Miguel A. Alonso; Jorge Graña; Manuel Vilares
In this paper we consider a set of natural language processing techniques that can be used to analyze large amounts of texts, focusing on the advanced tokenizer which accounts for a number of complex linguistic phenomena, as well as for pre-tagging tasks such as proper noun recognition. We also show the results of several experiments performed in order to study the impact of the strategy chosen for the recognition of proper nouns. 1
Automatic acquisition of adjectival subcategorisation from corpora
- Jeremy Yallop; Anna Korhonen; Ted Briscoe
This paper describes a novel system for acquiring adjectival subcategorization frames (SCFs) and associated frequency information from English corpus data. The system incorporates a decision-tree classifier for 30 SCF types which tests for the presence of grammatical relations (GRs) in the output of a robust statistical parser. It uses a powerful patternmatching language to classify GRs into frames hierarchically in a way that mirrors inheritance-based lexica. The experiments show that the system is able to detect SCF types with 70 % precision and 66 % recall rate. A new tool for linguistic annotation of SCFs in corpus data is also...
Acoustic-based improving pronunciation inference using n-best list, acoustics and orthography
- Gopala Krishna Anumanchipalli; Mosur Ravishankar
In this paper, we tackle the problem of pronunciation inference and Out-of-Vocabulary (OOV) enrollment in Automatic Speech Recognition (ASR) applications. We combine linguistic and acoustic information of the OOV word using its spelling and a single instance of its utterance to derive an appropriate phonetic baseform. The novelty of the approach is in its employment of an orthography-driven n-best hypothesis and rescoring strategy of the pronunciation alternatives. We make use of decision trees and heuristic tree search to construct and score the n-best hypotheses space. We use acoustic alignment likelihood and phone transition cost to leverage the empirical evidence and...
- The English; Finnish Australians
The paper discusses an application of a technique to tag a corpus containing the English of Finnish Australians automatically and to analyse the frequency vectors of part-of-speech (POS) trigrams using a permutation test. Our goal is to detect the linguistic sources of the syntactic variation between two groups, the ‘Adults’, who had received their school education in Finland, and the ‘Juveniles’, who were educated in Australia. The idea of the technique is to utilise frequency profiles of trigrams of POS categories as indicators of syntactic distance between the groups and then examine potential effects of language contact and language (‘vernacular’)...
Applying productive derivational morphology to term indexing of Spanish texts
- Jesús Vilares; David Cabrero; Miguel A. Alonso
Abstract. This paper deals with the application of natural language processing techniques to the field of information retrieval. To be precise, we propose the application of morphological families for single term conflation in order to reduce the linguistic variety of indexed documents written in Spanish. A system for automatic generation of morphological families by means of Productive Derivational Morphology is discussed. The main characteristics of this system are the use of a minimum of linguistic resources, a low computational cost, and the independence with respect to the indexing engine. 1
Article Submitted to Computer Speech and Language Contrast in concept-to-speech generation
- Mariët Theune
In concept-to-speech systems, spoken output is generated on the basis of a text that has been produced by the system itself. In such systems, linguistic information from the text generation component may be exploited to achieve a higher prosodic quality of the speech output than can be obtained in a plain text-to-speech system. In this paper we discuss how information from natural language generation can be used to compute prosody in a concept-to-speech system, focusing on the automatic marking of contrastive accents on the basis of information about the preceding discourse. We discuss and compare some formal approaches to this...
KNOWLEDGE EXTRACTION FROM TEXTS BY SINTESI
In this paper we present SINTESI, a system for the knowledge extraction from Italian inputs, currently under development in our re,search centre. It is used on short descriptive diagnostic texts, in order to summarise their technical content and to build a knowledge base on faults. Often in these texts complex linguistic constructions like conjunctions, negations, ellipsis and anaphorae are involved. The presence of extragrammaticalities and of implicit knowledge is also frequent, especially because of the use of a sublanguage. SINTESI extracts the diagnostic information by performing a full text analysis; it is based on a semantics driven approach integrated by...
Learning Hebrew roots: Machine learning with linguistic constraints
- Ezra Daya
The morphology of Semitic languages is unique in the sense that the major word-formation mechanism is an inherently non-concatenative process of interdigitation, whereby two morphemes, a root and a pattern, are interwoven. Identifying the root of a given word in a Semitic language is an important task, in some cases a crucial part of morphological analysis. It is also a non-trivial task, which many humans find challenging. We present a machine learning approach to the problem of extracting roots of Hebrew words. Given the large number of potential roots (thousands), we address the problem as one of combining several classifiers,...
Words that Fascinate the Listener: Predicting Affective Ratings of On-Line Lectures
- Felix Weninger; Pascal Staudt; Björn Schuller
In a large scale study on 843 transcripts of Technology, Entertainment and Design (TED) talks, we address the relation between word usage and categorical affective ratings of lectures by a large group of internet users. Users rated the lectures by assigning one or more predefined tags which relate to the affective state evoked in the audience (e. g., ‘fascinating’, ‘funny’, ‘courageous’, ‘unconvincing ’ or ‘long-winded’). By automatic classification experiments, we demonstrate the usefulness of linguistic features for predicting these subjective ratings. Extensive test runs are conducted to assess the influence of the classifier and feature selection, and individual linguistic features...
Making Requests: A Pragmatic Study of Chinese Mother-Child Dyads
- Nan Meng
The purpose of this study is to investigate different types of requests made by either mother or child in their daily interactions within family environment. The study also aims at addressing how different grammatical structures are used in contex-tualized situations as well as how a certain pragmatic intent is expressed within a specific context. Data were collected from mother-child interactions in different daily routine activities which were audio-recorded within Chinese family environ-ment. The descriptive analysis system is employed to describe the flow of mother-child dyads. The results show that mothers tend to make both direct and indirect requests, using a...
SHORT REPORT Neural correlates of infant accent discrimination: an fNIRS study Alejandrina Cristia,1,2 Yasuyo Minagawa-Kawai,1,3 Natalia Egorova,4
- Judit Gervain; Dominique Cabrol
The present study investigated the neural correlates of infant discrimination of very similar linguistic varieties (Quebecois and Parisian French) using functional Near InfraRed Spectroscopy. In line with previous behavioral and electrophysiological data, there was no evidence that 3-month-olds discriminated the two regional accents, whereas 5-month-olds did, with the locus of discrimination in left anterior perisylvian regions. These neuroimaging results suggest that a developing language network relying crucially on left perisylvian cortices sustains infants ’ discrimination of similar linguistic varieties within this early period of infancy.
SPEAKING LIKE A LOVE ENTREPRENEUR: LANGUAGE CHOICES AND IDEOLOGIES OF SOCIAL MOBILITY AMONG DAUGHTERS OF PEASANTS IN THAILAND’S TOURIST SITES
- Hugo Yu-hsiu Lee
This article addresses an important set of issues: how language choices between dominant, standard and international codes are conditioned by increasingly globalized industries and social models that accompany them and are created by them. Particularly it examines how gender ideologies related to languages and social actions shape love industry workers ’ orientations toward foreign/second language learning and societal language uses. All of these aforementioned issues and data have the potential to make a contribution to the literature (published research materials) on language, discourse and society. This article examines the social meanings of language choices, shifts and the ideologies of differentiation...