Automatic and Data Driven Pitch Contour Manipulation with Functional Data Analysis
- Michele Gubian; Francesco Cangemi; Lou Boves
Creating stimuli for perceptual experiments in intonation research involves manipulation of pitch contours extracted from spoken utterances. Difficulties arise when changes in the contour shape need to be applied globally and smoothly in the whole pitch curve. Moreover, it is hard to relate a gradual modification in some contour trait to its perceptual counterpart. In this paper we propose a novel approach to stimuli manipulation that is based on an extension of Principal Component Analysis (PCA). Starting from a corpus of pitch curves a parametric description of the principal variation in the curve set is obtained. This allows to locate...
Quantitative Modeling of the Neural Representation of Nouns and Phrases
- Kai-min Kevin Chang
Recent advances in functional Magnetic Resonance Imaging (fMRI) offer a significant new approach to studying semantic representations in humans by making it possible to directly observe brain activity while people comprehend words and sentences. In the proposed work, we used fMRI to study the cortical systems that underpin semantic representation while people comprehended linguistic concepts like concrete objects, adjective-noun phrases, or noun-noun concept combinations. The thesis of this research is that the distributed pattern of neural activity encodes the meanings of linguistic concepts and intermediate semantic representation can be used to model how brain composes the meaning of words or...
2008. Generation of output style variation in the SAMMIE dialogue system
- Ivana Kruijff-korbayová; Ciprian Gerstenberger; Olga Kukina
A dialogue system can present itself and/or address the user as an active agent by means of linguistic constructions in personal style, or suppress agentivity by using impersonal style. We describe how we generate and control personal and impersonal style variation in the output of SAMMIE, a multimodal in-car dialogue system for an MP3 player. We carried out an experiment to compare subjective evaluation judgments and input style alignment behavior of users interacting with versions of the system generating output in personal vs. impersonal style. Although our results are consistent with earlier findings obtained with simulated systems, the effects are...
The emergence of a language in an evolving population of neural networks
- Angelo Cangelosi; Domenico Parisi
The evolution of language implies the parallel evolution of an ability to respond appropriately to signals (language understanding) and an ability to produce the appropriate signals in the appropriate circumstances (language production). When linguistic signals are produced to inform other individuals, individuals that respond appropriately to these signals may increase their reproductive chances but it is less clear what is the reproductive advantage for the languages producers. We present simulations in which populations of neural networks living in an environment evolve a simple language with an informative function. Signals are produced to help other individuals to categorize edible and poisonous...
Identifying Synonymous Expressions from a Bilingual Corpus for Example-Based Machine Translation
- Mitsuo Shimohata; Eiichiro Sumita
Example-based machine translation (EBMT) is based on a bilingual corpus. In EBMT, sentences similar to an input sentence are retrieved from a bilingual corpus and then output is generated from translations of similar sentences. Therefore, a similarity measure between the input sentence and each sentence in the bilingual corpus is important for EBMT. If some similar sentences are missed from retrieval, the quality of translations drops. In this paper, we describe a method to acquire synonymous expressions from a bilingual corpus and utilize them to expand retrieval of similar sentences. Synonymous expressions are acquired from di erences in synonymous sentences....
Corpora and Discourse: A Three-Pronged Approach to Analyzing Linguistic Data
- Monika Bednarek
The three-pronged framework to the analysis of discourse described in this paper was first suggested in Bednarek (2008a, b) but was not developed further there. In this paper I want to outline it in more detail, focusing in particular on those aspects of the framework that involve corpus methodology. In summary, the three-pronged approach involves a. large-scale computerized corpus
A new methodology of extraction, optimization and application of crisp and fuzzy logical rules
- Włodzisław Duch; Rafał Adamczak; Krzysztof Grabczewski
A new methodology of extraction, optimization, and application of sets of logical rules is described. Neural networks are used for initial rule extraction, local, or global minimization procedures for optimization, and Gaussian uncertainties of measurements are assumed during application of logical rules. Algorithms for extraction of logical rules from data with real-valued features require determination of linguistic variables or membership functions. Context-dependent membership functions for crisp and fuzzy linguistic variables are introduced and methods of their determination described. Several neural and machine learning methods of logical rule extraction generating initial rules are described, based on constrained multilayer perceptron, networks with...
2007, ‘Graphical query for linguistic treebanks
- Steven Bird; Haejoong Lee
Databases of hierarchically annotated text occupy a central place in linguistic research and language technology development. We describe a new approach to tree query which we call “Query by Annotation”. Users express a query by annotating a tree, and the annotation is compiled into an expression in a path language. The result trees are overlaid with the original query, permitting the user to see why they match. Since queries and results are annotated trees, users can easily refine and resubmit their queries. The approach to Query by Annotation is motivated and exemplified using databases of linguistic trees, or treebanks. 1
A Development Environment for Large-scale Multi-lingual Parsing Systems Hisami Suzuki
We describe the development environment available to linguistic developers in our lab in writing large-scale grammars for multiple languages. The environment consists of the tools that assist writing linguistic rules and running regression testing against large corpora, both of which are indispensable for realistic development of large-scale parsing systems. We also emphasize the importance of parser efficiency as an integral part of efficient parser development. The tools and methods described in this paper are actively used in the daily development of broad-coverage natural language understanding systems in seven languages (Chinese, English, French, German, Japanese, Korean and Spanish).
July, 2006 PROGRESS IN PHYSICS Volume 3 Positive, Neutral and Negative Mass-Charges in General Relativity
- Larissa Borissova; Florentin Smar
As shown, any four-dimensional proper vector has two observable projections onto time line, attributed to our world and the mirror world (for a mass-bearing particle, the projections posses are attributed to positive and negative mass-charges). As predicted, there should be a class of neutrally mass-charged particles that inhabit neither our world nor the mirror world. Inside the space-time area (membrane) the space rotates at the light speed, and all particles move at as well the light speed. So, the predicted particles of the neutrally mass-charged class should seem as light-like vortices. 1 Problem statement As known, neutrosophy is a new...
Hierarchical bipartite spectral graph partitioning to cluster dialect varieties and
- Martijn Wieling; John Nerbonne
determine their most important linguistic features
Connectionist and statistical approaches to language acquisition: A distributional perspective
- Martin Redington; Nick Chater
We propose that one important role for connectionist research in language acquisition is analysing what linguistic information is present in the child’s input. Recent connectionist and statistical work analysing the properties of real language corpora suggest that a priori objections against the utility of distributional information for the child are misguided. We illustrate our argument with examples of connectionist and statistical corpus-based research on phonology, segmentation, morphology, word classes, phrase structure, and lexical semantics. We discuss how this research relates to other empirical and theoretical approaches to the study of language acquisition.
Keyword extraction for metadata annotation of Learning Objects
- Lothar Lemnitzer; Paola Monachesi
One of the functionalities developed within the LT4eL project is the possibility to annotate learning objects semi-automatically with keywords that describe them. To this end, a keyword extractor has been created which can deal with documents in 8 languages. The approach employed is based on a linguistic processing step which is followed by a filtering step of candidate keywords and their subsequent ranking based on frequency criteria. Two tests have been carried out to provide a rough evaluation of the performance of the tool and to measure inter annotator agreement in order to determine the complexity of the task and...
SOME FURTHER DIALECTOMETRICAL STEPS
- John Nerbonne; Jelena Prokić; Martijn Wieling; Charlotte Gooskens
This article surveys recent developments furthering dialectometric research which the authors have been involved in, in particular techniques for measuring large numbers of pronunciations (in phonetic transcription) of comparable words at various sites. Edit distance (also known as Levenshtein distance) has been deployed for this purpose, for which refinements and analytic techniques continue to be developed. The focus here is on (i) an empirical approach, using an information-theoretical measure of mutual information, for deriving the appropriate segment distances to serve within measures of sequence distance; (ii) a heuristic technique for simultaneously aligning large sets of comparable pronunciations, a necessary step...
Intelligence, Romanian Academy,
- Dan Cristea; Ionut Cristian Pistol; Corina Forăscu
This paper briefly describes the concept, initial implementation and usage of the ALPE 1 system for natural language processing. A hierarchy connecting annotation schemas, processing tools and resources is used as working environment for the system, which can perform various complex NL processing tasks. ALPE will be used to build linguistic processing chains involving the annotation formats and tools developed in the LT4eL 2 project. The particularities and advantages of such an endeavor are the main topics of this paper.
The representation of complex telic predicates in WordNets: the case of lexical-conceptual structure deficitary verbs
- Palmira Marrafa
Abstract. This paper has a twofold aim: (i) to point out that telicity is both a lexical and a compositional semantic feature; (ii) to propose a straightforward solution to represent lexical telicity in wordnets-like computational lexica. The approach presented here subsumes the basic idea that lexicon is not a repository of idiosyncrasies. It is rather organized following a few general (universal or parametrical) constraints. In this context, despite the fact that the paper is mainly concerned with Portuguese, cross-linguistic generalizations can be captured, on the basis of a contrastive examination of data. The analysis focus on the behavior of complex...
Knowledge Representation Issues and Implementation of Lexical Data Bases
- F. Sáenz; A. Vaquero
Abstract. We propose to apply classical development methodologies to the design and implementation of Lexical Databases(LDB), which embody conceptual and linguistic knowledge. We represent the conceptual knowledge as an ontology, and the linguistic knowledge, which depends on each language, in lexicons. Our approach is based on a single language-independent ontology. Besides, we study some conceptual and linguistic requirements; in particular, meaning classifications in the ontology, focusing on taxonomies. We have followed a classical software development methodology for implementing lexical information systems in order to reach robust, maintainable, and integrateable relational databases (RDB) for storing the conceptual and linguistic knowledge. 1
2009 10th International Conference on Document Analysis and Recognition Scaling Up Whole-Book Recognition
- Pingping Xiu; Henry S. Baird
We describe the results of large-scale experiments with algorithms for unsupervised improvement of recognition of book-images using fully automatic mutual-entropy-based model adaptation. Each experiment is initialized with an imperfect iconic model derived from errorful OCR results, and a more or less perfect linguistic model, after which our fully automatic adaptation algorithm corrects the iconic model to achieve improved accuracy, guided only by evidence within the test set. Mutual-entropy scores measure disagreements between the two models and identify candidates for iconic model correction. Previously published experiments have shown that word error rates fall monotonically with passage length. Here we show similar...
Linguistic selection of language strategies, a case study for color
- Joris Bleys; Luc Steels
Abstract. Language evolution takes place at two levels: the level of language strategies, which are ways in which a particular subarea of meaning and function is structured and expressed, and the level of concrete linguistic choices for the meanings, words, or grammatical constructions that instantiate a particular language strategy. It is now reasonably well understood how a shared language strategy enables a population of agents to self-organise a shared language system. But the origins and evolution of strategies has so far been explored less. This paper proposes that linguistic selection, i.e. selection driven by communicative success and cognitive effort, is...
Hermeto: A NL-UNL Enconverting Environment
- Ronaldo Martins; Ricardo Hasegawa; M. Graças; V. Nunes; Núcleo Interinstitucional De Lingüística Computacional
Abstract. This paper aims at presenting and describing HERMETO, a computational environment for fully-automatic, both syntactic and semantic, natural language analysis. HERMETO converts a list structure into a network structure, and can be used to enconvert from any natural language into the Universal Networking Language (UNL). As a language-independent platform, HERMETO should be parameterized for each language, in a way very close to the one required by the UNL Center’s EnConverter. However, HERMETO brings together three special distinctive features: 1) it takes rather high-level syntactic and semantic grammars; 2) its dictionaries support attribute-value pair assignments; and 3) its user-friendly interface...