Pronoun Resolution and The Influence of Syntactic and Semantic Information on Discourse Prominence
- Ralph Rose
Abstract. Beginning with the observation that syntactic and semantic information often coincide (i.e., subjects are often agents, objects often patients), this study investigates the possibility that preference to resolve a sentence-initial pronoun to a syntactically prominent antecedent might actually be better explained in terms of preference for resolving to a semantically prominent antecedent. The study takes Discourse Prominence Theory (Gordon and Hendrick [11, 12]) as an underlying framework. Results of three psycholinguistic experiments using a self-paced reading task show that both syntactic and semantic information guide readers’ pronoun resolution preferences. This suggests a revised understanding of Discourse Prominence Theory in...
“KNAW˙Franke˙ConvImplic˙Revised ” — 2007/10/17 — 15:02 — page 1 — #1 Interpretation of Optimal Signals
- Michael Franke
According to the optimal assertions approach of Benz and van Rooij (2007), conversational implicatures can be calculated based on the assumption that a given signal was optimal, i.e. that it was the sender’s best choice if she assumes, purely hypothetically, a particular naive receiver interpretation behavior. This paper embeds the optimal assertions approach in a general signaling game setting and derives the notion of an optimal signal via a serious of iterated best responses (c.f. Jäger, 2007). Subsequently, we will compare three different ways of interpreting such optimal signals. It turns out that under a natural assumption of expressibility (i)...
Intonation and Interpretation: Phonetics and Phonology
Intonational meaning is located in two components of language, the phonetic implementation and the intonational grammar. The phonetic implementation is widely used for the expression of universal meanings that derive from ‘biological codes’, meaning dimensions based on aspects of the production process of pitch variation. Three codes are identified, Ohala’s Frequency Code, the Effort Code and the Production Code. In each case, ‘informational ’ meanings (which relate to the message) are identified, while for the first two codes also ‘affective ’ meanings (relating to the state of the speaker) are discussed. Speech communities will vary in the extent to which...
Amalia Abstract MAchine for LInguistic Applications User's Guide
- Shuly Wintner; Evgeniy Gabrilovich; Nissim Francez
2.1 Type Speci cation............................ 2
Verb Sense and Subcategorization: Using Joint Inference to Improve Performance on Complementary Tasks
- Galen Andrew; Trond Grenager; Christopher Manning
We propose a general model for joint inference in correlated natural language processing tasks when fully annotated training data is not available, and apply this model to the dual tasks of word sense disambiguation and verb subcategorization frame determination. The model uses the EM algorithm to simultaneously complete partially annotated training sets and learn a generative probabilistic model over multiple annotations. When applied to the word sense and verb subcategorization frame determination tasks, the model learns sharp joint probability distributions which correspond to linguistic intuitions about the correlations of the variables. Use of the joint model leads to error reductions...
Center for Hebrew Computational Linguistics and by the
- Ofer Biller; Michael Elhadad; Yael Netzer
We present an authoring system for logical forms encoded as conceptual graphs (CG). The system belongs to the family of WYSIWYM (What You See Is What You Mean) text generation systems: logical forms are entered interactively and the corresponding linguistic realization of the expressions is generated in several languages. The system maintains a model of the discourse context corresponding to the authored documents. The system helps users author documents formulated in the CG format. In a first stage, a domainspecific ontology is acquired by learning from example texts in the domain. The ontology acquisition module builds a typed hierarchy of...
Evaluating lexical resources for a semantic tagger
- Scott S. L. Piao; Paul Rayson; Dawn Archer; Tony Mcenery
Semantic lexical resources play an important part in both linguistic study and natural language engineering. In Lancaster, a large semantic lexical resource has been built over the past 14 years, which provides a knowledge base for the USAS semantic tagger. Capturing semantic lexicological theory and empirical lexical usage information extracted from corpora, the Lancaster semantic lexicon provides a valuable resource for the corpus research and NLP community. In this paper, we evaluate the lexical coverage of the semantic lexicon both in terms of genres and time periods. We conducted the evaluation on test corpora including the BNC sampler, the METER...
Web-based Multi-Criteria Group Decision Support
Abstract — Organizational decisions are often made in groups where group members may be distributed geographically in different locations. Furthermore, a decision-making process, in practice, frequently involves various uncertain factors including linguistic expressions of decision makers ’ preferences and opinions. This study first proposes a rational-political group decision-making model which identifies three uncertain factors involved in a group decision-making process: decision makers’ roles in a group reaching a satisfactory solution, preferences for alternatives and judgments for assessment-criteria. Based on the model, a linguistic term oriented multi-criteria group decision-making method is developed. The method uses general fuzzy number to deal with...
- François Barbançon; Y Warnow; Steven N. Evans; Donald Ringe; Luay Nakhleh
experimental study comparing linguistic phylogenetic
Learning to generate naturalistic utterances using reviews in spoken dialogue systems
- Ryuichiro Higashinaka
Spoken language generation for dialogue systems requires a dictionary of mappings between semantic representations of concepts the system wants to express and realizations of those concepts. Dictionary creation is a costly process; it is currently done by hand for each dialogue domain. We propose a novel unsupervised method for learning such mappings from user reviews in the target domain, and test it on restaurant reviews. We test the hypothesis that user reviews that provide individual ratings for distinguished attributes of the domain entity make it possible to map review sentences to their semantic representation with high precision. Experimental analyses show...
MINECoP: An Integrated Visualization Tool for Corpus Mining *
- Asanee Kawtrakul; Patcharee Varasai; Sutee Sudprasert; Prachya Boonkwan; Dusadee Thamavijit
In order to have a good language model for creating cost effective solutions to the practical problems in developing NLP applications, we need to learn from observed data of naturally occurring text. This paper presents a design of packaged tool called MINECoP for annotating and mining linguistic phenomena. The user could learn the language behaviors by identifying specific-need query patterns to observe the problems and to deduce know-how to design language model from the naturally occurring text. 1
Analysis and synthesis of intonation using the Tilt model
- Paul Taylor
This paper introduces the tilt intonational model and describes how this model can be used to automatically analyse and synthesize intonation. In the model, intonation is represented as a linear sequence of events, which can be pitch accents or boundary tones. Each event is characterised by continuous parameters representing amplitude, duration and tilt (a measure of the shape of the event). The paper describes a event detector, in effect an intonational recognition system, which produces a transcription of an utterance’s intonation. The features and parameters of the event detector are discussed and performance figures are shown on a variety of...
High-accuracy annotation and parsing of CHILDES transcripts
- Kenji Sagae; Alon Lavie
Corpora of child language are essential for psycholinguistic research. Linguistic annotation of the corpora provides researchers with better means for exploring the development of grammatical constructions and their usage. We describe an ongoing project that aims to annotate the English section of the CHILDES database with grammatical relations in the form of labeled dependency structures. To date, we have produced a corpus of over 65,000 words with manually curated gold-standard grammatical relation annotations. Using this corpus, we have developed a highly accurate data-driven parser for English CHILDES data. The parser and the manually annotated data are freely available for research...
Preservation of interpolation features by fibring
- Walter Carnielli; João Rasga Cristina Sernadas
Fibring is a metalogical constructor that permits to combine different logics by operating on their deductive systems under certain natural restrictions, as for example that the two given logics are presented by deductive systems of the same type. Under such circumstances, fibring will produce a new deductive system by means of the free use of inference rules from both deductive systems, provided the rules are schematic, in the sense of using variables that are open for application to formulas with new linguistic symbols (from the point of view of each logic component). Fibring is a generalization of fusion, a less...
Local speaking rate and perceived quantity
- Diana Krull; Hartmut Traunmüller; Wim A. Van
In an earlier experiment, we have shown that the local speaking rate affects the perception of quantity of Estonian listeners. In order to see if this effect is language dependent, we presented the same stimuli to Finns and a subset to Norwegians, whose languages have a different and smaller functional load of quantity distinctions. The results obtained with Estonian and Finnish listeners are compatible with a model of speech perception in which variations in speaking rate are reflected in the pace of an “inner clock ” by which listeners measure segment durations. More ‘absolute ’ and narrow scoped results with...
SGML Documents: Where does Quality Go?
- José Carlos; Ramalho Jorge; Gustavo Rocha; José João Almeida; Pedro Rangel Henriques
Quality control in electronic publications should be one of the major concerns of everyone who is managing a project. Big projects, like digital libraries, try to gather information from a series of different sources: libraries, museums, universities, and other scientific or cultural organizations. Collecting and treating information from several different sources raises very interesting problems, one being the assurance of quality. Quality in electronic publications can be reflected in several forms, from the visual aspects of the interface, to linguistic and literary aspects, to the correctness of data. With SGML we can solve part of the problem, structural/syntactic correctness. SGML...
Phonological analysis in typed feature systems
- Steven Bird; Ewan Klein
Research on constraint-based grammar frameworks has focussed on syntax and semantics largely to the exclusion of phonology. Likewise, current developments in phonology have generally ignored the technical and linguistic innovations available in these frameworks. In this paper we suggest some strategies for reuniting phonology and the rest of grammar in the context of a uniform constraint formalism. We explain why this is a desirable goal, and we present some conservative extensions to current practice in computational linguistics and in non-linear phonology which we believe are necessary and sufficient for achieving this goal. We begin by exploring the application of typed...
Abstract 1- Grammar Specification and Locality Capturing Non-local dependencies
Minimalist Inquiry (Chomsky 1995-2001) proposes a model of linguistic competence able to build phrase structures from bottom-to-top in a derivational way (namely starting from the inner verbal shell, up to the matrix one, adding piecemeal arguments and functional elements). Including structure building operations (such as merge, move and the idea of derivation by phase) as part of the grammar has been a necessary move toward a cognitively/computationally/formally more plausible model. On the other hand, this forces us to consider the competence from a procedural perspective rather than simply from a static knowledgerepresentation point of view. Even though much emphasis is...
UNITEX-PB, a set of flexible language resources for Brazilian Portuguese ∗
- Marcelo C. M. Muniz; Maria Graças; V. Nunes; Eric Laporte
Abstract. This work documents the project and development of various computational linguistic resources that support the Brazilian Portuguese language according to the formal methodology used by the corpus processing system called UNITEX. The delivered resources include computational lexicons, libraries to access compressed lexicons, and additional tools to validate those resources. 1.
GETESS — searching the web exploiting german texts
- Steffen Staab; Christian Braun; Ilvio Bruder; Antje Düsterhöft; Andreas Heuer; Meike Klettke; Günter Neumann; Bernd Prager; Jan Pretzel; Hans-peter Schnurr; Rudi Studer; Burkhard Wrenger
Abstract. We present an intelligent information agent that uses semantic methods and natural language processing capabilites in order to gather tourist information from the WWW and present it to the human user in an intuitive, user-friendly way. Thereby, the information agent is designed such that as background knowledge and linguistic coverage increase, its benefits improve, while it guarantees state-of-the-art information and database retrieval capabilities as its bottom line. 1