  1. Automated Generation of Non-Verbal Behavior for Virtual Embodied Characters

    Werner Breitfuss
    In this paper we introduce a system that automatically adds different types of non-verbal behavior to a given dialogue script between two virtual embodied agents. It allows us to transform a dialogue in text format into an agent behavior script enriched by eye gaze and conversational gesture behavior. The agents ’ gaze behavior is informed by theories of human face-to-face gaze behavior. Gestures are generated based on the analysis of linguistic and contextual information of the input text. The resulting annotated dialogue script is then transformed into the Multimodal Presentation Markup Language for 3D agents (MPML3D), which controls the multi-modal...

  2. Enhancing first-pass attachment prediction

    Fabrizio Costa; Paolo Frasconi; Vincenzo Lombardo; Patrick Sturt; Giovanni Soda
    Abstract. This paper explores the convergence between cognitive modeling and engineering solutions to the parsing problem in NLP. Natural language presents many sources of ambiguity, and several theories of human parsing claim that ambiguity is resolved by using past (linguistic) experience. In this paper we analyze and refine a connectionist paradigm (Recursive Neural Networks) capable of processing acyclic graphs to perform supervised learning on syntactic trees extracted from a large corpus of parsed sentences. Following a widely accepted hypothesis in psycholinguistics, we assume an incremental parsing process (one word at a time) that keeps a connected partial parse tree at...


    James S. Williams; Jugal K. Kalita
    In this paper, we discuss how recent theoretical linguistic research focusing on the Minimalist Program (MP)(Cho95, Mar95, Zwa94)can be used to guide the parsing of a useful range of natural language sentences and the building of a logical representation in a principles-based manner. We discuss the components of the MP and give an example derivation. We then propose parsing algorithms that recreate the derivation structure starting with a lexicon and the surface form of a sentence. Given the approximated derivation structure, MP principles are applied to generate a logical form, which leads to linguistically based algorithms for determining possible meanings...

  4. An Open-Source Shallow-Transfer Machine Translation Engine for the Romance Languages of Spain

    Antonio M. Corbí-bellot; Mikel L. Forcada; Sergio Ortiz-rojas; Juan Antonio Pérez; Gema Ramírez-sánchez; Felipe Sánchez-martínez; Iñaki Alegria; Kepa Sarasola
    Abstract. We present the current status of development of an open-source shallow-transfer machine translation engine for the Romance languages of Spain (the main ones being Spanish, Catalan and Galician) as part of a larger government-funded project which includes non-Romance languages such as Basque and involving both universities and linguistic technology companies. The machine translation architecture uses finite-state transducers for lexical processing, hidden Markov models for part-of-speech tagging, and finite-state based chunking for structural transfer, and is largely based upon that of systems already developed by the Transducens group at the Universitat d'Alacant, such as interNOSTRUM (Spanish—Catalan) and Traductor Universia (Spanish—Portuguese)....

  5. Design and development of a system for the detection of agreement errors in basque

    Arantza Díaz De Ilarraza; Koldo Gojenola; Maite Oronoz
    Abstract. This paper presents the design and development of a system for the detection and correction of syntactic errors in free texts. The system is composed of three main modules: a) a robust syntactic analyser, b) a compiler that will translate error processing rules, and c) a module that coordinates the results of the analyser, applying different combinations of the already compiled error rules. The use of the syntactic analyser (a) and the rule processor (b) is independent and not necessarily sequential. The specification language used for the description of the error detection/correction rules is abstract, general, declarative, and based...

  6. Joint learning improves semantic role labeling

    Kristina Toutanova
    Despite much recent progress on accurate semantic role labeling, previous work has largely used independent classifiers, possibly combined with separate label sequence models via Viterbi decoding. This stands in stark contrast to the linguistic observation that a core argument frame is a joint structure, with strong dependencies between arguments. We show how to build a joint model of argument frames, incorporating novel features that model these interactions into discriminative loglinear models. This system achieves an error reduction of 22 % on all arguments and 32 % on core arguments over a stateof-the art independent classifier for goldstandard parse trees on...

  7. Utterance Segmentation Using Combined Approach Based on Bi-directional N-gram and Maximum Entropy

    Ding Liu
    This paper proposes a new approach to segmentation of utterances into sentences using a new linguistic model based upon Maximum-entropy-weighted Bidirectional N-grams. The usual N-gram algorithm searches for sentence boundaries in a text from left to right only. Thus a candidate sentence boundary in the text is evaluated mainly with respect to its left context, without fully considering its right context. Using this approach, utterances are often divided into incomplete sentences or fragments. In order to make use of both the right and left contexts of candidate sentence boundaries, we propose a new linguistic modeling approach based on Maximum-entropy-weighted Bidirectional...

  8. Rejection of empathy in negotiation

    Bilyana Martinovski; David Traum; Stacy Marsella
    Abstract. Trust is a crucial quality in the development of individuals and societies and empathy plays a key role in the formation of trust. Trust and empathy have growing importance in studies of negotiation. However, empathy can be rejected which complicates its role in negotiation. This paper presents a linguistic analysis of empathy by focusing on rejection of empathy in negotiation. Some of the rejections are due to failed recognition of the rejector’s needs and desires whereas others have mainly strategic functions gaining momentum in the negotiation. In both cases, rejection of empathy is a phase in the negotiation not...

  9. Universität Konstanz, PARC, and PARC

    Miriam Butt; Tracy Holloway King; John T. Maxwell Iii; Miriam Butt; Tracy Holloway King (editors
    This paper continues the discussion of the RESTRICTION OPERATOR (Kaplan and Wedekind, 1993) and whether it can provide a linguistically adequate solution to the problem posed by syntactic complex predicate formation. The solution introduced here has been implemented as part of an on-going project aimed at the development of a computational grammar for Urdu and can be shown to model the linguistic facts of syntactic complex predicate formation as described by (Alsina, 1996) and (Butt, 1995). This also allows for a straightfoward extension to related phenomena in other languages such as German, Japanese, Norwegian, and French.

  10. Large linguistically-processed Web corpora for multiple languages

    Marco Baroni
    The Web contains vast amounts of linguistic data. One key issue for linguists and language technologists is how to access it. Commercial search engines give highly compromised access. An alternative is to crawl the Web ourselves, which also allows us to remove duplicates and nearduplicates, navigational material, and a range of other kinds of non-linguistic matter. We can also tokenize, lemmatise and part-of-speech tag the corpus, and load the data into a corpus query tool which supports sophisticated linguistic queries. We have now done this for German and Italian, with corpus sizes of over 1 billion words in each case....

  11. The grammar matrix: An open-source starter-kit for the rapid development of cross-linguistically consistent broad-coverage precision grammars

    Emily M. Bender; Dan Flickinger; Stephan Oepen
    The grammar matrix is an open-source starter-kit for the development of broadcoverage HPSGs. By using a type hierarchy to represent cross-linguistic generalizations and providing compatibility with other open-source tools for grammar engineering, evaluation, parsing and generation, it facilitates not only quick start-up but also rapid growth towards the wide coverage necessary for robust natural language processing and the precision parses and semantic representations necessary for natural language understanding. 1


    Krista Lagus; Mathias Creutz; Sami Virpioja
    We study properties of morphemes by analyzing their use in a large Finnish text corpus using Independent Component Analysis (ICA). As a result, we obtain emergent linguistic representations for the morphemes. On a coarse level, main syntactic categories are observed. On a more detailed level, the components depict potential thematic roles of the morphemes. An interesting question is whether these discovered lower-dimensional representations could be directly utilized in language processing applications. 1.

  13. The acquisition of stress: a data-oriented approach

    Walter Daelemans; Gert Durieux; Steven Gillis
    A data-oriented (empiricist) alternative to the currently pervasive (nativist) Principles and Pa-rameters approach to the acquisition of stress assignment is investigated. A similarity-based algorithm, viz. an augmented version of Instance-Based Learning is used to learn the system of main stress assignment in Dutch. In this nontrivial task a comprehensive lexicon of Dutch monomorphemes is used instead of the idealized and highly simplified description of the empirical data used in previous approaches. It is demonstrated that a similarity-based learning method is effective in learning the complex stress system of Dutch. The task is accomplished without the a priori knowledge assumed to...

  14. The influence of gender on behaviors and outcomes in a retail buyer-seller negotiation simulation

    Joyce Neu; John L. Graham; Mary C. Gilly
    Successful negotiations between retail buyers and manufacturer repre-sentatives are an important ingredient in retailer success. Women are well-represented in retail industries, raising the question of how gender affects buyer-seller negotiations. In an investigation of this, question, more than 100 businesspeople participated in a buyer-seller negotiation simulation. All participants completed a questionnaire and 29 negotiations were tupe recorded. Gender differences were discov-ered in both negotiation performance and behavior. For example, men achieved higher individual profits. Men were also found to use more questions, self-disclosures, conversational repairs, interruptions, and first person, plural pronouns (''we"). The linguistic and practical sa-lience of the discovered...

  15. General Terms Design

    Suzan Verberne
    Our research aims at developing a system for answering why-questions (why-QA). More specifically, we focus on the role that linguistic information and analysis can play in the process of why-QA. In the present paper, we evaluate an

  16. RevisionBank: A Resource for Revision-based Multi-document Summarization and Evaluation

    Jahna Otterbacher; Dragomir Radev
    Multi-document summaries produced via sentence extraction often suffer from a number of cohesion problems, including dangling anaphora, sudden shifts in topic and incorrect or awkward chronological ordering. Therefore, the development of an automated revision process to correct such problems is a research area of current interest. We present the RevisionBank, a corpus of 240 extractive, multidocument summaries that have been manually revised to promote cohesion. The summaries were revised by six linguistic students using a constrained set of revision operations that we previously developed. In the current paper, we describe the process of developing a taxonomy of cohesion problems and...

  17. The UNL Initiative: An Overview

    I. Boguslavsky; J. Cardeñosa; C. Gallardo; L. Iraola
    Abstract. We are presenting a description of the UNL initiative based on the Universal Networking Language (UNL). This language was conceived to be the support of the multilingual communication on the Internet beyond the linguistic barriers. This initiative was launched by the Institute of Advanced Studies of the United Nations University in 1996. The initial consortium was formed to support 15 languages. Eight years later, this initial consortium changed, many components and resources were developed, and the UNL language itself evolved to be the support of different type of applications from the multilingual generation to the “knowledge repositories ” or...

  18. The target paper of this commentary—“The Equilibrium Point Hypothesis and Its

    Elliot Saltzman; Dani Byrd; Motor Control”—by Perrier
    POL) presents a sophisticated application of the λ-version of the Equilibrium Point Hypothesis to issues of linguistic concern in the control and coordination of speech articulators. This is done in terms of an elegant model of jaw biomechanics and neuromotor control. Their synthesis of the EP Hypothesis and jaw model will continue to encourage the consideration of biomechanics in the interpretation of articulatory kinematics. Our reflections, outlined below, generally concern the appropriateness of micro- versus macro- levels of abstraction when considering the manifestation of linguistic intentions in speech motor control. Specifically, we address 1) POL’s conception of planning and invariance;...

  19. Using argumentation in text generation

    Michael Elhadad
    Text generation is a field of artificial intelligence aiming at modelling the process of natural language production. Text generation is best characterized as the process of making choices between alternate linguistic realizations under the constraints specified in the input to a text generator. Depending on the practical application, the input can take different forms- streams of numbers in report generation, traces

  20. SYSTRAN’s Chinese Word Segmentation

    Jin Yang; Jean Senellart; Remi Zajac
    SYSTRAN’s Chinese word segmentation is one important component of its Chinese-English machine translation system. The Chinese word segmentation module uses a rule-based approach, based on a large dictionary and fine-grained linguistic rules. It works on generalpurpose texts from different Chinesespeaking regions, with comparable performance. SYSTRAN participated in the four open tracks in the First International Chinese Word Segmentation Bakeoff. This paper gives a general description of the segmentation module, as well as the results and analysis of its performance in the Bakeoff. 1

