Nomenclatura Unesco > (57) Lingüística

Mostrando recursos 38,041 - 38,060 de 52,684

38041. Type Shifting with Semantic Features: A Unified Perspective - Yoad Winter
this paper is to define a simple notion of type mismatch, which will rather closely follow Partee and Rooth's original proposal but will be expressed within more familiar terms of categorial semantics. After introducing this implementation of traditional type mismatch, it will be argued that in fact, it covers only one possible kind of trigger for type shifting principles. Partee and Rooth's notion of mismatch is "external" in that the type of an expression is changed only when it combines with another type to which it cannot compose using the "normal" compositional mechanism. It will be argued that, within an appropriate type system, another notion of...

38042. Annotation and Automatic Recognition of Spontaneously - Vidar Markhus,Bojana Gajic,Jacques Svarverud,Lars Erik Solbraa,Magne H. Johnsen
In this paper we present a new research database of spontaneously dictated Norwegian speech, called MOBELspon, together with an experimental evaluation using standard automatic speech recognition (ASR) techniques. MOBELspon contains about 150 minutes of spontaneous dictation and 48 minutes of read speech of rheumatism health care records. The speakers are 10 medical students of both genders, coming from different parts of Norway and talking with their own dialect. MOBELspon contains a high degree of spontaneous speech features like disfluencies and para-linguistic speaker generated noise sounds. To model these features we propose some new special annotation symbols.

38043. Management of Metadata in Linguistic Fieldwork: - David Penton,Steven Bird,Gillian Wigglesworth,Patrick Mcconvell
Many linguistic research projects collect large amounts of multimodal data in digital formats. Despite the plethora of data collection applications available, it is often difficult for researchers to identify and integrate applications which enable the management of collections of multimodal data in addition to facilitating the actual collection process itself. In research projects that involve substantial data analysis, data management becomes a critical issue. Whilst best practice recommendations in regard to data formats themselves are propagated through projects such as EMELD, HRELP and DOBES, there is little corresponding information available regarding best practice for field metadata management beyond the provision of standards by entities such...

38044. An Architecture for Word Learning using Bidirectional Multimodal - Keith Bonawitz,Anthony Kim,Seth Tardiff
Learning of new words is assisted by contextual information. This context can come in several forms, including observations in nonlinguistic semantic domains, as well as the linguistic context in which the new word was presented.

38045. COGNITIVE SCIENCE Vol 23 (4) 1999, pp. 543--568 ISSN 0364-0213 Copyright 1999 Cognitive Science Society, Inc. All rights of reproduction in any form reserved. A Connectionist Approach to Word - David C. Plaut
INTRODUCTION Many researchers assume that the most appropriate way to express the systematic aspects of language is in terms of a set of rules. For instance, there is a systematic relationship between the written and spoken forms of most English words (e.g., GAVE f /geIV/), and this relationship can be expressed in terms of a fairly concise set of grapheme-phoneme correspondence (GPC) rules (e.g., G f /g/, A_E f /eI/, V f /v/). In addition to being able to generate accurate pronunciations of so-called regular words, such rules also provide a straightforward account of how skilled readers apply their knowledge to novel items---for Direct all correspondence to: David...

38046. Annotating Discontinuous Structures in XML: the Multiword Case - Emanuele Pianta,Luisa Bentivogli
In this paper, we address the issue of how to annotate discontinuous elements in XML. We will take discontinuous multiwords as a case study to investigate different annotation possibilities, in the framework of the linguistic annotation of the MEANING Italian Corpus.

38047. An Automated Learner for Phonology and Morphology - Adam Albright,Bruce P. Hayes
This document is a summary, for ourselves and those who are curious, of the current state of our Phonological Learner. The Learner is the centerpiece of our current research project; it is a computer program whose purposes is to learn morphophonemic systems from input data, and to serve as a tool for modeling phonological and morphological knowledge in humans. 2. Rationale Linguists of many persuasions take a realist view of linguistic theory as it relates to learning: language learners, in infancy and childhood, are assumed to come equipped with whatever principles of linguistic structure and of language learning are biologically determined in our species. They encounter and process...

38048. Implicit Linguistic Structure in Connected Speech - Francisco Lacerda,Lisa Gustafsson,Nina Svrd
This paper sketches a model of early emergence of basic linguistic structure using general-purpose similarity measures, without a priori linguistic knowledge, to structure natural speech signals. The model attempts to mimic a first language learning situation. Crude auditory representations of the acoustic signal are continuously stored in memory and processed to detect similarities between portions of the representation patterns. The similarity measures are purely auditory and allow only for moderate time warping and frequency shifts, picking up any best matches among the stored patterns and between these and incoming speech. An example of the model performance is presented

38049. Shallow Parsing with PoS Taggers and Linguistic Features - Pos Taggers And,Centre For,Edilml Jameshammerton,Miles Osborn,Susan Armstron An
Three data-driven publicly available part-of-speech taggers are applied to shallowparsin of Swedish texts. The phrase structure isrepresen ted byn00 types of phrasesin a hierarchical structurecon taintu labels for every con81E1wN t type the token belonR toin the parse tree. The enw din is basedon thecon0E91wN086 of the phrase tagson the path from lowest to highern des. Variouslinsw2---E2 features are usedin learn26--- the taggers aretrain2 on the basis of lexicalincalwB62R on , part-of-speechon0 ,an a combin1B1--- of both, to predict the phrase structure of the token with or without part-of-speech. Specialatten tion is directed to the taggers'senrs'w22 y todi#eren t types oflin0E0wN0 in0E0wN02 in0E0 in learnE0w as well as the taggers'senrs'w29 y to the sizean the various types oftrain6w data sets. The methodcan be easilytranw28R86 to otherlanw09EE2 Keywords: ChunBR1E Shallowparsin1 Part-of-speech taggers,Hidden Markov models, Maximumen tropy learn1wN TranERwN2E2E8wnwn learnRw 1. Introducti5 Machin learn0--- technwB62 in the last decade have permeated several areas ofnw1R21 lan1R21 processin (NLP). Thereason is that a vastn umber...

38050. A@b1$:6c3d127b8def:61$3ghef@b@bijcklef:Nm=ij:Noip4nc85:=<
Recently, statistical machine translation models have begun to take advantage of higher level linguistic structures such as syntactic dependencies. Underlying these models is an assumption about the directness of translational correspondence between sentences in the two languages; however, the extent to which this assumption is valid and useful is not well understood. In this paper, we present an empirical study that quantifies the degree to which syntactic dependencies are preserved when parses are projected directly from English to Chinese. Our results show that although the direct correspondence assumption is often too restrictive, a small set of principled, elementary linguistic transformations can boost the quality of the projected Chinese parses by 76% relative to the unimproved baseline.

38051. Tree Adjoining Grammars: Formalisms, Linguistic Analysis - Anne Abeille,Owen Rambow (editors,Geoffrey K. Pullum
rammar in Joshi, Levy, and Takahashi [1975]) is known today as a tree adjoining grammar (TAG). The research program on TAGs that Aravind Joshi has led since 1975 is perhaps the most interesting and significant research program in formal language theory of the last 40 years. General linguists have clearly underrated it, though computational linguists have in general kept more closely in touch with it. The TALs are a mathematically natural class with closure and decision properties very similar to those of the CFLs, including a polynomial-time recognition problem. Several independent but equivalent characterizations of the class have been discovered: Vijay-shanker and Weir (1994) present a weak equivalence result...

38052. Manuscript revised and resubmitted to Computer Speech and Language; Version 2, November 6, 1998 - Y. Marchand,M. J. Adamson
The automatic derivation of word pronunciations from input text is a central task for any text-to-speech system. For general English text at least, this is often thought to be a solved problem, with manually-derived linguistic rules assumed capable of handling `novel' words missing from the system dictionary. Data-driven methods, based on machine learning of the regularities implicit in a large pronouncing dictionary, have received considerable attention recently but are generally thought to perform less well. However, these tentative beliefs are at best uncertain without powerful methods for comparing text-to-phoneme subsystems. This paper contributes to the development of such methods by comparing the performance of four representative approaches to...

38053. B.C.M. Wondergem, M. van Uden, P. van Bommel, Th.P. van - Th. P. Van,B. C. M. Wondergem,M. Van Uden,P. Van Bommel,Th. P. Van Der Weide
Searching information from a large and dynamic information space causes several problems, concerning, for instance, dynamic and vague information needs, too broad queries, and correctness and sensibility of descriptors. These problems may be attacked by navigational query formulation strategies which are available for strati ed architectures. However, strati ed architectures cannot be easily constructed for large and dynamic information spaces. In this article, we show how navigational query formulation and exploration can be employed on the WWW by using linguistic (as opposed to statistical) re nements. Grounded in the theory of navigational networks for index expressions, we introduce our tool, the INdex Navigator (INN), for searching and navigating the WWW.

38054. Unknown - Stephan Kepser
this paper we showed that linguistic treebanks can be queried with a very powerful query language, namely monadic second-order logic, in time linear in the size of the treebanks. We thus give an argument for that at least on a theoretical level the question of a choice of a query language for treebanks can be settled. We hardly expect the arise of a need of an even more powerful query language. And the fact that a large part of costly computations can be done in an offline preprocessing step to be performed only once lets us believe that the described approach is practically feasible

38055. Unknown - Suzanne Stevenson
ee grammars were motivated by the difficulties in building robust, large-scale systems using the explicit representation of linguistic knowledge. Large corpus annotation efforts and the creation of tree-banks (text corpora annotated with syntactic structures) enabled researchers to develop and automatically train probabilistic models of syntactic disambiguation (Marcus, Santorini, and Marcinkiewicz 1993). In an attempt to take advantage of the insights gained in the area of statistical speech processing, computational linguists initially adopted very simplified statistical models of grammar and parsing, abandoning the more sophisticated lexicalised feature-based formalisms (Magerman and Marcus 1991; Magerman and Weir 1992; Resnik 1992; Schabes 1992). However, it soon became apparent that the success of probabilistic context-free...

38056. Parametric Models of Linguistic Count Data - Martin Jansche
It is well known that occurrence counts of words in documents are often modeled poorly by standard distributions like the binomial or Poisson. Observed counts vary more than simple models predict, prompting the use of overdispersed models like Gamma-Poisson or Beta-binomial mixtures as robust alternatives. Another deficiency of standard models is due to the fact that most words never occur in a given document, resulting in large amounts of zero counts. We propose using zeroinflated models for dealing with this, and evaluate competing models on a Naive Bayes text classification task. Simple zero-inflated models can account for practically relevant variation, and can be easier to work with than overdispersed models.

38057. Falsifying serial and parallel parsing models: Empirical - Richard L. Lewis
serial or depth-first parsing)? This note discusses four important classes of serial and parallel models: simple limited parallel, ranked limited parallel, deterministic serial with reanalysis, and probabilistic serial with reanalysis. It is argued that existing evidence is compatible only with probabilistic serial--reanalysis models, or ranked parallel models augmented with a reanalysis component. A new class of linguistic structures is introduced on which the behavior of serial and parallel parsers diverge the most radically: multiple local ambiguities are stacked to increase the number of viable alternatives in the ambiguous region from two to eight structures. This paradigm may provide the strongest test yet for parallel models.

38058. A Multilingual Database of Idioms - Aline Villavicencio,Timothy Baldwin,Benjamin Waldron
This paper presents a possible architecture for a multilingual database of idioms. We discuss the challenges that idioms present to the creation of such a database and propose a possible encoding that maximises the amount of information that can be stored for different languages. Such a resource provides important information for linguistic, computational linguistic and psycholinguistic use, and allows for the comparison of different phenomena in different languages. This can provide the basis for a better understanding of regularities in idioms across languages.

38059. Talking Technology: Language and Literacy in the Primary School Examined Through Children's Encounters with Mechanisms - Eric Parkinson
This article embraces an examination of certain dedicated terms within technology education that children may encounter as part of their primary school experience. Four language-related issues are explored. The first of these concerns the difficulty that may be experienced in defining certain technological terms. The second concerns the ways in which primary school children use their own versions of terminology to describe specific artifacts and functions. The third issue concerns the role of some manufacturers and publishers in employing inappropriate terminology within educational products. The final issue revolves around the psycho-social development of language in young children and the contribution this may make to the acquisition of appropriate technical...

38060. Emotion Detection In Task-Oriented Spoken Dialogs - Laurence Devillers,Lori Lamel,Ioana Vasilescu
Detecting emotions in the context of automated call center services can be helpful for following the evolution of the human-computer dialogs, enabling dynamic modification of the dialog strategies and influencing the final outcome. The emotion detection work reported here is a part of larger study aiming to model user behavior in real interactions. We make use of a corpus of real agent-client spoken dialogs in which the manifestation of emotion is quite complex, and it is common to have shaded emotions since the interlocutors attempt to control the expression of their internal attitude. Our aims are to define appropriate emotions for call center services, to annotate the dialogs and to validate...


