Parallel Hidden Markov Models for American Sign Language Recognition
- Christian Vogler,Dimitris Metaxas
The major challenge that faces American Sign Language
(ASL) recognition now is to develop methods that will scale
well with increasing vocabulary size. Unlike in spoken languages,
phonemes can occur simultaneously in ASL. The
number of possible combinations of phonemes after enforcing
linguistic constraints is approximately 5:5 Theta 10
recognition, which is less constrained than ASL recognition,
suffers from the same problem.
Thus, it is not feasible to train conventional hidden
Markov models (HMMs) for large-scale ASL applications.
Factorial HMMs and coupled HMMs are two extensions to
HMMs that explicitly attempt to model several processes occuring
in parallel. Unfortunately, they still require consideration
of the combinations at training time.
In this paper we...
Elementary Principles of HPSG
- Georgia M. Green
This chapter describes the theoretical foundations and descriptive mechanisms of Head-Driven
Phrase Structure Grammar (HPSG), as well as proposed treatments for a number of familiar
grammatical phenomena. The anticipated reader has some familiarity with syntactic phenomena
and the function of a theory of syntax, but not necessarily any expertise with modern theories of
phrase-structure grammar. Section 1 describes the character of HPSG grammars, and the elements
and axioms of the system. Section 2 describes how linguistic entities are modelled, and how
grammars describe the modelled entities. The third section describes the ontology of featurestructure
descriptions in HPSG, and Section 4 deals with the expression of constraints, especially
Phonological Rules For Overlapping Articulatory Features In Speech Recognition
- Jiping Sun,Li Deng
this paper, we
report our recent development of an overlapping-feature based phonological model which represents
long-span contextual dependency in speech acoustics. In this model, high-level linguistic constraints
are incorporated in automatic construction of the feature overlapping patterns and the associated
hidden Markov model (HMM) states which represent acoustic variability. The main linguistic
information explored includes morpheme and syllable boundaries, syllable constituent categories,
and word stress. A consistent computational framework developed for the construction of the
feature-based model and the major components of the model are described. The experimental
results on use of the feature-based model as the HMM state topology for speech recognition are
Representations of Dialogue State for Domain and Task Independent Meta-Dialogue
- David R. Traum,Carl F. Andersen
We propose a representation of local dialogue
context motivated by the need to react appropriately
to meta-dialogue, such as various
sorts of corrections to the sequence of an instruction
and response action. Such context includes
at least the following aspects: the words
and linguistic structures uttered, the domain
correlates of those linguistics structures, and
plans and actions in response. Each of these is
needed as part of the context in order to be able
to correctly interpret the range of corrections.
Partitioning knowledge of dialogue structure in
this way may lead to an ability to represent
generic dialogue structure (e.g., in the form of
axioms), which can be particularized to the domain,
topic and content...
What is a word, What is a sentence? Problems of Tokenization
- Gregory Grefenstette,Pasi Tapanainen
Any linguistic treatment of freely occurring text must provide an answer to what is
considered as a token. In artificial languages, the definition of what is considered as a token
can be precisely and unambiguously defined. Natural languages, on the other hand, display
such a rich variety that there are many ways to decide upon what will be considered as a
unit for a computational approach to text. Here we will discuss tokenization as a problem
for computational lexicography. Our discussion will cover the aspects of what is usually
considered preprocessing of text in order to prepare it for some automated treatment. We
present the roles of...
Inequality Without Irreflexivity
This paper presents the axiomatization --- without the rule of irreflexivity
--- of the modal logic of inequality as well as a method for proving
its completeness. This method uses the technics of the frame of subordination.
Introduced by von Wright  and furthered by Segerberg , the modal
logic with inequality   --- also called logic of elsewhere   
--- is superior in expressive power to the modal logic without inequality
  . Its linguistic basis is the propositional calculus enlarged
with the modal operators L and [6=]. Its semantical basis is a relational
structure of the form F = (W;R) where...
Choosing A Distance Metric For Automatic Word Categorization
- Emin Erkan,Korkmaz Gokturk
This paper analyzes the functionality of different
distance metrics that can be used in
a bottom-up unsupervised algorithm for automatic
word categorization. The proposed
method uses a modified greedy-type algorithm.
The formulations of fuzzy theory are also used
to calculate the degree of membership for the
elements in the linguistic clusters formed. The
unigram and the bigram statistics of a corpus
of about two million words are used. Empirical
comparisons are made in order to support
the discussions proposed for the type of distance
metric that would be most suitable for
measuring the similarity between linguistic elements.
Statistical natural language processing is a challenging
area in the field of computational natural language
learning. Researchers of...
Linguistic Issues in Grace (Evaluation of Part-of-Speech Tagging for French)
GRACE is the first large-scale evaluation program of taggers for French. This experiment allowed to compare the assignments of
Parts-of-Speech tags by various different taggers, on a common corpus of literary and journalistic texts. The evaluation relied on the
acceptance by the participants of a reference formalism for morpho-syntactic description (the reference tagset) used by an expert to tag
the evaluation corpus, and by the participants to provide a description (mapping table) of their own tagset. The global strategy was to
make the reference tagging and tokenization of the finest grain possible. The reference tags were decomposed in Parts of Speech (main
category) and lists...
Adaptation of Statistical Language Models for Automatic Speech Recognition
- Philip R. Clarkson
Statistical language models encode linguistic information in such a way as to be
useful to systems which process human language. Such systems include those
for optical character recognition and machine translation. Currently, however,
the most common application of language modelling is in automatic speech
recognition, and it is this that forms the focus of this thesis.
Most current speech recognition systems are dedicated to one specific task
(for example, the recognition of broadcast news), and thus use a language
model which has been trained on text which is appropriate to that task. If,
however, one wants to perform recognition on more general language, then
creating an appropriate language model...
Machine Learning And Language Acquisition: A Model Of Child's Learning Of Turkish Morphophonology
- Dr. Deniz Zeyrek,Assoc Prof,Dr. Cem Bozsahin,Umit Turan
MACHINE LEARNING AND LANGUAGE ACQUISITION: A MODEL OF CHILD'S
LEARNING OF TURKISH MORPHOPHONOLOGY
MS., Cognitive Science
Supervisor: Assoc. Prof. Dr. Cem Bozsahin
Co-Supervisor: Assoc. Prof. Dr. Deniz Zeyrek
July 1999, 90 pages
Every normal child who is exposed to linguistic input in an interactional environment acquires
the complex structure of the language at a very early age, in a very short time and without any
explicit training. This fascinating character of language acquisition lead many researchers to
study on the aspects of language acquisition. The present study is on the morphological analysis
of Turkish and on learning the morphophonology of Turkish by using the non-monotonic
setting of Inductive Logic Programming,...
Fuzzy Data Analysis: Challenges and Perspectives
- Rudolf Kruse,Christian Borgelt,Detlef Nauck
In meeting the challenges that resulted from the
explosion of collected, stored, and transferred data,
Knowledge Discovery in Databases or Data Mining has
emerged as a new research area. However, the approaches
studied in this area have mainly been oriented
at highly structured and precise data. In addition,
the goal to obtain understandable results is often
neglected. Therefore we suggest to concentrate on Information
Mining, i.e., the analysis of heterogeneous information
sources with the prominent aim of producing
comprehensible results. Since the aim of fuzzy technology
has always been to model linguistic information and
to achieve understandable solutions, we expect it to play
an important role in information mining.
1. Introduction: A View of...
Word Segmentation Based on Estimation of Words from Examples
- Juntae Yoon,Woonjae Lee,Key-sun Choi
From a cognitive point of view, words can be recognized based on learned data which
can be obtained from linguistic materials. Namely, people learn words from many
examples which they meet. We propose a word segmentation algorithm based on
estimated knowledge for words acquired from both local texts being processed and
POS tagged corpus. In order to show the feasibility of our model, we apply it to
guessing of unknown words caused by morphological analysis failure.
We continuously learn words by seeing and hearing examples, and acquire new ones based on
learned knowledge and new examples. We can think of recognition and segmentation of words
Unbounded Negative Concord in Polish: A Lexicalist HPSG Approach
In this paper, we deal with Negative Concord (NC) in Polish. We show
that Polish NC is a kind of unbounded dependency construction (UDC), although
it differs in many respects from the `standard' UDCs such as, e.g.,
wh-extraction or topicalization. Our analysis of NC is coached in the theoretical
framework of HPSG; more precisely, we adopt a lexicalist approach to
UDCs proposed by Sag (1996a, 1996b). Moreover, we argue that Polish NC
facts would be difficult to model by a purely semantic account.
The aim of this paper is twofold. First, on the basis of facts rarely (if ever) considered
in the linguistic literature, we argue for...
Distributional Clustering Of English Words
- Fernando Pereira,Naftali Tishby,Lillian Lee
We describe and experimentally evaluate a method for
automatically clustering words according to their distribution
in particular syntactic contexts. Deterministic
annealing is used to find lowest distortion sets of
clusters. As the annealing parameter increases, existing
clusters become unstable and subdivide, yielding a
hierarchical "soft" clustering of the data. Clusters are
used as the basis for class models of word coocurrence,
and the models evaluated with respect to held-out test
Methods for automatically classifying words according
to their contexts of use have both scientific and practical
interest. The scientific questions arise in connection
to distributional views of linguistic (particularly
lexical) structure and also in relation to the question
of lexical acquisition both from psychological...
CJK DOCP Recommendation: CJK multilingual linguistic markup CJKDOCP Text Corpus Exchange Formats Date: 1995-12-06 Author : Jing-Shin Chang (SIGMT, ROCLING) Revision : 1.2.3 File: CJKDOCP.xf.1.2.3.doc
- Jing-shin Chang
nsuming for the corpus contributors, we hope that the corpus users can share the load of corpus tagging if some of the
tagging information is not available in the original corpora. We also encourage the release of any derived works from the
original works that contain extra information after further processing.
0.2. Simple Encoding Convention Used for Corpus Tagging
An "element" (e.g., a paragraph, a sentence, or a word) is, in general, enclosed by a pair of "start tag" and "end tag" in
an SGML text. For example, a book title "Advanced Unix Programming" can be tagged as:
Advanced Unix Programming
where "" is the "start tag"...
Parallel Term Indexing for a Document Retrieval System
- Scott L. Alexander,Nancy J. Mccracken
A parallel system for finding and indexing key terms in a document is described.
With the constantly increasing amount and availability of information, a reliable and
efficient method of document retrieval is quickly becoming a necessity for research.
DR-LINK is one such document retrieving tool that is able to return a list of relevant
documents to a user's query. One way that this is accomplished is by comparing the key
terms in the query to the key terms in the documents. This process is called indexing. It
is shown that the efficiency of indexing these terms is increased by developing a parallel
implementation of the existing code....
A Primitive Calculus for Module Systems
- Davide Ancona
. We present a simple and powerful calculus of modules supporting mutual recursion and higher order features. The calculus allows to encode a large variety of existing mechanisms for combining software components, including parameterized modules, extension with overriding of object-oriented programming, mixin modules and extra-linguistic mechanisms like those provided by a linker. As usual, we rst present an untyped version of our calculus and then a type system which is proved sound w.r.t. the reduction semantics; moreover we give a translation of other primitive calculi. Introduction Considerable eort has been recently invested in studying theoretical foundations and developing new forms...
Recent Advances in Transcribing Television and Radio Broadcasts
- Jean-luc Gauvain,Lori Lamel,Gilles Adda,Michele Jardino
Transcription of broadcast news shows (radio and television)
is a major step in developing automatic tools for indexation and
retrieval of the vast amounts of information generated on a daily
basis. Broadcast shows are challenging to transcribe as they consist
of a continuous data stream with segments of different linguistic
and acoustic natures. Transcribing such data requires addressing
two main problems: those related to the varied acoustic
properties of the signal, and those related to the linguistic properties
of the speech. Prior to word transcription, the data is partitioned
into homogeneous acoustic segments. Non-speech segments
are identified and rejected, and the speech segments are
clustered and labeled according to bandwidth and...
Partial Proof Trees as Building Blocks for a Categorial Grammar
- Aravind K. Joshi,Seth Kulick
We describe a categorial system (PPTS) based on partial proof trees (PPTs) as the building
blocks of the system. The PPTs are obtained by unfolding the arguments of the type that would
be associated with a lexical item in a simple categorial grammar. The PPTs are the basic types in
the system and a derivation proceeds by combining PPTs together. We describe the construction
of the finite set of basic PPTs and the operations for combining them. PPTS can be viewed as
a categorial system incorporating some of the key insights of lexicalized tree adjoining grammar,
namely the notion of an extended domain of locality and...
A Modular Connectionist Parser for Resolution of Pronominal Anaphoric References in Multiple Sentences
- Itamar Leite De Oliveira,Raul Sidnei Wazlawick
In this work a connectionist model used in the resolution
of a well-known linguistic phenomenon as pronominal
anaphoric reference is presented. The model is composed
of two neural networks: a simple recurrent neural
network (parser) and a feedforward neural network
(segmenter). These networks are trained and tested
simultaneously. With this model it is possible to solve
anaphoric references with text segments of arbitrary size,
that is to say, with any number of sentences.
The anaphoric reference is a linguistic phenomenon in
that a pronoun or a noun phrase (NP) in a sentence is
referring to somebody or to an object previously
mentioned in the text. The problem then is to know...