Publicidad

Publicidad

becas.universia.netBiblioteca.Net

Buscar recursos:

Buscador Google

rss_1.0 Recursos de colección

Universidade da Coruña. UDCDspace (405 recursos)
UDCDspace é o repositorio dixital da Universidade da Coruña, un sistema que proporciona de xeito estable e seguro a preservación de documentos dixitais produto da actividade científica e institucional da UDC, e facilita a súa accesibilidade en Internet.

Mostrando recursos 1 - 10 de 10

1. Nuevos Algoritmos Tabulares para el Análisis de LIG - Alonso Pardo, Miguel Ángel; Graña Gil, Jorge; Vilares Ferro, Manuel
A partir de un algoritmo de tipo CYK se desarrolla una serie de nuevos algoritmos tabulares para el análisis de Gramáticas Lineales de Índices que incluye algoritmos ascendentes y algoritmos de tipo Earley con y sin la propiedad del prefijo válido, creando un camino evolutivo continuo en el que cada algoritmo puede ser obtenido mediante transformaciones simples del algoritmo precedente. Los nuevos algoritmos creados permiten establecer un paralelismo con los algoritmos disponibles para Gramáticas de Adjunción de Árboles.

2. Practical NLP-Based Text Indexing - Vilares Ferro, Jesús; Barcala Rodríguez, Francisco Mario; Alonso Pardo, Miguel Ángel; Graña Gil, Jorge; Vilares Ferro, Manuel

3. Instrumentation of Synchronous Reactive Models for Performance Engineering - Valderruten Vidal, Alberto; Vilares Ferro, Manuel; Graña Gil, Jorge

4. Regional Finite-State Error Repair - Vilares Ferro, Manuel; Otero Pombo, Juan; Graña Gil, Jorge
We describe an algorithm to deal with error repair over finite-state architectures. Such a technique is of interest in spelling correction as well as approximate string matching in a variety of applications related to natural language processing, such as information extraction/recovery or answer searching, where error-tolerant recognition allows misspelled input words to be integrated in the computational process. Our proposal relies on a regional least-cost repair strategy, dynamically gathering all relevant information in the context of the error location. The system guarantees asymptotic equivalence with global repair strategies.

5. Compilation of Constraint-based Contextual Rules for Part-of-Speech Tagging into Finite State Transducers - Graña Gil, Jorge; Andrade Sanchez, Gloria; Vilares Ferro, Jesús
With the aim of removing the residuary errors made by pure stochastic disambiguation models, we put forward a hybrid system in which linguist users introduce high level contextual rules to be applied in combination with a tagger based on a Hidden Markov Model. The design of these rules is inspired in the Constraint Grammars formalism. In the present work, we review this formalism in order to propose a more intuitive syntax and semantics for rules, and we develop a strategy to compile the rules under the form of Finite State Transducers, thus guaranteeing an efficient execution framework.

6. Compilation Methods of Minimal Acyclic Finite-State Automata for Large Dictionaries. - Graña Gil, Jorge; Barcala Rodríguez, Francisco Mario; Alonso Pardo, Miguel Ángel
We present a reflection on the evolution of the different methods for constructing minimal deterministic acyclic finite-state automata from a finite set of words. We outline the most important methods, including the traditional ones (which consist of the combination of two phases: insertion of words and minimization of the partial automaton) and the incremental algorithms (which add new words one by one and minimize the resulting automaton on-the-fly, being much faster and having significantly lower memory requirements). We analyze their main features in order to provide some improvements for incremental constructions, and a general architecture that is needed to implement...

7. A Common Solution for Tokenization and Part-of-Speech Tagging: One-Pass Viterbi Algorithm vs. Iterative Approaches - Graña Gil, Jorge; Alonso Pardo, Miguel Ángel; Vilares Ferro, Manuel
Current taggers assume that input texts are already tokenized, i.e. correctly segmented in \emph{tokens} or high level information units that identify each individual component of the texts. This working hypothesis is unrealistic, due to the heterogeneous nature of the application texts and their sources. The greatest troubles arise when this segmentation is ambiguous. The choice of the correct segmentation alternative depends on the context, which is precisely what taggers study. In this work, we develop a tagger able not only to decide the tag to be assigned to every token, but also to decide whether some of them form or...

8. Formal Methods of Tokenization for Part-of-Speech Tagging - Graña Gil, Jorge; Barcala Rodríguez, Francisco Mario; Vilares Ferro, Manuel
One of the most important prior tasks for robust part-of-speech tagging is the correct tokenization or segmentation of the texts. This task can involve processes which are much more complex than the simple identification of the diferent sentences in the text and each of their individual components, but it is often obviated in many current applications. Nevertheless, this preprocessing step is an indispensable task in practice, and it is particularly dificult to tackle it with scientific precision with-out falling repeatedly in the analysis of the specific casuistry of every phenomenon detected. In this work, we have developed a scheme of...

9. Regional Versus Global Finite-State Error Repair - Vilares Ferro, Manuel; Otero Pombo, Juan; Graña Gil, Jorge
We focus on the domain of a regional least-cost strategy in order to illustrate the viability of non-global repair models over finitestate architectures. Our interest is justified by the difficulty, shared by all repair proposals, to determine how far to validate. A short validation may fail to gather sufficient information, and in a long one most of the effort can be wasted. The goal is to prove that our approach can provide, in practice, a performance and quality comparable to that attained by global criteria, with a significant saving in time and space. To the best of our knowledge, this...

10. Stochastic Parsing and Parallelism - Barcala Rodríguez, Francisco Mario; Sacristán Agulló, Oscar; Graña Gil, Jorge
Parsing CYK-like algorithms are inherently parallel: there are a lot of cells in the chart that can be calculated simultaneously. In this work, we present a study on the appropriate techniques of paralle-lism to obtain an optimal performance of the extended CYK algorithm, a stochastic parsing algorithm that preserves the same level of expressiveness as the one in the original grammar, and improves further tasks of robust parsing. We consider two methods of parallelization: distributed memory and shared memory. The excellent performance obtained with the second one turns this algorithm into an alternative that could compete with other parsing techniques...