Monday, November 30, 2015



Soy un nuevo usuario

Olvidé mi contraseña

Entrada usuarios

Lógica Matemáticas Astronomía y Astrofísica Física Química Ciencias de la Vida
Ciencias de la Tierra y Espacio Ciencias Agrarias Ciencias Médicas Ciencias Tecnológicas Antropología Demografía
Ciencias Económicas Geografía Historia Ciencias Jurídicas y Derecho Lingüística Pedagogía
Ciencia Política Psicología Artes y Letras Sociología Ética Filosofía

Spell-checking in Spanish: the case of diacritic accents

1) La descarga del recurso depende de la página de origen
2) Para poder descargar el recurso, es necesario ser usuario
    registrado en Universia

  Descargar recurso

Detalles del recurso

Pertenece a: RECERCAT  

Descripción: This article presents the problem of diacritic restoration (or diacritization) in the context of spell-checking, with the focus on an orthographically rich language such as Spanish. We argue that despite the large volume of work published on the topic of diacritization, currently available spell-checking tools have still not found a proper solution to the problem in those cases where both forms of a word are listed in the checker’s dictionary. This is the case, for instance, when a word form exists with and without diacritics, such as continuo‘continuous’ and continuó ‘he/she/it continued’, or when different diacritics make other word distinctions, as in continúo ‘I continue’. We propose a very simple solution based on a word bigram model derived from correctly typed Spanish texts and evaluate the ability of this model to restore diacritics in artificial as well as real errors. The case of diacritics is only meant to be an example of the possibleapplications for this idea, yet we believe that the same method could be applied to other kinds of orthographic or even grammatical errors. Moreover, given that no explicit linguistic knowledge is required, the proposed model can be used with other languages provided that a large normative corpus is available.

Autor(es): Atserias, Jordi -  Fuentes Fort, Maria -  Nazar, Rogelio -  Renau, Irene - 

Id.: 55205488

Idioma: English  - 

Versión: 1.0

Estado: Final

Palabras claveÀrees temàtiques de la UPC -  Informàtica -  Intel·ligència artificial -  Llenguatge natural - 

Tipo de recurso: Conference lecture  - 

Tipo de Interactividad: Expositivo

Nivel de Interactividad: muy bajo

Audiencia: Estudiante  -  Profesor  -  Autor  - 

Estructura: Atomic

Coste: no

Copyright: sí

: Open Access

Requerimientos técnicos:  Browser: Any - 

Fecha de contribución: 06-may-2012



Otros recursos del mismo autor(es)

  1. Word Sense Discrimination Using Statistic Analysis of Texts For  years, computer programs have been working to obtain information about certain entities such as...
  2. Annotation of collocations in a learner corpus for building a learning environment Collocations in the sense of idiosyncratic lexical co-occurrences are one of the main barriers and c...
  3. Bilingual terminology acquisition from unrelated corpora This paper presents a simple yet effective technique for the extraction of term equivalents in diffe...
  4. Sobre las construcciones pronominales y su tratamiento en algunos diccionarios monolingües de cuatro lenguas románicas Some non-native student errors in Spanish show the difficulty of pronominal constructions in Spanish...
  5. Automatic Acquisition of Sense Examples using ExRetriever A current research line for word sense disambiguation (WSD) focuses on the use of supervised machine...

Otros recursos de la misma colección

No existen otros recursos

Valoración de los usuarios

No hay ninguna valoración para este recurso.Sea el primero en valorar este recurso.

Busque un recurso