1) La descarga del recurso depende de la página de origen
2) Para poder descargar el recurso, es necesario ser usuario registrado en Universia

Opción 1: Descargar recurso

Opción 2: Descargar recurso

Detalles del recurso


For the task of near-duplicate document detection, comparison approaches based on bag-of-words used in information retrieval community are not sufficiently accurate. This work presents novel approach when instance-level constraints are given for documents and it is needed to retrieve them, given new query document for near-duplicate detection. The framework incorporates instance-level constraints and clusters documents into groups using novel clustering approach Grouped Latent Dirichlet Allocation (gLDA). Then distance metric is learned for each cluster using large margin nearest neighbor algorithm and finally ranked documents for given new unknown document using learnt distance metrics. The variety of experimental results on various datasets demonstrate that our clustering method (gLDA with side constraints) performs better than other clustering methods and the overall approach outperforms other near-duplicate detection algorithms.

Pertenece a

ETD at Indian Institute of Science  


Patel, Vishal - 

Id.: 54390552

Idioma: inglés (Estados Unidos)  - 

Versión: 1.0

Estado: Final

Palabras claveDocument Clustering  -  Artificial Intelligence - 

Tipo de recurso: Thesis  - 

Tipo de Interactividad: Expositivo

Nivel de Interactividad: muy bajo

Audiencia: Estudiante  -  Profesor  -  Autor  - 

Estructura: Atomic

Coste: no

Copyright: sí

Requerimientos técnicos:  Browser: Any - 

Relación: [References] G23536

Fecha de contribución: 10-ago-2011



Otros recursos del mismo autor(es)

  1. The Effect of Winning an Oscar Award on Survival: Correcting for Healthy Performer Survivor Bias With a Rank Preserving Structural Accelerated Failure Time Model We study the causal effect of winning an Oscar Award on an actor or actress’s survival. Does the inc...
  2. Image-driven modeling of the proliferation and necrosis of glioblastoma multiforme Background: The heterogeneity of response to treatment in patients with glioblastoma multiforme sugg...
  3. The impact of an educational program on HCV patient outcomes using boceprevir in community practices (OPTIMAL trial)
  4. Delayed presentation of a loose body in undisplaced paediatric talar neck fracture Fractures of the talus are rare in children. A high index of suspicion is needed to avoid missing su...
  5. Muscle insulin sensitivity and glucose metabolism are controlled by the intrinsic muscle clock★ Circadian rhythms control metabolism and energy homeostasis, but the role of the skeletal muscle clo...

Otros recursos de la mismacolección

  1. Variants of Hegselmann-Krause Model The Hegselmann-Krause system (HK system for short) is one of the most popular models for the dynamic...
  2. A GPU Accelerated Tensor Spectral Method for Subspace Clustering In this thesis we consider the problem of clustering the data lying in a union of subspaces using sp...
  3. Resource Allocation for Sequential Decision Making Under Uncertainaty : Studies in Vehicular Traffic Control, Service Systems, Sensor Networks and Mechanism Design A fundamental question in a sequential decision making setting under uncertainty is “how to allocate...
  4. Semantic Analysis of Web Pages for Task-based Personal Web Interactions Mobile widgets now form a new paradigm of simplified web. Probably, the best experience of the Web i...
  5. Power Efficient Last Level Cache for Chip Multiprocessors The number of processor cores and on-chip cache size has been increasing on chip multiprocessors (CM...

Aviso de cookies: Usamos cookies propias y de terceros para mejorar nuestros servicios, para análisis estadístico y para mostrarle publicidad. Si continua navegando consideramos que acepta su uso en los términos establecidos en la Política de cookies.