1) La descarga del recurso depende de la página de origen
2) Para poder descargar el recurso, es necesario ser usuario registrado en Universia

Opción 1: Descargar recurso

Opción 2: Descargar recurso

Detalles del recurso


For the task of near-duplicate document detection, comparison approaches based on bag-of-words used in information retrieval community are not sufficiently accurate. This work presents novel approach when instance-level constraints are given for documents and it is needed to retrieve them, given new query document for near-duplicate detection. The framework incorporates instance-level constraints and clusters documents into groups using novel clustering approach Grouped Latent Dirichlet Allocation (gLDA). Then distance metric is learned for each cluster using large margin nearest neighbor algorithm and finally ranked documents for given new unknown document using learnt distance metrics. The variety of experimental results on various datasets demonstrate that our clustering method (gLDA with side constraints) performs better than other clustering methods and the overall approach outperforms other near-duplicate detection algorithms.

Pertenece a

ETD at Indian Institute of Science  


Patel, Vishal - 

Id.: 54390552

Idioma: inglés (Estados Unidos)  - 

Versión: 1.0

Estado: Final

Palabras claveDocument Clustering  -  Artificial Intelligence - 

Tipo de recurso: Thesis  - 

Tipo de Interactividad: Expositivo

Nivel de Interactividad: muy bajo

Audiencia: Estudiante  -  Profesor  -  Autor  - 

Estructura: Atomic

Coste: no

Copyright: sí

Requerimientos técnicos:  Browser: Any - 

Relación: [References] G23536

Fecha de contribución: 10-ago-2011



Otros recursos del mismo autor(es)

  1. MCUR1 Is a Scaffold Factor for the MCU Complex Function and Promotes Mitochondrial Bioenergetics Mitochondrial Ca2+ Uniporter (MCU)-dependent mitochondrial Ca2+ uptake is the primary mechanism for ...
  2. Determination of galantamine hydrobromide in bulk drug and pharmaceutical dosage form by spectrofluorimetry Aim: To develop a simple, accurate, sensitive, rapid and precise method for the determination of gal...
  3. PETALS: Proteomic Evaluation and Topological Analysis of a mutated Locus' Signaling



    Colon cancer is driven by mutations in a number of genes, the m...

  4. Treatment of severe falciparum malaria: quinine versus artesunate Background: Malaria is the most important disease of human being. More than 40% of the world’s ...
  5. REVIEW ON QUALITY SAFETY AND LEGISLATION FOR HERBAL PRODUCTS In the last few decades, there has been exponential growth in the field of herbal medicine. The grow...

Otros recursos de la mismacolección

  1. MIST : Mlgrate The Storage Too We address the problem of migration of local storage of desktop users to remote sites. Assuming a ne...
  2. Computational And Combinatorial Problems On Some Geometric Proximity Graphs In this thesis, we focus on the study of computational and combinatorial problems on various geometr...
  3. Automatic Data Allocation, Buffer Management And Data Movement For Multi-GPU Machines Multi-GPU machines are being increasingly used in high performance computing. These machines are bei...
  4. Tiling Stencil Computations To Maximize Parallelism Stencil computations are iterative kernels often used to simulate the change in a discretized spatia...
  5. Model-Checking Infinite-State Systems For Information Flow Security Properties Information flow properties are away of specifying security properties of systems ,dating back to th...

Aviso de cookies: Usamos cookies propias y de terceros para mejorar nuestros servicios, para análisis estadístico y para mostrarle publicidad. Si continua navegando consideramos que acepta su uso en los términos establecidos en la Política de cookies.