Valuing Semantic Similarity

Abdoulahi Boubacar, Zhendong Niu

Abstract


Similarity is a tool widely used in various domains such as DNA sequence analysis, knowledge representation, natural language processing, data mining, information retrieval, and information flow. Computing semantic similarity between two entities is a non-trivial task. There are many ways to define semantic similarity. Some measures have been proposed combining both statistical information and lexical similarity. It is difficult for a measure that performs well in a given domain to be applied with accuracy in another domain. Similarity measure may perform better with one language than another. Word is supposed to be not only similar to itself but also to some of its synonyms in a given context and some words with common roots. Our approach is designed to perform query matching and compute semantic relatedness using word occurrences. It performs better than classical measures like TF-IDF and Cosine. Although it is not a metric, the proposed similarity measure can be used for a wide range of content analysis tasks based on semantic distance and its efficacy has been demonstrated. The measure is not corpus dependent so it can establish directly the semantic relatedness of two entities.

 

DOI: http://dx.doi.org/10.11591/telkomnika.v12i8.6034 


Keywords


Semantic similarity, Semantic Relatedness, Information retrieval

Full Text:

PDF

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License