Abstract
Characterizing evolving behavior of document vectors helps in identifying similarity between text documents. As document vectors contain terms and their importances in documents, discovering association and disassociation between terms is very important. This paper introduces characterization of evolving behavior of document vectors to identify similar and dissimilar segments in document vectors. This approach is particularly suitable where document vectors contain similar patterns of term occurrences but the patterns could be away from each other with regard to distance. The main objective of this paper is to capture evolving structure of context vector, document vector of contextually related terms, for discovering similarity between them. Context vector reduces the size of document vector from 6 to 12.57%. Evaluation is done by clustering the documents using Unweighted Pair Group Method with Arithmetic Mean with standard datasets. This results in formation of clusters with better entropy and purity. Mann–Whitney–Wilcoxon U test demonstrates statistically significant quality enhancement.
http://ift.tt/2ihOiSm
http://ift.tt/2hDXgdg
Δεν υπάρχουν σχόλια:
Δημοσίευση σχολίου