tag search result for 'discrimination' return
G. Salton, C.S. Yang, and C.T. Yu
A Theory of Term Importance in Automatic Text Analysis
A Theory of Term Importance in Automatic Text Analysis
updated at: 2007/06/14 10:56:34
Carolyn J. Crouch and Bokyung Yang
Experiments in Automatic Statistical Thesaurus Construction
Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval.
pp. 77--88
1992
Experiments in Automatic Statistical Thesaurus Construction
Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval.
pp. 77--88
1992
well constructed thesaurus has long been recognized as a valuable tool in the effective operation of an information retrieval system. This paper reports the results of experiments designed to determine the validity of an approach to the automatic construction of global thesauri (described originally by Crouch in [1] and [2]) based on a clustering of the document collection. The authors validate the approach by showing that the use of thesauri generated by this method results in substantial improvements in retrieval effectiveness in four test collections. The term discrimination value theory, used in the thesaurus generation algorithm to determine a term¡Çs membership in a particular thesaurus class, is found not to be useful in distinguishing between thesaurus classes (i.e., in differentiating a ¡Ègood¡É from an ¡Èindifferent¡É or ¡Èpoor¡É thesaurus class). In conclusion, the authors suggest an alternate approach to automatic thesaurus construction which greatly simplifies the work of producing viable thesaurus classes. Experimental results show that the alternate approach described herein in some cases produces thesauri which are comparable in retrieval effectiveness to those produced by the first method at much lower cost.
The discrimination value of a term is defined as a
measure of the change in space separation which occurs
when a given term is assigned to the document collection.
A good discriminator is a term which, when assigned to a
document, decreases the space density (rendering the
documents less similar to each other). A poor
discriminator, then, increases space density. By computing
the density of the document space before and after the
assignment of each term, the discrimination value of the
term can be determined.
Empirical results have shown that document frequency and
discrimination value are well correlated.
measure of the change in space separation which occurs
when a given term is assigned to the document collection.
A good discriminator is a term which, when assigned to a
document, decreases the space density (rendering the
documents less similar to each other). A poor
discriminator, then, increases space density. By computing
the density of the document space before and after the
assignment of each term, the discrimination value of the
term can be determined.
Empirical results have shown that document frequency and
discrimination value are well correlated.
updated at: 2007/06/12 15:36:38