tag search result for 'coefficient' return
James R. Curran and Marc Moens.
Improvements in Automatic Thesaurus Extraction.
In Proceedings of the Workshop of the ACL Special Interest Group on the Lexicon (SIGLEX),
pp. 59-66,
2002.
Improvements in Automatic Thesaurus Extraction.
In Proceedings of the Workshop of the ACL Special Interest Group on the Lexicon (SIGLEX),
pp. 59-66,
2002.
Abstract: The use of semantic resources is common in modern NLP systems, but methods to extract lexical semantics have only recently begun to perform well enough for practical use. We evaluate existing and new similarity metrics for thesaurus extraction, and experiment with the tradeoff between extraction performance and efficiency. We propose an approximation algorithm, based on canonical attributes and coarse- and fine-grained matching, that reduces the time complexity and execution time of thesaurus extraction with only a marginal performance penalty.
thesaurus extraction systems -> differ in the definition of "context"
used a statistical shallow parser
frequency cutoff speeds up the calculation, but doesn't decrease the performance
misc. topics: weights, measures, cutoff frequency, speed-up by canonical vectors
canonical vectors: subj+dobj+iobj, TTestLog + maximum frequency cutoff
used a statistical shallow parser
frequency cutoff speeds up the calculation, but doesn't decrease the performance
misc. topics: weights, measures, cutoff frequency, speed-up by canonical vectors
canonical vectors: subj+dobj+iobj, TTestLog + maximum frequency cutoff
updated at: 2007/07/07 17:25:42
Young Mee Chung and Jae Yun Lee.
A corpus-based approach to comparative evaluation of statistical term association measures.
Journal of the American Society for Information Science and Technology.
volume 52, issue 4, pages 283--296,
2001.
A corpus-based approach to comparative evaluation of statistical term association measures.
Journal of the American Society for Information Science and Technology.
volume 52, issue 4, pages 283--296,
2001.
Statistical association measures have been widely applied in information retrieval research, usually employing a clustering of documents or terms on the basis of their relationships. Applications of the association measures for term clustering include automatic thesaurus construction and query expansion. This research evaluates the similarity of six association measures by comparing the relationship and behavior they demonstrate in various analyses of a test corpus. Analysis techniques include comparisons of highly ranked termpairs and term clusters, analyses of the correlation among the association measures using Pearson¡Çs correlation coefficient and MDS mapping, and an analysis of the impact of a term frequency on the association values by means of z-score. The major findings of the study are as follows: First, the most similar association measures are mutual information and Yule¡Çs coefficient of colligation Y, whereas cosine and Jaccard coefficients, as well as x2 statistic and likelihood ratio, demonstrate quite similar behavior for terms with high frequency. Second, among all the measures, the x2 statistic is the least affected by the frequency of terms. Third, although cosine and Jaccard coefficients tend to emphasize high frequency terms, mutual information and Yule¡Çs Y seem to overestimate rare terms.
updated at: 2007/06/12 22:02:28