- paper management system
by matton
James R. Curran and Marc Moens. Improvements in Automatic Thesaurus Extraction. In Proceedings of the Workshop of the ACL Special Interest Group on the Lexicon (SIGLEX), pp. 59-66, 2002.
Abstract: The use of semantic resources is common in modern NLP systems, but methods to extract lexical semantics have only recently begun to perform well enough for practical use. We evaluate existing and new similarity metrics for thesaurus extraction, and experiment with the tradeoff between extraction performance and efficiency. We propose an approximation algorithm, based on canonical attributes and coarse- and fine-grained matching, that reduces the time complexity and execution time of thesaurus extraction with only a marginal performance penalty.
[automatic thesaurus construction][context extraction][similarity measure][vector space model][jaccard coefficient] [weight function][cosine][dice][precision recall][frequency cutoff][canonical attribute][canonical vector][maximum cutoff]
thesaurus extraction systems -> differ in the definition of "context" used a statistical shallow parser frequency cutoff speeds up the calculation, but doesn't decrease the performance misc. topics: weights, measures, cutoff frequency, speed-up by canonical vectors canonical vectors: subj+dobj+iobj, TTestLog + maximum frequency cutoff