xcit'ed

- paper management system

xcit'ed

- paper management system

 

by matton

tag search result for 'maximum' return

search: 
add new paper
James R. Curran and Marc Moens.
Improvements in Automatic Thesaurus Extraction.
In Proceedings of the Workshop of the ACL Special Interest Group on the Lexicon (SIGLEX),
pp. 59-66,
2002.
Abstract: The use of semantic resources is common in modern NLP systems, but methods to extract lexical semantics have only recently begun to perform well enough for practical use. We evaluate existing and new similarity metrics for thesaurus extraction, and experiment with the tradeoff between extraction performance and efficiency. We propose an approximation algorithm, based on canonical attributes and coarse- and fine-grained matching, that reduces the time complexity and execution time of thesaurus extraction with only a marginal performance penalty.
thesaurus extraction systems -> differ in the definition of "context"
used a statistical shallow parser
frequency cutoff speeds up the calculation, but doesn't decrease the performance
misc. topics: weights, measures, cutoff frequency, speed-up by canonical vectors

canonical vectors: subj+dobj+iobj, TTestLog + maximum frequency cutoff
updated at: 2007/07/07 17:25:42
Chris Ding and Hanchuan Peng.
Minimum Redundancy Feature Selection from Microarray Gene Expression Data,
Proceedings of the IEEE Computer Society Conference on Bioinformatics,
pp. 523-528, 2003.
Motivation. How to selecting a small subset out of the thousands of genes in microarray data is important for accurate classification of phenotypes. Widely used methods typically rank genes according to their differential expressions among phenotypes and pick the top-ranked genes. We observe that feature sets so obtained have certain redundancy and study methods to minimize it. Results. We propose a minimum redundancy – maximum relevance (MRMR) feature selection framework. Genes selected via MRMR provide a more balanced coverage of the space and capture broader characteristics of phenotypes. They lead to significantly improved class predictions in extensive experiments on 5 gene expression data sets: NCI, Lymphoma, Lung, Leukemia and Colon. Improvements are observed consistently among 4 classification methods: Na¾­„Įve Bayes, Linear discriminant analysis, Logistic regression and Support vector machines. Supplimentary: The top 60 MRMR genes for each of the dataset are listed in http://www.nersc.gov/~cding/MRMR/
updated at: 2007/03/06 13:30:45