July 14, 2005
文書クラスタリングの手法・概観
Posted at July 14, 2005 07:40 PM in .
文書をクラスタリングする手法というのはそれこそ腐るほど提案されているのですが、それらの源流を辿ると概ねこいつらに行き着くらしいです。
(「これもリストに加えるべき」とか、「その手法ならこのペーパーのほうが良いよ」というのがあればコメントでご指摘下さいませ)
- Naive Beyes (ナイーブ・ベイス)
David D. Lewis and Marc Ringuette. A comparison of two learning algorithms for text categorization. In Proceed-ings of SDAIR-94, 3rd Annual Symposium on DocumentAnalysis and Information Retrieval.
http://citeseer.ist.psu.edu/rd/85006017%2C18549%2C1%2C0.25%2CDownload/http://citeseer.ist.psu.edu/cache/papers/cs/508/http:zSzzSzwww.cs.cmu.eduzSzafszSzcs.cmu.eduzSzuserzSzmnrzSzwwwzSzpaperszSzcateg.pdf/lewis94comparison.pdf - K-Nearest Neighbor (k近傍法)
Y. Yang and X. Liu. A re-examination of text categorization methods. In 22nd Annual International SIGIR, pages 42-49
http://www.cs.rit.edu/~dmrg/dm_winter/reading/re_examTCMethods.pdf - Decision Tree (決定木)
Rajeev Rastogi and Kyuseok Shim. PUBLIC: A decision tree classifier that integrates building and pruning. Data Mining and Knowledge Discovery, 4(4):315{344, 2000.
http://citeseer.ist.psu.edu/rd/85006017%2C16809%2C1%2C0.25%2CDownload/http://citeseer.ist.psu.edu/cache/papers/cs/2741/http:zSzzSzwww.bell-labs.comzSzprojectszSzserendipzSzpublic.pdf/rastogi98public.pdf - Support Vector Machine (サポートベクタマシン)
Thorsten Joachims. Text categorization with support vector machines: learning with many relevant features. In Proceedings of ECML-98, 10th European Conference on Machine Learning.
http://citeseer.ist.psu.edu/rd/65647369%2C553162%2C1%2C0.25%2CDownload/http://citeseer.ist.psu.edu/cache/papers/cs/26885/http:zSzzSzranger.uta.eduzSz%7EalpzSzixzSzreadingszSzSVMsforTextCategorization.pdf/joachims97text.pdf - Neural Network (ニューラルネットワーク)
Erik D. Wiener, Jan O. Pedersen, and Andreas S. Weigend. A neural network approach to topic spotting. In Proceedings of SDAIR-95, 4th Annual Symposium on Document Analysis and Information Retrieval
http://citeseer.ist.psu.edu/rd/85006017%2C84047%2C1%2C0.25%2CDownload/http://citeseer.ist.psu.edu/cache/papers/cs/320/http:zSzzSzwww.stern.nyu.eduzSz%7EaweigendzSzResearchzSzPaperszSzBEFORENYUzSztopic-spotting.pdf/wiener95neural.pdf
手始めにこの5本を読んで、あとはCiteSeerあたりを使ってこいつらをリファーした論文を適当に辿っていけば、文書クラスタリング博士になれますよ。たぶん。
Trackback
You can ping this entry by using http://windy.ac/MT/mt-tb.cgi/811 .
