Rethinking "A refinement..."
来源:互联网 发布:中级会计网络课程 编辑:程序博客网 时间:2024/05/17 19:17
a paper "Hierarchically classifying documents using very few words" gives a better explanation about the question why refinement works without overfitting. this paper proposes a new classification method in the manner of hierarchy. the procedure is same as "A refinement approach to handling model misfit in text categorization"(binary classifier) but more complex and manual(note that this is not a binary classifier). the hierarchy is constructed by mutual information and feature selection. following is the main idea:
"...The flattened classifier loses the intuition that topics that are close to each other in the hierarchy have a lot more in common with each other, in general, than topics that are very apart.Therefore, even when it is difficult to find the precise topic of a document, it may be easy to decide whether it is about "agriculture" or about "computers".
...
The key insight is that each of these subtasks is significantly simpler than the original task..."
corresponding to "A refinement...", its procedure is implicit: there is no mutual information to deciding features contained in nodes like decision tree, rahter, like boosting, operating on misclassified examples. the effect should be same: get rid of confusing, noisy and irrelevant examples(or words) by selecting misclassification examples(don't need to considering correct classfifed examples). for binary classification, this explanation is problematic: the category number is one. I think the explanation should be: raher than sematic words noisy, noisy in binary classification due to data skew, the words in training examples is not uniform distribution, so the item P(w|c) is not normlaized. keeping in mind misclassification examples can alleviate this situation.
next problem is overfitting, according to above explanation, it is inevitable. because the words distribution reflected by classifier is just training examples distribution. may be the experiment in "A refinement..." is biased, specially the second data collection Usenet.
- Rethinking "A refinement..."
- Thinking about a paper "A Refinement Approach to Handling Model Misfit in Text Categorization"
- Successive Refinement
- Burberry outlet It is a truth that pile up ordinary women affable supply copy refinement items
- 多阶段细化分割-iccv2017-A Stagewise Refinement Model for Detecting Salient Objects in Images
- Rethinking Swing Threading
- Rethinking Recommendation Engines
- Rethinking artificial intelligence
- Rethinking on Java Thread
- Coding, Testing, and Further Refinement
- refinement and coarsening of Nash Eqlilibium
- An Example of Stepwise Refinement(BanishWinter)
- Rethinking Design Patterns - from Jeff Atwood
- Rethinking the origins of the universe
- 论文选读Global Refinement of Random Forest
- rethinking the inception architecture for computer vision
- Rethinking the Inception Architecture for Computer Vision
- Finite Element Method with Adaptive Refinement
- 感触我的2006~2007
- ASP.NET 程序中常用的三十三种代码
- C#日期函数所有样式大全
- STL 相关
- PetShop数据访问层之消息处理
- Rethinking "A refinement..."
- C#数组篇讲解
- C#中的数组(Array)
- 等差数列问题
- 黎怡兰(我的导师)谈2006年软件业自主创新
- 如何让你的SQL运行得更快
- C语言编程常见问题解答之常用函数的包含文件
- 单链表的排序(选择排序)
- 学习STL之二