笔记-2012-Fast Online Training with Frequency-Adaptive Learning Rates for CWS and New
来源:互联网 发布:matlab根据数组画图 编辑:程序博客网 时间:2024/05/18 02:02
Fast Online Training with Frequency-Adaptive Learning Rates for Chinese Word Segmentation and New Word Detection
作者:香港理工大学,北京大学,Xu Sun , Houfeng Wang, Wenjie Li
出处:Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, pages 253–262,Jeju, Republic of Korea, 8-14 July 2012.
高维特征&改进online收敛算法
引言部分
中文分词中的主要问题是分词歧义。新词是引起歧义的重要原因之一。典型的新词,命名实体识别:例如组织机构名,地名,人名。
CRF、ME:将中文分词作为序列标注任务,已经是常规的分词方式(Xue, 2003; Peng et al., 2004; Tseng et al., 2005; Asahara et al., 2005; Zhao et al.,2010)为了取得高精度结果,更多统计量大的模型被用于分词,例如Semi-Markov assumptions 或 latent variables(Andrew, 2006; Sun et al., 2009b)
感知机:semi-Markov perceptron methods 或 voting systems based on multiple semi-Markov perceptron segmenters (Zhang and Clark, 2007;Sun, 2010)
CRF模型普通特征训练已经很费时,如果增加高维特征,将使训练速度更慢。感知机的模型比CRF模型的训练速度快,但是问题是,不输出概率值,只输出分类。
新词发现也是中文分词的重要任务,主要方法有(J. Nie and Jin, 1995; Chen and Bai, 1998;Wu and Jiang, 2000; Peng et al., 2004; Chen and Ma, 2002; Zhou, 2005; Goh et al., 2003; Fu and Luke, 2004; Wu et al., 2011)
使用语料
Sighan 2005 MSR,CU,PKU
正文
算法
其实还是利用CRF的方法。只是改进了算法的收敛方式ADF。一般常用的online方法是SGD,本文ADF是在此基础上改进,包括stochastic meta descent (Vishwanathan et al., 2006) and periodic step-size adaptation online learning (Hsu et al., 2009)。原则是:高频learning rate低,低频learning rate 高;高频已经被充分学习,低频可提高收敛速度。
特征
CRF的特征与之前的论文比较,加入了词典特征:词典一开始是由训练语料生成,后来CRF对测试语料分词,会产生一些新词,这些新词如果大于给定阈值,将被加入到词典中。
被加入的词典特征包括:
从x0(包含x0)向左(6个字以内)是否是词。从x0(包含x0)向右(6个字以内)是否是词。从x0(不包含x0)向左(6个字以内)是否是词。从x0(不包含x0)向右(6个字以内)是否是词。
结果
分词结果:MSR最好成绩97.4,CU最好成绩94.8,PKU最好成绩95.4
训练时间:ADF在大概10次迭代就可以达到SGD50次迭代的效果,时间可省2/3或3/4。
论文最后用ADF+词典特征+1次(不反复迭代添加新词)与使用SIGHAN2005的几个系统做比较:Best05 (Tseng et al., 2005),CRF + rule-system (Zhang et al., 2006),Semi-Markov perceptron (Zhang and Clark, 2007) ,Semi-Markov CRF (Gao et al., 2007),Latent-variable CRF (Sun et al., 2009b),在MSR与PKU语料上成绩最高。
- 笔记-2012-Fast Online Training with Frequency-Adaptive Learning Rates for CWS and New
- 【Machine Learning】笔记:Adaptive learning rates
- [论文笔记]Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
- 《Fast and Accurate Inference with Adaptive Ensemble Prediction in Image Classification阅读笔记
- OLAT是"Online Learning and Training"--
- Learning Policies for Adaptive Tracking with Deep Feature Cascades
- Adaptive Learning for Multi-Agent Coodination and Control
- 笔记-2008-An Empirical Comparison of Goodness Measures for Unsupervised CWS with a ~
- 论文笔记 | Metric Learning with adaptive density discrimination
- 笔记-2009-An Error-Driven Word-Character Hybrid Model for Joint CWS and POS Tagging
- Online Object Tracking, Learning and Parsing with And-Or Graphs
- 学习笔记:LAB Feature with Feature-centric Cascade for Fast and Accurate Face Detection
- Adaptive Hashing for Fast Similarity Search
- Learning to Divide and Conquer for Online Multi-Target Tracking
- Adaptive Deconvolutional Networks for Mid and High Level Feature Learning(阅读)
- Adaptive Deconvolutional Networks for Mid and High Level Feature Learning(阅读)
- 笔记:Learning Fast Low-Rank Projection for Image Classification
- 论文笔记 A Large Contextual Dataset for Classification,Detection and Counting of Cars with Deep Learning
- hdu 3081 Marriage Match II
- Page-encoding specified in XML prolog (UTF-8) is different from that specified in page directive (ut
- STM32菜鸟成长记录---GPIO的使用
- error: .repo/manifests/: contains uncommitted changes 解决方法
- 朴素的java数据库连接池实现(一)
- 笔记-2012-Fast Online Training with Frequency-Adaptive Learning Rates for CWS and New
- 爱情公寓3不光抄袭美剧了,竟然还抄袭了英国的IT狂人的剧情
- 去参加一个聚会的心得体会
- PI.实时数据库系统---详细介绍
- VB6连接PI实时数据库
- Spring启动太慢?&Spring 配置中的 default-lazy-init="false"
- Nessus 安装笔记
- c++构造函数
- 数据库事务的四大特性:ACID