条件随机场(CRF -- Conditional Random Fields)的开源库收集

来源：互联网发布：s7-300编程手册编辑：程序博客网时间：2024/06/06 19:52

http://flexcrfs.sourceforge.net/
FlexCRFs: Flexible Conditional Random Fields

FlexCRFs is a conditional random field toolkit for segmenting and labeling sequence data written in C/C++ using STL library. It was implemented based on the theoretic model presented in (Lafferty et al. 2001) and (Sha and Pereira 2003). The toolkit uses L-BFGS (Liu and Nocedal 1989) - an advanced convex optimization procedure - to train CRF models. FlexCRFs was designed to deal with hundreds of thousand data sequences and millions of features. FlexCRFs supports both first-order and second-order Markov CRFs. We have tested FlexCRFs on Linux (Red Hat, Fedora, Ubuntu), Sun Solaris, and MS Windows with MS Visual C++.

PCRFs is a parallel version of FlexCRFs that allows us to train conditional random fields on massively parallel processing systems supporting Message Passing Interface (MPI). PCRFs helps to train conditional random fields on large-scale datasets containing up to millions of data sequences. We have tested PCRFs on large parallel systems, such as Cray XT3, SGI Altix, and IBM SP.

http://crfpp.googlecode.com/svn/trunk/doc/index.html
CRF++: Yet Another CRF toolkit

CRF++ is a simple, customizable, and open source implementation of Conditional Random Fields (CRFs) for segmenting/labeling sequential data. CRF++ is designed for generic purpose and will be applied to a variety of NLP tasks, such as Named Entity Recognition, Information Extraction and Text Chunking.

Last Update: 2013-02-13

Initial Release: 2005-05-28

http://www.chokkan.org/software/crfsuite/

CRFsuite

A fast implementation of Conditional Random Fields (CRFs)
Introduction

CRFsuite is an implementation of Conditional Random Fields (CRFs) [Lafferty 01][Sha 03][Sutton] for labeling sequential data. Among the various implementations of CRFs, this software provides following features.

Fast training and tagging. The primary mission of this software is to train and use CRF models as fast as possible. See the benchmark result for more information.
Simple data format for training and tagging. The data format is similar to those used in other machine learning tools; each line consists of a label and attributes (features) of an item, consequtive lines represent a sequence of items (an empty line denotes an end of item sequence). This means that users can design an arbitrary number of features for each item, which is impossible in CRF++.
State-of-the-art training methods. CRFsuite implements:
      Limited-memory BFGS (L-BFGS) [Nocedal 80]
      Orthant-Wise Limited-memory Quasi-Newton (OWL-QN) method [Andrew 07]
      Stochastic Gradient Descent (SGD) [Shalev-Shwartz 07]
      Averaged Perceptron [Collins 02]
      Passive Aggressive [Crammer 06]
      Adaptive Regularization Of Weight Vector (AROW) [Mejer 10]
Forward/backward algorithm using the scaling method[Rabiner 90]. The scaling method seems faster than computing the forward/backward scores in logarithm domain.
Linear-chain (first-order Markov) CRF.
Performance evaluation on training. CRFsuite can output precision, recall, F1 scores of the model evaluated on test data.
An efficient file format for storing/accessing CRF models using Constant Quark Database (CQDB). It takes a little time to start up a tagger since a preparation is done only by reading an entire model file to a memory block. Retriving the weight of a feature is also very quick.

C++/SWIG API. CRFsuite provides an easy-to-use API for C++ language (crfsuite.hpp). CRFsuite also provides the SWIG interface for various languages (e.g., Python) on top of the C++ API. See the API Documentation for more information.

Last Update: 2011-08-11

Wapiti - A simple and fast discriminative sequence labelling toolkit

http://wapiti.limsi.fr/

Wapiti is a very fast toolkit for segmenting and labeling sequences with discriminative models. It is based on maxent models, maximum entropy Markov models and linear-chain CRF and proposes various optimization and regularization methods to improve both the computational complexity and the prediction performance of standard models. Wapiti is ranked first on the sequence tagging task for more than a year on MLcomp web site.

HCRF library (including CRF and LDCRF)

http://hcrf.sourceforge.net/

其他开发语言编写的库：

Java:

CRF Project: http://crf.sourceforge.net/

MALLET:http://mallet.cs.umass.edu/

GRMM:http://mallet.cs.umass.edu/grmm/

Python：

SGD:http://leon.bottou.org/projects/sgd

Scala:

FACTORIE:http://factorie.cs.umass.edu/

欢迎来到我的CSDN博客：http://blog.csdn.net/anshan1984/