SVMlight windows使用
来源:互联网 发布:潘安到底有多帅 知乎 编辑:程序博客网 时间:2024/05/29 07:03
首先在http://svmlight.joachims.org/下载适合自己系统的SVMlight版本。
本文以Windows为例。下载下来svm_classify.exe和svm_learn.exe。
在它的网站上有例子可以下载试用。
输入的数据可以有多种文件格式,txt,log,dat等。
SVM输入文件的数据格式为: #SVMlight和LibSVM相同
[label] [index1]:[value1] [index2]:[value2] …
[label] [index1]:[value1] [index2]:[value2] …
label
或说是class, 就是你要分类的种类,通常是一些整数。
index
是有顺序的索引,通常是连续的整数。
value
就是用来 train 的数据,通常是一堆实数。
每一行都是如上的结构, 意思就是: 我有一排数据, 分别是 value1, value2, …. value, (而且它们的顺序已由 index 分别指定),这排数据的分类结果就是label。
或许你会不太懂,为什么会是 value1,value2,…. 这样一排呢? 这牵涉到 SVM 的原理。你可以这样想(我没说这是正确的), 它的名字就叫 Support “Vector” Machine,所以输入的训练数据是 “Vector”(向量), 也就是一排的 x1, x2, x3, … 这些值就是 value,而 x[n] 的n就是由index 指定。这些东西又称为 “(属性)attribute”。
真实的情况是,大部份时候我们给定的数据可能有很多 “特征(feature)” 或说 “属性(attribute)”,所以输入会是一组的。举例来说,以前面点分区的例子 来说,我们不是每个点都有 X 跟 Y 的坐标吗?所以它就有两种属性。假定我有两个点: (0,3) 跟 (5,8) 分别在 label(class) 1 跟 2 ,那就会写成
1 1:0 2:3
2 1:5 2:8
同理,空间中的三维坐标就等于有三组属性。这种文件格式最大的好处就是可以使用稀疏矩阵(sparse matrix),或说有些数据的属性可以有缺失。
在命令行输入
svm_learn [options] example_file model_file
可选的参数有:
General options: -? - this help -v [0..3] - verbosity level (default 1)Learning options: -z {c,r,p} - select between classification (c), regression (r), and preference ranking (p) (see [Joachims, 2002c]) (default classification) -c float - C: trade-off between training error and margin (default [avg. x*x]^-1) -w [0..] - epsilon width of tube for regression (default 0.1) -j float - Cost: cost-factor, by which training errors on positive examples outweight errors on negative examples (default 1) (see [Morik et al., 1999]) -b [0,1] - use biased hyperplane (i.e. x*w+b0) instead of unbiased hyperplane (i.e. x*w0) (default 1) -i [0,1] - remove inconsistent training examples and retrain (default 0)Performance estimation options: -x [0,1] - compute leave-one-out estimates (default 0) (see [5]) -o ]0..2] - value of rho for XiAlpha-estimator and for pruning leave-one-out computation (default 1.0) (see [Joachims, 2002a]) -k [0..100] - search depth for extended XiAlpha-estimator (default 0)Transduction options (see [Joachims, 1999c], [Joachims, 2002a]): -p [0..1] - fraction of unlabeled examples to be classified into the positive class (default is the ratio of positive and negative examples in the training data)Kernel options: -t int - type of kernel function: 0: linear (default) 1: polynomial (s a*b+c)^d 2: radial basis function exp(-gamma ||a-b||^2) 3: sigmoid tanh(s a*b + c) 4: user defined kernel from kernel.h -d int - parameter d in polynomial kernel -g float - parameter gamma in rbf kernel -s float - parameter s in sigmoid/poly kernel -r float - parameter c in sigmoid/poly kernel -u string - parameter of user defined kernelOptimization options (see [Joachims, 1999a], [Joachims, 2002a]): -q [2..] - maximum size of QP-subproblems (default 10) -n [2..q] - number of new variables entering the working set in each iteration (default n = q). Set n<q to prevent zig-zagging. -m [5..] - size of cache for kernel evaluations in MB (default 40) The larger the faster... -e float - eps: Allow that error for termination criterion [y [w*x+b] - 1] = eps (default 0.001) -h [5..] - number of iterations a variable needs to be optimal before considered for shrinking (default 100) -f [0,1] - do final optimality check for variables removed by shrinking. Although this test is usually positive, there is no guarantee that the optimum was found if the test is omitted. (default 1) -y string -> if option is given, reads alphas from file with given and uses them as starting point. (default 'disabled') -# int -> terminate optimization, if no progress after this number of iterations. (default 100000)Output options: -l char - file to write predicted labels of unlabeled examples into after transductive learning -a char - write all alphas to this file after learning (in the same order as in the training set)
A more detailed description of the parameters and how they link to the respective algorithms is given in the appendix of [Joachims, 2002a].
The input file example_file contains the training examples. The first lines may contain comments and are ignored if they start with #. Each of the following lines represents one training example and is of the following format:
<target> .=. +1 | -1 | 0 | <float>
<feature> .=. <integer> | "qid"
<value> .=. <float>
<info> .=. <string>
http://blog.sina.com.cn/s/blog_53c2bcbb01008zi5.html
- SVMlight windows使用
- svmlight使用心得
- svmlight使用总结
- SVMlight工具包使用说明(windows)
- svmlight使用说明
- libsvm, liblinear, svmlight简介
- 基于SVMLight的文本分类
- 了解svmlight的模型策略算法
- java里出现了no svmlight in java.library.path
- Windows使用
- windows使用
- windows 使用
- C++和Matlab混合编程——MATLAB调用c++函数(以svmlight为例)
- Windows使用若干技巧
- 使用Linux备份Windows
- 使用Windows的快捷键
- WINDOWS消息使用详解
- Dialog & Windows 使用技巧
- jquery mobile跳转到指定id时怎样传递参数
- DSP28335—CMD文件中的各个段解释
- CSS z-index 属性的使用方法和层级树的概念
- 在ASP.NET中使用JSON
- Mysql errno:150
- SVMlight windows使用
- 逆序输出
- jquery 多维数组实现PHP的count、in_array(不区分大小写),utf8字符串(包含中文,中文字符2个字节)字节统计
- WPF中使用webbrowser调用javascript代码
- svn
- openCV 直方图比较compareHist函数
- Red Hat Enterprise Linux 5 64位安装Mysql5.6.24(DB5.6.24.rpm for rhel5 x86)
- 排列数组
- UVA - 116 - Unidirectional TSP (简单DP + 打印路径)