Factorization Machines
来源:互联网 发布:大数据开发语言 编辑:程序博客网 时间:2024/06/17 23:43
pdftotext工具推荐
sudo apt-get install gocr
pdftotext xxx.pdf a.txt
Factorization Machines
Steffen Rendle
Department of Reasoning for Intelligence
The Institute of Scientific and Industrial Research
Osaka University, Japan
rendle@ar.sanken.osaka-u.ac.jp
LibFFM
下载:http://www.csie.ntu.edu.tw/~r01922136/libffm/
安装:make
训练:./ffm-train bigdata.tr.txt model
预测:./ffm-predict bigdata.te.txt model output
bigdata.tr.txt 数据格式
The data format of LIBFFM is: <label> <field1>:<index1>:<value1> <field2>:<index2>:<value2> ...
field
and index
是非负整数
参看head -1 bigdata.tr.txt
,label =1 , field有16个
1 0:0:0.3651 2:1163:0.3651 3:8672:0.3651 4:2183:0.3651 5:2332:0.3651 6:185:0.3651 7:2569:0.3651 8:8131:0.3651 9:5483:0.3651 10:215:0.3651 11:1520:0.3651 12:1232:0.3651 13:2738:0.3651 14:2935:0.3651 15:5428:0.3651 17:2434:0.50000 16:7755:0.50000
代码解读:
# ffm-train.cpp int train(Option opt){ ffm_problem *tr = ffm_read_problem(opt.tr_path.c_str()); ffm_problem *va = nullptr; va = ffm_read_problem(opt.va_path.c_str()); if(opt.do_cv) { ffm_cross_validation(tr, opt.nr_folds, opt.param); } else { ffm_model *model = ffm_train_with_validation(tr, va, opt.param); status = ffm_save_model(model, opt.model_path.c_str()); ffm_destroy_model(&model); } ffm_destroy_problem(&tr); ffm_destroy_problem(&va); }# 主要在ffm_train_with_validation# ffm.ccpffm_train_with_validation(){ ... shared_ptr<ffm_model> model = train(tr, order, param, va); ...}train( ffm_problem *tr, vector<ffm_int> &order, ffm_parameter param, ffm_problem *va=nullptr){shared_ptr<ffm_model> model = shared_ptr<ffm_model>(init_model(tr->n, tr->m, param), [] (ffm_model *ptr) { ffm_destroy_model(&ptr); });}
ffm-train
Command Line Usage==================- `ffm-train' usage: ffm-train [options] training_set_file [model_file] options: -l <lambda>: set regularization parameter (default 0) -k <factor>: set number of latent factors (default 4) -t <iteration>: set number of iterations (default 15) -r <eta>: set learning rate (default 0.1) -s <nr_threads>: set number of threads (default 1) -p <path>: set path to the validation set -v <fold>: set the number of folds for cross-validation --quiet: quiet model (no output) --no-norm: disable instance-wise normalization --no-rand: disable random update --on-disk: perform on-disk training (a temporary file <training_set_file>.bin will be generated) By default we do instance-wise normalization. That is, we normalize the 2-norm of each instance to 1. You can use `--no-norm' to disable this function. By default, our algorithm randomly select an instance for update in each inner iteration. On some datasets you may want to do update in the original order. You can do it by using `--no-rand' together with `-s 1.' If you do not have enough memory, then you can use `--on-disk' to do disk-level training. Two restrictions when you use this mode: 1. So far we do not allow random update in the mode, so please use `--no-rand' if you want to do on-disk training. 2. Cross-validation in this mode is not yet supported. A binary file `training_set_file.bin' will be generated to store the data in binary format.
ffm-predict
usage: ffm-predict test_file model_file output_file
Example
> ffm-train bigdata.tr.txt modeltrain a model using the default parameters> ffm-train -l 0.001 -k 16 -t 30 -r 0.05 -s 4 bigdata.tr.txt modeltrain a model using the following parameters: regularization cost = 0.001 latent factors = 16 iterations = 30 learning rate = 0.05 threads = 4> ffm-train -p bigdata.te.txt bigdata.tr.txt modeluse bigdata.te.txt as validation set> ffm-train -v 5 bigdata.tr.txtdo five fold cross validation> ffm-train --quiet bigdata.tr.txtdo not print message to screen> ffm-predict bigdata.te.txt model outputdo prediction> ffm-train --no-rand --on-disk bigdata.tr.txtperform on-disk training
Library Usage
These structures and functions are declared in the header file ffm.h.' You
ffm.h
need to #includein your C/C++ source files and link your program with
ffm.cpp.You can see
ffm-train.cppand
ffm-predict.cpp` for examples showing how to use them.
There are four public data structures in LIBFFM.- struct ffm_node { ffm_int f; // field index ffm_int j; // column index ffm_float v; // value }; Each `ffm_node' represents a non-zero element in a sparse matrix.- struct ffm_problem { ffm_int n; // number of features ffm_int l; // number of instances ffm_int m; // number of fields ffm_node *X; // non-zero elements ffm_long *P; // row pointers ffm_float *Y; // labels };- struct ffm_parameter { ffm_float eta; ffm_float lambda; ffm_int nr_iters; ffm_int k; ffm_int nr_threads; bool quiet; bool normalization; bool random; };`ffm_parameter' represents the parameters used for training. The meaning of each variable is: variable meaning default ============================================================ eta learning rate 0.1 lambda regularization cost 0 nr_iters number of iterations 15 k number of latent factors 4 nr_threads number of threads used 1 quiet no outputs to stdout false normalization instance-wise normalization false raondom randomly select instance in SG true To obtain a parameter object with default values, use the function `ffm_get_default_param.'- struct ffm_model { ffm_int n; // number of features ffm_int m; // number of fields ffm_int k; // number of latent factors ffm_float *W; // store model values bool normalization; // do instance-wise normalization };
function
- ffm_parameter ffm_get_default_param(); Get default parameters.- ffm_int ffm_save_model(struct ffm_model const *model, char const *path); Save a model. It returns 0 on sucess and 1 on failure.- struct ffm_model* ffm_load_model(char const *path); Load a model. If the model could not be loaded, a nullptr is returned.- void ffm_destroy_model(struct ffm_model **model); Destroy a model.- struct ffm_model* ffm_train( struct ffm_problem const *prob, ffm_parameter param); Train a model.- struct ffm_model* ffm_train_with_validation( struct ffm_problem const *Tr, struct ffm_problem const *Va, ffm_parameter param); Train a model with training set 'Tr' and validation set 'Va.' The logloss of the validation set is printed at each iteration.- ffm_float ffm_cross_validation( struct ffm_problem const *prob, ffm_int nr_folds, ffm_parameter param); Do cross validation with 'nr_folds' folds.- ffm_float ffm_predict(ffm_node *begin, ffm_node *end, ffm_model *model); Do prediction. 'begin' and 'end' are pointers to specify the beginning and ending position of the instance to be predicted.
- Factorization Machines
- Factorization Machines with libFM
- Factorization Machines 学习笔记
- 【每周一文】Factorization Machines
- 分解机(Factorization Machines)推荐算法
- Factorization Machines 学习笔记(一)预测任务
- Factorization Machines 学习笔记(二)模型方程
- Factorization Machines 学习笔记(三)回归和分类
- Factorization Machines 学习笔记(四)学习算法
- Factorization Machines 学习笔记(一)预测任务
- Factorization Machines 学习笔记(二)模型方程
- Factorization Machines 学习笔记(三)回归和分类
- Factorization Machines 学习笔记(四)学习算法
- Factorization Machines 学习笔记(一)预测任务
- Factorization Machines 学习笔记(二)模型方程
- Factorization Machines 学习笔记(三)回归和分类
- Factorization Machines 学习笔记(四)学习算法
- Factorization Machines 学习笔记(一)预测任务
- Struts的ModelDriven接口的使用
- Ruby on Rails,一对多关联(One-to-Many)
- java之 ------ 类的封装、继承和多态(三)
- 2015 CSUST校赛 - 超级快速幂【费马小定理】+【快速幂取模】
- RailsCasts中文版,#19 Where Administration Goes 为页面增加权限校验1
- Factorization Machines
- RailsCasts中文版,#20 Restricting Access 为页面增加权限校验2
- 矩阵快速幂 poj 3070
- Android开发《二》短信发送
- wince7下SD host驱动学习笔记
- java之 ------ 类的封装、继承和多态(四)
- RailsCasts中文版,#16 Virtual Attributes 虚拟属性
- 系统入门(12):Android之system_server与zygote之作用
- Java多线程与并发应用-(10)-java阻塞队列实现ArrayBlockingQueue