python使用grid-search调参

来源:互联网 发布:java动态代理模式 编辑:程序博客网 时间:2024/06/03 18:49

博文参考:http://blog.csdn.net/abcjennifer/article/details/23884761

目标就是解决:

vectorizer取多少个word呢?

预处理时候要过滤掉tf>max_df的words,max_df设多少呢?

tfidftransformer只用tf还是加idf呢?

classifier分类时迭代几次?学习率怎么设?
……..

本文对随机梯度下降和svm(rbf)进行了调参,针对的是知网期刊的文章分类。

需要注意的是:print sorted(pipeline.get_params().keys())

pipeline = Pipeline([
(‘vect’,CountVectorizer()),
(‘tfidf’,TfidfTransformer()),
(‘clf’,svm.SVC()),
]);

parameters = {
“clf__C”:[0.1, 1, 10],
“clf__gamma”: [1, 0.1, 0.01]

}
名字要对应。
随机梯度下降结果如下:

*************************Feature Extraction*************************Performing grid search...('pipeline:', ['vect', 'tfidf', 'clf'])parameters:{'clf__n_iter': (10, 50), 'clf__alpha': (1e-05, 1e-06), 'tfidf__use_idf': (True, False), 'vect__max_features': (None, 5000, 10000), 'vect__max_df': (0.5, 0.75)}Fitting 3 folds for each of 48 candidates, totalling 144 fits[Parallel(n_jobs=1)]: Done  49 tasks       | elapsed:  1.0min[Parallel(n_jobs=1)]: Done 144 out of 144 | elapsed:  3.1min finisheddone in 188.100s()Best score: 0.848    clf__alpha: 1e-05    clf__n_iter: 50    tfidf__use_idf: True    vect__max_df: 0.5    vect__max_features: None
svm结果如下:*************************Feature Extraction*************************['clf', 'clf__C', 'clf__cache_size', 'clf__class_weight', 'clf__coef0', 'clf__decision_function_shape', 'clf__degree', 'clf__gamma', 'clf__kernel', 'clf__max_iter', 'clf__probability', 'clf__random_state', 'clf__shrinking', 'clf__tol', 'clf__verbose', 'steps', 'tfidf', 'tfidf__norm', 'tfidf__smooth_idf', 'tfidf__sublinear_tf', 'tfidf__use_idf', 'vect', 'vect__analyzer', 'vect__binary', 'vect__decode_error', 'vect__dtype', 'vect__encoding', 'vect__input', 'vect__lowercase', 'vect__max_df', 'vect__max_features', 'vect__min_df', 'vect__ngram_range', 'vect__preprocessor', 'vect__stop_words', 'vect__strip_accents', 'vect__token_pattern', 'vect__tokenizer', 'vect__vocabulary']Fitting 3 folds for each of 9 candidates, totalling 27 fits[Parallel(n_jobs=1)]: Done  27 out of  27 | elapsed:  9.6min finishedThe best parameters are {'clf__gamma': 1, 'clf__C': 10} with a score of 0.85
0 0
原创粉丝点击