ベイズ最適化とグリッドサーチの比較

1 Star2 Stars3 Stars4 Stars5 Stars (まだ評価されていません)
Loading...

概要

機械学習におけるパラメータチューニングには、グリッドサーチがよく使われる。
しかし、このような総当り的手法では計算に時間がかかる。
そこで、自動で効率よく最適なパラメータを選択する手法として、ベイズ最適化が考えられる。
今回、SVMによる分類問題にグリッドサーチとベイズ最適化を適用し、比較を行った。

結果

手法 計算時間(秒) 試行回数 F1-Score
ベイズ最適化 12.902 20 0.99
グリッドサーチ 41.947 132 0.98

結論

同等の予測精度を得るのに要した時間は、ベイズ最適化の方が短い。ベイズ最適化の効率の良さが実際に確認できた。

今後の課題

データ

scikit-learnが提供するきれいなサンプルデータ(digits)を用いた。実データでも試してみたい。

アルゴリズム

ディープラーニングのような多大な計算時間を要するアルゴリズムでこそ、ベイズ最適化の真価が発揮される。実際に試してみたい。

ライブラリ選定

今回はPythonで手軽に使えそうという理由でGPyOptを選択。しかし、他にも色々な候補があるため、比較検討の余地あり。
skopt
BayesianOptimization
Spearmint
MOE
etc…

備考

カテゴリー変数

ベイズ最適化では連続変数はもちろん、離散変数も扱うことができる。
カテゴリー変数の場合は、ダミー変数にするなどの前処理で対応するのが妥当に思える。
※今回は、単に場合分けを行った。
GPyOpt: mixing different types of variables

実行環境

minicondaをインストール後、下記コマンドにより環境構築。

conda create -n myenv --file conda.req
source activate myenv
pip install -r pip.req

conda.req

matplotlib=2.0.2
numpy=1.12.1
python=3.5.3
scikit-learn=0.18.2
scipy=0.19.1

pip.req

GPy==1.7.7
GPyOpt==1.0.3

実行結果

time python bayes_opt.py
# Tuning hyper-parameters for f1
(...省略...)
20 experiments were performed.
Best parameters set found on development set:
{'kernel': 'rbf', 'gamma': 0.0011320544666204339, 'C': 604.8637015078541}
Detailed classification report:
The model is trained on the full development set.
The scores are computed on the full evaluation set.
precision    recall  f1-score   support
0       1.00      1.00      1.00        89
1       0.97      1.00      0.98        90
2       1.00      0.98      0.99        92
3       1.00      1.00      1.00        93
4       1.00      1.00      1.00        76
5       0.99      0.98      0.99       108
6       0.99      1.00      0.99        89
7       0.99      1.00      0.99        78
8       1.00      0.98      0.99        92
9       0.99      0.99      0.99        92
avg / total       0.99      0.99      0.99       899
real    0m12.902s
user    0m0.000s
sys     0m0.031s

time python grid_search.py
# Tuning hyper-parameters for f1
(...省略...)
132 experiments were performed.
Best parameters set found on development set:
{'kernel': 'rbf', 'gamma': 0.00016666666666666666, 'C': 100}
Detailed classification report:
The model is trained on the full development set.
The scores are computed on the full evaluation set.
precision    recall  f1-score   support
0       1.00      1.00      1.00        89
1       0.95      1.00      0.97        90
2       0.99      0.99      0.99        92
3       0.97      0.99      0.98        93
4       1.00      1.00      1.00        76
5       0.96      0.97      0.97       108
6       0.99      0.99      0.99        89
7       0.99      1.00      0.99        78
8       1.00      0.91      0.95        92
9       0.97      0.96      0.96        92
avg / total       0.98      0.98      0.98       899
real    0m41.947s
user    0m0.000s
sys     0m0.031s

コード

bayes_opt.py

import GPyOpt
import numpy as np
from numpy.random import seed
from sklearn import datasets
from sklearn.model_selection import cross_val_predict
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score
from sklearn.metrics import classification_report
from sklearn.svm import SVC
seed(0)
digits = datasets.load_digits()
n_samples = len(digits.images)
X = digits.images.reshape((n_samples, -1))
y = digits.target
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.5, random_state=0)
kernel2domains = {
'rbf':[
{'name': 'C', 'type': 'continuous', 'domain': (1, 1100)},
{'name': 'gamma', 'type': 'continuous', 'domain': (0.00005, 0.0015)}
],
'linear': [
{'name': 'C', 'type': 'continuous', 'domain': (1, 1100)},
],
}
def _get_model(param, kernel):
_param = _refine_param(param, kernel)
model = SVC(**_param)
return model
def _refine_param(param, kernel):
assert kernel in ['rbf', 'linear']
if kernel == 'rbf':
ret = {'kernel': kernel, 'C': param[0], 'gamma': param[1]}
else:
ret = {'kernel': kernel, 'C': param[0]}
return ret
def _optimize(params, kernel):
scores = np.zeros((params.shape[0], 1))
for i, param in enumerate(params):
model = _get_model(param, kernel)
y_pred = cross_val_predict(model, X_train, y_train, cv=5)
scores[i] -= f1_score(y_train, y_pred, average='macro')
return scores
print("# Tuning hyper-parameters for f1")
print()
bests_per_kernel = []
for k, d in kernel2domains.items():
f = lambda x: _optimize(x, k)
opt = GPyOpt.methods.BayesianOptimization(f=f, domain=d)
opt.run_optimization(max_iter=15)
idx = np.argmin(opt.Y)
x_best = opt.X[idx]
best_score = opt.Y[idx]
bests_per_kernel.append((x_best, best_score, opt, k))
x_best, _, optimizer, kernel = min(bests_per_kernel, key=lambda x: x[1])
print("Grid scores on development set:")
print()
for param, score in zip(optimizer.X, optimizer.Y):
_score = -score
_param = _refine_param(param, kernel)
print("%0.3f for %r" % (_score, _param))
print()
print("%d experiments were performed." % len(optimizer.X))
print()
print("Best parameters set found on development set:")
print()
print(_refine_param(x_best, kernel))
print()
print("Detailed classification report:")
print()
print("The model is trained on the full development set.")
print("The scores are computed on the full evaluation set.")
print()
clf = _get_model(x_best, kernel)
clf.fit(X_train, y_train)
y_true, y_pred = y_test, clf.predict(X_test)
print(classification_report(y_true, y_pred))
print()

grid_search.py

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import classification_report
from sklearn.svm import SVC
digits = datasets.load_digits()
n_samples = len(digits.images)
X = digits.images.reshape((n_samples, -1))
y = digits.target
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.5, random_state=0)
C = [1, 10, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000]
_gamma = [1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000]
gamma = [1/i for i in _gamma]
tuned_parameters = [{'kernel': ['rbf'], 'gamma': gamma, 'C': C},
{'kernel': ['linear'], 'C': C}]
print("# Tuning hyper-parameters for f1")
print()
clf = GridSearchCV(SVC(C=1), tuned_parameters, cv=5, scoring='f1_macro')
clf.fit(X_train, y_train)
print("Grid scores on development set:")
print()
means = clf.cv_results_['mean_test_score']
stds = clf.cv_results_['std_test_score']
for mean, std, params in zip(means, stds, clf.cv_results_['params']):
print("%0.3f (+/-%0.03f) for %r"
% (mean, std * 2, params))
print()
print("%d experiments were performed." % len(means))
print()
print("Best parameters set found on development set:")
print()
print(clf.best_params_)
print()
print("Detailed classification report:")
print()
print("The model is trained on the full development set.")
print("The scores are computed on the full evaluation set.")
print()
y_true, y_pred = y_test, clf.predict(X_test)
print(classification_report(y_true, y_pred))
print()


1 Star2 Stars3 Stars4 Stars5 Stars (まだ評価されていません)
Loading...
      この投稿は審査処理中  | 元のサイトへ