Ex 3: Compare Stochastic learning strategies for MLPClassifier
此範例將畫出圖表,展現不同的訓練策略(optimizer)下loss curves的變化,訓練策略包括SGD與Adam。

1.Stochastic Gradient Descent(SGD):

.Stochastic Gradient Descent(SGD)為Gradient Descent(GD)的改良,在GD裡是輸入全部的training dataset,根據累積的loss才更新一次權重,因此收歛速度很慢,SGD隨機抽一筆 training sample,依照其 loss 更新權重。

2.Momentum:

Momentum是為了以防GD類的方法陷入局部最小值而衍生的方法,可以利用momentum降低陷入local minimum的機率,此方法是參考物理學動量的觀念。
看圖1藍色點的位置,當GD類的方法陷入局部最小值時,因為gd=0將會使電腦認為此處為最小值,於是為了減少此現象,每次更新時會將上次更新權重的一部分拿來加入此次更新。如紅色箭頭所示,將有機會翻過local minimum。
圖1:momentum觀念示意圖

3.Nesterov Momentum:

Nesterov Momentum為另外一種Momentum的變形體,目的也是降低陷入local minimum機率的方法,而兩種方法的差異在於下圖:
圖2:左圖為momentum,1.先計算 gradient、2.加上 momentum、3.更新權重 右圖為Nesterov Momentum,1.先加上momentum、2.計算gradient、3.更新權重。

4.Adaptive Moment Estimation (Adam):

Adam為一種自己更新學習速率的方法,會根據GD計算出來的值調整每個參數的學習率(因材施教)。 以上所有的最佳化方法都將需要設定learning_rate_init值,此範例結果將呈現四種不同資料的比較:iris資料集、digits資料集、與使用sklearn.datasets產生資料集circlesmoon

(一)引入函式庫

1
print(__doc__)
2
import matplotlib.pyplot as plt
3
from sklearn.neural_network import MLPClassifier
4
from sklearn.preprocessing import MinMaxScaler
5
from sklearn import datasets
Copied!

(二)設定模型參數

1
# different learning rate schedules and momentum parameters
2
params = [{'solver': 'sgd', 'learning_rate': 'constant', 'momentum': 0,
3
'learning_rate_init': 0.2},
4
{'solver': 'sgd', 'learning_rate': 'constant', 'momentum': .9,
5
'nesterovs_momentum': False, 'learning_rate_init': 0.2},
6
{'solver': 'sgd', 'learning_rate': 'constant', 'momentum': .9,
7
'nesterovs_momentum': True, 'learning_rate_init': 0.2},
8
{'solver': 'sgd', 'learning_rate': 'invscaling', 'momentum': 0,
9
'learning_rate_init': 0.2},
10
{'solver': 'sgd', 'learning_rate': 'invscaling', 'momentum': .9,
11
'nesterovs_momentum': True, 'learning_rate_init': 0.2},
12
{'solver': 'sgd', 'learning_rate': 'invscaling', 'momentum': .9,
13
'nesterovs_momentum': False, 'learning_rate_init': 0.2},
14
{'solver': 'adam', 'learning_rate_init': 0.01}]
15
16
labels = ["constant learning-rate", "constant with momentum",
17
"constant with Nesterov's momentum",
18
"inv-scaling learning-rate", "inv-scaling with momentum",
19
"inv-scaling with Nesterov's momentum", "adam"]
20
21
plot_args = [{'c': 'red', 'linestyle': '-'},
22
{'c': 'green', 'linestyle': '-'},
23
{'c': 'blue', 'linestyle': '-'},
24
{'c': 'red', 'linestyle': '--'},
25
{'c': 'green', 'linestyle': '--'},
26
{'c': 'blue', 'linestyle': '--'},
27
{'c': 'black', 'linestyle': '-'}]
Copied!

(三)畫出loss curves

1
def plot_on_dataset(X, y, ax, name):
2
# for each dataset, plot learning for each learning strategy
3
print("\nlearning on dataset %s" % name)
4
ax.set_title(name)
5
X = MinMaxScaler().fit_transform(X)
6
mlps = []
7
if name == "digits":
8
# digits is larger but converges fairly quickly
9
max_iter = 15
10
else:
11
max_iter = 400
12
13
for label, param in zip(labels, params):
14
print("training: %s" % label)
15
mlp = MLPClassifier(verbose=0, random_state=0,
16
max_iter=max_iter, **param)
17
mlp.fit(X, y)
18
mlps.append(mlp)
19
print("Training set score: %f" % mlp.score(X, y))
20
print("Training set loss: %f" % mlp.loss_)
21
for mlp, label, args in zip(mlps, labels, plot_args):
22
ax.plot(mlp.loss_curve_, label=label, **args)
23
24
25
fig, axes = plt.subplots(2, 2, figsize=(15, 10))
26
27
# load / generate some toy datasets
28
iris = datasets.load_iris()
29
digits = datasets.load_digits()
30
data_sets = [(iris.data, iris.target),
31
(digits.data, digits.target),
32
datasets.make_circles(noise=0.2, factor=0.5, random_state=1),
33
datasets.make_moons(noise=0.3, random_state=0)]
34
35
for ax, data, name in zip(axes.ravel(), data_sets, ['iris', 'digits',
36
'circles', 'moons']):
37
plot_on_dataset(*data, ax=ax, name=name)
38
39
fig.legend(ax.get_lines(), labels=labels, ncol=3, loc="upper center")
40
plt.show()
Copied!
圖3:四種資料對於不同學習方法的loss curves下降比較圖

(四)完整程式碼

1
print(__doc__)
2
import matplotlib.pyplot as plt
3
from sklearn.neural_network import MLPClassifier
4
from sklearn.preprocessing import MinMaxScaler
5
from sklearn import datasets
6
7
# different learning rate schedules and momentum parameters
8
params = [{'solver': 'sgd', 'learning_rate': 'constant', 'momentum': 0,
9
'learning_rate_init': 0.2},
10
{'solver': 'sgd', 'learning_rate': 'constant', 'momentum': .9,
11
'nesterovs_momentum': False, 'learning_rate_init': 0.2},
12
{'solver': 'sgd', 'learning_rate': 'constant', 'momentum': .9,
13
'nesterovs_momentum': True, 'learning_rate_init': 0.2},
14
{'solver': 'sgd', 'learning_rate': 'invscaling', 'momentum': 0,
15
'learning_rate_init': 0.2},
16
{'solver': 'sgd', 'learning_rate': 'invscaling', 'momentum': .9,
17
'nesterovs_momentum': True, 'learning_rate_init': 0.2},
18
{'solver': 'sgd', 'learning_rate': 'invscaling', 'momentum': .9,
19
'nesterovs_momentum': False, 'learning_rate_init': 0.2},
20
{'solver': 'adam', 'learning_rate_init': 0.01}]
21
22
labels = ["constant learning-rate", "constant with momentum",
23
"constant with Nesterov's momentum",
24
"inv-scaling learning-rate", "inv-scaling with momentum",
25
"inv-scaling with Nesterov's momentum", "adam"]
26
27
plot_args = [{'c': 'red', 'linestyle': '-'},
28
{'c': 'green', 'linestyle': '-'},
29
{'c': 'blue', 'linestyle': '-'},
30
{'c': 'red', 'linestyle': '--'},
31
{'c': 'green', 'linestyle': '--'},
32
{'c': 'blue', 'linestyle': '--'},
33
{'c': 'black', 'linestyle': '-'}]
34
35
36
def plot_on_dataset(X, y, ax, name):
37
# for each dataset, plot learning for each learning strategy
38
print("\nlearning on dataset %s" % name)
39
ax.set_title(name)
40
X = MinMaxScaler().fit_transform(X)
41
mlps = []
42
if name == "digits":
43
# digits is larger but converges fairly quickly
44
max_iter = 15
45
else:
46
max_iter = 400
47
48
for label, param in zip(labels, params):
49
print("training: %s" % label)
50
mlp = MLPClassifier(verbose=0, random_state=0,
51
max_iter=max_iter, **param)
52
mlp.fit(X, y)
53
mlps.append(mlp)
54
print("Training set score: %f" % mlp.score(X, y))
55
print("Training set loss: %f" % mlp.loss_)
56
for mlp, label, args in zip(mlps, labels, plot_args):
57
ax.plot(mlp.loss_curve_, label=label, **args)
58
59
60
fig, axes = plt.subplots(2, 2, figsize=(15, 10))
61
# load / generate some toy datasets
62
iris = datasets.load_iris()
63
digits = datasets.load_digits()
64
data_sets = [(iris.data, iris.target),
65
(digits.data, digits.target),
66
datasets.make_circles(noise=0.2, factor=0.5, random_state=1),
67
datasets.make_moons(noise=0.3, random_state=0)]
68
69
for ax, data, name in zip(axes.ravel(), data_sets, ['iris', 'digits',
70
'circles', 'moons']):
71
plot_on_dataset(*data, ax=ax, name=name)
72
73
fig.legend(ax.get_lines(), labels=labels, ncol=3, loc="upper center")
74
plt.show()
Copied!