Decision boundary of label propagation versus SVM on the Iris dataset
此範例是比較藉由Label Propagation(標籤傳播法)及SVM(支持向量機)對iris dataset(鳶尾花卉數據集)生成的decision boundary(決策邊界)
    Label Propagation屬於一種Semi-supervised learning(半監督學習)

(一)引入函式庫

    numpy : 產生陣列數值
    matplotlib.pyplot : 用來繪製影像
    sklearn import datasets : 匯入資料集
    sklearn import svm : 匯入支持向量機
    sklearn.semi_supervised import LabelSpreading : 匯入標籤傳播算法
    ```python
    import numpy as np
    import matplotlib.pyplot as plt
from sklearn import datasets from sklearn import svm from sklearn.semi_supervised import LabelSpreading
1
## (二)讀取資料集
2
3
* numpy.random.RandomState(seed=None) : 產生隨機數
4
* datasets.load_iris() : 將資料及存入,iris為一個dict型別資料
5
* X代表從iris資料內讀取前兩項數據,分別表示萼片的長度及寬度
6
* y代表iris所屬的class
7
* np.copy() : 複製數據進行操作,避免修改原本的檔案
8
* [rng.rand(len(y)) < 0.3] = -1 代表隨機生成150個0~1區間的值,並將小於0.3的值轉為-1 (len(y)為150)
9
```python
10
rng = np.random.RandomState(0)
11
iris = datasets.load_iris()
12
X = iris.data[:, :2]
13
y = iris.target
14
h = .02 # 設定用於mesh的step
15
y_30 = np.copy(y)
16
y_30[rng.rand(len(y)) < 0.3] = -1
17
y_50 = np.copy(y)
18
y_50[rng.rand(len(y)) < 0.5] = -1
Copied!
    LabelSpreading().fit() : 進行標籤傳播法並擬合數據集
    svm.SVC().fit() : 進行SVC(support vectors classification)並擬合數據集
    分別用不同比例已被標籤的數據和未被標籤的數據進行標籤傳播,並與SVM的結果進行對比
    1
    # we create an instance of SVM and fit out data. We do not scale our
    2
    # data since we want to plot the support vectors
    3
    ls30 = (LabelSpreading().fit(X, y_30), y_30)
    4
    ls50 = (LabelSpreading().fit(X, y_50), y_50)
    5
    ls100 = (LabelSpreading().fit(X, y), y)
    6
    rbf_svc = (svm.SVC(kernel='rbf', gamma=.5).fit(X, y), y)
    Copied!
    (三)繪製比較圖
    min()、max() : 決定x與y的範圍
    np.meshgrid() : 從給定的座標向量回傳座標矩陣
    這裡分別是以x,y的最大、最小值加減1並以h=0.02的間隔來繪製
    1
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    2
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    3
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
    4
    np.arange(y_min, y_max, h))
    Copied!
    定義各圖片標題
    1
    titles = ['Label Spreading 30% data',
    2
    'Label Spreading 50% data',
    3
    'Label Spreading 100% data',
    4
    'SVC with rbf kernel']
    Copied!
    為了繪製圖片,設定一個為dict型態的color_map,將4種label分別給予不同顏色
最後用下面的程式將所有點繪製出來
1
color_map = {-1: (1, 1, 1), 0: (0, 0, .9), 1: (1, 0, 0), 2: (.8, .6, 0)}
2
3
for i, (clf, y_train) in enumerate((ls30, ls50, ls100, rbf_svc)):
4
# Plot the decision boundary. For that, we will assign a color to each
5
# point in the mesh [x_min, x_max]x[y_min, y_max].
6
plt.subplot(2, 2, i + 1)
7
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
8
9
# Put the result into a color plot
10
Z = Z.reshape(xx.shape)
11
plt.contourf(xx, yy, Z, cmap=plt.cm.Paired)
12
plt.axis('off')
13
14
# Plot also the training points
15
colors = [color_map[y] for y in y_train]
16
plt.scatter(X[:, 0], X[:, 1], c=colors, edgecolors='black')
17
18
plt.title(titles[i])
19
20
plt.suptitle("Unlabeled points are colored white", y=0.1)
21
plt.show()
Copied!
    顯示了即使只有少部分被標籤的數據,Label Propagation也能很好的學習產生decision boundary
    (四)完整程式碼
1
print(__doc__)
2
3
# Authors: Clay Woolam <[email protected]>
4
# License: BSD
5
6
import numpy as np
7
import matplotlib.pyplot as plt
8
from sklearn import datasets
9
from sklearn import svm
10
from sklearn.semi_supervised import LabelSpreading
11
12
rng = np.random.RandomState(0)
13
14
iris = datasets.load_iris()
15
16
X = iris.data[:, :2]
17
y = iris.target
18
19
# step size in the mesh
20
h = .02
21
22
y_30 = np.copy(y)
23
y_30[rng.rand(len(y)) < 0.3] = -1
24
y_50 = np.copy(y)
25
y_50[rng.rand(len(y)) < 0.5] = -1
26
# we create an instance of SVM and fit out data. We do not scale our
27
# data since we want to plot the support vectors
28
ls30 = (LabelSpreading().fit(X, y_30), y_30)
29
ls50 = (LabelSpreading().fit(X, y_50), y_50)
30
ls100 = (LabelSpreading().fit(X, y), y)
31
rbf_svc = (svm.SVC(kernel='rbf', gamma=.5).fit(X, y), y)
32
33
# create a mesh to plot in
34
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
35
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
36
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
37
np.arange(y_min, y_max, h))
38
39
# title for the plots
40
titles = ['Label Spreading 30% data',
41
'Label Spreading 50% data',
42
'Label Spreading 100% data',
43
'SVC with rbf kernel']
44
45
color_map = {-1: (1, 1, 1), 0: (0, 0, .9), 1: (1, 0, 0), 2: (.8, .6, 0)}
46
47
for i, (clf, y_train) in enumerate((ls30, ls50, ls100, rbf_svc)):
48
# Plot the decision boundary. For that, we will assign a color to each
49
# point in the mesh [x_min, x_max]x[y_min, y_max].
50
plt.subplot(2, 2, i + 1)
51
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
52
53
# Put the result into a color plot
54
Z = Z.reshape(xx.shape)
55
plt.contourf(xx, yy, Z, cmap=plt.cm.Paired)
56
plt.axis('off')
57
58
# Plot also the training points
59
colors = [color_map[y] for y in y_train]
60
plt.scatter(X[:, 0], X[:, 1], c=colors, edgecolors='black')
61
62
plt.title(titles[i])
63
64
plt.suptitle("Unlabeled points are colored white", y=0.1)
65
plt.show()
Copied!
Last modified 1yr ago
Copy link