此範例是比較藉由Label Propagation(標籤傳播法)及SVM(支持向量機)對iris dataset(鳶尾花卉數據集)生成的decision boundary(決策邊界)
Label Propagation屬於一種Semi-supervised learning(半監督學習)
numpy : 產生陣列數值
matplotlib.pyplot : 用來繪製影像
sklearn import datasets : 匯入資料集
sklearn import svm : 匯入支持向量機
sklearn.semi_supervised import LabelSpreading : 匯入標籤傳播算法
```python
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets from sklearn import svm from sklearn.semi_supervised import LabelSpreading
## (二)讀取資料集* numpy.random.RandomState(seed=None) : 產生隨機數* datasets.load_iris() : 將資料及存入,iris為一個dict型別資料* X代表從iris資料內讀取前兩項數據,分別表示萼片的長度及寬度* y代表iris所屬的class* np.copy() : 複製數據進行操作,避免修改原本的檔案* [rng.rand(len(y)) < 0.3] = -1 代表隨機生成150個0~1區間的值,並將小於0.3的值轉為-1 (len(y)為150)```pythonrng = np.random.RandomState(0)iris = datasets.load_iris()X = iris.data[:, :2]y = iris.targeth = .02 # 設定用於mesh的stepy_30 = np.copy(y)y_30[rng.rand(len(y)) < 0.3] = -1y_50 = np.copy(y)y_50[rng.rand(len(y)) < 0.5] = -1
LabelSpreading().fit() : 進行標籤傳播法並擬合數據集
svm.SVC().fit() : 進行SVC(support vectors classification)並擬合數據集
分別用不同比例已被標籤的數據和未被標籤的數據進行標籤傳播,並與SVM的結果進行對比
# we create an instance of SVM and fit out data. We do not scale our# data since we want to plot the support vectorsls30 = (LabelSpreading().fit(X, y_30), y_30)ls50 = (LabelSpreading().fit(X, y_50), y_50)ls100 = (LabelSpreading().fit(X, y), y)rbf_svc = (svm.SVC(kernel='rbf', gamma=.5).fit(X, y), y)
(三)繪製比較圖
min()、max() : 決定x與y的範圍
np.meshgrid() : 從給定的座標向量回傳座標矩陣
這裡分別是以x,y的最大、最小值加減1並以h=0.02的間隔來繪製
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1xx, yy = np.meshgrid(np.arange(x_min, x_max, h),np.arange(y_min, y_max, h))
定義各圖片標題
titles = ['Label Spreading 30% data','Label Spreading 50% data','Label Spreading 100% data','SVC with rbf kernel']
為了繪製圖片,設定一個為dict型態的color_map,將4種label分別給予不同顏色
最後用下面的程式將所有點繪製出來
color_map = {-1: (1, 1, 1), 0: (0, 0, .9), 1: (1, 0, 0), 2: (.8, .6, 0)}for i, (clf, y_train) in enumerate((ls30, ls50, ls100, rbf_svc)):# Plot the decision boundary. For that, we will assign a color to each# point in the mesh [x_min, x_max]x[y_min, y_max].plt.subplot(2, 2, i + 1)Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])# Put the result into a color plotZ = Z.reshape(xx.shape)plt.contourf(xx, yy, Z, cmap=plt.cm.Paired)plt.axis('off')# Plot also the training pointscolors = [color_map[y] for y in y_train]plt.scatter(X[:, 0], X[:, 1], c=colors, edgecolors='black')plt.title(titles[i])plt.suptitle("Unlabeled points are colored white", y=0.1)plt.show()
顯示了即使只有少部分被標籤的數據,Label Propagation也能很好的學習產生decision boundary
(四)完整程式碼
print(__doc__)# Authors: Clay Woolam <clay@woolam.org># License: BSDimport numpy as npimport matplotlib.pyplot as pltfrom sklearn import datasetsfrom sklearn import svmfrom sklearn.semi_supervised import LabelSpreadingrng = np.random.RandomState(0)iris = datasets.load_iris()X = iris.data[:, :2]y = iris.target# step size in the meshh = .02y_30 = np.copy(y)y_30[rng.rand(len(y)) < 0.3] = -1y_50 = np.copy(y)y_50[rng.rand(len(y)) < 0.5] = -1# we create an instance of SVM and fit out data. We do not scale our# data since we want to plot the support vectorsls30 = (LabelSpreading().fit(X, y_30), y_30)ls50 = (LabelSpreading().fit(X, y_50), y_50)ls100 = (LabelSpreading().fit(X, y), y)rbf_svc = (svm.SVC(kernel='rbf', gamma=.5).fit(X, y), y)# create a mesh to plot inx_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1xx, yy = np.meshgrid(np.arange(x_min, x_max, h),np.arange(y_min, y_max, h))# title for the plotstitles = ['Label Spreading 30% data','Label Spreading 50% data','Label Spreading 100% data','SVC with rbf kernel']color_map = {-1: (1, 1, 1), 0: (0, 0, .9), 1: (1, 0, 0), 2: (.8, .6, 0)}for i, (clf, y_train) in enumerate((ls30, ls50, ls100, rbf_svc)):# Plot the decision boundary. For that, we will assign a color to each# point in the mesh [x_min, x_max]x[y_min, y_max].plt.subplot(2, 2, i + 1)Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])# Put the result into a color plotZ = Z.reshape(xx.shape)plt.contourf(xx, yy, Z, cmap=plt.cm.Paired)plt.axis('off')# Plot also the training pointscolors = [color_map[y] for y in y_train]plt.scatter(X[:, 0], X[:, 1], c=colors, edgecolors='black')plt.title(titles[i])plt.suptitle("Unlabeled points are colored white", y=0.1)plt.show()