Ex 2: Plot randomly generated classification dataset 分類數據集

機器學習資料集/ 範例二: Plot randomly generated classification dataset

這個範例實現了 datasets.make_classification datasets.make_blobs 以及 datasets.make_gaussian_quantiles 的函數運用

(一)Make classification

對於make_classification的函數,隨機生成n種不同的分類數據集,每個類別具有不同數量的信息特徵和群聚。
1
plt.title("One informative feature, one cluster per class", fontsize='small')
2
X1, Y1 = make_classification(n_features=2, n_redundant=0, n_informative=1,
3
n_clusters_per_class=1)
4
plt.scatter(X1[:, 0], X1[:, 1], marker='o', c=Y1,
5
s=25, edgecolor='k')
Copied!
針對不同數量的信息特徵和群聚會產生不同結果

(二)Make blobs

對於make_blobs的函數,會產生同向心性的高斯分布群。
1
plt.title("Three blobs", fontsize='small')
2
X1, Y1 = make_blobs(n_features=2, centers=3)
3
plt.scatter(X1[:, 0], X1[:, 1], marker='o', c=Y1,
4
s=25, edgecolor='k')
Copied!

(三)Make gaussian quantiles

對於make_gaussian_quantiles的函數,用分位數生成各向同性的高斯並標記樣本。
1
X1, Y1 = make_gaussian_quantiles(n_features=2, n_classes=3)
2
plt.scatter(X1[:, 0], X1[:, 1], marker='o', c=Y1,
3
s=25, edgecolor='k')
Copied!

(四)完整程式碼

Python source code:plot_random_dataset.py
1
print(__doc__)
2
3
import matplotlib.pyplot as plt
4
5
from sklearn.datasets import make_classification
6
from sklearn.datasets import make_blobs
7
from sklearn.datasets import make_gaussian_quantiles
8
9
plt.figure(figsize=(8, 8))
10
plt.subplots_adjust(bottom=.05, top=.9, left=.05, right=.95)
11
12
plt.subplot(321)
13
plt.title("One informative feature, one cluster per class", fontsize='small')
14
X1, Y1 = make_classification(n_features=2, n_redundant=0, n_informative=1,
15
n_clusters_per_class=1)
16
plt.scatter(X1[:, 0], X1[:, 1], marker='o', c=Y1,
17
s=25, edgecolor='k')
18
19
plt.subplot(322)
20
plt.title("Two informative features, one cluster per class", fontsize='small')
21
X1, Y1 = make_classification(n_features=2, n_redundant=0, n_informative=2,
22
n_clusters_per_class=1)
23
plt.scatter(X1[:, 0], X1[:, 1], marker='o', c=Y1,
24
s=25, edgecolor='k')
25
26
plt.subplot(323)
27
plt.title("Two informative features, two clusters per class",
28
fontsize='small')
29
X2, Y2 = make_classification(n_features=2, n_redundant=0, n_informative=2)
30
plt.scatter(X2[:, 0], X2[:, 1], marker='o', c=Y2,
31
s=25, edgecolor='k')
32
33
plt.subplot(324)
34
plt.title("Multi-class, two informative features, one cluster",
35
fontsize='small')
36
X1, Y1 = make_classification(n_features=2, n_redundant=0, n_informative=2,
37
n_clusters_per_class=1, n_classes=3)
38
plt.scatter(X1[:, 0], X1[:, 1], marker='o', c=Y1,
39
s=25, edgecolor='k')
40
41
plt.subplot(325)
42
plt.title("Three blobs", fontsize='small')
43
X1, Y1 = make_blobs(n_features=2, centers=3)
44
plt.scatter(X1[:, 0], X1[:, 1], marker='o', c=Y1,
45
s=25, edgecolor='k')
46
47
plt.subplot(326)
48
plt.title("Gaussian divided into three quantiles", fontsize='small')
49
X1, Y1 = make_gaussian_quantiles(n_features=2, n_classes=3)
50
plt.scatter(X1[:, 0], X1[:, 1], marker='o', c=Y1,
51
s=25, edgecolor='k')
52
53
plt.show()
Copied!
Last modified 1yr ago