當(dāng)前位置：維易PHP培訓(xùn)學(xué)院 > LINUX入門 > 內(nèi)容正文

LINUX教程：sklearn 快速入門

作者：VEPHP 時間 2017-10-04

《LINUX教程：sklearn 快速入門》要點：
本文介紹了LINUX教程：sklearn 快速入門，希望對您有用。如果有疑問，可以聯(lián)系我們。

簡介

sklearn自帶了一些尺度數(shù)據(jù)集,用于分類問題的 iris 和 digits.用于回歸問題的boston房價數(shù)據(jù)集.

導(dǎo)入數(shù)據(jù)集

from sklearn import datasets

自帶的數(shù)據(jù)都放在datasets里面

iris = datasets.load_iris()
digits = datasets.load_digits()

datasets 是dict類型的對象,包括數(shù)據(jù)和元數(shù)據(jù)信息.數(shù)據(jù)放在.data里,標簽放在.target里.

type(iris.data)

numpy.ndarray

.data里放的是特性的信息

print "iris.data.dtype: ",iris.data.dtype
print "iris.data.shape: ",iris.data.shape
print "iris.data.ndim: ",iris.data.ndim
print "--------------------------------"
print iris.data[0:5]

iris.data.dtype:  float64
iris.data.shape:  (150, 4)
iris.data.ndim:  2
--------------------------------
[[ 5.1  3.5  1.4  0.2]
 [ 4.9  3.   1.4  0.2]
 [ 4.7  3.2  1.3  0.2]
 [ 4.6  3.1  1.5  0.2]
 [ 5.   3.6  1.4  0.2]]

.target里放的是標簽信息

print "iris.target.dtype: ",iris.target.dtype
print "iris.target.shape: ",iris.target.shape
print "iris.target.ndim: ",iris.target.ndim
print "--------------------------------"
print iris.target[0:5]

iris.target.dtype:  int64
iris.target.shape:  (150,)
iris.target.ndim:  1
--------------------------------
[0 0 0 0 0]

type(digits)

sklearn.datasets.base.Bunch

print "digits.data.dtype: ",digits.data.dtype
print "digits.data.shape: ",digits.data.shape
print "digits.data.ndim: ",digits.data.ndim
print "--------------------------------"
print digits.data[0:5]

digits.data.dtype:  float64
digits.data.shape:  (1797, 64)
digits.data.ndim:  2
--------------------------------
[[  0.   0.   5.  13.   9.   1.   0.   0.   0.   0.  13.  15.  10.  15.
    5.   0.   0.   3.  15.   2.   0.  11.   8.   0.   0.   4.  12.   0.
    0.   8.   8.   0.   0.   5.   8.   0.   0.   9.   8.   0.   0.   4.
   11.   0.   1.  12.   7.   0.   0.   2.  14.   5.  10.  12.   0.   0.
    0.   0.   6.  13.  10.   0.   0.   0.]
 [  0.   0.   0.  12.  13.   5.   0.   0.   0.   0.   0.  11.  16.   9.
    0.   0.   0.   0.   3.  15.  16.   6.   0.   0.   0.   7.  15.  16.
   16.   2.   0.   0.   0.   0.   1.  16.  16.   3.   0.   0.   0.   0.
    1.  16.  16.   6.   0.   0.   0.   0.   1.  16.  16.   6.   0.   0.
    0.   0.   0.  11.  16.  10.   0.   0.]
 [  0.   0.   0.   4.  15.  12.   0.   0.   0.   0.   3.  16.  15.  14.
    0.   0.   0.   0.   8.  13.   8.  16.   0.   0.   0.   0.   1.   6.
   15.  11.   0.   0.   0.   1.   8.  13.  15.   1.   0.   0.   0.   9.
   16.  16.   5.   0.   0.   0.   0.   3.  13.  16.  16.  11.   5.   0.
    0.   0.   0.   3.  11.  16.   9.   0.]
 [  0.   0.   7.  15.  13.   1.   0.   0.   0.   8.  13.   6.  15.   4.
    0.   0.   0.   2.   1.  13.  13.   0.   0.   0.   0.   0.   2.  15.
   11.   1.   0.   0.   0.   0.   0.   1.  12.  12.   1.   0.   0.   0.
    0.   0.   1.  10.   8.   0.   0.   0.   8.   4.   5.  14.   9.   0.
    0.   0.   7.  13.  13.   9.   0.   0.]
 [  0.   0.   0.   1.  11.   0.   0.   0.   0.   0.   0.   7.   8.   0.
    0.   0.   0.   0.   1.  13.   6.   2.   2.   0.   0.   0.   7.  15.
    0.   9.   8.   0.   0.   5.  16.  10.   0.  16.   6.   0.   0.   4.
   15.  16.  13.  16.   1.   0.   0.   0.   0.   3.  15.  10.   0.   0.
    0.   0.   0.   2.  16.   4.   0.   0.]]

print "digits.target.dtype: ",digits.target.dtype
print "digits.target.shape: ",digits.target.shape
print "digits.target.ndim: ",digits.target.ndim
print "--------------------------------"
print digits.target[0:5]

digits.target.dtype:  int64
digits.target.shape:  (1797,)
digits.target.ndim:  1
--------------------------------
[0 1 2 3 4]

digits是手寫字數(shù)據(jù)集,可以經(jīng)由過程images選擇加載8*8的矩陣圖片

digits.images[1]

array([[  0.,   0.,   0.,  12.,  13.,   5.,   0.,   0.],
       [  0.,   0.,   0.,  11.,  16.,   9.,   0.,   0.],
       [  0.,   0.,   3.,  15.,  16.,   6.,   0.,   0.],
       [  0.,   7.,  15.,  16.,  16.,   2.,   0.,   0.],
       [  0.,   0.,   1.,  16.,  16.,   3.,   0.,   0.],
       [  0.,   0.,   1.,  16.,  16.,   6.,   0.,   0.],
       [  0.,   0.,   1.,  16.,  16.,   6.,   0.,   0.],
       [  0.,   0.,   0.,  11.,  16.,  10.,   0.,   0.]])

學(xué)習(xí)和預(yù)測

在scikit-learn里面,一個分類模型有兩個主要的辦法：fit(X,y)和predict(T)

這里我們用svm做例子,看怎么使用.

from sklearn import svm
clf = svm.SVC(gamma=0.001,C=100.)

選擇模子的參數(shù)
在我們這個例子里面,我們使用手工設(shè)置參數(shù),此外還可以使用網(wǎng)格搜索(grid search)和交叉驗證(cross validation)來選擇參數(shù).

現(xiàn)在我們的模型便是 clf.它是一個分類器.現(xiàn)在讓模型可以進行分類任務(wù),先要讓模型學(xué)習(xí).這里便是把訓(xùn)練數(shù)據(jù)集放到fit函數(shù)里,這么把digits數(shù)據(jù)集最后一個記錄當(dāng)作test dataset,前面1796個樣本當(dāng)作training dataset

clf.fit(digits.data[:-1],digits.target[:-1])

SVC(C=100.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape=None, degree=3, gamma=0.001, kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

現(xiàn)在用學(xué)習(xí)好的模子預(yù)測最后一個樣本的標簽

print "prediction: ", clf.predict(digits.data[-1:])
print "actual: ",digits.target[-1:]

prediction:  [8]
actual:  [8]

保留模型

通過pickle來保留模型

from sklearn import svm
from sklearn import datasets
clf = svm.SVC()
iris = datasets.load_iris()
X, y = iris.data,iris.target
clf.fit(X,y)

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape=None, degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

保留上面的模型

import pickle
s = pickle.dumps(clf)

讀取保留的模型

clf2 = pickle.loads(s)
print "prediction: ",clf2.predict(X[0:1])
print "actual: ",y[0:1]

prediction:  [0]
actual:  [0]

此外,可以使用joblib代替pickle(joblib.dump & joblib.load).joblib對大的數(shù)據(jù)很有效,但是只能保留的硬盤,而不是一個string對象里.

用joblib保留模型

from sklearn.externals import joblib
joblib.dump(clf,"filename.pkl")

['filename.pkl',
 'filename.pkl_01.npy',
 'filename.pkl_02.npy',
 'filename.pkl_03.npy',
 'filename.pkl_04.npy',
 'filename.pkl_05.npy',
 'filename.pkl_06.npy',
 'filename.pkl_07.npy',
 'filename.pkl_08.npy',
 'filename.pkl_09.npy',
 'filename.pkl_10.npy',
 'filename.pkl_11.npy']

讀取joblib保留的模型

clf3 = joblib.load("filename.pkl")
print "prediction: ",clf3.predict(X[0:1])
print "actual: ",y[0:1]

prediction:  [0]
actual:  [0]

注意：

joblib返回一系列的文件名,是因為模型里面的每一個numpy矩陣都保留在獨立的文件里,并且要在相同的路徑下面,再次讀取的時候才能成功.

協(xié)議

sklearn 有如下幾點規(guī)矩,保證其能正常工作.

類型轉(zhuǎn)換

除非特別指定,不然都會自動轉(zhuǎn)換到 float64

import numpy as np
from sklearn import random_projection
rng = np.random.RandomState(0)
X = rng.rand(10,2000)
X = np.array(X,dtype='float32')
X.dtype

dtype('float32')

transformer = random_projection.GaussianRandomProjection()
X_new = transformer.fit_transform(X)
X_new.dtype

dtype('float64')

X原來是float32類型,通過fit_transform(X)轉(zhuǎn)換到float64

回歸的成果被轉(zhuǎn)換成float64,分類的數(shù)據(jù)類型不變.

from sklearn import datasets
from sklearn.svm import SVC
iris = datasets.load_iris()
clf = SVC()
# 回歸
clf.fit(iris.data, iris.target)  
print u"回歸成果:",list(clf.predict(iris.data[:3]))
# 分類
clf.fit(iris.data, iris.target_names[iris.target]) 
print u"分類成果:",list(clf.predict(iris.data[:3]))

回歸成果: [0, 0, 0]
分類成果: ['setosa', 'setosa', 'setosa']

回歸用的是iris.target,分類用的是iris.target_names

重新訓(xùn)練和更新超參數(shù)

模型的超參數(shù)在模型訓(xùn)練完成以后仍然可以更新,通過sklearn.pipeline.Pipeline.set_params辦法.多次調(diào)用fit會覆蓋前面訓(xùn)練的模型.

import numpy as np
from sklearn.svm import SVC
rng = np.random.RandomState(0)
X = rng.rand(100, 10)
y = rng.binomial(1, 0.5, 100)
X_test = rng.rand(5, 10)

clf = SVC()
clf.set_params(kernel="linear").fit(X,y)

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape=None, degree=3, gamma='auto', kernel='linear',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

clf.predict(X_test)

array([1, 0, 1, 1, 0])

clf.set_params(kernel='rbf').fit(X, y)

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape=None, degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

clf.predict(X_test)

array([0, 0, 0, 1, 0])

本文永遠更新鏈接地址：

更多LINUX教程，盡在維易PHP學(xué)院專欄。歡迎交流《LINUX教程：sklearn 快速入門》！

轉(zhuǎn)載請注明本頁網(wǎng)址：
http://www.snjht.com/jiaocheng/11336.html

標簽：

欧美97色伦欧美一区二区日韩,国产福利片在线观看,freexxx性欧美vide0高清,西西亚洲,日本欧美国产精品第一页久久,成人18免费软件

PHP教程

WEB前端開發(fā)

數(shù)據(jù)庫

WEB服務(wù)器

APP開發(fā)

LINUX學(xué)習(xí)

后端開發(fā)課程

前端開發(fā)課程

數(shù)據(jù)庫課程