交叉验证用于评估预测模型，方法是将原始样本划分为训练集以训练模型，并使用测试集对其进行评估。

Sklearn中的交叉验证对我们选择正确的模型和模型参数非常有帮助。通过使用它，我们可以直观地看到不同模型或参数对结构精度的影响。

我们将使用著名的数据集“iris”和KNN分类器。

1、使用knn.score()来查看准确度。

基本上，knn.score()的准确性只测试一组列车和测试数据集。

如何处理基于KNN算法的交叉验证，基于朴素贝叶斯算法计算AUC ?

拆分训练和测试数据集

# We are going to use the famous dataset 'iris' with the KNN Classifier

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split

from sklearn.neighbors import KNeighborsClassifier

# load dataset

iris = load_iris()

X = iris.data

y = iris.target

# split into test and train dataset, and use random_state=48

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=4)

# build KNN model and choose n_neighbors = 5

knn = KNeighborsClassifier(n_neighbors = 5)

# train the model

knn.fit(X_train, y_train)

# get the predict value from X_test

y_pred = knn.predict(X_test)

# print the score

print('accuracy: ', knn.score(X_test, y_test))

# accuracy: 0.973684210526

2.交叉验证分类

在k-fold交叉验证中，原始样本被随机划分为k个相同大小的子样本。

如何处理基于KNN算法的交叉验证，基于朴素贝叶斯算法计算AUC ?

# import k-folder

from sklearn.cross_validation import cross_val_score

# use the same model as before

knn = KNeighborsClassifier(n_neighbors = 5)

# X,y will automatically devided by 5 folder, the scoring I will still use the accuracy

scores = cross_val_score(knn, X, y, cv=5, scoring='accuracy')

# print all 5 times scores

print(scores)

# [ 0.96666667 1. 0.93333333 0.96666667 1. ]

# then I will do the average about these five scores to get more accuracy score.

print(scores.mean())

# 0.973333333333

我们可以选择不同的邻居来看看哪个K是最好的K。

import matplotlib.pyplot as plt

%matplotlib inline

# choose k between 1 to 31

k_range = range(1, 31)

k_scores = []

# use iteration to caclulator different k in models, then return the average accuracy based on the cross validation

for k in k_range:

knn = KNeighborsClassifier(n_neighbors=k)

scores = cross_val_score(knn, X, y, cv=5, scoring='accuracy')

k_scores.append(scores.mean())

# plot to see clearly

plt.plot(k_range, k_scores)

plt.xlabel('Value of K for KNN')

plt.ylabel('Cross-Validated Accuracy')

plt.show()

如何处理基于KNN算法的交叉验证，基于朴素贝叶斯算法计算AUC ?

＃我们可以看到最好的K在6-13之间，13之后精度由于不适合而下降。

3.使用neg_mean_squared_error进行评分（适合回归）

import matplotlib.pyplot as plt

k_range = range(1, 31)

k_scores = []

for k in k_range:

knn = KNeighborsClassifier(n_neighbors=k)

loss = abs(cross_val_score(knn, X, y, cv=5, scoring='neg_mean_squared_error'))

k_scores.append(loss.mean())

plt.plot(k_range, k_scores)

plt.xlabel('Value of K for KNN')

plt.ylabel('Cross-Validated MSE')

plt.show()

如何处理基于KNN算法的交叉验证，基于朴素贝叶斯算法计算AUC ?

＃因为它显示MSE，我们需要找到6-13之间的最小值。与＃2结果相同。

3.将朴素贝叶斯算法应用于具有相同数据集的分类问题

from sklearn import metrics

from sklearn.naive_bayes import GaussianNB

gnb = GaussianNB()

y_pred = gnb.fit(iris.data, iris.target).predict(iris.data)

print("Number of mislabeled points : %d" % (iris.target != y_pred).sum())

# Number of mislabeled points : 6

＃上面我们使用了一个简单的错误标记计数来确定一个分数：6个错误标签/ 150个总数或144个右侧/150个总数= 0.96（显然这里我们希望尽可能接近1）。

我们可以通过绘制受试者工作特征曲线和确定曲线下面积值（AUC）来评分二元分类。同样，我们的目标是尽可能接近1的AUC。

# Finding the false positive and true positive rates where the positive label is 2.

from sklearn import metrics

fpr, tpr, thresholds = metrics.roc_curve(iris.target, y_pred, pos_label=2)

metrics.auc(fpr, tpr)

plt.plot(fpr, tpr)

plt.xlabel('False Positive Rate')

plt.ylabel('True Positive Rate')

plt.title('Receiver operating characteristic example')

plt.show()

如何处理基于KNN算法的交叉验证，基于朴素贝叶斯算法计算AUC ?

如何处理基于KNN算法的交叉验证，基于朴素贝叶斯算法计算AUC ?

1、使用knn.score()来查看准确度。

2.交叉验证分类

3.使用neg_mean_squared_error进行评分（适合回归）

3.将朴素贝叶斯算法应用于具有相同数据集的分类问题

tracy

相关推荐

数据科学面试中应了解的十种机器学习概念

深度学习入门比赛——街景字符识别（四）

深度度量学习的这十三年，难道是错付了吗？

MachineLearning入门-11（算法评估）

论人工智能之二

千锋扣丁学堂Python培训之实现K折交叉验证方法步骤

Kaggle冠军经验分享丨如何用15个月冲到排行榜的首位

机器学习基础：(Python)训练集测试集分割与交叉验证

三招提升数据不平衡模型的性能（附python代码）

机器学习系列15：学习曲线

训练集、验证集、测试集以及交验验证的理解

机器学习基础：(Python)训练集测试集分割与交叉验证

资源 | 神经网络告诉我，谁是世界上最「美」的人？

机器学习模型评估指标示例

机器学习模型评估和超参数调优实践

七招教你处理非平衡数据——避免得到一个“假”模型

教你如何在机器学习竞赛中更胜一筹（上）

您应该在数据科学项目中使用交叉验证的5个理由

机器学习中交叉验证的两点介绍

通过交叉验证构建可靠的机器学习模型

在Python中训练/测试分割和交叉验证

调整机器学习模型

处理非平衡数据的七个技巧

Python sklearn KFold 生成交叉验证数据集的方法

如何解决机器学习中的数据不平衡问题？

机器学习基本概念笔记

几种交叉验证（cross validation）方式的比较

《Python机器学习》笔记（六）

交叉验证

sklearn交叉验证3-【老鱼学sklearn】

sklearn交叉验证-【老鱼学sklearn】

机器学习：以分析红酒口感为例说明交叉验证的套索模型