详解 Scikit-learn 的 svm.SVR函数：支持向量机回归器

介绍

Scikit-learn是Python中广泛使用的机器学习库之一。其中sklearn.svm.SVR是支持向量回归算法的一个实现，用于处理回归问题。

SVR算法的目标是在训练数据上基于支持向量机（SVM）实现回归，它通过将数据转换到高维空间来找到最优的回归超平面。尤其适用于数据量小且复杂的非线性回归问题。

使用方法

sklearn.svm.SVR主要可调参数：

C：float（默认值= 1.0）：C是控制权衡分类准确性和边际的重要参数。
kernel：{‘linear’，‘poly’，‘rbf’，‘sigmoid’，‘precomputed’}，（默认=’rbf’）：说明不同的核会产生不同的SVR，‘rbf’是默认配置。
degree：int（默认= 3），使用函数‘poly’的程度。
gamma：{‘scale’，‘auto’}或float，（默认=’scale’）：在‘rbf’、‘poly’和‘sigmoid’中使用核函数的系数。
coef0：float（默认值= 0.0）：核函数的独立项。
shrinking：bool（默认= True）：是否使用收缩的启发式方法。

例子：

这里我们对SVR算法使用了两个不同的实例数据集，一个是玩具数据集，另一个是真实数据集。

首先，导入所需的库和数据集：

from sklearn.svm import SVR
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split

# 使用toy数据集
X, y = make_regression(n_features=4, random_state=0)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

# 导入真实数据集
from sklearn.datasets import load_boston
boston = load_boston()
X2 = boston.data
y2 = boston.target
X2_train, X2_test, y2_train, y2_test = train_test_split(X2, y2, random_state=0)

接下来，我们使用不同的参数设置拟合数据，并预测测试集上的结果。

使用默认参数

clf = SVR()
clf.fit(X_train, y_train)
score = clf.score(X_test, y_test)
print(score)

输出为0.41。

使用多项式核

clf2 = SVR(kernel='poly', degree=2, gamma='scale', C=1.0, epsilon=0.2)
clf2.fit(X_train, y_train)
score2 = clf2.score(X_test, y_test)
print(score2)

输出为0.16。

使用真实数据集

clf3 = SVR(kernel='linear', C=1.0, epsilon=0.1)
clf3.fit(X2_train, y2_train)
score3 = clf3.score(X2_test, y2_test)
print(score3)

输出为0.65。

这里我们尝试了三种不同的参数设置进行拟合和预测，分别是默认参数、多项式核、以及真实数据集。结果表明，三种不同的参数设置下，SVR算法对每个数据集的表现都有所不同。因此，在使用该算法时，需要选择合适的参数设置来获得更好的拟合结果。

注意，在使用SVR时还需小心过拟合问题的发生。如果需要进一步提高SVR算法的拟合效果和泛化能力，可以采用交叉验证等技巧对数据进行更加精细的处理和调参。

你可能也喜欢

scikit-learn报”ValueError: Input contains NaN, infinity or a value too large for dtype(‘float64’) “的原因以及解决办法

scikit-learn报”ValueError: The number of classes has to be greater than one; got {n_classes}. “的原因以及解决办法

scikit-learn报”ValueError: The sum of sample_weight, or sample_weight & sample_mask, is not positive “的原因以及解决办法