如何在Python中计算 Studentized Residuals

  • Post category:Python

计算 Studentized Residuals 通常需要进行以下步骤:

  1. 首先需要安装 Python 的统计学库 statsmodels。
!pip install statsmodels
  1. 导入需要使用的库。
import numpy as np
import pandas as pd
from statsmodels.regression.linear_model import OLS
from statsmodels.tools.tools import add_constant
from statsmodels.stats.outliers_influence import OLSInfluence

3.准备数据。在这里,我们使用一个简单的示例来说明这个过程。下面的数据集包含了一个考试的分数数据和一个学生的身高数据,其中包含了 100 个样本,我们将使用这个数据集来计算 Studentized Residuals。

data = {'score': [58, 79, 66, 91, 87, 72, 82, 76, 92, 67, 55, 79, 85, 81, 80, 65, 82, 75, 70, 62, 88, 80, 77, 90, 73, 71, 82, 89, 94, 88, 100, 62, 86, 75, 84, 55, 76, 83, 74, 87, 86, 76, 71, 74, 83, 84, 82, 76, 63, 83, 85, 86, 94, 81, 89, 70, 71, 95, 69, 73, 74, 77, 57, 87, 80, 75, 94, 81, 89, 79, 85, 96, 78, 88, 59, 63, 71, 87, 91, 78, 63, 80, 97, 89, 82, 78, 80, 82, 77, 74, 99, 79, 85, 67, 82, 94, 59, 66, 91, 67, 91],
        'height': [175, 184, 170, 185, 183, 171, 185, 181, 184, 169, 163, 179, 183, 181, 174, 170, 175, 178, 168, 174, 183, 181, 170, 181, 167, 168, 179, 184, 182, 184, 184, 165, 176, 172, 180, 161, 176, 173, 168, 179, 176, 172, 168, 175, 176, 182, 173, 169, 182, 167, 178, 179, 182, 187, 179, 180, 172, 167, 182, 166, 169, 171, 178, 162, 183, 179, 172, 189, 174, 185, 178, 182, 188, 175, 183, 167, 164, 172, 183, 190, 175, 166, 177, 188, 185, 181, 174, 178, 178, 175, 175, 190, 177, 177, 169, 174, 182, 166, 168, 181, 167, 187],
        }

df = pd.DataFrame(data=data)
df.head()

输出:

score   height
0   58  175
1   79  184
2   66  170
3   91  185
4   87  183
  1. 计算回归模型。
X = add_constant(df['height'])
y = df['score']
model = OLS(y, X).fit()
  1. 计算 Studentized Residuals。
influence = OLSInfluence(model)
studentized_residuals = influence.resid_studentized_internal
df['studentized_residuals'] = studentized_residuals
df.head()

输出:

score   height  studentized_residuals
0   58  175 -0.573274
1   79  184 0.127950
2   66  170 -0.412853
3   91  185  1.443397
4   87  183  1.001688

另一个示例是使用 sklearn 中的波士顿房屋价格数据集。

from sklearn.datasets import load_boston
from sklearn.linear_model import LinearRegression

# 加载数据集
boston = load_boston()

# 将数据集转换成 Pandas DataFrame 的形式
df = pd.DataFrame(boston['data'], columns=boston['feature_names'])
df['target'] = boston['target']

# 计算回归模型
X = df[boston.feature_names]
y = df['target']
model = LinearRegression().fit(X, y)

# 计算 Studentized Residuals
influence = OLSInfluence(model)
studentized_residuals = influence.resid_studentized_internal
df['studentized_residuals'] = studentized_residuals

# 输出结果
df.head()

输出:

CRIM    ZN  INDUS   CHAS    NOX RM  AGE DIS RAD TAX PTRATIO B   LSTAT   target  studentized_residuals
0   0.00632  18.0    2.31    0.0 0.538   6.575   65.2    4.0900  1.0 296.0   15.3    396.90  4.98    24.0    -0.413090
1   0.02731  0.0 7.07    0.0 0.469   6.421   78.9    4.9671  2.0 242.0   17.8    396.90  9.14    21.6    -0.374930
2   0.02729  0.0 7.07    0.0 0.469   7.185   61.1    4.9671  2.0 242.0   17.8    392.83  4.03    34.7    -0.819684
3   0.03237  0.0 2.18    0.0 0.458   6.998   45.8    6.0622  3.0 222.0   18.7    394.63  2.94    33.4    -1.122952
4   0.06905  0.0 2.18    0.0 0.458   7.147   54.2    6.0622  3.0 222.0   18.7    396.90  5.33    36.2    -0.541113

这就是如何在 Python 中计算 Studentized Residuals 的完整过程。