Statsmodels 다중회귀분석

1 개요

Statsmodels 다중회귀분석

2 예시 1: 광고 기억률

Python

CPU

1.9s

MEM

117M

2.6s

Copy

import pandas as pd
df = pd.DataFrame({
'radio_ads': [3,4,9,4,5,5,2,6,5,3],
'tv_ads':    [1,3,4,1,4,1,4,2,4,2],
'retention': [5,1,6,2,8,3,4,9,7,4],
})

X = df[['radio_ads','tv_ads']]
y = df['retention']

import statsmodels.api as sm
X = sm.add_constant(X)
model = sm.OLS(y, X)
result = model.fit()

print( result.params )
print( "R²=", result.rsquared )

const        1.366972
radio_ads    0.472477
tv_ads       0.522936
dtype: float64
R²= 0.2516683990901012
/usr/local/lib/python3.8/site-packages/statsmodels/tsa/tsatools.py:142: FutureWarning: In a future version of pandas all arguments of concat except for the argument 'objs' will be keyword-only
  x = pd.concat(x[::order], 1)

→ 회귀식 [math]\displaystyle{ y = 0.472477 x_1 - 0.522936 x_2 + 1.366972 }[/math]

→ 결정계수 [math]\displaystyle{ R^2 = 0.25166839909010097 }[/math]

3 예시 2: 빵집 매출

Python

Copy

import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/jmnote/zdata/master/multiple-regression/bakery-sales.csv')
df

Loading

Copy

X = df[['floor_space','distance_to_station']]
y = df['sales']

import statsmodels.api as sm
X = sm.add_constant(X)
model = sm.OLS(y, X)
result = model.fit()

print( result.params )
print( "R²=", result.rsquared )

Loading

→ 회귀식 [math]\displaystyle{ y = 41.513478 x_1 - 0.340883 x_2 + 65.323916 }[/math]

→ 결정계수 [math]\displaystyle{ R^2 = 0.945235852681711 }[/math]

4 예시 3: Boston

Python

Copy

from sklearn.datasets import load_boston
import pandas as pd
boston = load_boston()
X = pd.DataFrame(boston.data, columns=boston.feature_names)
y = boston.target

import statsmodels.api as sm
X = sm.add_constant(X)
model = sm.OLS(y, X)
result = model.fit()
print( result.summary() )

Loading

5 같이 보기