Sklearn 로지스틱회귀분석

1 개요[ | ]

Sklearn 로지스틱회귀분석

2 예시 1: 공부시간과 합격확률[ | ]

Python
CPU
1.7s
MEM
105M
2.3s
Copy
import pandas as pd
df = pd.DataFrame({
'hours': [0.50,0.75,1.00,1.25,1.50,1.75,1.75,2.00,2.25,2.50,2.75,3.00,3.25,3.50,4.00,4.25,4.50,4.75,5.00,5.50],
'pass': [0,0,0,0,0,0,1,0,1,0,1,0,1,0,1,1,1,1,1,1],
})

X = df[['hours']]
Y = df['pass']

from sklearn.linear_model import LogisticRegression
reg = LogisticRegression(C=100000).fit(X, Y)
print( reg.coef_ )
print( reg.intercept_ )
print( "R²=", reg.score(X, Y) )
[[1.50463927]]
[-4.07770207]
R²= 0.8
→ 회귀식 [math]\displaystyle{ y = \dfrac{1}{1 + \exp(-(1.50463927 x_1 - 4.07770207))} }[/math]

3 예시 2: 스페셜 판매확률[ | ]

Python
CPU
1.6s
MEM
107M
2.2s
Copy
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/jmnote/zdata/master/logistic-regression/special-sales.csv')
print( df )

Y = df['special_sales']
X = df[['busy_day','high_temperature']]

from sklearn.linear_model import LogisticRegression
reg = LogisticRegression(C=100000).fit(X, Y)
print( reg.coef_ )
print( reg.intercept_ )
print( "R²=", reg.score(X, Y) )
          date weekday  busy_day  high_temperature  special_sales
0   2002-08-05     Mon         0                28              1
1   2002-08-06     Tue         0                24              0
2   2002-08-07     Wed         1                26              0
3   2002-08-08     Thu         0                24              0
4   2002-08-09     Fri         0                23              0
5   2002-08-10     Sat         1                28              1
6   2002-08-11     Sun         1                24              0
7   2002-08-12     Mon         0                26              1
8   2002-08-13     Tue         0                25              0
9   2002-08-14     Wed         1                28              1
10  2002-08-15     Thu         0                21              0
11  2002-08-16     Fri         0                22              0
12  2002-08-17     Sat         1                27              1
13  2002-08-18     Sun         1                26              1
14  2002-08-19     Mon         0                26              0
15  2002-08-20     Tue         0                21              0
16  2002-08-21     Wed         1                21              1
17  2002-08-22     Thu         0                27              0
18  2002-08-23     Fri         0                23              0
19  2002-08-24     Sat         1                22              0
20  2002-08-25     Sun         1                24              1
[[2.44261279 0.54450301]]
[-15.20342824]
R²= 0.8095238095238095
→ 회귀식 [math]\displaystyle{ y = \dfrac{1}{1 + \exp(-(2.44261279 x_1 + 0.54450301 x_2 - 15.20342824))} }[/math]

4 예시 3: 유방암 판정[ | ]

Python
Copy
from sklearn.datasets import load_breast_cancer
breast_cancer = load_breast_cancer()
X = breast_cancer.data
y = breast_cancer.target

from sklearn.linear_model import LogisticRegression
model = LogisticRegression(solver='newton-cg')
result = model.fit(X, y)

for f, w in zip(breast_cancer.feature_names, result.coef_[0]):
  print(f"{f:<25} {w}")
Loading

5 같이 보기[ | ]