1 개요[ | ]
- One Hot Encoding in R
- R one-hot encoding
- R 원핫 인코딩
2 자동[ | ]
- 모든 unique 값에 대한 컬럼을 자동 생성
2.1 unique + ifelse ★[ | ]
- 추가 라이브러리 필요 없음 ★
- 접두어(여기서는
subject_
)를 자유롭게 추가할 수 있음
R
Copy
df = read.table( header=TRUE, stringsAsFactors=FALSE, text="
group student exam_passed subject
A 1 Y Math
A 1 N Science
A 1 Y Japanese
A 2 N Math
A 2 Y Science
B 1 Y Japanese
C 2 N Math
")
for(v in unique(df$subject)) df[paste0("subject_",v)] <- ifelse(df$subject==v,1,0)
print( df )
Loading
2.2 predict + dummyVars[ | ]
R
Copy
df = read.table( header=TRUE, stringsAsFactors=FALSE, text="
group student exam_passed subject
A 1 Y Math
A 1 N Science
A 1 Y Japanese
A 2 N Math
A 2 Y Science
B 1 Y Japanese
C 2 N Math
")
library(caret)
df = cbind(df, predict(dummyVars(~subject,df), df))
print( df )
Loading
3 수동[ | ]
- 컬럼을 지정하여 생성
3.1 mutate + ifelse[ | ]
R
Copy
df = read.table( header=TRUE, stringsAsFactors=FALSE, text="
group student exam_passed subject
A 1 Y Math
A 1 N Science
A 1 Y Japanese
A 2 N Math
A 2 Y Science
B 1 Y Japanese
C 2 N Math
")
library(tidyr, warn.conflicts=F)
library(dplyr, warn.conflicts=F)
df = df %>% mutate(
subject_Math = ifelse(subject=='Math', 1, 0),
subject_Science = ifelse(subject=='Science', 1, 0),
subject_Japanese = ifelse(subject=='Japanese', 1, 0),
)
print( df )
Loading
4 같이 보기[ | ]
편집자 Jmnote
로그인하시면 댓글을 쓸 수 있습니다.