R 원핫 인코딩

1 개요[ | ]

One Hot Encoding in R
R one-hot encoding
R 원핫 인코딩

2 자동[ | ]

  • 모든 unique 값에 대한 컬럼을 자동 생성

2.1 unique + ifelse ★[ | ]

  • 추가 라이브러리 필요 없음 ★
  • 접두어(여기서는 subject_)를 자유롭게 추가할 수 있음
df = read.table( header=TRUE, stringsAsFactors=FALSE, text="
group   student exam_passed subject 
A       1       Y           Math
A       1       N           Science
A       1       Y           Japanese
A       2       N           Math
A       2       Y           Science
B       1       Y           Japanese
C       2       N           Math
")
for(v in unique(df$subject)) df[paste0("subject_",v)] <- ifelse(df$subject==v,1,0)
print( df )

2.2 predict + dummyVars[ | ]

df = read.table( header=TRUE, stringsAsFactors=FALSE, text="
group   student exam_passed subject 
A       1       Y           Math
A       1       N           Science
A       1       Y           Japanese
A       2       N           Math
A       2       Y           Science
B       1       Y           Japanese
C       2       N           Math
")
library(caret)
df = cbind(df, predict(dummyVars(~subject,df), df))
print( df )

3 수동[ | ]

  • 컬럼을 지정하여 생성

3.1 mutate + ifelse[ | ]

df = read.table( header=TRUE, stringsAsFactors=FALSE, text="
group   student exam_passed subject 
A       1       Y           Math
A       1       N           Science
A       1       Y           Japanese
A       2       N           Math
A       2       Y           Science
B       1       Y           Japanese
C       2       N           Math
") 
library(tidyr, warn.conflicts=F)
library(dplyr, warn.conflicts=F)
df = df %>% mutate(
  subject_Math = ifelse(subject=='Math', 1, 0),
  subject_Science = ifelse(subject=='Science', 1, 0),
  subject_Japanese = ifelse(subject=='Japanese', 1, 0),
)
print( df )

4 같이 보기[ | ]

문서 댓글 ({{ doc_comments.length }})
{{ comment.name }} {{ comment.created | snstime }}