R 원핫 인코딩

Jmnote (토론 | 기여)님의 2020년 5월 5일 (화) 19:16 판 (→‎수동)

1 개요

One Hot Encoding in R
R one-hot encoding
R 원핫 인코딩

2 자동

  • 모든 unique 값에 대한 컬럼을 자동 생성

2.1 기본 ★

df = read.table( header=TRUE, stringsAsFactors=FALSE, text="
group   student exam_passed subject 
A       1       Y           Math
A       1       N           Science
A       1       Y           Japanese
A       2       N           Math
A       2       Y           Science
B       1       Y           Japanese
C       2       N           Math
")
for(v in unique(df$subject)) df[paste0("subject_",v)] <- ifelse(df$subject==v,1,0)
print( df )

2.2 caret

df = read.table( header=TRUE, stringsAsFactors=FALSE, text="
group   student exam_passed subject 
A       1       Y           Math
A       1       N           Science
A       1       Y           Japanese
A       2       N           Math
A       2       Y           Science
B       1       Y           Japanese
C       2       N           Math
")
library(caret)
df = cbind(df, predict(dummyVars(~subject,df), df))
print( df )

3 수동

  • 컬럼을 지정하여 생성

3.1 mutate + ifelse

df = read.table( header=TRUE, stringsAsFactors=FALSE, text="
group   student exam_passed subject 
A       1       Y           Math
A       1       N           Science
A       1       Y           Japanese
A       2       N           Math
A       2       Y           Science
B       1       Y           Japanese
C       2       N           Math
") 
library(tidyr, warn.conflicts=F)
library(dplyr, warn.conflicts=F)
df = df %>% mutate(
  subject_Math = ifelse(subject=='Math', 1, 0),
  subject_Science = ifelse(subject=='Science', 1, 0),
  subject_Japanese = ifelse(subject=='Japanese', 1, 0),
)
print( df )

4 같이 보기

문서 댓글 ({{ doc_comments.length }})
{{ comment.name }} {{ comment.created | snstime }}