"R 원핫 인코딩"의 두 판 사이의 차이

2020년 5월 5일 (화) 19:22 기준 최신판

1 개요[ | ]

One Hot Encoding in R
R one-hot encoding
R 원핫 인코딩

2 자동[ | ]

모든 unique 값에 대한 컬럼을 자동 생성

2.1 unique + ifelse ★[ | ]

추가 라이브러리 필요 없음 ★
접두어(여기서는 subject_)를 자유롭게 추가할 수 있음

df = read.table( header=TRUE, stringsAsFactors=FALSE, text="
group   student exam_passed subject 
A       1       Y           Math
A       1       N           Science
A       1       Y           Japanese
A       2       N           Math
A       2       Y           Science
B       1       Y           Japanese
C       2       N           Math
")
for(v in unique(df$subject)) df[paste0("subject_",v)] <- ifelse(df$subject==v,1,0)
print( df )

2.2 predict + dummyVars[ | ]

df = read.table( header=TRUE, stringsAsFactors=FALSE, text="
group   student exam_passed subject 
A       1       Y           Math
A       1       N           Science
A       1       Y           Japanese
A       2       N           Math
A       2       Y           Science
B       1       Y           Japanese
C       2       N           Math
")
library(caret)
df = cbind(df, predict(dummyVars(~subject,df), df))
print( df )

3 수동[ | ]

컬럼을 지정하여 생성

3.1 mutate + ifelse[ | ]

df = read.table( header=TRUE, stringsAsFactors=FALSE, text="
group   student exam_passed subject 
A       1       Y           Math
A       1       N           Science
A       1       Y           Japanese
A       2       N           Math
A       2       Y           Science
B       1       Y           Japanese
C       2       N           Math
") 
library(tidyr, warn.conflicts=F)
library(dplyr, warn.conflicts=F)
df = df %>% mutate(
  subject_Math = ifelse(subject=='Math', 1, 0),
  subject_Science = ifelse(subject=='Science', 1, 0),
  subject_Japanese = ifelse(subject=='Japanese', 1, 0),
)
print( df )

4 같이 보기[ | ]

@@ 7번째 줄: / 7번째 줄: @@
 * 모든 unique 값에 대한 컬럼을 자동 생성
-===기본 ★===
+===unique + ifelse ★===
+* 추가 라이브러리 필요 없음 ★
+* 접두어(여기서는 <code>subject_</code>)를 자유롭게 추가할 수 있음
 <source lang='r' run>
 df = read.table( header=TRUE, stringsAsFactors=FALSE, text="
@@ 23번째 줄: / 25번째 줄: @@
 </source>
-===caret===
+===predict + dummyVars===
 <source lang='r' run>
 df = read.table( header=TRUE, stringsAsFactors=FALSE, text="
@@ 67번째 줄: / 69번째 줄: @@
 ==같이 보기==
 * [[원핫 인코딩]]
+* [[R unique()]]
 * [[R dummy_cols()]]
 * [[R 가변수 추가하기]]
 [[분류: R 데이터 전처리]]