728x90
반응형
ar <- read.csv("AR.csv", header=T, encoding = 'UTF-8-BOM')
사용할 데이터를 불러옵니다.
ar$Personal_status___sex[ar$Personal_status___sex == '1'] <- 'divorced male'
ar$Personal_status___sex[ar$Personal_status___sex == '2'] <- 'divorced female'
ar$Personal_status___sex[ar$Personal_status___sex == '3'] <- 'single male'
ar$Personal_status___sex[ar$Personal_status___sex == '4'] <- 'married male'
ar$Personal_status___sex[ar$Personal_status___sex == '5'] <- 'single female'
ar$Age = ifelse(ar$Age < 20, 'age 10-20',
ifelse(ar$Age < 30, 'age 20-30',
ifelse(ar$Age < 40, 'age 30-40',
ifelse(ar$Age < 50, 'age 40-50',
ifelse(ar$Age < 60, 'age 50-60',
ifelse(ar$Age < 70, 'age 60-70', 'age 70-80'))))))
ar$Personal_status___sex <- as.factor(ar$Personal_status___sex)
ar$Age <- as.factor(ar$Age)
ar$Credit_status <- as.factor(ar$Credit_status)
연속형 변수를 전부 범주형 변수로 바꿔줍니다.
library(arules)
rules <- apriori(ar[, c(2:4)], parameter = list(supp=0.2, conf=0.2))
inspect(rules)
arules 패키지에 있는 apriori 함수를 통해 규칙을 생성할 수 있습니다.
parameter를 통해 규칙에 제한조건을 줄 수 있습니다.
최소 support로 0.2, 최소 confidence로 0.2를 설정하였습니다.
rules <- subset(rules, subset = lift > 1) # lift > 1
rules.sorted <- sort(rules, by="lift")
inspect(rules.sorted)
lhs rhs support confidence lift count
[1] {Age=[38,75]} => {Personal_status___sex=[3,4]} 0.245 0.7122093 1.112827 245
[2] {Personal_status___sex=[3,4]} => {Age=[38,75]} 0.245 0.3828125 1.112827 245
[3] {Credit_status=N} => {Age=[38,75]} 0.256 0.3657143 1.063123 256
[4] {Age=[38,75]} => {Credit_status=N} 0.256 0.7441860 1.063123 256
[5] {Age=[28,38)} => {Personal_status___sex=[3,4]} 0.248 0.6794521 1.061644 248
[6] {Personal_status___sex=[3,4]} => {Age=[28,38)} 0.248 0.3875000 1.061644 248
[7] {Personal_status___sex=[3,4]} => {Credit_status=N} 0.469 0.7328125 1.046875 469
[8] {Credit_status=N} => {Personal_status___sex=[3,4]} 0.469 0.6700000 1.046875 469
[9] {Age=[28,38)} => {Credit_status=N} 0.260 0.7123288 1.017613 260
[10] {Credit_status=N} => {Age=[28,38)} 0.260 0.3714286 1.017613 260
subset 함수를 사용해서, lift > 1 조건을 만족하는 규칙들을 뽑아내었습니다.
그리고 sort 함수를 통해 lift가 높은 순으로 정렬하여 inspect 함수를 통해 출력하였습니다.
반응형
'데이터 다루기 > Base of R' 카테고리의 다른 글
[R] 신경망 (Neural network) (0) | 2020.05.07 |
---|---|
[R] 협업 필터링 (Collaborative filtering) (0) | 2020.05.07 |
[R] Decision Tree (의사결정나무) (0) | 2020.04.09 |
[R] Logistic Regression (0) | 2020.04.07 |
[R] K-nearest neighbor (KNN) method (0) | 2020.04.07 |