728x90
반응형
German credit data를 사용하겠습니다.
독일의 credit 평가 데이터로 첨부하도록 하겠습니다.
회귀 분석 모델 적합
#데이터 불러오기
german<-read.csv("German_credit.csv")
# regression
reg <- lm(Credit_amount ~ Duration_in_month + Installment_rate + Present_residence + Age +
Num_of_existing_credits + Num_of_people_liable, data=german)
summary(reg)
------------------------------------------------------------------------------------------------
Call:
lm(formula = Credit_amount ~ Duration_in_month + Installment_rate +
Present_residence + Age + Num_of_existing_credits + Num_of_people_liable,
data = german)
Residuals:
Min 1Q Median 3Q Max
-6018.2 -1155.3 -258.5 596.5 12213.1
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1723.536 371.278 4.642 3.91e-06 ***
Duration_in_month 152.628 5.285 28.880 < 2e-16 ***
Installment_rate -819.884 57.181 -14.338 < 2e-16 ***
Present_residence 4.097 59.804 0.069 0.94539
Age 17.695 5.879 3.010 0.00268 **
Num_of_existing_credits 120.145 111.718 1.075 0.28244
Num_of_people_liable -12.850 177.805 -0.072 0.94240
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2005 on 993 degrees of freedom
Multiple R-squared: 0.4985, Adjusted R-squared: 0.4955
F-statistic: 164.5 on 6 and 993 DF, p-value: < 2.2e-16
R squared는 0.498로 높진 않습니다.
coefficients(reg) # model coefficients
(Intercept) Duration_in_month Installment_rate Present_residence Age
1723.535670 152.628496 -819.883557 4.097184 17.694955
Num_of_existing_credits Num_of_people_liable
120.144648 -12.850370
confint(reg, level=0.95) # CIs for model parameters
2.5 % 97.5 %
(Intercept) 994.955456 2452.11588
Duration_in_month 142.257446 162.99955
Installment_rate -932.093138 -707.67398
Present_residence -113.258894 121.45326
Age 6.158893 29.23102
Num_of_existing_credits -99.085688 339.37498
Num_of_people_liable -361.766588 336.06585
fitted(reg) # predicted values
1 2 3 4 5 6 7 8 9 10
789.16204 2889.09874 7401.27100 6308.55048 3988.48452 6313.20648 3118.38575 3235.87453 2992.33774 3288.74561
11 12 13 14 15 16 17 18 19 20
5464.87244 3578.15078 1030.98804 1884.85299 3506.17500 1997.68797 2301.65845 2571.93010 173.55866 2078.34552
21 22 23 24 25 26 27 28 29 30
1049.40157 2767.80213 4340.10489 3597.56342 1407.83882 970.86836 8897.94936 2380.25548 361.92265 3669.19362
31 32 33 34 35 36 37 38 39 40
1666.88724 3720.49213 6151.01358 986.06597 4752.41355 2341.89569 3516.49646 886.49077 2735.72768 5791.10344
vcov(reg) # covariance matrix for model parameters
(Intercept) Duration_in_month Installment_rate Present_residence Age
(Intercept) 137847.6034 -551.580885 -9028.90853 -5784.32707 -716.715937
Duration_in_month -551.5809 27.931225 -22.46787 -13.70629 1.469673
Installment_rate -9028.9085 -22.467873 3269.67324 -108.65531 -18.886469
Present_residence -5784.3271 -13.706295 -108.65531 3576.47876 -89.361277
Age -716.7159 1.469673 -18.88647 -89.36128 34.558874
Num_of_existing_credits -11388.5992 4.674207 -123.89043 -341.04164 -77.840634
Num_of_people_liable -32466.3742 12.734440 807.43943 -108.17191 -106.086847
Num_of_existing_credits Num_of_people_liable
(Intercept) -11388.599222 -32466.37418
Duration_in_month 4.674207 12.73444
Installment_rate -123.890429 807.43943
Present_residence -341.041637 -108.17191
Age -77.840634 -106.08685
Num_of_existing_credits 12480.896228 -1873.77822
Num_of_people_liable -1873.778225 31614.53407
회귀 모형의 진단
# diagnostic plots
layout(matrix(c(1,2,3,4),2,2)) # optional 4 graphs/page
plot(reg)
1,2 번 그래프 : 등분산성 검정 (패턴이 보이면 가정에 위배) -> 위 데이터는 값이 커질수록 분산이 커지는 패턴이 보임.
3 번 그래프 : 오차의 정규성 검정 (직선이면 정규분포를 따름) -> 위 데이터는 직선이 아니기 때문에 정규성 만족 x
4 번 그래프 : 각 관측치들이 오차에 미치는 영향 -> 몇 관측치들이 이상치로 보임 (766, 973, 972)
반응형
'데이터 다루기 > Base of R' 카테고리의 다른 글
[R] Data Partition (데이터 분할) (0) | 2020.03.09 |
---|---|
[R] 회귀 분석 (변수선택) (0) | 2020.03.08 |
[Data] LendingClub (P2P Default 예측 데이터) (1) | 2020.01.17 |
[R] dplyr 패키지로 데이터 전처리하기 (0) | 2019.08.27 |
[R] 패키지 설치하기 (0) | 2019.08.15 |