[R] 회귀 분석

728x90

German credit data를 사용하겠습니다.

독일의 credit 평가 데이터로 첨부하도록 하겠습니다.

회귀 분석 모델 적합

#데이터 불러오기
german<-read.csv("German_credit.csv")

# regression
reg <- lm(Credit_amount ~ Duration_in_month + Installment_rate + Present_residence + Age +
            Num_of_existing_credits + Num_of_people_liable, data=german)
summary(reg)
------------------------------------------------------------------------------------------------
Call:
lm(formula = Credit_amount ~ Duration_in_month + Installment_rate + 
    Present_residence + Age + Num_of_existing_credits + Num_of_people_liable, 
    data = german)

Residuals:
    Min      1Q  Median      3Q     Max 
-6018.2 -1155.3  -258.5   596.5 12213.1 

Coefficients:
                        Estimate Std. Error t value Pr(>|t|)    
(Intercept)             1723.536    371.278   4.642 3.91e-06 ***
Duration_in_month        152.628      5.285  28.880  < 2e-16 ***
Installment_rate        -819.884     57.181 -14.338  < 2e-16 ***
Present_residence          4.097     59.804   0.069  0.94539    
Age                       17.695      5.879   3.010  0.00268 ** 
Num_of_existing_credits  120.145    111.718   1.075  0.28244    
Num_of_people_liable     -12.850    177.805  -0.072  0.94240    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2005 on 993 degrees of freedom
Multiple R-squared:  0.4985,	Adjusted R-squared:  0.4955 
F-statistic: 164.5 on 6 and 993 DF,  p-value: < 2.2e-16

R squared는 0.498로 높진 않습니다.

coefficients(reg) # model coefficients

            (Intercept)       Duration_in_month        Installment_rate       Present_residence                     Age 
            1723.535670              152.628496             -819.883557                4.097184               17.694955 
Num_of_existing_credits    Num_of_people_liable 
             120.144648              -12.850370 


confint(reg, level=0.95) # CIs for model parameters

                              2.5 %     97.5 %
(Intercept)              994.955456 2452.11588
Duration_in_month        142.257446  162.99955
Installment_rate        -932.093138 -707.67398
Present_residence       -113.258894  121.45326
Age                        6.158893   29.23102
Num_of_existing_credits  -99.085688  339.37498
Num_of_people_liable    -361.766588  336.06585


fitted(reg) # predicted values


          1           2           3           4           5           6           7           8           9          10 
  789.16204  2889.09874  7401.27100  6308.55048  3988.48452  6313.20648  3118.38575  3235.87453  2992.33774  3288.74561 
         11          12          13          14          15          16          17          18          19          20 
 5464.87244  3578.15078  1030.98804  1884.85299  3506.17500  1997.68797  2301.65845  2571.93010   173.55866  2078.34552 
         21          22          23          24          25          26          27          28          29          30 
 1049.40157  2767.80213  4340.10489  3597.56342  1407.83882   970.86836  8897.94936  2380.25548   361.92265  3669.19362 
         31          32          33          34          35          36          37          38          39          40 
 1666.88724  3720.49213  6151.01358   986.06597  4752.41355  2341.89569  3516.49646   886.49077  2735.72768  5791.10344 


vcov(reg) # covariance matrix for model parameters
                        (Intercept) Duration_in_month Installment_rate Present_residence         Age
(Intercept)             137847.6034       -551.580885      -9028.90853       -5784.32707 -716.715937
Duration_in_month         -551.5809         27.931225        -22.46787         -13.70629    1.469673
Installment_rate         -9028.9085        -22.467873       3269.67324        -108.65531  -18.886469
Present_residence        -5784.3271        -13.706295       -108.65531        3576.47876  -89.361277
Age                       -716.7159          1.469673        -18.88647         -89.36128   34.558874
Num_of_existing_credits -11388.5992          4.674207       -123.89043        -341.04164  -77.840634
Num_of_people_liable    -32466.3742         12.734440        807.43943        -108.17191 -106.086847
                        Num_of_existing_credits Num_of_people_liable
(Intercept)                       -11388.599222         -32466.37418
Duration_in_month                      4.674207             12.73444
Installment_rate                    -123.890429            807.43943
Present_residence                   -341.041637           -108.17191
Age                                  -77.840634           -106.08685
Num_of_existing_credits            12480.896228          -1873.77822
Num_of_people_liable               -1873.778225          31614.53407

회귀 모형의 진단

# diagnostic plots
layout(matrix(c(1,2,3,4),2,2)) # optional 4 graphs/page
plot(reg)

1,2 번 그래프 : 등분산성 검정 (패턴이 보이면 가정에 위배) -> 위 데이터는 값이 커질수록 분산이 커지는 패턴이 보임.

3 번 그래프 : 오차의 정규성 검정 (직선이면 정규분포를 따름) -> 위 데이터는 직선이 아니기 때문에 정규성 만족 x

4 번 그래프 : 각 관측치들이 오차에 미치는 영향 -> 몇 관측치들이 이상치로 보임 (766, 973, 972)

저작자표시 (새창열림)

'데이터 다루기 > Base of R' 카테고리의 다른 글

[R] Data Partition (데이터 분할) (0)	2020.03.09
[R] 회귀 분석 (변수선택) (0)	2020.03.08
[Data] LendingClub (P2P Default 예측 데이터) (1)	2020.01.17
[R] dplyr 패키지로 데이터 전처리하기 (0)	2019.08.27
[R] 패키지 설치하기 (0)	2019.08.15

분석벌레의 공부방

[R] 회귀 분석

'데이터 다루기 > Base of R' 카테고리의 다른 글

티스토리툴바

[R] 회귀 분석

'데이터 다루기 > Base of R' 카테고리의 다른 글

'데이터 다루기/Base of R' Related Articles

티스토리툴바