Processing math: 22%
  • Data
  • Matrix Approach
    • Calculating the various components
      • Create the outcome vector and design matrix
      • XX
      • XY
      • YY
      • (XX)1
      • Calculate \hat{\boldsymbol{\beta}}
      • Calculate the MSE (i.e., the variance of Y|X)
      • Calculate the covariance matrix for our betas
    • Testing a single covariate for \hat{\beta}_1
      • Set up a vector to extract \hat{\beta}_1
      • Extract \hat{\beta}_1
      • Calculate Var(\hat{\beta}_1)
      • Calculate the t-statistic
      • Calculate the p-value for our two-sided t-test
      • Putting it all together
  • Compare results to linear regression output

Data

In this HTML we will implement the multiple linear regression for the Rosner data set examining systolic blood pressure based on weight (oz) and age (days) for newborns:

SBP Weight Age
89 135 3
90 120 4
83 100 3
77 105 2
92 130 4
98 125 5
82 125 2
85 105 3
96 120 5
95 90 4
80 120 2
79 95 3
86 120 3
97 150 4
92 160 3
88 125 3

We will first implement the multiple linear regression model using matrices, then check our result using lm.

Matrix Approach

Calculating the various components

Create the outcome vector and design matrix

Y <- matrix( c(89,90,83,77,92,98,82,85,96,95,80,79,86,97,92,88), ncol=1 )
X <- matrix( c( rep(1, length(Y)),
                c(135,120,100,105,130,125,125,105,120,90,120,95,120,150,160,125),
                c(3,4,3,2,4,5,2,3,5,4,2,3,3,4,3,3)), 
             ncol=3, byrow=F)
cbind(Y,X)
##       [,1] [,2] [,3] [,4]
##  [1,]   89    1  135    3
##  [2,]   90    1  120    4
##  [3,]   83    1  100    3
##  [4,]   77    1  105    2
##  [5,]   92    1  130    4
##  [6,]   98    1  125    5
##  [7,]   82    1  125    2
##  [8,]   85    1  105    3
##  [9,]   96    1  120    5
## [10,]   95    1   90    4
## [11,]   80    1  120    2
## [12,]   79    1   95    3
## [13,]   86    1  120    3
## [14,]   97    1  150    4
## [15,]   92    1  160    3
## [16,]   88    1  125    3

\mathbf{X}^\top \mathbf{X}

XtX <- t(X) %*% X
XtX
##      [,1]   [,2] [,3]
## [1,]   16   1925   53
## [2,] 1925 236875 6405
## [3,]   53   6405  189

\mathbf{X}^\top \mathbf{Y}

XtY <- t(X) %*% Y
XtY
##        [,1]
## [1,]   1409
## [2,] 170350
## [3,]   4750

\mathbf{Y}^\top \mathbf{Y}

YtY <- t(Y) %*% Y
YtY
##        [,1]
## [1,] 124751

(\mathbf{X}^\top \mathbf{X})^{-1}

XtX_inv <- solve(XtX)
XtX_inv
##             [,1]          [,2]          [,3]
## [1,]  3.34152652 -0.0217335058 -0.2005174644
## [2,] -0.02173351  0.0001918187 -0.0004059419
## [3,] -0.20051746 -0.0004059419  0.0752776910

Calculate \hat{\boldsymbol{\beta}}

beta_vec <- XtX_inv %*% t(X) %*% Y
beta_vec
##            [,1]
## [1,] 53.4501940
## [2,]  0.1255833
## [3,]  5.8877191

Calculate the MSE (i.e., the variance of Y|X)

mse <- ( YtY - t(beta_vec) %*% t(X) %*% Y ) / ( length(Y) - 2 - 1 ) #denominator is n-p-1, where n=16, p=2 (2 different predictors)
mse
##          [,1]
## [1,] 6.146297

Calculate the covariance matrix for our betas

Sigma_mat <- XtX_inv * as.numeric( mse ) #note: here we had to convert our sigma^2(Y|X) into a numeric from a 1x1 matrix object, otherwise R gives a "Error...non-conformable arrays"
Sigma_mat
##            [,1]         [,2]        [,3]
## [1,] 20.5380142 -0.133580580 -1.23243988
## [2,] -0.1335806  0.001178975 -0.00249504
## [3,] -1.2324399 -0.002495040  0.46267904

Testing a single covariate for \hat{\beta}_1

Set up a vector to extract \hat{\beta}_1

c_vec <- matrix(c(0,1,0), nrow=1) 

Extract \hat{\beta}_1

beta_1 <- c_vec %*% beta_vec
beta_1
##           [,1]
## [1,] 0.1255833

Calculate Var(\hat{\beta}_1)

var_beta_1 <- c_vec %*% XtX_inv %*% t(c_vec) * as.numeric(mse)
var_beta_1
##             [,1]
## [1,] 0.001178975

Calculate the t-statistic

t_stat <- beta_1 / sqrt( var_beta_1 )
t_stat
##          [,1]
## [1,] 3.657459

Calculate the p-value for our two-sided t-test

pval <- 2 * (1-pt( t_stat, df = length(Y) - 2 - 1) ) # df = n-p-1
pval
##             [,1]
## [1,] 0.002895789

Putting it all together

We can also calculate a 95% confidence interval based on our estimated variability and tie it all together in our brief, but complete, framework:

lci <- beta_1 - qt(0.975,13)*sqrt(var_beta_1)
uci <- beta_1 + qt(0.975,13)*sqrt(var_beta_1)
c(lci, uci)
## [1] 0.05140441 0.19976212

There is a significant increase in SBP (mmHg) for a one unit increase in weight (oz) (p=0.0029). On average, for a one ounce increase in weight, SBP increases by 0.1255833 mmHg (95% CI: 0.0514044, 0.1997621), when adjusting for age.

Compare results to linear regression output

birth <- data.frame(
  sbp = c(89,90,83,77,92,98,82,85,96,95,80,79,86,97,92,88),
  wgt = c(135,120,100,105,130,125,125,105,120,90,120,95,120,150,160,125),
  age = c(3,4,3,2,4,5,2,3,5,4,2,3,3,4,3,3)
)
mod1 <- lm(sbp ~ wgt + age, data=birth)
summary(mod1)
## 
## Call:
## lm(formula = sbp ~ wgt + age, data = birth)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.0438 -1.3481 -0.2395  0.9688  6.6964 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 53.45019    4.53189  11.794 2.57e-08 ***
## wgt          0.12558    0.03434   3.657   0.0029 ** 
## age          5.88772    0.68021   8.656 9.34e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.479 on 13 degrees of freedom
## Multiple R-squared:  0.8809, Adjusted R-squared:  0.8626 
## F-statistic: 48.08 on 2 and 13 DF,  p-value: 9.844e-07
confint(mod1)
##                   2.5 %     97.5 %
## (Intercept) 43.65964398 63.2407441
## wgt          0.05140441  0.1997621
## age          4.41822526  7.3572130