iT IS HAPPEN WHEN MORE THAN ONE POSSIBLE PREDICTOR VARIABLE.

including more than one independent variable in the regression model, makes us extend the simple linerar regression model to a multiple linear regression model.

Advantages:

Relationship between response variables and several predictors simultaneously.

Disadvantages:

Model building , interpration difficulties due to complexity.

Multiple linear regression with two predictors:

Y=beta0+beta1X1+beta2X2+epsylon

where, Y is the dependent variable.

X1,X2…Xk are predictors(independent variables)

Epsylon is the random error

beta1, beta2, beta0 are unknown regression coefficients

Example=> oil consumption:

Y=oil consumption(per month)

X1=outdoor temperature

X2=size of house(in meter square)

Model:

Y=beta0+beta1X1+beta2X2+epsylon

now beta1 is expected change in Y(oil consiumption) at one unit increase in X1(outdoor temperature), when all other predictors are kept constant, i.e. in this case the size of the house is not changed.

beta1 is estimated with beta1=-27.2 degree C

Assumptions:

The random error term epsylon is normally distributed and has mean zero. i.e. E(epsylon)=0

Epsylon has (unknown) variance sigma epsylon^2. i.e. all random errors have the same variance.

Adjusted R^2

R^2adj=1- SSE/(n-k-1)/SST/(n-1)

As for simple linear regression:

plots of residual against y prime

plots of residuals against xi

normal probability plot of residuals

plots of residuals in observation order

Cook’s distance

Studentized residuals

Standardized residuals

Dffits

**Collinearity:**

Can only occur for multiple regression.

Predictors explaining the same variation of the response variabl.

**Oil consumption continued:**

One predictor measuring house size in cm^2 and another predictor in m^2

Variance inflation factor

VIFi=1/1-Ri^2

Condition Index for collinearity:

between 10 and 30=>weak collinearity

between 30 and 100=>moderate

collinearity>100=>strong collinearity

Example of Oil consumption continued:

Assume that we would like to use outdoor temperature X1 and house size X2 as predictors. Additionally, we want to use a third predictor:

X3={1 if extra-thick walls, 0 otherwise

Model:

Y=beta0+beta1X1+beta2X2+beta3X3+epsylon

Model Selection Strategies:

Mldel ranked using R^2, adjusted R^2 or mallow’s Cp

Stepwise selection methods:

Backward, forward, stepwise selection

r^2 Selection

In a data set with 7 possible predictors, there would be 2^7-1=127 possible regression models.

For every model size(k=1,2,…..,p) look at, let say, m models, chosen

Mallow’s Cp:

Large Cp=>biased model

it’s a formula.

where MSEp=mean squared error for a model with p parametes

mean squared error for the full model

n=number of observations