Transmission Effects For Efficient Fuel Usage In Cars

Executive Summary

In this report, we have the effects of manual or automatic transmission for efficient fuel usage in cars. The data we have used comes from the 1974 edition of our Motor Trend Magazine. We have predicted that the weight of the car was a significant confounder in our analysis, and the choice of manual or automatic depends on it.

Initial Data and Pre-Processing

We transform all relevant variables to their corresponding factors and the transmission (am) data to its actual value (0 = automatic, 1 = manual)

mtcars$cyl <- factor(mtcars$cyl)
mtcars$gear <- factor(mtcars$gear)
mtcars$vs <- factor(mtcars$vs)
mtcars$carb <- factor(mtcars$carb)
mtcars$am[mtcars$am =="0"] <- "automatic"
mtcars$am[mtcars$am =="1"] <- "manual"
mtcars$am <- as.factor(mtcars$am)
str(mtcars)

Exploratory Data Analysis

From the exploratory analysis (Figure 1, Appendix) done through scatterplot of all the variables in the dataset we can observe that there is a significant correlation between mpg and the other variables of interest like cyl, disp, hp, draft, wt, vs and am. Note that we are also interested in exploring the relation between the mpg and its effects of car transmission type, we explore from box-and-whisker plot that there is a steady increase in mpg when the transmission for the car used is manual.

Regression Analysis

We build several linear regression models based on factorized variables we preprocessed in the processing step above and try to find out the best model and compare it with the base model using anova. After model selection, we also perform analysis of residuals.

Best Regression Model

In order to choose the best model, we use the stepwise selection (forward, backward, both) using the stepAIC( ) function from the MASS package.

library(MASS)
fit <- lm(mpg~.,data=mtcars)
bestmodel <- stepAIC(fit, direction="both")

The following code chunk shows the best regression model that is obtained by the interactions of several variables. The best model consists of the variables, cyl, wt and hp as confounders and am as the independent variable. As we can see the R-square coefficient of 84% is the value we can obtain by fitting the best model.

 summary(bestmodel)$r.squared
## [1] 0.8659

ANOVA - Analysis of variances

We now derive from the anova for different cases involving a. transmission, b. all variables and c. best fit variable combination of predictors cyl, hp, wt and am.

## Analysis of Variance Table
## 
## Model 1: mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb
## Model 2: mpg ~ am
## Model 3: mpg ~ cyl + hp + wt + am
##   Res.Df RSS  Df Sum of Sq     F  Pr(>F)    
## 1     15 120                                
## 2     30 721 -15      -600  4.99  0.0018 ** 
## 3     26 151   4       570 17.75 1.5e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Note that the smaller values for the p-value in the anova above indicates that the confounding variables cyl, wt and hp are statistically insignificant; and we reject the hypothesis that these variables do not contribute to the accuracy of our model.

Residual Analysis

Please see the residual plots for our chosen regression model. We can conclude the following from the plot:

Diagnostics

We want to identify the impact of an observation on the regression coefficients, and one approach is to consider how much the regression coefficient values change if the observation was not considered.

leverage<-hatvalues(bestmodel)
head(sort(leverage,decreasing=TRUE),3)
##       Maserati Bora Lincoln Continental       Toyota Corona 
##              0.4714              0.2937              0.2778

Please see the appendix section for the cooks distance plot for our chosen regression model.

From the cooks distance plot above we can confirm our analysis was correct, as the same cars are mentioned in the residual plots.

Inference

From the t-test for mpg as the outcome and am as predictor, we clearly see that the manual and automatic transmissions are significatively different.

t.test(mpg ~ am, data = mtcars)$statistic 
##      t 
## -3.767

Conclusions

From our multiple regression analysis above, we conclude the following:

Appendix

Scatterplot matrix of mtcars

 pairs(mpg~ ., data=mtcars)
plot of chunk unnamed-chunk-8

Boxplot of miles per gallon by transmission type

boxplot(mpg ~ am, data = mtcars, col = "red", ylab = "miles per gallon")
plot of chunk unnamed-chunk-9

Residual plot for our regression model

plot(bestmodel)
plot of chunk unnamed-chunk-10
plot of chunk unnamed-chunk-10
plot of chunk unnamed-chunk-10
plot of chunk unnamed-chunk-10

Cooks distance plot for regression model

plot(bestmodel, which=c(4,6))
plot of chunk unnamed-chunk-11
plot of chunk unnamed-chunk-11