Instructions:

The purpose of this mini project is for you to demonstrate that you can build all types of visuals. In each section, I will ask you to build a specific type of visual. I may give you reading material or links to help explain what I want you to do. Make sure you follow my instructions. I expect you to do this assignment on your own without the help of another human or the use of AI tools like ChatGPT. If you get help from another student or use something like ChatGPT on this assignment, you will receive a 0 and be reported.

For this assignment, the dataset we will use is Automobile from the UC Irvine Machine Learning Repository. Read the documentation about the dataset on the website to understand what these different variables are measuring. In the code below, I loaded the data into an object called cars and printed out a preview of the data using the str() function.

cars=read.csv(file="imports-85.data",header=F)
names(cars)=c("Symboling","Normalized_Losses","Make","Fuel_Type","Aspiration","Number_Doors",
              "Body","Drive","Engine_Location","Wheel_Base","Length","Width","Height","Weight","Engine_Type",
              "Cylinders","Engine_Size","Fuel","Bore","Stroke","Comp_Ratio","HP","RPM",
              "City_MPG", "Hwy_MPG","Price")

str(cars)
## 'data.frame':    205 obs. of  26 variables:
##  $ Symboling        : int  3 3 1 2 2 2 1 1 1 0 ...
##  $ Normalized_Losses: chr  "?" "?" "?" "164" ...
##  $ Make             : chr  "alfa-romero" "alfa-romero" "alfa-romero" "audi" ...
##  $ Fuel_Type        : chr  "gas" "gas" "gas" "gas" ...
##  $ Aspiration       : chr  "std" "std" "std" "std" ...
##  $ Number_Doors     : chr  "two" "two" "two" "four" ...
##  $ Body             : chr  "convertible" "convertible" "hatchback" "sedan" ...
##  $ Drive            : chr  "rwd" "rwd" "rwd" "fwd" ...
##  $ Engine_Location  : chr  "front" "front" "front" "front" ...
##  $ Wheel_Base       : num  88.6 88.6 94.5 99.8 99.4 ...
##  $ Length           : num  169 169 171 177 177 ...
##  $ Width            : num  64.1 64.1 65.5 66.2 66.4 66.3 71.4 71.4 71.4 67.9 ...
##  $ Height           : num  48.8 48.8 52.4 54.3 54.3 53.1 55.7 55.7 55.9 52 ...
##  $ Weight           : int  2548 2548 2823 2337 2824 2507 2844 2954 3086 3053 ...
##  $ Engine_Type      : chr  "dohc" "dohc" "ohcv" "ohc" ...
##  $ Cylinders        : chr  "four" "four" "six" "four" ...
##  $ Engine_Size      : int  130 130 152 109 136 136 136 136 131 131 ...
##  $ Fuel             : chr  "mpfi" "mpfi" "mpfi" "mpfi" ...
##  $ Bore             : chr  "3.47" "3.47" "2.68" "3.19" ...
##  $ Stroke           : chr  "2.68" "2.68" "3.47" "3.40" ...
##  $ Comp_Ratio       : num  9 9 9 10 8 8.5 8.5 8.5 8.3 7 ...
##  $ HP               : chr  "111" "111" "154" "102" ...
##  $ RPM              : chr  "5000" "5000" "5000" "5500" ...
##  $ City_MPG         : int  21 21 19 24 18 19 19 19 17 16 ...
##  $ Hwy_MPG          : int  27 27 26 30 22 25 25 25 20 22 ...
##  $ Price            : chr  "13495" "16500" "16500" "13950" ...

Bar Plot for Fuel Type

I want a basic bar plot for the Fuel_Type variable that shows how many observations are in the data for each fuel type. I want the bars to be ordered in descending order. This means that the first bar should be the tallest bar and the bars should decrease in size from left to right. I want the color of the bars to be “maroon”. I want the label on the x-axis to be “Fuel Type”. I want you to choose the “classic” theme built-in to ggplot.

Helpful Links: Link 1

#

Stacked Bar Plot for Body and Drive

I want you to create a bar plot for the Body variable, but I want each bar to be broken up based off the Drive variable. There should be a legend that shows the three levels for Drive (4wd,fwd,rwd). I would like to change the default colors in ggplot and choose three different colors for the three levels of drive. I want you to modify the colors I want you to choose the “bw” theme built-in to ggplot.

Helpful Links: Link 1, Link 2

#

Side-by-Side Bar Plot for Body and Drive

I want you to copy-and-paste your code from the previous graphic here. Then, modify the code so that instead of all the categories in Drive being stacked on top of each other, I want to see separate bars for each Drive. Basically, the Body variable still should be on the x-axis, but there should be separate different colored bars within each Body category. Use the same colors and theme from the previous graphic.

#

Pie Chart for Cylinders

I want you to create a pie chart using functions from ggplot2 only. This means that I don’t want you using the pie() function. If you run the table function on the Cylinders variable, you will find out how many cars their are in each of the 7 different levels. Since their are not many cars with 2,3,8, or 12 cylinders, we will exclude these options in the pie chart.

You will need to start by creating a data.frame (or tibble) that has three rows and two variables. One variable should be named “Cylinders” and have the values “four”,“five”, and “six”. The other variable should be named “Count” and should have the respective count of cars that have the three different cylinder-types. You are creating a frequency table.

Using the data.frame (or tibble) that you created, you can create a pie chart that is partitioned into three sections. There should be a legend with three colors for the three values of “Cylinders”. I also want you to add text to the pie chart that shows the audience exactly how many cars are in each of the different sections.

Helpful Links: Link 1, Link 2

#Cylinder.tabulate = table(cars$Cylinders)

2D Density Plot of Length and Width

Sometimes we want to see the bivariate distribution of two variables. A 2d density plot in R is useful for visualizing the bivariate distribution of two continuous numeric variables. I want you to create a 2d density plot for the bivariate distribution of Length and Width. I want to see in this plot both the area and the contours. I want Length on the x-axis. and Width on the y-axis.

Helpful Links: Link 1

#

Heatmap of Average Highway MPG

Create a data frame (or tibble) that calculates the average Hwy_MPG for different combinations of Body and Number_Doors. I recommend using group_by() and summarize() to calculate the averages for each combination. In your summary table, call the variable “avg_hwy_mpg”. Use the round() function in R after calculating your averages to round to 2 decimal places.

After you have a data frame that contains the 8 different combinations of Body and Number_Doors along with the avg_hwy_mpg for each combination, you can create a heatmap using geom_tile(). In this heatmap, I want Body along the x-axis, and I want Number_Doors along the y-axis. The shading of the tiles should be based off avg_hwy_mpg. I want to see black lines around each box in the grid. I want you to use a color gradient from “red” to “gray” instead of the default. Groups with lower averages should be closer to red than white. I want you to add values to each of the tiles so the audience can see the exact average highway miles per gallon for each group. This helps if someone has problem seeing the color. I want the values to be “white” and I want the size of the font to be “8”.

Helpful Links: Link 1

#

Rubric

Task Points
Bar Plot for Fuel Type 5 Points
Stacked Bar Plot for Body and Drive 5 Points
Side-by-Side Bar Plot for Body and Drive 2 Points
Pie Chart for Cylinders 4 Points
2D Density Plot of Length and Width 3 Points
Heatmap of Average Highway MPG 8 Points