The purpose of this mini project is for you to demonstrate that you can build all types of visuals. In each section, I will ask you to build a specific type of visual. I may give you reading material or links to help explain what I want you to do. Make sure you follow my instructions. I expect you to do this assignment on your own without the help of another human or the use of AI tools like ChatGPT. If you get help from another student or use something like ChatGPT on this assignment, you will receive a 0 and be reported.
For this assignment, the dataset we will use is Automobile
from the UC Irvine Machine
Learning Repository. Read the documentation about the dataset on the
website to understand what these different variables are measuring. In
the code below, I loaded the data into an object called
cars and printed out a preview of the data using the
str() function.
cars=read.csv(file="imports-85.data",header=F)
names(cars)=c("Symboling","Normalized_Losses","Make","Fuel_Type","Aspiration","Number_Doors",
"Body","Drive","Engine_Location","Wheel_Base","Length","Width","Height","Weight","Engine_Type",
"Cylinders","Engine_Size","Fuel","Bore","Stroke","Comp_Ratio","HP","RPM",
"City_MPG", "Hwy_MPG","Price")
str(cars)
## 'data.frame': 205 obs. of 26 variables:
## $ Symboling : int 3 3 1 2 2 2 1 1 1 0 ...
## $ Normalized_Losses: chr "?" "?" "?" "164" ...
## $ Make : chr "alfa-romero" "alfa-romero" "alfa-romero" "audi" ...
## $ Fuel_Type : chr "gas" "gas" "gas" "gas" ...
## $ Aspiration : chr "std" "std" "std" "std" ...
## $ Number_Doors : chr "two" "two" "two" "four" ...
## $ Body : chr "convertible" "convertible" "hatchback" "sedan" ...
## $ Drive : chr "rwd" "rwd" "rwd" "fwd" ...
## $ Engine_Location : chr "front" "front" "front" "front" ...
## $ Wheel_Base : num 88.6 88.6 94.5 99.8 99.4 ...
## $ Length : num 169 169 171 177 177 ...
## $ Width : num 64.1 64.1 65.5 66.2 66.4 66.3 71.4 71.4 71.4 67.9 ...
## $ Height : num 48.8 48.8 52.4 54.3 54.3 53.1 55.7 55.7 55.9 52 ...
## $ Weight : int 2548 2548 2823 2337 2824 2507 2844 2954 3086 3053 ...
## $ Engine_Type : chr "dohc" "dohc" "ohcv" "ohc" ...
## $ Cylinders : chr "four" "four" "six" "four" ...
## $ Engine_Size : int 130 130 152 109 136 136 136 136 131 131 ...
## $ Fuel : chr "mpfi" "mpfi" "mpfi" "mpfi" ...
## $ Bore : chr "3.47" "3.47" "2.68" "3.19" ...
## $ Stroke : chr "2.68" "2.68" "3.47" "3.40" ...
## $ Comp_Ratio : num 9 9 9 10 8 8.5 8.5 8.5 8.3 7 ...
## $ HP : chr "111" "111" "154" "102" ...
## $ RPM : chr "5000" "5000" "5000" "5500" ...
## $ City_MPG : int 21 21 19 24 18 19 19 19 17 16 ...
## $ Hwy_MPG : int 27 27 26 30 22 25 25 25 20 22 ...
## $ Price : chr "13495" "16500" "16500" "13950" ...
I want a basic bar plot for the Fuel_Type variable that shows how many observations are in the data for each fuel type. I want the bars to be ordered in descending order. This means that the first bar should be the tallest bar and the bars should decrease in size from left to right. I want the color of the bars to be “maroon”. I want the label on the x-axis to be “Fuel Type”. I want you to choose the “classic” theme built-in to ggplot.
Helpful Links: Link 1
#
I want you to create a bar plot for the Body variable, but I want each bar to be broken up based off the Drive variable. There should be a legend that shows the three levels for Drive (4wd,fwd,rwd). I would like to change the default colors in ggplot and choose three different colors for the three levels of drive. I want you to modify the colors I want you to choose the “bw” theme built-in to ggplot.
#
I want you to copy-and-paste your code from the previous graphic here. Then, modify the code so that instead of all the categories in Drive being stacked on top of each other, I want to see separate bars for each Drive. Basically, the Body variable still should be on the x-axis, but there should be separate different colored bars within each Body category. Use the same colors and theme from the previous graphic.
#
I want you to create a pie chart using functions from
ggplot2 only. This means that I don’t want you using the
pie() function. If you run the table function on the
Cylinders variable, you will find out how many cars
their are in each of the 7 different levels. Since their are not many
cars with 2,3,8, or 12 cylinders, we will exclude these options in the
pie chart.
You will need to start by creating a data.frame (or
tibble) that has three rows and two variables. One variable
should be named “Cylinders” and have the values “four”,“five”, and
“six”. The other variable should be named “Count” and should have the
respective count of cars that have the three different cylinder-types.
You are creating a frequency table.
Using the data.frame (or tibble) that you
created, you can create a pie chart that is partitioned into three
sections. There should be a legend with three colors for the three
values of “Cylinders”. I also want you to add text to the pie chart that
shows the audience exactly how many cars are in each of the different
sections.
#Cylinder.tabulate = table(cars$Cylinders)
Sometimes we want to see the bivariate distribution of two variables. A 2d density plot in R is useful for visualizing the bivariate distribution of two continuous numeric variables. I want you to create a 2d density plot for the bivariate distribution of Length and Width. I want to see in this plot both the area and the contours. I want Length on the x-axis. and Width on the y-axis.
Helpful Links: Link 1
#
Create a data frame (or tibble) that calculates the average
Hwy_MPG for different combinations of
Body and Number_Doors. I recommend
using group_by() and summarize() to calculate
the averages for each combination. In your summary table, call the
variable “avg_hwy_mpg”. Use the round() function in R after
calculating your averages to round to 2 decimal places.
After you have a data frame that contains the 8 different
combinations of Body and Number_Doors
along with the avg_hwy_mpg for each combination, you
can create a heatmap using geom_tile(). In this heatmap, I
want Body along the x-axis, and I want
Number_Doors along the y-axis. The shading of the tiles
should be based off avg_hwy_mpg. I want to see black
lines around each box in the grid. I want you to use a color gradient
from “red” to “gray” instead of the default. Groups with lower averages
should be closer to red than white. I want you to add values to each of
the tiles so the audience can see the exact average highway miles per
gallon for each group. This helps if someone has problem seeing the
color. I want the values to be “white” and I want the size of the font
to be “8”.
Helpful Links: Link 1
#
| Task | Points |
|---|---|
| Bar Plot for Fuel Type | 5 Points |
| Stacked Bar Plot for Body and Drive | 5 Points |
| Side-by-Side Bar Plot for Body and Drive | 2 Points |
| Pie Chart for Cylinders | 4 Points |
| 2D Density Plot of Length and Width | 3 Points |
| Heatmap of Average Highway MPG | 8 Points |