The main purpose of this lab is to practice control structures in R:
if
and else
: testing a condition and
acting on itfor
: execute a loop a fixed number of timeswhile
: execute a loop while a condition is truerepeat
: execute an infinite loop (must break out of it
to stop) • break: break the execution of a loopnext
: skip an iteration of a loopYou will need to modify the code chunks so that the code works within
each of chunk (usually this means modifying anything in ALL CAPS). You
will also need to modify the code outside the code chunk. When you get
the desired result for each step, change Eval=F
to
Eval=T
and knit the document to HTML to make sure it works.
After you complete the lab, you should submit your HTML file of what you
have completed to Sakai before the deadline.
Write code that creates a vector x
that contains
100
random observations from the standard normal
distribution (this is the normal distribution with the mean equal to
0
and the variance equal to 1
). Print out only
the first five random observations in this vector.
#
Write code that replaces the observations in the vector
x
that are greater than or equal to 0
with a
string of characters "non-negative"
and the observations
that are smaller than 0
with a string of characters
"negative"
. Hint: try ifelse()
funtion. Print
out the first five values in this new version of x
.
#
Write for
-Loop to count how many observations in the
vector x
are non-negative and how many observations are
negative. (There are many easier ways to solve this problem. Use
for
-Loop or get 0 points. Use the cat()
function to print out a sentence that states how many non-negative and
negative obervations there are. For example, “The number of non-negative
observations is 32”.
#
Create a \(100000\) by \(10\) matrix A
with the numbers
\(1:1000000\). The first row of this
matrix should be the numbers 1 to 10. The second row of this matrix
should be the numbers 11 to 20. Create a for
-loop that
calculates the sum for each row of the matrix and save the results to a
vector sum_row
and print out the first five values of
sum_row
.
A = matrix(1:1000000, COMPLETE) # DO NOT CHANGE
Verify that your results are consistent with what you obtain with the
built-in rowSums
function.
sum_row_rowSums = as.integer(rowSums(A))
sum_row_rowSums[1:5]
Another common loop structure that is used is the while
loop, which functions much like a for
loop, but will only
run as long as a test condition is TRUE
. Modify your
for
loop from the previous exercise and make it into a
while
loop. Use the identical()
function to
check if the results from the for
loop are the same as the
results from while
loop.
#
Write a for
loop to compute the mean of every column in
mtcars
and save the results to a vector
col_mean
. Ignore missing values when taking the mean.
#
Compute the number of unique values in each column of
iris
and print the results during a loop. Use the
cat()
function to print out the values in a sentence with
the corresponding name of the variable. For example, “The number of
unique values for Sepal.Length is 35”.
names(iris) #DO NOT CHANGE
## [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
In this lab, you will build predictive models for board game ratings. The dataset below was scraped from boardgamegeek.com and contains information on the top 4,999 board games. Below, you will see a preview of the data
bgg<-read.csv("bgg.csv")
bgg2=bgg[,c(4:13,15:20)]
head(bgg2)
## names min_players max_players
## 1 Gloomhaven 1 4
## 2 Pandemic Legacy: Season 1 2 4
## 3 Through the Ages: A New Story of Civilization 2 4
## 4 Terraforming Mars 1 5
## 5 Twilight Struggle 2 2
## 6 Star Wars: Rebellion 2 4
## avg_time min_time max_time year avg_rating geek_rating num_votes age
## 1 120 60 120 2017 8.98893 8.61858 15376 12
## 2 60 60 60 2015 8.66140 8.50163 26063 13
## 3 240 180 240 2015 8.60673 8.30183 12352 14
## 4 120 120 120 2016 8.38461 8.19914 26004 12
## 5 180 120 180 2005 8.33954 8.19787 31301 13
## 6 240 180 240 2016 8.47439 8.16545 13336 14
## mechanic
## 1 Action / Movement Programming, Co-operative Play, Grid Movement, Hand Management, Modular Board, Role Playing, Simultaneous Action Selection, Storytelling, Variable Player Powers
## 2 Action Point Allowance System, Co-operative Play, Hand Management, Point to Point Movement, Set Collection, Trading, Variable Player Powers
## 3 Action Point Allowance System, Auction/Bidding, Card Drafting
## 4 Card Drafting, Hand Management, Set Collection, Tile Placement, Variable Player Powers
## 5 Area Control / Area Influence, Campaign / Battle Card Driven, Dice Rolling, Hand Management, Simultaneous Action Selection
## 6 Area Control / Area Influence, Area Movement, Dice Rolling, Hand Management, Partnerships, Variable Player Powers
## owned
## 1 25928
## 2 41605
## 3 15848
## 4 33340
## 5 42952
## 6 20682
## category
## 1 Adventure, Exploration, Fantasy, Fighting, Miniatures
## 2 Environmental, Medical
## 3 Card Game, Civilization, Economic
## 4 Economic, Environmental, Industry / Manufacturing, Science Fiction, Territory Building
## 5 Modern Warfare, Political, Wargame
## 6 Fighting, Miniatures, Movies / TV / Radio theme, Science Fiction, Wargame
## designer weight
## 1 Isaac Childres 3.7543
## 2 Rob Daviau, Matt Leacock 2.8210
## 3 Vlaada Chvátil 4.3678
## 4 Jacob Fryxelius 3.2456
## 5 Ananda Gupta, Jason Matthews 3.5518
## 6 Corey Konieczka 3.6311
There are 16 variables and we want to create some more. Create a new dataframe called \(bgg3\) where you use the mutate function to create the following variables:
head(bgg3)
Question: In complete sentences, what is the purpose of adding 1 for the log transformed variables?
YOUR ANSWER IN COMPLETE SENTENCES
Question: In complete sentences, what is the purpose of adding 1 in the creation of the year variable?
YOUR ANSWER IN COMPLETE SENTENCES
We hypothesize the geek rating increases when the number of votes increases and/or the ownership increases. Create four scatter plots showing the association with geek_rating and the following variables:
Question: In complete sentences, describe how the relationship changes when you take the log of the independent variable.
YOUR ANSWER IN COMPLETE SENTENCES
Randomly sample approximately 80% of the data in bgg3
for a training dataset and the remaining will act as a test set. Call
the training dataset train.bgg
and the testing dataset
test.bgg
.
set.seed(COMPLETE)
bgg4= bgg3 %>%
mutate(Set=sample(COMPLETE))
train.bgg<-filter(bgg4,Set=="Train")
test.bgg<-filter(bgg4,Set=="Test")
Now, we want to fit models to the training dataset. Use the
lm()
function to create 3 model objects in R called
lm1
, lm2
, lm3
based on the
following linear models, respectively:
lm1 = lm(COMPLETE,data=train.bgg)
lm2 = lm(COMPLETE,data=train.bgg)
lm3 = lm(COMPLETE,data=train.bgg)
Add predictions and residuals for all 3 models to the test set.
Create a new data frame called test.bgg2
and give all your
predictions and residuals different names. Use the str()
function to show these variables were created
str(test.bgg2)
Create a function called MAE.func()
that returns the
mean absolute error based on a vector of the residuals and test your
function on the vector called test
.
Solution 1:
test=c(-5,-2,0,3,5)
MAE.func(test)
Use your function on the test.bgg2
to calculate the
out-of-sample MAE of all three models based on the associated residuals.
Make sure you display the mean absolute error from these different
models in your output.
Question: Which model does the best job at predicting the geek rating of these board games?
YOUR ANSWER IN COMPLETE SENTENCES
For the third model only, use 10-fold cross-validation and measure the out-of-sample mean absolute error. Print out the final cross-validated mean absolute error.
Question: What is the absolute difference between the out-of-sample mean absolute error measured using a test set and the mean absolute error measured using cross validation? When you type your answer in complete sentences use inline R code to calculate the absolute difference and input it directly into your sentence.
YOUR ANSWER IN COMPLETE SENTENCES