Introduction

Today, we will work with daily water temperature and air temperature data observed for 31 rivers in Spain. The goal of this tutorial is to identify the best model for predicting the maximum water temperature given the maximum air temperature. In the preview below, W represents the daily maximum water temperature and A represents the daily maximum air temperature. The data contains almost a full year of data for each of the 31 different rivers.

## # A tibble: 6 × 8
##   JULIAN_DAY  YEAR     L     W     A  TIME MONTH   DAY
##        <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1          1  2003   103  14.2  21.2     1     1     1
## 2          2  2003   103  14.4  16.8     2     1     2
## 3          3  2003   103  14.4  15.4     3     1     3
## 4          4  2003   103  10.9  10.8     4     1     4
## 5          5  2003   103  10.8  11.7     5     1     5
## 6          6  2003   103  10.7  12.4     6     1     6

Part 1: Examining the Relationship

Chunk 1: Overall Relationship

Chunk 2: Location-Specific Relationship

Chunk 3: Split Data into Train and Test Sets

set.seed(INTEGER)
TEST.LOCATIONS=sample(x=unique(DATA$L),size=3,replace=F)

TRAIN = anti_join(DATA,tibble(L=TEST.LOCATIONS),by="L")
TEST = semi_join(DATA,tibble(L=TEST.LOCATIONS),by="L")

Chunk 4: Plots of Relationship for Train and Test Data

Part 2: Linear Regression Model

Chunk 1: Fitting Linear Model to Train Data

Chunk 2: Getting Predictions from Linear Model

TRAIN2 = TRAIN %>% add_predictions(linmod,var="linpred")
TEST2 = TEST %>% add_predictions(linmod,var="linpred")

Chunk 3: Getting Residuals from Linear Model

TRAIN3 = TRAIN2 %>% add_residuals(linmod,var="linres")
TEST3 = TEST2 %>% add_residuals(linmod,var="linres")

Part 3: Polynomial Regression Model

Chunk 1: Fitting Polynomial Regression Models

poly2mod=lm(W~A+I(A^2),data=TRAIN)
poly3mod=lm(W~A+I(A^2)+I(A^3),data=TRAIN)
poly4mod=lm(W~A+I(A^2)+I(A^3)+I(A^4),data=TRAIN)
anova(linmod,poly2mod,poly3mod,poly4mod,test="Chisq")

Chunk 2: Getting Predictions from Polynomial Models

TRAIN4 =TRAIN3 %>% 
  add_predictions(poly2mod,var="poly2pred") %>%
  add_predictions(poly3mod,var="poly3pred") %>%
  add_predictions(poly4mod,var="poly4pred")
  
TEST4 =TEST3 %>% 
  add_predictions(poly2mod,var="poly2pred") %>%
  add_predictions(poly3mod,var="poly3pred") %>%
  add_predictions(poly4mod,var="poly4pred")

Chunk 3: Getting Residuals from Polynomial Models

TRAIN5 =TRAIN4 %>% 
  add_predictions(poly2mod,var="poly2pred") %>%
  add_predictions(poly3mod,var="poly3pred") %>%
  add_predictions(poly4mod,var="poly4pred")
  
TEST5 =TEST4 %>% 
  add_predictions(poly2mod,var="poly2pred") %>%
  add_predictions(poly3mod,var="poly3pred") %>%
  add_predictions(poly4mod,var="poly4pred")