Overview

Survey the application of statistics and mathematics to the sports industry exploring the history of analytics across various sports, understanding the advantages of sports analytics for both on-field performance and off-field business decisions, and examining current research to encourage creative thought of future development.

The course will be organized by sport to ensure a comprehensive exploration of sports analytics. We will start with in-depth studies in Baseball, Basketball, American Football, Hockey, and Soccer. Later in the course, we will branch out according to the diverse interests of the class.

The statistical programming language R will be used to generate visualizations and perform basic modeling.

  • Head Coach: Mario Giacomazzo

  • Personal Trainer: Kendall Thomas

  • Syllabus: Section 1

  • Game Days: TTh, 12:30PM - 1:45PM, Gardner 105

  • Office Hours:

    • Press Conference (Dr. Mario): W, 9AM - 11AM & 12PM - 2PM, Hanes 134
    • Training Day (Kendall): M, 1PM - 3PM, Hanes B-30
  • University Approved Absences: Online Form

Playbook

Required

  • Mathletics (2nd Edition), Winston Et al., 2022, ISBN: 9780691177625 (MATH)

  • Handbook of Statistical Methods and Analyses in Sports, Albert, Glickman, Et al., 2017, ISBN: 9781498737364 (HSMAS)

Optional

  • Analytic Methods in Sports, Severini, 2015, ISBN: 9781482237016 (AMS)

  • Sports Analytics: A Guide for Coaches, Managers, and Other Decision Makers, Alamar, 2013, ISBN: 9780231162920 (SPAN)

  • Practicing Sabermetrics: Putting the Science of Baseball Statistics to Work, Costa et Al., 2009, ISBN: 9780786441778 (PS)

  • Analyzing Baseball Data with R, Marchi and Albert, 2014, ISBN:9781466570238 (ABDR)

  • Sports Math: An Introductory Course in the Mathematics of Sports Science and Sports Analytics, Minton, 2017, ISBN:9781498706261 (SM)

  • Introduction to NFL Analytics with R, Congelio, 2023, ISBN:9781032427751 (INAR)

Film Sessions

First Half

Date Lecture Material Reading
JAN 11 Syllabus Survey
JAN 16 Sports Analytics I Slides Ch. 1 SPAN, Web 1, Web 2, Web 3, Web 4, Web 5, Web 6, Web 7, Web 8
JAN 18 Sports Analytics II Slides Ch. 1-7 SPAN, Web 1
JAN 23 Sports Analytics III Slides Web 1, Web 2, Web 3, Web 4, Web 5, Web 6, Web 7
Sports Analytics IV Slides Web 1, Web 2, Web 3, Article 1
JAN 25 Sports Analytics IV (Cont.) Slides Web 1, Web 2, Web 3, Article 1
Baseball I Slides Web 1, Web 2, Article 1
JAN 30 Baseball I (Cont.) Slides Web 1, Web 2, Article 1
Baseball II Slides Ch.1-2 MATH, Ch.7 SM
FEB 1 Baseball II (Cont.) Slides Ch.1-2 MATH, Ch.7 SM
Baseball III Slides Ch. 3 MATH
FEB 6 Baseball III (Cont.) Slides Ch. 3 MATH
Playoff Round 1 Example
FEB 8 Baseball IV Slides Ch. 4-5 MATH, Web 1, Web 2
FEB 13 Well-Being No Class
FEB 15 Baseball IV (Cont.) Slides Ch. 4-5 MATH, Web 1, Web 2
FEB 20 Baseball V Slides Ch. 6 MATH, Web
FEB 22 Baseball VI Slides Ch. 8-9 MATH, Web 1, Web 2,Web 3, Web 4, Article 1
FEB 27 Baseball VII Slides Ch. 11,16 MATH, Article 1, Article 2
FEB 29 Baseball VII (Cont.) Slides Ch. 11,16 MATH, Article 1, Article 2

Second Half

Date Lecture Material Reading
FEB 29 Basketball I Slides Ch. 28-29 MATH, Web 1 Web 2, Web 3
MAR 5 Basketball I (Cont.) Slides Ch. 28-29 MATH, Web 1 Web 2, Web 3
Playoffs Round 2
MAR 7 Playoffs Round 2 (Cont.)
Basketball II Slides Ch. 30 MATH, Web 1, Web 2, Article 1
MAR 12 Spring Break No Class
MAR 14 Spring Break No Class
MAR 19 Basketball II (Cont.) Slides Ch. 30 MATH, Web 1, Web 2, Article 1
MAR 21 Basketball II (Cont.) Slides Ch. 30 MATH, Web 1, Web 2, Article 1
Basketball III Slides Ch. 32-33 MATH, Web 1, Web 2, Web 3
Basketball IV Slides Ch. 31,34 MATH, Web 1
MAR 28 Well-Being No Class
APR 2 Work on Playoff Rd. 2 No Class
APR 4 Basketball IV (Cont.) Slides Ch. 31,34 MATH, Web 1
Basketball V Slides Ch. 35 MATH, Article 1
APR 9 Work on Playoff Rd. 2 No Class
APR 11 Basketball VI Slides Ch. 38 MATH, Web 1, Web 2
Article 1, Article 2, Article 3
APR 16 Football I Slides Ch. 18 MATH
Football II Slides Ch. 19 MATH, Web 1, Web 2, Web 3, Article 1
APR 18 Football III Slides Ch. 20 MATH, Web 1, Article 1, Article 2, Article 3
APR 23 Football III (Cont.) Slides Ch. 20 MATH, Web 1, Article 1, Article 2, Article 3
Football IV Slides Ch. 21 MATH, Web 1, Article 1, Article 2
APR 25 Football V Slides Ch. 22 MATH, Web 1, Web 2, Web 3
Football VI Slides Ch. 25-26 MATH, Web 1, Web 2
Article 1, Article 2
APR 30 Work on Championship No Class

Fundamentals of Sports Analytics

In this class, your performance will be graded using six different assessments: attendance (5%), practice (5%), gameday speeches (10%), regular season (20%), playoffs (40%), and championship (20%). For the gameday speeches, playoffs, and the championship, you will be randomly assigned to a team. This class is a team sport.

Attendance

Attendance will be taken every class using the UNC Check-in App. You will need to install the UNC Check-in app to your mobile device and bring it to every class. Starting at the beginning of class, you will have 15 minutes to check-in using the mobile app. Instructions for installing and using the UNC Check-in App are available at https://unccheckin.unc.edu/. Go to this website to learn more. You need to attend at least 70% of the lectures to get credit for attendance, otherwise you will receive a 0 for your attendance grade. If you need to miss class for a reason permitted by the university and you don’t want to be penalized, you will need to get a university approved absence at https://uaao.unc.edu/. If you cannot get a university approved absence and don’t want to be penalized, you must notify your instructor of the reason and provide documentation of the reason in email. The reason should line up with UNC’S definition of a university approved absence. For example, a job interview would not be approved by the university or me.


Your best ability is availability.

~ George Kittle et. al.


Practice

These will be assignments requiring you to use R or Python to perform data science tasks. Data cleaning, data visualization, and data modeling skills are required for this class and these things will not be taught in this class. These assignments are designed to ensure you have the prerequisite skills necessary for completing group projects or at minimum, have the determination to get these skills. There will be at least one of these assignments during the semester, and you will be required to work alone on these assignments. If you get any help from your classmates or other human beings, you will get a 0 and be reported to the university. The reason I want you to complete these assignments on your own is because you will be in group projects, and your team will need you to have these skills to perform at a high level. Turning in these assignments late will come with a 25% penalty for each day late so make sure these assignments are submitted on Canvas before the due date.


Practice makes perfect.

~ Bruce Lee et. al.


Gameday Speech

Gameday speeches are to be done in teams. Biweekly, I will give you two journal articles from a refereed journal to read and summarize in a 7 slide gameday speech. Which article you read will depend on whether your group number is odd or even.

  • Slide 1: Title of the article, the names of the author(s), and the name(s) of the presenters.

  • Slide 2-3: Summarize the overarching themes of the article(s). Discuss the research goal(s) of the paper in at least 6 bullet points.

  • Slide 4-6: Talk about the methodology. What did the author(s) do to answer their research goal? Discuss the methodology in at least 8 bullet points.

  • Slide 7: What did you like? What did you find confusing? What did you find problematic? What could the author(s) have done better? Give at least 2 positive opinions of the paper and at least 1 negative critique.

On gameday, I will use a random number generator to pick 2 groups (1 odd and 1 even) to present in 5 to 8 minutes. All groups will be graded based on the criteria, but only 2 groups will present. This will be followed by an in-class discussion.

The presentation should be submitted on Canvas before class starts on the due date. Each group should have their own presentation, but needs to be submitted by every member of the group as a pdf. Also, each group member can assess the contribution value of the other members of the team on a scale from 0 (Bad) to 3 (Excellent). Fill out the appropriate Google Form in the section below called Team Sport to turn in your ratings of your group members. Your value will be determined by the average score of the other members in your group.

Gameday speeches are worth a total of 21 points minor exceptions:

  • The two groups that present will have the chance to get 0 to 3 bonus points based on presentation quality.

  • If your group presents and you are not present in class, you will receive a 3 point penalty and not get the bonus points awarded to your group for the presentation.

Consider the following rubric and notice the bonus points for the lucky presenting groups:

Criteria 0 1 2 3
Slide 1 Missing All Components Missing 2 Components Missing 1 Component Followed Directions
Slides 2-3 1-2 Bullet Points 2-3 Bullet Points 4-5 Bullet Points At least 6 Bullet Points
Slides 4-6 1-3 Bullet Points 4-6 Bullet Points 5-7 Bullet Points At least 8 Bullet Points
Slide 7 Missing All Three Missing Two Missing One 2 Positive Opinions and 1 Critique
Spelling/Grammar >5 Errors 3-5 Errors 1-2 Errors No Errors
Value Bad Okay Good Excellent
Submitted Late Not a PDF On Time
Bonus (Presenting Groups) Not Prepared Prepared (Reading Slides) Semi-Prepared (Mediocre Creativity) Well-Prepared and Creative

And just when you think they are about to break apart, Ducks fly together.

~ Gordon Bombay


Reg. Season

The regular season consists of biweekly quizzes on the material presented in class over the previous two weeks. This includes what was taught in lecture, what is in the reading, and what was presented during gameday speeches. You will be given at least 15 minutes to complete the quiz.


There’s two times of year for me: Football season, and waiting for football season.

~ Darius Rucker


Playoff Rd. 1

The first round will be a data gathering and summary report. Imagine you are an analyst working for the coach or athlete and want to discover insights that would bring a competitive edge. In a world saturated with data, the way to ensure your analysis is unique and creative is to get the data yourself. To successfully conduct your study, you should follow the steps below:

  1. Decide on What Area of Sports You Would Like to Analyze
    • This Can Be Any Traditional Sport (Baseball, NASCAR, Soccer, Volleyball, Tennis, etc.)
    • This Can Be A Less Traditional Sport (Cornhole, Climbing, Drone Racing, etc.)
    • This Can Be A Video Game that Is Played Competitively
    • This Can Be A Board Game that Is Played Competitively
  2. Critically Think About What Questions/Opinions That You or Society Have About the Sport
    • What Statistics or Metrics Can Be Used to Describe Success
    • What Information Contributes to Success
  3. Plan the Study After Discussing the Sport With Your Team
    • Choose the Level at Which to Gather Data (Team, Player, Game, Play/Event, Time, etc.)
    • Choosing Levels at Higher Resolutions Make it Easier to Gather Enough Data
    • Identify at Least 5 Variables You Want to Gather at the Specified Level
    • Variables Can Be All Categorical, All Numeric, or Mixture
    • You Can Have More Than 5 Variables, but At Least 5 Need to Be Gathered Through Observing the Sport Being Played
    • Focus on Variables that are Not Already Being Constantly Tracked to Add Creativity to Your Project
  4. Watch or Play the Sport Enough Times to Get a Minimum of 50 Observations per Person in Group
    • Groups of 5 Need a Sample Size of 250
    • Groups of 6 Need a Sample Size of 300
    • Depends on the Level at Which You’re Gathering Data
    • Example: If the Level is Game, You Would Need to Watch or Play the Sport at Least 50 Times
    • Example: If the Level is Play/Event, You Would Need to Watch Enough to Gather Data for 50 Plays or Events
    • Example: If the Level is Time (minutes), You Would have to Gather Information for A Minimum of 50 Minutes
    • If Your Group is Tracking Something Complex and Desire Less Observations Per Person, have a Discussion with Dr. Mario for Approval.
  5. As a Team, Track the Data in a Spreadsheet
    • All Data Must Be Recorded in the Same Electronic Spreadsheet (Excel,CSV, etc.)
    • Each Row Should Be a Different Observation According to the Level Selected
    • Can Be Recorded on Paper Separately, But Must Be Compiled Electronically Into a Single Spreadsheet
    • Each Group Member is Expected to Participate in the Data Collection and if Group Member Refuses or is Unable to Participate in the Data Collection by February 22, then I Will Remove that Group Member and Require that Individual to Collect 250 Observations and Write Their Own Paper

Your study should be summarized in a paper with at least 4 pages (a minimum of 2 pages worth of writing). The paper and dataset should be submitted on Canvas before 11:59PM on the due date. Submit a PDF for the paper and a CSV for the dataset. Each group should have their own paper and data, but both need to be submitted by every member of the group. Also, each group member can assess the contribution value of the other members of the team on a scale from 0 (Bad) to 3 (Excellent). Fill out the appropriate Google Form in the section below called Team Sport to turn in your ratings of your group members. Your value will be determined by the average score of the other members in your group.

On the first page, you should title your paper and give the names of the team members who contributed. The content of the paper should be organized in the following 3 subsections:

1) Introduction to the Data

In this section, you should discuss three things.

First, you should briefly discuss the sport/game your team selected and the overall purpose of analyzing data for this sport/game. For example, what did you hypothesize or hope to find from studying the sport/game? Provide citations to any articles or websites that provided you inspiration for your ideas. You should look into what research has been done in this sport that is connected to what you plan on doing with the data you collect.

Second, you should give a preview of your data showing at least the first 5 rows of your dataset. There should be at least 5 columns for each of the variables you measured on each observation. Give your variables clear names so anyone could understand what you measured, and make the table aesthetically pleasing (rounding, colors, etc.). It is okay to abbreviate variable names (e.g. R/G) if you identify the abbreviations (e.g Runs (R) and Games (G)) somewhere in the document. At least 5 variables need to be collected through observation. You can collect variables like height and birthplace of athletes, but these are not going to be variables you observe through watching the sport.

Third, you should briefly describe your data. When did you gather your data? Where did you gather your data? How did you gather your data (Watch or Play Game)? What does each observation (i.e. row in data) represent and how many observations do you have? Describe the five variables and how they were measured. I want a clear description of how the data was collected, what role each member on your team played in the gathering of the data, and information about the contents of the raw data.

2) Summary of the Data

In this section, I want a table(s) and two figures summarizing the data. Each table and each figure you give should have a couple sentences describing the information summarized.

The table(s) should be the same style as the table in the introduction (not different colors, fonts, rounding, etc.). The table(s) should efficiently summarize the five variables using popular statistics. If the variable is numeric (continuous or discrete), you should give the minimum, maximum, mean, and standard deviation, at least. If the variable is categorical, you should give the possible values with frequencies (counts) and relative frequencies (%). If you have a combination of categorical and numeric, I advise doing separate tables which will count as a single table. All tables should be formatted similarly.

Then, you should have two figures summarizing relationships between the variables you selected. In each figure, at least 2 variables, should be summarized. Appropriate axis names and scales should be used. Fonts should be large enough to read. These figures should be chosen with a purpose to later describe insights you learned from analyzing the data.

Following all instructions gets you two points. The last point is reserved for creativity, design, and over-achievement. This will be determined by comparing what you do to what the other teams do. I reward you for taking risks that that lead to better results then I require. For example in the table(s), calculating confidence intervals for numeric variables, creating contingency tables, or summarizing numeric variables for different subgroups from categorical variables. For example in the figures, tile, 3D, or map plots showing relationships across multiple variables. Also, I don’t mind if you give more than two figures or put multiple figures together in a grid. If you create a ton of figures, I will grade your two worst figures so make sure each figure is valuable and worth discussing.

3) Insights from the Data

The content in the previous two sections should be used to support two insights. An insight is a deeper understanding you gained about the sport from gathering and summarizing data. The insights should be connected to the purpose you outlined in the “Introduction to the Data” section and defended from the tables/figures in the “Summary of the Data” section. I recommend giving your figures and tables numbers so you can reference them in this section (e.g. Table 1, Figure 2). Write, at least, one paragraph for each insight. What were the two most interesting things you learned from the data? If you don’t reference statistics or figures you created, you will lose two points, at least. To get full credit, I advise supplementing your insights with p-values from appropriate hypothesis tests (t-test, anova, regression, difference in proportions, independence tests, etc.) or provide confidence intervals (means, proportions, etc.). Excellent insights should not be obvious but lead to innovations that would give a decision maker or athlete the competitive edge. Tests for statistical significance are the strongest way to defend your arguments. This section should be written so someone with a technical/statistical background would understand your methodology and someone without a technical/statistical background would understand your conclusions.

Also, in this section, after discussing your insights, you should write, at least, one paragraph where you critique what your group would have done differently. Would you have gathered another variable which could potentially be a confounding variable? Would you have measured a particular variable differently? What other information that you did not gather would have made this study better or have been of use? No study is done perfect and there are always aspects that could be improved. After I read your entire paper, I will take off a point for every single thing that I think you should have done differently that you did not highlight in this section. If there is a criticism that I incorrectly identify because you are extremely vague in the description of your study or in the summary of your insights, then you are at fault. Be very clear in describing all aspects of the data and method of collection in the first 2 sections.

Rubric

The first playoff round is worth a total of 45 points based on the following rubric:

Criteria 0 1 2 3
Title Page Instructions Not Followed Missing Entire Element Missing a Team Member Title+Team Members
Introduction: Purpose Not Innovative or Clearly Described Not Innovative and/or Not Clearly Described Somewhat Innovative and Clearly Described Innovative and Clearly Described
Introduction: Preview Not Attempted Missing Rows or Columns 5 Rows and 5 Columns Outstanding
Introduction: Description Not Attempted Missing Key Information Answers Questions But Poorly Written Addresses All Questions Well
Summary: Table Missing More Than 1 Statistic Missing 1 Statistic Followed Instructions Excellent
Summary: Figure 1 Not Attempted Not 2 Variables or Messy 2 Variables and Clear Excellent
Summary: Figure 2 Not Attempted Not 2 Variables or Messy 2 Variables and Clear Excellent
Summary: Table/Figure Descriptions Not Attempted Missing Important Descriptions Some Descriptions Unclear Everything Clearly Described
Insights: First Insight Not Attempted Mediocre/Weak Reference Summary Section Excellent
Insights: Second Insight Not Attempted Mediocre/Weak Reference Summary Section Excellent
Insights: Done Differently Missing 3 Obvious Things Missing 2 Obvious Things Missing 1 Obvious Thing Got Everything
Instructions Followed Organized Incorrectly Organized Correctly without Section Headings Organized Correctly with Section Headings
Spelling/Grammar >5 Errors 3-5 Errors 1-2 Errors No Errors
Value Bad Okay Good Excellent
Submitted Late or Inadequate Data Not a PDF On Time

I’m not looking for home runs, I’m looking for the playoffs.

~ Sammy Sosa


Playoff Rd. 2

The second round will be a predictive modeling project for actual NBA games. The primary goal of this project is to design models for prediction of three variables – \(Spread\), \(Total\), and \(OREB\). Below you can find clear definitions of these three outcome variables: –>

  • \(Spread = \textrm{Home Points} - \textrm{Away Points}\)

  • \(Total = \textrm{Home Points} + \textrm{Away Points}\)

  • \(OREB = \textrm{Home OREB} + \textrm{Away OREB}\)

It is imperative that you follow these specifications. Your group will be making predictions of the three variables for all NBA games between April 9 and April 14, inclusively. Your predictions should be saved in the dataset called Predictions. Here you will find missing values where future predictions will be placed. This completed file should be submitted along with a paper summarizing your methodology. You will not only be graded by your methodology, but also by your predictive accuracy. The variables, \(Spread\), \(Total\), and \(OREB\) will all be evaluated by mean absolute error (MAE). For each of the variables, the top 6 groups will get 3 points, the middle 6 groups will get 2 points, and the bottom 6 groups will get 1 point. All three variables are numeric. If you fail to get predictions or the predictions are not numeric values, you will get 0 points.

To build adequate predictive models, historical data is needed for training and testing. Kendall Thomas use the nba_api in Python to scrape 2023 box scores. Kendall has provided the python code and data on github. You can also find historical data (2003-2022) from Nathan Lauga on Kaggle. Nathan Lauga webscraped this data directly from NBA.com. I downloaded these datasets last year to my github. These datasets represent your starting point. In this project, you are required to engineer new variables and use outside data. This is highly recommended to gain a competitive edge in the sports betting market.

For the engineering of new variables, consider creating differences and ratios between the stats for the home and away teams. Also, it may be useful to create variables that represent past information such as moving averages or lagged variables. You should be able to explain and defend the variables you create.

For the use of outside data, research other variables that could be important for prediction of the three variables. You must find data from other online sources that are not part of Nathan Lauga’s Kaggle Data. This could be injury data, advanced metrics on players, play-by-play data, or recent data. I am grading you on your creativity so utilizing recent data for your outside data requirement would be helpful for predicting, but would definitely be less creative than other options. Below is a list of potential options:

Your study should be summarized in a paper of at least 5 pages. The paper and predictions should be submitted on Canvas before 7:00PM on the due date. Each group should have their own paper and predictions, but both need to be submitted by every member of the group. Also, each group member needs to assess the contribution value of the other members of the team on a scale from 0 (Bad) to 3 (Excellent). Use the Google form provided on the course website.

This project is extremely demanding. I believe that you should split the responsibilities into clear defined jobs and hold to this recommended timeline.

Start End Task
3/7 3/26 Build and Clean Dataset
3/26 3/28 Start Writing Section 1 of Paper
3/28 4/2 Each Group Member Builds and Evaluates One Model for Each Variable
4/2 4/4 Start Writing Sections 2-4 of Paper
4/4 4/9 Implement Models to Make Predictions and Finish Paper
4/9 4/9 Edit and Submit Paper + Predictions

On the first page, you should title your paper and give the names of the team members who contributed. The content of the paper should be organized in the following 4 subsections:

1) Data Information

In this section, you should outline in chronological order how your group built the dataset that your group used to fit, evaluate, and implement the predictive models you will discuss in future sections. Every step to build and clean the dataset should be written so that someone could read it, follow the steps, and get to the same data your group used.

Examples of some questions that need to be answered if applicable:

  • What datasets did you use and where did you get these datasets? What are the sources? Did you use an API and what data did you acquire from using the API?
  • How did you clean the datasets individually? How did you merge the datasets?
  • How did you handle missing data or outliers? Did you ignore certain games for some reason?
  • Did you split the data into train sets and test sets?
  • Did you acquire any current data for generating predictions for the current year?
  • Did you create any variables or aggregate the data for the purpose of generating predictions for the current year?

In this section, you should discuss any variables your group engineered and defend your reasons for engineering those variables. You should be able to mathematically represent your metrics as formulas and/or provide written descriptions. You should be able to explain why you think the variable you created would help in predicting any of the three outcome variables. Feel free to make citations to whoever you want to credit for leading you to your idea. The variables you engineer should be creative and well-defended.

Also, you should discuss all outside data you utilized to hopefully improve prediction. You are required to utilize data that is not currently contained in any of the starting datasets (Kendall or Nathan Lauga). You need to explain where you got the outside data and why you are including it. I want to know why your group thought that the outside data you are utilizing would be helpful for predicting any of the three outcome variables. The outside data utilized should be creative and well-defended.

2) Methodology for \(Spread\)

You should clearly describe your group’s best predictive model for \(Spread\) and the steps you took to get there. Discuss what variables were useful and useless for predicting \(Spread\). Since \(Spread\) is a numeric variable, I highly recommend a basic linear regression as a baseline with stepwise algorithms or regularization for variable selection. To ensure you are seeking the best model for prediction, I highly advise considering many different types of models (neural nets, regression trees, time series, etc.), utilizing cross-validation/out-of-sample testing, and adding interaction/polynomial terms. In this part, you should chronologically write about everything your group did to find the “best” model. Challenge yourselves to a thorough investigation from multiple angles and organize your process professionally for an audience with basic understanding in statistics and the sport. You are not required to present tables or figures, but these can be used to defend why the model you are calling the “best” is actually best. Talk about all of the models your group considered, but put some extra attention on describing your “best” model since this is the model that your group used to generate predictions. For example, if your “best” model is a linear regression model, you can show the coefficients in a table or write it out mathematically. In more advanced machine learning methods, I should know every value of every hyperparameter/tuning parameter and how your group chose those values. Finally, make sure that you explain how your group used the “best” model to generate predictions for future games where everything is unknown. Just giving a description of your best model is not enough if you don’t explain exactly how that model was used to make real predictions.

3) Methodology for \(Total\)

You should clearly describe your group’s best predictive model for \(Total\) and the steps you took to get there. Discuss what variables were useful and useless for predicting \(Total\). Since \(Total\) is a numeric variable but could be highly skewed, I highly recommend nonlinear transformations. To ensure you are seeking the best model for prediction, I highly advise considering many different types of models (neural nets, regression trees, time series, etc.), utilizing cross-validation/out-of-sample testing, and adding interaction/polynomial terms. In this part, you should chronologically write about everything your group did to find the “best” model. Challenge yourselves to a thorough investigation from multiple angles and organize your process professionally for an audience with basic understanding in statistics and the sport. You are not required to present tables or figures, but these can be used to defend why the model you are calling the “best” is actually best. Talk about all of the models your group considered, but put some extra attention on describing your “best” model since this is the model that your group used to generate predictions. For example, if your “best” model is a linear regression model, you can show the coefficients in a table or write it out mathematically. In more advanced machine learning methods, I should know every value of every hyperparameter/tuning parameter and how your group chose those values. Finally, make sure that you explain how your group used the “best” model to generate predictions for future games where everything is unknown. Just giving a description of your best model is not enough if you don’t explain exactly how that model was used to make real predictions.

4) Methodology for \(OREB\)

You should clearly describe your group’s best predictive model for \(OREB\) and the steps you took to get there. Discuss what variables were useful and useless for predicting \(OREB\). Since \(OREB\) is a discrete numeric variable, I highly recommend nonlinear transformations or considering a generalized linear model like Poisson regression. To ensure you are seeking the best model for prediction, I highly advise considering many different types of models (neural nets, regression trees, time series, etc.), utilizing cross-validation/out-of-sample testing, and adding interaction/polynomial terms. In this part, you should chronologically write about everything your group did to find the “best” model. Challenge yourselves to a thorough investigation from multiple angles and organize your process professionally for an audience with basic understanding in statistics and the sport. You are not required to present tables or figures, but these can be used to defend why the model you are calling the “best” is actually best. Talk about all of the models your group considered, but put some extra attention on describing your “best” model since this is the model that your group used to generate predictions. For example, if your “best” model is a linear regression model, you can show the coefficients in a table or write it out mathematically. In more advanced machine learning methods, I should know every value of every hyperparameter/tuning parameter and how your group chose those values. Finally, make sure that you explain how your group used the “best” model to generate predictions for future games where everything is unknown. Just giving a description of your best model is not enough if you don’t explain exactly how that model was used to make real predictions.

Rubric

The second playoff round is worth a total of 48 points based on the following rubric:

Criteria 0 1 2 3
Title Page Instructions Not Followed Missing Entire Element Missing a Team Member Title+Team Members
Data: Cleaning Summary Unclear Slightly Unclear Slightly Clear Clear
Data: New Variable Poor Not Innovative But Defended Innovative But Not Well Defended Innovative and Defended
Data: Outside Data Poor Not Innovative But Defended Innovative But Not Well Defended Innovative and Defended
Spread: Methodology Poor Lazy But Well Explained Clear, Thorough, but Lacking Innovation Clear, Innovative, and Thorough
Spread: Best Model Poor Description Adequate Description Good Description Excellent Description
Spread: Prediction None Bottom 5 Middle 6 Top 6
Total: Methodology Poor Lazy But Well Explained Clear, Thorough, but Lacking Innovation Clear, Innovative, and Thorough
Total: Best Model Poor Description Adequate Description Good Description Excellent Description
Total: Prediction None Bottom 5 Middle 6 Top 6
OREB: Methodology Poor Lazy But Well Explained Clear, Thorough, but Lacking Innovation Clear, Innovative, and Thorough
OREB: Best Model Poor Description Adequate Description Good Description Excellent Description
OREB: Prediction None Bottom 5 Middle 6 Top 6
Spelling >5 Errors 3-5 Errors 1-2 Errors No Errors
Value Bad Okay Good Excellent
Submitted Late Not a PDF On Time

People judge you by the way you play in the playoffs.

~ Jaromir Jagr


Championship

The Championship will consist of a 7 to 12 paged research paper with a minimum of 10 citations from 10 different sources studying a sport of interest that is not one of the following sports: baseball, basketball, football, hockey, and soccer. The page minimum and maximum includes the title page, sections, and references.

Like the playoffs, this final assignment will be done in a team. Each group member needs to submit the paper as a PDF on Canvas by 7PM on Monday, May 6.

Also, instead of having a formal exam, you will be required to meet with me through Zoom. Each group will have a 7 minute window to join me on Zoom and introduce me to your paper. Use this Zoom link https://unc.zoom.us/j/97382130073 and share your camera when you join. You will lose points if you don’t attend the meeting (without university approval), if you attend the meeting but are unable to share your camera, or if you are late to the meeting by 1 minute or more. I recommend that your entire group share one computer/camera for this meeting to ensure no technical difficulties arise.

Group Meeting Time
1 4:00PM - 4:07PM
2 4:10PM - 4:17PM
3 4:20PM - 4:27PM
4 4:30PM - 4:37PM
5 4:40PM - 4:47PM
6 4:50PM - 4:57PM
7 5:00PM - 5:07PM
8 5:10PM - 5:17PM
9 5:20PM - 5:27PM
10 5:30PM - 5:37PM
11 5:40PM - 5:47PM
12 5:50PM - 5:57PM
13 6:00PM - 6:07PM
14 6:10PM - 6:17PM
15 6:20PM - 6:27PM
16 6:30PM - 6:37PM
17 6:40PM - 6:47PM
18 6:50PM - 6:57PM

Since each group member has to submit a paper, make sure that group members who are not involved in the project do not have access to the paper. If a student fails to submit a paper, the student will receive a 0% until I receive a paper either written by the group or written by themselves. There will be additional penalties applied on top of the 3 point penalty at my discretion.

Also, each group member can assess the contribution value of the other members of the team on a scale from 0 (Bad) to 3 (Excellent). Use the survey link on the website for the Championship.

On the first page, you should title your paper and give the names of the team members. The content of the paper should be organized in the following 4 subsections:

1) Introduction

Describe the sport selected by writing about how it is played, when is it played, where is it most popular. Also, discuss the capital invested in the sport and the expected growth, both in popularity and future investment. From this information, give a factual defense on why your team chose this sport. Citations are highly recommended here to cite your numbers and information.

2) Literature Review

Discuss how analytics have been used historically in the sport, what types of metrics are used to evaluate performance on the field and/or off the field, who are the best athletes/teams/organizations based off those metrics. Many obscure sports do not have an extensive history of complex analytics, but all sports track information in order to evaluate performance. Talk about any implementation of gambling and what types of bets exist.

In this part, I want to know about all the documented analytical challenges pertaining to the sport you selected, as well as the solutions that people have come up with. Make sure you appropriately cite all books, articles, and webpages you use.

When I read your paper, I will do an extensive web search on your sport. If there is any key research that you did not discuss or any research that you failed to cite, I will take off points.

3) Future Work

Critically think about where you see ways in which sports analytics can be used to improve the sport. Think about ways in which analytics from other more popular sports could be applicable in the sport you selected. Try to creatively design metrics that could useful for performance evaluation. This section is the most important and should test your ability to innovate in areas primarily ignored that require innovation for growth. What information does the sport have or not have that could help the organizations, managers, or athletes that are financially invested in the sport you selected.

In this part, I want four innovative ideas that your group came up with that would solve key problems existing within the sport. For each innovation, you should be comprehensive in your description. What suggestions would you give the decision maker to implement your innovations? In this section, I want to know your ideas and how you would implement the ideas, illustrating that your ideas are practical. For each innovation, I will determine if your ideas are creative, well-described, and practical. All of these need to be ideas related to sports analytics.

4) Conclusion

Summarize your paper and discuss what appreciation you gained for this sport after your in-depth analysis.

Rubric

The championship is worth a total of 42 points based on the following rubric:

Criteria 0 1 2 3
Title Page Instructions Not Followed Missing Entire Element Missing a Team Member Title+Team Members
Intro: Description Poor Adequate Good Excellent
Intro: Finance/Growth Poor Adequate Good Excellent
Intro: Defense Weak Average Strong Compelling
Literature: Breadth Poor Missing Key Research Good Comprehensive
Literature: Citations None Missing Inconsistent Well-Cited
Future: Innovation 1 0/3 Success 1/3 Success 2/3 Success Creative, Described, Practical
Future: Innovation 2 0/3 Success 1/3 Success 2/3 Success Creative, Described, Practical
Future: Innovation 3 0/3 Success 1/3 Success 2/3 Success Creative, Described, Practical
Future: Innovation 4 0/3 Success 1/3 Success 2/3 Success Creative, Described, Practical
Conclusion: Summary/Appreciation Poor Adequate Good Excellent
Spelling >5 Errors 3-5 Errors 1-2 Errors No Errors
Value Bad Okay Good Excellent
Meeting/Submission Late Submission Missed Meeting/Did Not Share Camera Not a PDF/Late Arrival to Meeting On Time Submission and On Time at Meeting

Talent wins games, but teamwork and intelligence wins championships.

~ Jordan

No matter how good one individual is, it takes a whole team to win a championship.

~ King James


Assignment Tracker

Any assignments requiring a deliverable will be submitted via Canvas or Gradescope as a PDF.

Date (Time) Practice (PR) Gameday Speech (GS) Reg. Season (RS) Playoff (P) Champ (C)
JAN 25 (12:30PM) RS1
FEB 1 (12:30PM) GS1 (Odd: Ch. 3 HSMAS)
GS1 (Even: Ch. 4 HSMAS)
FEB 2 (5:00pm) PR1(.zip)
FEB 8 (12:30PM) RS2
FEB 15 (12:30PM) GS2 (Odd: Ch. 10 HSMAS)
GS2 (Even: Ch. 12 HSMAS)
FEB 22 (12:30PM) RS3
FEB 29 (12:30PM) GS3 (Odd: Ch. 6 HSMAS)
GS3 (Even: Ch. 8 HSMAS)
MAR 7 (11:59PM) P1
MAR 7 (12:30PM) RS4
MAR 19 (5:00pm) PR2(.zip)
MAR 21 (12:30PM) GS4 (Odd: Ch. 13 HSMAS)
GS4 (Even: Ch. 15 HSMAS)
APR 4 (12:30PM) RS5
APR 9 (7:00PM) P2
APR 11 (12:30PM) GS5 (Odd: Ch. 17 HSMAS)
GS5 (Even: Ch. 21 HSMAS)
APR 18 (12:30PM) RS6
MAY 6 (7:00PM) C

Team Sport

Many of the assessments in this course will be done in a teams of 5 or 6 playas randomly chosen. For each team-based assignment, you will be given a different team. This will force you to interact with the majority of the class throughout the semester. After each team-based assignment, you will grade the contribution of your teammates on a scale from 0 to 3, and this will contribute to your overall grade for the given assignment. A decent portion of your final grade will be influenced by this. The following link contains all teams alphabetically: All Teams

In the table below, you can find your group for each of the specific team-based assignments. Make sure you fill out the value survey before the time the assignment is due on the due date. Do not assess the value of yourself in the survey.

Due Date Assessment Value Survey
FEB 1 GS1 Value for GS1
FEB 15 GS2 Value for GS2
FEB 29 GS3 Value for GS3
MAR 5 P1 Value for P1
MAR 21 GS4 Value for GS4
APR 9 P2 Value for P2
APR 11 GS5 Value for GS5
MAY 6 C Value for C

Results from Playoffs Round 2

The predictions of all 18 teams can be accessed from the hyperlinks below. These three files contain predictions, actual values, calculations of MAE, and ranking. Kendall put all of this together for us, and I am giving it to you for transparency. If you find any mistakes that cause the grades to change, I will adjust them.

The table below shows each group’s MAE for \(Spread\), \(Total\), and \(OREB\), respectively. Also, you will see your group ranking for each of the three variables. It may be helpful to sort the table for each variable to see where your group is currently ranked.

This page was last updated on 2024-04-22 17:42:20 Eastern Time by Super Mario.