Survey the application of statistics and mathematics to the sports industry exploring the history of analytics across various sports, understanding the advantages of sports analytics for both on-field performance and off-field business decisions, and examining current research to encourage creative thought of future development.
The course will be organized by sport to ensure a comprehensive exploration of sports analytics. We will start with in-depth studies in Baseball, Basketball, American Football, Hockey, and Soccer. Later in the course, we will branch out according to the diverse interests of the class.
The statistical programming language R will be used to generate visualizations and perform basic modeling.
Head Coach: Mario Giacomazzo
Personal Trainer: Mark Cahill
Syllabus: Section 1
Game Days: MWF, 9:05AM - 9:55AM, Gardner 105
Office Hours:
University Approved Absences: Online Form
Handbook of Statistical Methods and Analyses in Sports, Albert, Glickman, Et al., 2017, ISBN: 9781498737364 (HSMAS)
Sports Analytics: A Guide for Coaches, Managers, and Other Decision Makers, Alamar, 2013, ISBN: 9780231162920 (SPAN)
Sports Analytics: A Guide for Coaches, Managers, and Other Decision Makers (2nd Edition), Alamar, 2024, ISBN: 9780231205207 (SPAN2)
Analytic Methods in Sports, Severini, 2015, ISBN: 9781482237016 (AMS)
Practicing Sabermetrics: Putting the Science of Baseball Statistics to Work, Costa et Al., 2009, ISBN: 9780786441778 (PS)
Analyzing Baseball Data with R, Marchi and Albert, 2014, ISBN:9781466570238 (ABDR)
Sports Math: An Introductory Course in the Mathematics of Sports Science and Sports Analytics, Minton, 2017, ISBN:9781498706261 (SM)
Introduction to NFL Analytics with R, Congelio, 2023, ISBN:9781032427751 (INAR)
Date | Lecture | Material | Reading |
---|---|---|---|
JAN 8 | Syllabus | Survey | |
JAN 10 | Sports Analytics I | Slides | Ch. 1 SPAN, Web 1, Web 2, Web 3, Web 4, Web 5, Web 6, Web 7,Web 8, Web 9 |
JAN 13 | Sports Analytics II | Slides | Ch. 1-7 SPAN,Web 1 ,Web 2 |
JAN 15 | Sports Analytics II (Cont.) | Slides | Ch. 1-7 SPAN,Web 1 ,Web 2 |
JAN 17 | Sports Analytics II (Cont.) | Slides | Ch. 1-7 SPAN,Web 1 ,Web 2 |
JAN 20 | MLK | No Class | |
JAN 22 | Snow Day | Game Day Speeches Moved | |
JAN 24 | Sports Analytics III | Slides | Web 1, Web 2, Web 3, Web 4, Web 5, Web 6, Web 7, Web 8 |
JAN 27 | Sports Analytics III (Cont.) | Slides | Web 1, Web 2, Web 3, Web 4, Web 5, Web 6, Web 7, Web 8 |
JAN 29 | Sports Analytics IV | Slides | Web 1, Web 2, Web 3, Article 1 |
JAN 31 | Baseball I | Slides | Web 1, Web 2, Article 1 |
FEB 3 | Baseball I (Cont.) | Slides | Web 1, Web 2, Article 1 |
Baseball II | Slides | Ch.1-2 MATH, Ch.7 SM | |
FEB 5 | Baseball II (Cont.) | Slides | Ch.1-2 MATH, Ch.7 SM |
FEB 7 | Baseball III | Slides | Ch. 3 MATH |
FEB 10 | Well-being Day | No Class | |
FEB 12 | Discussion of Playoff | ||
FEB 14 | Baseball IV | Slides | Ch. 4-5 MATH, Web 1, Web 2 |
FEB 17 | Baseball IV (Cont.) | Slides | Ch. 4-5 MATH, Web 1, Web 2 |
Baseball V | Slides | Ch. 6 MATH, Web | |
FEB 19 | Snow Day | Game Day Speeches Moved | |
FEB 21 | Snow Day | Game Day Speeches Moved | |
FEB 24 | Baseball V (Cont.) | Slides | Ch. 6 MATH, Web |
FEB 26 | Baseball VI | Slides | Ch. 7-9 MATH, Web 1, Web 2,Web 3, Web 4, Article 1 |
FEB 28 | Baseball VI (Cont.) | Slides | Ch. 7-9 MATH, Web 1, Web 2,Web 3, Web 4, Article 1 |
MAR 3 | Baseball VII | Slides | Ch. 11,16 MATH, Article 1, Article 2 |
Date | Lecture | Material | Reading |
---|---|---|---|
MAR 5 | Basketball I | Slides | Ch. 28-29 MATH, Web 1 Web 2, Web 3 |
MAR 7 | Basketball I (Cont.) | Slides | Ch. 28-29 MATH, Web 1 Web 2, Web 3 |
MAR 10 | Spring Break | No Class | |
MAR 12 | Spring Break | No Class | |
MAR 14 | Spring Break | No Class | |
MAR 17 | Championship | Example | |
MAR 19 | Basketball II | Slides | Ch. 30 MATH, Web 1, Web 2, Article 1 |
MAR 21 | Basketball II (Cont.) | Slides | Ch. 30 MATH, Web 1, Web 2, Article 1 |
Basketball III | Slides | Ch. 32-33 MATH, Web 1, Web 2, Web 3 | |
MAR 24 | Basketball III (Cont.) | Slides | Ch. 32-33 MATH, Web 1, Web 2, Web 3 |
MAR 26 | Basketball IV | Slides | Ch. 31,34 MATH, Web 1 |
MAR 28 | Basketball IV (Cont.) | Slides | Ch. 31,34 MATH, Web 1 |
MAR 31 | Basketball V | Slides | Ch. 35 MATH, Article 1 |
Basketball VI | Slides | Ch. 38 MATH, Web 1, Web 2 | |
Article 1, Article 2, Article 3 | |||
APR 2 | Basketball VI (Cont.) | Slides | Ch. 38 MATH, Web 1, Web 2 |
Article 1, Article 2, Article 3 | |||
APR 4 | Meet with Groups for Championship | ||
APR 7 | Football I | Slides | Ch. 18 MATH |
APR 9 | Football II | Slides | Ch. 19 MATH, Web 1, Web 2, Web 3, Article 1 |
APR 11 | Football II (Cont.) | Slides | Ch. 19 MATH, Web 1, Web 2, Web 3, Article 1 |
APR 14 | Football III | Slides | Ch. 20 MATH, Web 1, Article 1, Article 2, Article 3 |
APR 16 | Football IV | Slides | Ch. 21 MATH, Web 1, Article 1, Article 2 |
APR 18 | Well-being Day | No Class | |
APR 21 | Football V | Slides | Ch. 22 MATH, Web 1, Web 2, Web 3 |
Football VI | Slides | Ch. 25-26 MATH, Web 1, Web 2 | |
Article 1, Article 2 | |||
APR 23 | Last Lecture | ||
APR 25 | Work on Championship | ||
APR 28 | Work on Championship | ||
In this class, your performance will be graded using six different assessments: attendance (5%), practice (5%), gameday speeches (20%), regular season (20%), playoffs (20%), and championship (30%). For the gameday speeches, playoffs, and the championship, you will be randomly assigned to a team. This class is a team sport.
Attendance will be taken every class using the UNC Check-in App. You will need to install the UNC Check-in app to your mobile device and bring it to every class. Starting at the beginning of class, you will have 15 minutes to check-in using the mobile app. Instructions for installing and using the UNC Check-in App are available at https://unccheckin.unc.edu/. Go to this website to learn more. You need to attend at least 70% of the lectures to get credit for attendance, otherwise you will receive a 0 for your attendance grade. If you need to miss class for a reason permitted by the university and you don’t want to be penalized, you will need to get a university approved absence at https://uaao.unc.edu/submit-a-request/. If you cannot get a university approved absence and don’t want to be penalized, you must notify your instructor of the reason and provide documentation of the reason in email. The reason should line up with UNC’S definition of a university approved absence. For example, a job interview would not be approved by the university or me.
Your best ability is availability.
~ George Kittle et. al.
There will be at least one assignment designed to test proficiency in data science. This class involves group projects that require strong data science skills. These assignments will require you to perform data science tasks with data and must be done individually without the help of any other students in the course. The work required for these assignments will not be taught in the course but will come from material and topics taught in STOR 320 and/or STOR 455 and/or other provided material. These assignments will be submitted to either Canvas or Gradescope by the due date. You will receive a 25% penalty for less than one day late, 50% penalty for less than two days late, and 75% penalty for less than three days late. After three days, practice assignments will not be accepted. Expect these assignments early in the semester.
Practice makes perfect.
~ Bruce Lee et. al.
Gameday speeches are to be done in teams. Biweekly, your team will find a journal article from a refereed journal to read and summarize in an 8 slide gameday speech. You need to pick an article that none of the group members have read (previous gameday speech) and that hasn’t already been presented in class.
Slide 1: Title of the article, the names of the author(s), source of the article, and the name(s) of the presenters. If you have a team member who did not participate in reading the article or making the slides, DO NOT put their name on this slide. They will receive an automatic 0.
Slide 2-3: Summarize the overarching themes of the article(s). Discuss the research goal(s) of the paper in at least 6 bullet points.
Slide 4-6: Talk about the methodology. What did the author(s) do to answer their research goal? Discuss the methodology in at least 8 bullet points.
Slide 7: What did you like? What did you find confusing? Provide three positive critiques of the paper. Focus on the methodology used in the article.
Slide 8: What did you not like or think could have been improved? Give three negative critiques of the paper. Focus on the methodology used in the article.
On gameday, I will use a random number generator to pick 3 or 4 groups to present in 5 to 8 minutes. All groups will be graded based on the criteria, but only 3 groups will present.
The presentation should be submitted on Canvas before class starts on the due date. Each group should have their own presentation, but needs to be submitted by every member of the group as a pdf. Also, each group member can assess the contribution value of the other members of the team on a scale from 0 (Bad) to 3 (Excellent). Fill out the appropriate Google Form in the section below called Team Sport to turn in your ratings of your group members. Your value will be determined by the average score of the other members in your group.
Gameday speeches are worth a total of 24 points with minor exceptions:
The 3 or 4 groups that present will have the chance to get 0 to 3 bonus points based on presentation quality.
If your group presents and you are not present in class, you will receive a 3 point penalty and not get the bonus points awarded to your group for the presentation.
Consider the following rubric and notice the bonus points for the lucky presenting groups:
Criteria | 0 | 1 | 2 | 3 |
---|---|---|---|---|
Slide 1 | Missing All Components | Missing 2 Components | Missing 1 Component | Followed Directions |
Slides 2-3 | 1-2 Bullet Points | 2-3 Bullet Points | 4-5 Bullet Points | At least 6 Bullet Points |
Slides 4-6 | 1-3 Bullet Points | 4-6 Bullet Points | 5-7 Bullet Points | At least 8 Bullet Points |
Slide 7 | Missing All Three | Missing Two | Missing One | 3 Positive Critiques |
Slide 8 | Missing All Three | Missing Two | Missing One | 3 Negative Critiques |
Spelling/Grammar | >5 Errors | 3-5 Errors | 1-2 Errors | No Errors |
Value | Bad | Okay | Good | Excellent |
Submitted | Late | Not a PDF | On Time | |
Bonus (Presenting Groups) | Not Prepared | Prepared (Reading Slides) | Semi-Prepared (Mediocre Creativity) | Well-Prepared and Creative |
And just when you think they are about to break apart, Ducks fly together.
~ Gordon Bombay
The regular season consists of biweekly quizzes on the material presented in class over the previous lectures. Coach Mario will tell you which specific material to study. This can include what was taught in lecture, what was in the reading, and what was presented during gameday speeches. You will be given at least 15 minutes to complete the quiz. These quizzes taken in Canvas on your laptop live in class. You can use the slides, notes, internet. The only thing I prohibit is the use of generative AI. You will also need a calculator, R, Python, etc. to perform some basic calculations to answer some questions.
There’s two times of year for me: Football season, and waiting for football season.
~ Darius Rucker
The playoff will be a predictive modeling project for actual NBA games. The primary goal of this project is to design models for prediction of three variables – \(Spread\), \(Total\), and \(OREB\). Below you can find clear definitions of these three outcome variables:
\(Spread = \textrm{Home Points} - \textrm{Away Points}\)
\(Total = \textrm{Home Points} + \textrm{Away Points}\)
\(OREB = \textrm{Home OREB} + \textrm{Away OREB}\)
It is imperative that you follow these specifications. Your group will be making predictions of the three variables for all NBA games between Mar 7 and Mar 21, inclusively. Your predictions should be saved in the dataset called Predictions. Here you will find missing values where future predictions will be placed. This completed file should be submitted along with a paper summarizing your methodology. You will not only be graded by your methodology, but also by your predictive accuracy. The variables, \(Spread\), \(Total\), and \(OREB\) will all be evaluated by mean absolute error (MAE). For each of the variables, the top 5 groups will get 3 points, the middle 5 groups will get 2 points, and the bottom 4 groups will get 1 point. All three variables are numeric. If you fail to submit predictions on time or the predictions are not numeric values, you will get 0 points.
I am providing you with starting data courtesy of Vitalii Korolyk who posted several useful NBA datasets under the username NocturneBear on Github. You can download the data from Vitalii’s Github or you can download from our website using this link
For the engineering of new variables, consider creating differences and ratios between the stats for the home and away teams. Also, it may be useful to create variables that represent past information such as moving averages or lagged variables. You should be able to explain and defend the variables you create.
For the use of outside data, research other variables that could be important for prediction of the three variables. You must find data from other online sources that are not part of the starting data . This could be injury data, advanced metrics on players, play-by-play data, or more recent data that you webscraped. I am grading you on your creativity so utilizing recent data for your outside data requirement would be helpful for predicting, but would definitely be less creative than other options. Below is a list of potential options:
Your study should be summarized in a paper of at least 5 pages. The paper and predictions should be submitted on Canvas before 7:00PM on the due date (3/7/2025). Each group should have their own paper and predictions, but both need to be submitted by every member of the group. Also, each group member needs to assess the contribution value of the other members of the team on a scale from 0 (Bad) to 6 (Excellent). Use the Google form provided on the course website.
This project is extremely demanding. I believe that you should split the responsibilities into clear defined jobs and hold to this recommended timeline. If your group fails to submit the predictions on time, your entire group will get 0’s for all of the predictions. If your group fails to complete the paper on time, then you will get a penalty dependent on how late your group turns it in. If I discover that the methodology in your paper doesn’t lead to your predictions that you submit or your predictions were generated using any form of AI where the work is not your own and the methodology is unable to be explained, I will report this as an academic violation and require you to show me your code and prove to me that your predictions are valid and from your methodology. At a minimum, I will do whatever I can to ensure your group gets a 0 on this assignment.
Start | End | Task |
---|---|---|
NOW | 2/20 | Build and Clean Dataset |
2/20 | 2/21 | Start Writing Section 1 of Paper |
2/21 | 2/26 | Each Group Member Builds and Evaluates One Model for Each Variable |
2/26 | 2/28 | Start Writing Sections 2-4 of Paper |
2/28 | 3/7 | Implement Models to Make Predictions and Finish Paper |
3/7 | 3/7 | Edit and Submit Paper + Predictions |
On the first page, you should title your paper and give the names of the team members who contributed. If someone didn’t do any work, don’t put their name on it so I can give them a 0. The content of the paper should be organized in the following 4 subsections:
In this section, you should outline in chronological order how your group built the dataset that your group used to fit, evaluate, and implement the predictive models you will discuss in future sections. Every step to build and clean the dataset should be written so that someone could read it, follow the steps, and get to the same data your group used.
Examples of some questions that need to be answered if applicable:
In this section, you should discuss any variables your group engineered and defend your reasons for engineering those variables. You should be able to mathematically represent your metrics as formulas and/or provide written descriptions. You should be able to explain why you think the variable you created would help in predicting any of the three outcome variables. Feel free to make citations to whoever you want to credit for leading you to your idea. The variables you engineer should be creative and well-defended.
Also, you should discuss all outside data you utilized to hopefully improve prediction. You are required to utilize data that is not currently contained in any of the starting datasets (Kendall or Nathan Lauga). You need to explain where you got the outside data and why you are including it. I want to know why your group thought that the outside data you are utilizing would be helpful for predicting any of the three outcome variables. The outside data utilized should be creative and well-defended.
You should clearly describe your group’s best predictive model for \(Spread\) and the steps you took to get there. Discuss what variables were useful and useless for predicting \(Spread\). Since \(Spread\) is a numeric variable, I highly recommend a basic linear regression as a baseline with stepwise algorithms or regularization for variable selection. To ensure you are seeking the best model for prediction, I highly advise considering many different types of models (neural nets, regression trees, time series, etc.), utilizing cross-validation/out-of-sample testing, and adding interaction/polynomial terms. In this part, you should chronologically write about everything your group did to find the “best” model. Challenge yourselves to a thorough investigation from multiple angles and organize your process professionally for an audience with basic understanding in statistics and the sport. You are not required to present tables or figures, but these can be used to defend why the model you are calling the “best” is actually best. Talk about all of the models your group considered, but put some extra attention on describing your “best” model since this is the model that your group used to generate predictions. For example, if your “best” model is a linear regression model, you can show the coefficients in a table or write it out mathematically. In more advanced machine learning methods, I should know every value of every hyperparameter/tuning parameter and how your group chose those values. Finally, make sure that you explain how your group used the “best” model to generate predictions for future games where everything is unknown. Just giving a description of your best model is not enough if you don’t explain exactly how that model was used to make real predictions.
You should clearly describe your group’s best predictive model for \(Total\) and the steps you took to get there. Discuss what variables were useful and useless for predicting \(Total\). Since \(Total\) is a numeric variable but could be highly skewed, I highly recommend nonlinear transformations. To ensure you are seeking the best model for prediction, I highly advise considering many different types of models (neural nets, regression trees, time series, etc.), utilizing cross-validation/out-of-sample testing, and adding interaction/polynomial terms. In this part, you should chronologically write about everything your group did to find the “best” model. Challenge yourselves to a thorough investigation from multiple angles and organize your process professionally for an audience with basic understanding in statistics and the sport. You are not required to present tables or figures, but these can be used to defend why the model you are calling the “best” is actually best. Talk about all of the models your group considered, but put some extra attention on describing your “best” model since this is the model that your group used to generate predictions. For example, if your “best” model is a linear regression model, you can show the coefficients in a table or write it out mathematically. In more advanced machine learning methods, I should know every value of every hyperparameter/tuning parameter and how your group chose those values. Finally, make sure that you explain how your group used the “best” model to generate predictions for future games where everything is unknown. Just giving a description of your best model is not enough if you don’t explain exactly how that model was used to make real predictions.
You should clearly describe your group’s best predictive model for \(OREB\) and the steps you took to get there. Discuss what variables were useful and useless for predicting \(OREB\). Since \(OREB\) is a discrete numeric variable, I highly recommend nonlinear transformations or considering a generalized linear model like Poisson regression. To ensure you are seeking the best model for prediction, I highly advise considering many different types of models (neural nets, regression trees, time series, etc.), utilizing cross-validation/out-of-sample testing, and adding interaction/polynomial terms. In this part, you should chronologically write about everything your group did to find the “best” model. Challenge yourselves to a thorough investigation from multiple angles and organize your process professionally for an audience with basic understanding in statistics and the sport. You are not required to present tables or figures, but these can be used to defend why the model you are calling the “best” is actually best. Talk about all of the models your group considered, but put some extra attention on describing your “best” model since this is the model that your group used to generate predictions. For example, if your “best” model is a linear regression model, you can show the coefficients in a table or write it out mathematically. In more advanced machine learning methods, I should know every value of every hyperparameter/tuning parameter and how your group chose those values. Finally, make sure that you explain how your group used the “best” model to generate predictions for future games where everything is unknown. Just giving a description of your best model is not enough if you don’t explain exactly how that model was used to make real predictions.
The second playoff round is worth a total of 48 points based on the following rubric:
Criteria | 0 | 1 | 2 | 3 |
---|---|---|---|---|
Title Page | Instructions Not Followed | Missing Entire Element | Missing a Team Member | Title+Team Members |
Data: Cleaning Summary | Unclear | Slightly Unclear | Slightly Clear | Clear |
Data: New Variable | Poor | Not Innovative But Defended | Innovative But Not Well Defended | Innovative and Defended |
Data: Outside Data | Poor | Not Innovative But Defended | Innovative But Not Well Defended | Innovative and Defended |
Spread: Methodology | Poor | Lazy But Well Explained | Clear, Thorough, but Lacking Innovation | Clear, Innovative, and Thorough |
Spread: Best Model | Poor Description | Adequate Description | Good Description | Excellent Description |
Spread: Prediction | None | Bottom 5 | Middle 6 | Top 6 |
Total: Methodology | Poor | Lazy But Well Explained | Clear, Thorough, but Lacking Innovation | Clear, Innovative, and Thorough |
Total: Best Model | Poor Description | Adequate Description | Good Description | Excellent Description |
Total: Prediction | None | Bottom 5 | Middle 6 | Top 6 |
OREB: Methodology | Poor | Lazy But Well Explained | Clear, Thorough, but Lacking Innovation | Clear, Innovative, and Thorough |
OREB: Best Model | Poor Description | Adequate Description | Good Description | Excellent Description |
OREB: Prediction | None | Bottom 5 | Middle 6 | Top 6 |
Spelling | >5 Errors | 3-5 Errors | 1-2 Errors | No Errors |
Submitted | Late | Not a PDF | On Time | |
Criteria | 0-1 | 2-3 | 4-5 | 6 |
Value | Bad | Okay | Good | Excellent |
People judge you by the way you play in the playoffs.
~ Jaromir Jagr
I’m not looking for home runs, I’m looking for the playoffs.
~ Sammy Sosa
Imagine you are an analyst working for the coach or athlete and want to discover insights that would bring a competitive edge. In a world saturated with data, the way to ensure your analysis is unique and creative is to get the data yourself. To successfully conduct your study, you should follow the steps below:
To ensure that every group stays on a reasonable pace, I will require each member of your group to submit the exact portion of the dataset that they collected. I am expecting 50 or more observations per person. This needs to be submitted to Canvas before 11:59PM on April 11. For each member of the group, I am expecting the same number of variables and exact same variable names to show that your group worked together and coordinated efforts. Also, each group member can assess the contribution value of the other members of the team on a scale from 0 (Bad) to 3 (Excellent). Fill out the appropriate Google Form in the section below called Team Sport to turn in your ratings of your group members. Your value will be determined by the average score of the other members in your group.
Your study should be summarized in a paper with at least 5 pages (a minimum of 2 pages worth of writing). The paper should be submitted on Canvas before 11:59PM on the last day of class, April 28. Submit the paper as a PDF only. Each group should have their own paper, but the paper needs to be submitted by every member of the group.
On the first page, you should title your paper, write your group number, and give the names of the team members who contributed. The content of the paper should be organized in the following 3 subsections:
In this section, you should discuss three things.
First, you should briefly discuss the sport/game your team selected and the overall purpose of analyzing data for this sport/game. For example, what did you hypothesize or hope to find from studying the sport/game? Provide citations to any articles or websites that provided you inspiration for your ideas. You should look into what research has been done in this sport that is connected to what you plan on doing with the data you collect.
Second, you should give a preview of your data showing at least 5 rows of your dataset. Also, show a minimum of 5 columns. I would recommend trying to show as much of your data as possible. Give your variables clear names so anyone could understand what you measured, and make the table aesthetically pleasing (rounding, colors, etc.). It is okay to abbreviate variable names (e.g. R/G) if you identify the abbreviations (e.g Runs (R) and Games (G)) somewhere in the document. At least 5 variables need to be collected through observation. You can collect variables like height and birthplace of athletes, but these are not going to be variables you observe through watching the sport.
Third, you should briefly describe your data. When did you gather your data? Where did you gather your data? How did you gather your data (Watch or Play Game)? What does each observation (i.e. row in data) represent and how many observations do you have? Describe the five variables and how they were measured. I want a clear description of how the data was collected by the group.
In this section, I want a table(s) and two figures summarizing the data. Each table and each figure you give should have a couple sentences describing the information summarized.
The table(s) should be the same style as the table in the introduction (not different colors, fonts, rounding, etc.). The table(s) should efficiently summarize the five variables using popular statistics. If the variable is numeric (continuous or discrete), you should give the minimum, maximum, mean, and standard deviation, at least. If the variable is categorical, you should give the possible values with frequencies (counts) and relative frequencies (%). If you have a combination of categorical and numeric, I advise doing separate tables which will count as a single table. All tables should be formatted similarly.
Then, you should have two figures summarizing relationships between the variables you selected. In each figure, at least 2 variables, should be summarized. Appropriate axis names and scales should be used. Fonts should be large enough to read. These figures should be chosen for the purpose to be cited later in your paper. I recommend labeling each image and table that you create (i.e. Fig 1) so that you can easily reference them later.
If you create any visuals with new metrics that your group designed. Make sure you clearly define the metric using appropriate mathematical syntax prior to showing the visual with the new metric.
Following all instructions gets you two points. The last point is reserved for creativity, design, and over-achievement. This will be determined by comparing what you do to what the other teams do. I reward you for taking risks that that lead to better results then I require. For example in the table(s), calculating confidence intervals for numeric variables, creating contingency tables, or summarizing numeric variables for different subgroups from categorical variables. For example in the figures, tile, 3D, or map plots showing relationships across multiple variables. Also, I don’t mind if you give more than two figures or put multiple figures together in a grid. If you create a ton of figures, I will grade your two worst figures so make sure each figure is valuable and worth discussing.
The content in the previous two sections should be used to support two insights. An insight is a deeper understanding you gained about the sport from gathering and summarizing data. The insights should be connected to the purpose you outlined in the “Introduction to the Data” section and defended from the tables/figures in the “Summary of the Data” section. I recommend giving your figures and tables numbers so you can reference them in this section (e.g. Table 1, Figure 2). Write, at least, one paragraph for each insight. What were the two most interesting things you learned from the data? If you don’t reference statistics or figures you created, you will lose two points, at least. To get full credit, I advise supplementing your insights with p-values from appropriate hypothesis tests (t-test, anova, regression, difference in proportions, independence tests, etc.) or provide confidence intervals (means, proportions, etc.). Excellent insights should not be obvious but lead to innovations that would give a decision maker or athlete the competitive edge. Tests for statistical significance are the strongest way to defend your arguments. This section should be written so someone with a technical/statistical background would understand your methodology and someone without a technical/statistical background would understand your conclusions.
Also, in this section, after discussing your insights, you should write, at least, one paragraph where you critique what your group would have done differently. Would you have gathered another variable which could potentially be a confounding variable? Would you have measured a particular variable differently? What other information that you did not gather would have made this study better or have been of use? No study is done perfect and there are always aspects that could be improved. After I read your entire paper, I will take off a point for every single thing that I think you should have done differently that you did not highlight in this section. If there is a criticism that I incorrectly identify because you are extremely vague in the description of your study or in the summary of your insights, then you are at fault. Be very clear in describing all aspects of the data and method of collection in the first 2 sections.
Also, each group member can assess the contribution value twice of the other members of the team on a scale from 0 (Bad) to 3 (Excellent). One time you will fill it out on April 11 and another time on the last day of class. Fill out the appropriate Google Form in the section below called Team Sport to turn in your ratings of your group members. Your value will be determined by the average score of the other members in your group.
Finally, each group will present their group’s work using a slideshow presentation during our scheduled final exam period which is April 30 at 12PM in Gardner 105. Presentation slides should be submitted on Canvas prior to 12PM on April 30 by each group member. On April 30, each of you will have 5 to 7 minutes to present your work to the class. Your presentation should be persuasive as if you are talking to a group of coaches or players. I recommend starting with the purpose of your project and talk briefly about data collection. All group members involved should be utilized as evenly as possible in the presentation. It should be clear that your group has practiced your presentation several times prior. I am expecting a high level of eye contact, smooth transitions, slides with few words, no verbatim reading, etc. I want your slides to look absolutely stunning as if you were presenting this in a job interview.
If there is a group who is clearly incompetent, lazy, etc., your group should notify me as soon as possible so I can help your group adapt to the situation. For example, I wouldn’t want a useless member of your group being able to participate in a presentation that they didn’t prepare or practice with the rest of the time. Your grade will be negatively impacted if someone in your group is allowed to present, and they perform poorly. Also, I wouldn’t want a group member submitting a paper where their only contribution is making the title page.
The first playoff round is worth a total of 69 points based on the following rubric:
Criteria | 0 | 1 | 2 | 3 |
---|---|---|---|---|
Data | Inadequate Consistency | Some Inconsistencies | Consistency Across Group | |
Data | \(n<30\) and/or \(<4\) Observed Vars | \(30<=n<40\) and/or \(4\) Observed Vars | \(40<=n<50\) & \(>=5\) Observed Vars | \(n>=50\) & \(>=5\) Observed Vars |
Data | Not Submitted or Late | non-CSV | Submitted | |
Data Value | Bad | Okay | Good | Excellent |
Title Page | Instructions Not Followed | Major Mistake | Minor Mistake | Followed Instructions |
Introduction: Purpose | Not Innovative or Clearly Described | Not Innovative and/or Not Clearly Described | Somewhat Innovative and Clearly Described | Innovative and Clearly Described |
Introduction: Preview | Not Attempted | Missing Rows or Columns | 5 Rows and 5 Columns | Outstanding |
Introduction: Description | Not Attempted | Missing Key Information | Answers Questions But Poorly Written | Addresses All Questions Well |
Summary: Table | Missing More Than 1 Statistic | Missing 1 Statistic | Followed Instructions | Excellent |
Summary: Figure 1 | Not Attempted | Not 2 Variables or Messy | 2 Variables and Clear | Excellent |
Summary: Figure 2 | Not Attempted | Not 2 Variables or Messy | 2 Variables and Clear | Excellent |
Summary: Table/Figure Descriptions | Not Attempted | Missing Important Descriptions | Some Descriptions Unclear | Everything Clearly Described |
Insights: First Insight | Not Attempted | Mediocre/Weak | No Reference to Summary Section | Excellent |
Insights: Second Insight | Not Attempted | Mediocre/Weak | No Reference to Summary Section | Excellent |
Insights: Done Differently | Missing 3 Obvious Things | Missing 2 Obvious Things | Missing 1 Obvious Thing | Got Everything |
Instructions Followed | Organized Incorrectly | Organized Correctly without Section Headings | Organized Correctly with Section Headings | |
Spelling/Grammar | >5 Errors | 3-5 Errors | 1-2 Errors | No Errors |
Paper | Not Submitted or Late | Not a PDF | On Time | |
Paper Value | Bad | Okay | Good | Excellent |
Presentation Motivation | No Good Explanation | Weak Defense of Work | Okay Explanation and Defense | Exciting and Persuasive |
Presentation Slides | Bad | Okay | Good | Excellent |
Presentation Professionalism | Bad | Some Reading of Slides | Clearly Prepared and Well-Distributed | |
Presentation Time | Poorly Planned | Barely Missed Time | Between 5 and 7 Minutes | |
Presentation | Not Submitted or Late | Not a PDF | On Time | |
Important: After the final grades are submitted, I plan on modifying either this website or my personal website to make all of your datasets public. I don’t plan on publicizing your papers, although I recommend you do this. All of your names will be attached as authors for the dataset that your group has built which will require your names to be cited. Furthermore, you can make your data public on the ScoreNetwork by following the steps here https://data.scorenetwork.org/submit-data.html. This network would increase the exposure for your dataset across the world. If you would like to opt-out, let me know after the semester ends and the grades are finalized
Talent wins games, but teamwork and intelligence wins championships.
~ Jordan
No matter how good one individual is, it takes a whole team to win a championship.
~ King James
Any assignments requiring a deliverable will be submitted via Canvas as a PDF.
Date (Time) | Practice (PR) | Gameday Speech (GS) | Reg. Season (RS) | Playoff (P) | Champ (C) |
---|---|---|---|---|---|
JAN 24 (9AM) | GS1 | ||||
JAN 24 (11:59PM) | PR1(.zip) | ||||
JAN 29 (9AM) | RS1 | ||||
FEB 5 (9AM) | GS2 | ||||
FEB 7 (11:59PM) | PR2(.zip) | ||||
FEB 12 (9AM) | RS2 | ||||
FEB 26 (9AM) | GS3 | ||||
MAR 5 (9AM) | RS3 | ||||
MAR 7 (7PM) | P | ||||
MAR 19 (9AM) | GS4 | ||||
MAR 26 (9AM) | RS4 | ||||
APR 2 (9AM) | GS5 | ||||
APR 9 (9AM) | RS5 | ||||
APR 11 (11:59PM) | C (Data) | ||||
APR 16 (9AM) | GS6 | ||||
APR 23 (9AM) | RS6 | ||||
APR 28 (11:59PM) | C (Paper) | ||||
APR 30 (12PM) | C (Presentation) | ||||
Many of the assessments in this course will be done in a teams of 3 to 6 playas randomly chosen. For each team-based assignment, you will be given a different team. This will force you to interact with the majority of the class throughout the semester. After each team-based assignment, you will grade the contribution of your teammates on a scale from 0 to 3, and this will contribute to your overall grade for the given assignment. A decent portion of your final grade will be influenced by this. The following link contains all teams alphabetically: All Teams
In the table below, you can find your group for each of the specific team-based assignments. Make sure you fill out the value survey before the time the assignment is due on the due date. Do not assess the value of yourself in the survey. You don’t need to fill out the form if you are giving everyone in your group a perfect score.
Due Date | Assessment | Value Survey |
---|---|---|
JAN 22 | GS1 | Value for GS1 |
FEB 5 | GS2 | Value for GS2 |
FEB 26 | GS3 | Value for GS3 |
MAR 7 | P | Value for Playoff |
MAR 19 | GS4 | Value for GS4 |
APR 2 | GS5 | Value for GS5 |
APR 11 | C (Data) | Value for Championship 1 |
APR 16 | GS6 | Value for GS6 |
APR 30 | C (Paper) | Value for Championship 2 |
The predictions of all 14 teams can be accessed from the hyperlinks below. These three files contain predictions, actual values, calculations of MAE, and ranking. If you find any mistakes that cause the grades to change, I will adjust them.
The table below shows each group’s MAE for \(Spread\), \(Total\), and \(OREB\), respectively. Also, you will see your group ranking for each of the three variables. It may be helpful to sort the table for each variable to see where your group is currently ranked.
This page was last updated on 2025-04-28 17:06:12.284476 Eastern Time by Super Mario.