Predicting Injuries in MLB Pitchers

I’ve made it halfway through bootcamp and completed my third and favorite project up to now! The previous few weeks we’ve been studying about SQL databases, classification models resembling Logistic Regression and Support Vector Machines, and visualization tools such as Tableau, Bokeh, and Flask. I put these new skills to use over the past 2 weeks in my project to classify injured pitchers. This post will define my process and evaluation for this project. All of my code and project presentation slides can be discovered on my Github and my Flask app for this project may be found at mlb.kari.codes.

Challenge:

For this project, my challenge was to predict MLB pitcher accidents utilizing binary classification. To do this, I gathered information from several sites including Baseball-Reference.com and MLB.com for pitching stats by season, Spotrac.com for Disabled Checklist data per season, and Kaggle for 2015–2018 pitch-by-pitch data. My aim was to make use of aggregated data from previous seasons, to predict if a pitcher could be injured in the following season. The requirements for this project had been to store our data in a PostgreSQL database, to make the most of classification models, and to visualize our data in a Flask app or create graphs in Tableau, Bokeh, or Plotly.

Data Exploration:

I gathered information from the 2013–2018 seasons for over 1500 Main League Baseball pitchers. To get a really feel for my knowledge, nba중계 I began by looking at features that were most intuitively predictive of injury and compared them in subsets of injured and wholesome pitchers as follows:

I first looked at age, and while the imply age in both injured and healthy gamers was around 27, the information was skewed a bit in another way in each groups. The most typical age in injured players was 29, while healthy players had a a lot lower mode at 25. Equally, average pitching velocity in injured players was higher than in healthy gamers, as expected. The subsequent feature I considered was Tommy John surgery. This is a quite common surgery in pitchers where a ligament in the arm gets torn and is changed with a wholesome tendon extracted from the arm or leg. I used to be assuming that pitchers with past surgical procedures were more likely to get injured once more and the information confirmed this idea. A significant 30% of injured pitchers had a past Tommy John surgery while wholesome pitchers were at about 17%.

I then looked at average win-loss document within the two groups, which surprisingly was the characteristic with the highest correlation to injury in my dataset. The subset of injured pitchers had been successful a median of 43% of games compared to 36% for wholesome players. It is smart that pitchers with more wins will get more taking part in time, which can lead to more accidents, as shown in the higher average innings pitched per game in injured players.

The feature I used to be most taken with exploring for this project was a pitcher’s repertoire and if certain pitches are more predictive of injury. Taking a look at function correlations, I discovered that Sinker and Cutter pitches had the highest constructive correlation to injury. I decided to explore these pitches more in depth and appeared on the percentage of combined Sinker and Cutter pitches thrown by individual pitchers every year. I seen a pattern of injuries occurring in years the place the sinker/cutter pitch percentages had been at their highest. Below is a pattern plot of 4 leading MLB pitchers with recent injuries. The red factors on the plots represent years in which the gamers had been injured. You can see that they often correspond with years in which the sinker/cutter percentages have been at a peak for each of the pitchers.