← Back

Pybaseball Data in a Spreadsheet

2023-07-31 // Nick End, Founder

Introduction

For data scientists, data analysts, and engineers who know python, the pybaseball package is a great way to access all the baseball statistics a fan could dream of. Unfortunately, pybaseball is less accessible for the average fan who is looking to crunch some numbers on their favorite players, look up historical stats for their home team, or as is most common, build an analysis to help pick and manage a fantasy baseball team. In the following post, we show you how to easily run some python command to access data in pybaseball, pull it straight into a highly performant spreadsheet, and perform your own analysis.

One of the challenges of using pybaseball has generally been the need to use a development tool, like VS Code or notebooks, like Jupyter and Google Colab, to access the pybaseball package with python, then export the data and open it with a spreadsheet. The tedious process made it less appealing for spreadsheet users to make use of pybaseball. In this post we will present a solution using Row Zero, a blazingly fast spreadsheet with a built in python development environment. Row Zero enables users to write simple python scripts to get data into the spreadsheet and then complete the analysis using everyday spreadsheet skills.

Index

  1. What is Pybaseball?
  2. What is Row Zero?
  3. Getting started with Pybaseball
  4. Pybaseball Individual Player Batting Stats
  5. Pybaseball Individual Player Pitching Stats
  6. Pybaseball Season Data
  7. Pybaseball Team Data
  8. Summary

What is Pybaseball?

Pybaseball is a popular python package written and maintained by James Ledoux as well as other contributors. The package scrapes Baseball Reference, Baseball Savant, and Fan Graphs to aggregate data and make it easy to run Python commands to analyze the data.

The data available through the pybaseball python package is queryable at the individual pitch or hit level, as well as historical player or team data aggregated across seasons. Information on the pybaseball python package can be found on github. The python commands for pybaseball are well documented in the pybaseball package documentation.

What is Row Zero?

To start we quickly review Row Zero. To open a free Row Zero workbook, visit https://rowzero.io/new. Row Zero presents a spreadsheet with all the same functions and features as Microsoft Excel and Google Sheets but with 100x-1000x more speed. The code window where pybaseball commands can be evaluated is accessible on the right-hand side of the screen as highlighted in the image below.

Row Zero Screenshot

Getting Started with Pybaseball

To get started with pybaseball, there are three main steps:

  1. Import necessary python packages
  2. Write code to pull desired states
  3. Enter the name of the data table in the Row Zero spreadsheet to see stats.

We will now walk through 4 examples to help show how baseball stats can be pulled into a spreadsheet for analysis.

1. Pybaseball Individual Player Batting Stats in a Spreadsheet

First import datetime and timedelta packages that will be useful when specifying the time periods over which the stats will be pulled. Next pull in 2 different packages from pybaseball: playerid_lookup and statcast_batter. With these packages imported, write a script that specifies the player name, time period for stats desired, and which stats provider is desired. The python script creates a data set called 'statcast_batter_df,' that will pull in the data for the player and timeframe specified in the code. Press 'shift + enter' or the 'run' button to execute the code window. Then, in any spreadsheet cell type '=statcast_batter_df' and the data table of stats will be populated.

from datetime import datetime, timedelta
from pybaseball import playerid_lookup
from pybaseball import statcast_batter

##############  Player Data  ######################

# Display statcast data for a given batter

batter_first_name = "Shohei"
batter_last_name = "Ohtani"
start_date = '2023-03-30' # Opening day, 2023 season
end_date = (datetime.now() - timedelta(1)).strftime('%Y-%m-%d') # This makes the end_date yesterday's date

batter_id_df = playerid_lookup(batter_last_name, batter_first_name)
batter_mlb_id = batter_id_df['key_mlbam'][0]

statcast_batter_df = statcast_batter(start_date, end_date, batter_mlb_id)

Batting Data Table Creation Shohei Ohtani stats row zero

2. Pybaseball Individual Player Pitching Stats in a Spreadsheet

Now we create another example pulling pitching stats for an individual player into a workbook. To do this, we start out importing similar packages from pybaseball, executing them, and then pulling the stats into a data table in the Row Zero spreadsheet.

from datetime import datetime, timedelta
from pybaseball import playerid_lookup
from pybaseball import statcast_pitcher

##############  Player Data  ######################

# Display statcast data for a given pitcher

pitcher_first_name = "Sandy"
pitcher_last_name = "Alcantara"
start_date = '2023-03-30' # Opening day, 2023 season
end_date = (datetime.now() - timedelta(1)).strftime('%Y-%m-%d') # This makes the end_date yesterday's date

pitcher_id_df = playerid_lookup(pitcher_last_name, pitcher_first_name)
pitcher_mlb_id = pitcher_id_df['key_mlbam'][0]

statcast_pitcher_df = statcast_pitcher(start_date, end_date, pitcher_mlb_id)

Pitching Data Table Creation Sandy Alcantara stats row zero

3. Pybaseball Season Data in a Spreadsheet

To pull season data into a Row Zero spreadsheet, the commands are similar. First import the necessary packages into the code window. In this example, packages are imported from pybaseball that imports data from either Fangraph (batting_stats and pitching_stats) or Baseball Reference (batting_stats_bref and Pitching_stats_bref). Then define functions that pull in data for a given season (2023) and sorts the data by “Name.” Once the code window is compiled, type '=baseball_reference_batting_season_df' or any one of the other functions into the spreadsheet to see the data.

from pybaseball import batting_stats
from pybaseball import batting_stats_bref
from pybaseball import pitching_stats
from pybaseball import pitching_stats_bref


##############  Season Data  ######################

# Look at a single season worth of batting or pitching data from Fangraphs and/or Baseball Reference. 
# Dataframes will display in the Batting Season Data and Pitching Season Data tabs.

# Edit the season below to look at a different season
season = 2023

# Fangraph data is displayed in tabs in the spreadsheet
fangraph_batting_season_df = batting_stats(season,qual=0).sort_values("Name")
fangraph_pitching_season_df = pitching_stats(season,qual=0).sort_values("Name")

# Simply type "=<dataframe name>" in a spreadsheet tab to pull in baseball reference data
baseball_reference_batting_season_df = batting_stats_bref(season).sort_values("Name")
baseball_reference_pitching_season_df = pitching_stats_bref(season).sort_values("Name")

Baseball Reference 2023 Spreadsheet

4. Pybaseball Team Data in a Spreadsheet

In order to get specific team data into a spreadsheet, another set of commands are used. 3 different packages are imported: batting_stats, pitching_stats, and schedule_and_record. A variable called 'team' is set and assigned to the 3 letter city acronym for the team of interest. Then a set of data frames (data tables) are created with the various team stats included. In this case, tables are created for batting, pitching, and schedule/record stats. Run the code below in the code window and paste the names of the data tables into the spreadsheet (e.g. '=team_batting_df').

from pybaseball import batting_stats
from pybaseball import pitching_stats
from pybaseball import schedule_and_record

##############  Team Data  ######################

# Change the team below using the 3-letter city acronym of the team (e.g. "STL" for St. Louis)

team = 'SEA'

team_batting_df = batting_stats(2023, qual = 1, ind = 1)
team_batting_df = team_batting_df.loc[team_batting_df['Team'] == team]
team_pitching_df =  pitching_stats(2023, qual = 1, ind = 1)
team_pitching_df = team_pitching_df.loc[team_pitching_df['Team'] == team]
team_schedule_df = schedule_and_record(2023, team)

Pybaseball Team Stats

Summary

Pybaseball is one of the best free tools for evaluating baseball stats. Now with a free Row Zero account and a few python scripts, it's possible to easily import baseball stats from Pybaseball into a spreadsheet for analysis. Simply write the import scripts in Row Zero's code window and then type '=name_of_variable' in the spreadsheet. Once data is imported, all normal spreadsheet functions can be used to analyze baseball stats.