The Code That Cracked the Effect

Python Powerhouse

This Python script transforms raw Statcast data into a striking visualization of Juan Soto’s performance with and without Aaron Judge. Using pandas for data wrangling and matplotlib for plotting, we reveal the Soto–Judge Effect in vivid detail.


import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

# Load Statcast data for Soto’s 2024 (Yankees) and 2025 (Mets) seasons
soto_2024 = pd.read_csv("../data/soto-2024.csv")
soto_2025 = pd.read_csv("../data/soto-2025.csv")

# Convert dates to datetime for consistent plotting
soto_2024["Date"] = pd.to_datetime(soto_2024["Date"], errors="coerce")
soto_2025["Date"] = pd.to_datetime(soto_2025["Date"], errors="coerce")

# Normalize years to align 2024 and 2025 on the same x-axis
soto_2024["PlotDate"] = soto_2024["Date"].apply(lambda d: d.replace(year=2000) if pd.notnull(d) else pd.NaT)
soto_2025["PlotDate"] = soto_2025["Date"].apply(lambda d: d.replace(year=2000) else pd.NaT)

# Sort by date for proper rolling average calculation
soto_2024 = soto_2024.sort_values("PlotDate")
soto_2025 = soto_2025.sort_values("PlotDate")

# Calculate 10-game rolling average of hits
soto_2024["rolling_avg"] = soto_2024["H"].rolling(10).mean()
soto_2025["rolling_avg"] = soto_2025["H"].rolling(10).mean()

# Create the plot
plt.figure(figsize=(10, 6))
plt.plot(soto_2024["PlotDate"], soto_2024["rolling_avg"], label="Soto Late 2024", linewidth=2)
plt.plot(soto_2025["PlotDate"], soto_2025["rolling_avg"], label="Soto Early 2025", linewidth=2)

# Format x-axis with month and day
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%b %d'))

# Add titles, labels, and legend
plt.title("Juan Soto Rolling Hits: With vs. Without Judge")
plt.xlabel("Game Date")
plt.ylabel("Hits (10-game rolling avg)")
plt.legend()
plt.xticks(rotation=45)
plt.grid(True, which='major', linestyle='--', linewidth=0.5)
plt.tight_layout()

# Save the visualization
plt.savefig("../images/soto-judge-effect.jpeg")
    

This script loads Soto’s data, aligns dates, computes rolling averages, and generates a clear, compelling graph. The result? A visual story of how lineup protection shapes performance.