Session 2: Building Blocks – Probability and Distributions (Normal, Binomial & Simulations)

Quick Review

From Amanda Ategeka, MD's notes: Probability models uncertainty in clinical outcomes, like patient recovery rates. Building on descriptive stats, distributions help predict 'hidden patterns' in data—essential for evidence-based medicine.

Get Started

Open your ukubona-clinical-lab folder in VS Code. In the terminal, type wsl (if on Windows) to enter Linux mode, then source venv/bin/activate to turn on your lab.

Step 1: Create s2.py

In terminal, type code s2.py. This opens the file in VS Code.

Copy-paste the code below into it, then save (Ctrl+S).

import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
import os

os.makedirs("plots", exist_ok=True)

# Normal distribution parameters (e.g., for patient recovery times)
mu, sigma = 10, 2  # Mean 10 days, std 2 days
x = np.linspace(mu - 3*sigma, mu + 3*sigma, 100)
pdf = stats.norm.pdf(x, mu, sigma)

# Simulate data
data = np.random.normal(mu, sigma, 1000)

# Plot histogram and PDF
fig, ax = plt.subplots()
ax.hist(data, bins=30, density=True, color='blue', alpha=0.7, label='Simulated Data')
ax.plot(x, pdf, 'r-', label='Normal PDF')
ax.set_xlabel('Recovery Time (days)')
ax.set_ylabel('Density')
ax.set_title('Normal Distribution: Patient Recovery Times')
ax.legend()
ax.grid(True)

plt.savefig("plots/session2_normal_dist.png", dpi=200)
plt.close()

# Binomial example (e.g., success in clinical trial)
n, p = 10, 0.3  # 10 patients, 30% success rate
binomial_data = np.random.binomial(n, p, 1000)
ax = plt.subplot(111)
ax.hist(binomial_data, bins=np.arange(0, n+2)-0.5, density=True, color='green', alpha=0.7, label='Binomial Data')
x_bin = np.arange(0, n+1)
pmf = stats.binom.pmf(x_bin, n, p)
ax.plot(x_bin, pmf, 'ro-', label='Binomial PMF')
ax.set_xlabel('Number of Successes')
ax.set_ylabel('Probability')
ax.set_title('Binomial Distribution: Clinical Trial Successes')
ax.legend()
ax.grid(True)

plt.savefig("plots/session2_binomial_dist.png", dpi=200)
plt.close()

print("✅ Plots saved!")

Step 2: Run the Code

Update your run.sh to run s2.py (edit the file: change 'python s1.py' to 'python s2.py'). Then in terminal, type bash run.sh. This runs your script and saves plots in plots/. Open the images—they show simulated distributions for clinical scenarios!

Excite: Play Around

Edit s2.py: Change mu to 15 (longer average recovery) or p to 0.5 (higher success rate). Save, run bash run.sh again. See how the distributions shift? That's probability modeling real clinical variability!

Homework

Try different distributions (e.g., add Poisson for rare events like side effects: use stats.poisson). Save your best plots. Commit to GitHub: In terminal, git add ., git commit -m "Session 2 distributions", git push.

Math Focus

Distributions model probability—like normal for continuous data (e.g., blood pressure) and binomial for discrete outcomes (e.g., trial successes). Simulations help visualize uncertainty in medicine.