Prompt Engineering Labs¶

Runnable experiments that demonstrate curriculum concepts with real LLM API calls. Each lab is a self-contained Python script — or run them free in Google Colab with zero local setup.

🚀 Run Free in Google Colab (Recommended)¶

Click any badge below to open the lab directly in your browser — no installation required:

Lab	Colab Link
Lab 1 — Zero-Shot vs. Few-Shot
Lab 2 — Chain-of-Thought Impact
Lab 3 — Specificity Experiment
Lab 4 — Evaluation Pipeline
Lab 5 — Tool-Calling & Structured Output
Lab 6 — Agentic Plan-and-Execute

Free LLM Providers¶

The Colab notebooks let you choose a provider at runtime. No credit card required for the free options:

Provider	Free Tier	How to Get a Key
Google Gemini ⭐	15 RPM, 1 M tokens/day	aistudio.google.com/apikey
Groq	30 RPM, 14.4 K tokens/min	console.groq.com
OpenAI (paid)	Pay-per-token	platform.openai.com

Run Locally¶

Sandbox policy: Labs enforce isolated execution. Run them in .venv (recommended), conda, or Google Colab. Running with system Python outside an isolated environment will exit with a safety error.

Prerequisites¶

Python 3.10+
An API key for at least one provider (see table above)

Setup¶

cd learn/labs
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
# Edit .env with your API key(s)

Available Labs¶

Lab	Module Link	What You'll Learn	Time
Lab 1 — Zero-Shot vs. Few-Shot	Module 3, §3.2–§3.3	Quantify the difference between zero-shot and few-shot on a classification task	10 min
Lab 2 — Chain-of-Thought Impact	Module 3, §3.4	Measure CoT's improvement on arithmetic reasoning (direct vs. step-by-step)	10 min
Lab 3 — Specificity Experiment	Module 2, §2.1	Compare outputs from vague vs. specific prompts across multiple runs	10 min
Lab 4 — Evaluation Pipeline	Module 5, §5.4	Build a mini evaluation pipeline with test suite, metrics, and LLM-as-Judge	15 min
Lab 5 — Tool-Calling & Structured Output	Module 3, §3.6 · Module 5, §5.4	Compare JSON-mode prompting vs. function-calling API — measure valid-JSON rate and field completeness	15 min
Lab 6 — Agentic Plan-and-Execute	Module 6, §6.2	Build a plan-and-execute agent in pure Python; compare to single-prompt baseline	20 min

How Labs Work¶

Each lab script:

Defines two or more prompt variants (naive vs. pattern-applied)
Sends each variant to the configured LLM API multiple times
Collects and scores the outputs
Prints a comparison table showing the difference

The labs auto-detect your provider: Google Gemini → Groq → OpenAI (first available key wins). Override with LLM_MODEL and OPENAI_API_BASE in your .env.

Failure Gallery¶

Interactive exercises where you diagnose broken prompts before reading the solution.

Case	Anti-Pattern	Core Lesson
01 — Kitchen Sink	Doing too much in one prompt	Task decomposition
02 — Stale Context	Relying on out-of-date model knowledge	Grounding / RAG
03 — Injection Vulnerable	Secrets in system prompt, no override defense	Security & input sanitisation
04 — Ambiguous Format	No output schema for structured data	Constrained output (Lab 5)
05 — Missing Constraints	All 4 of 5 prompt components absent	Prompt anatomy (Module 1)

See failure-gallery/README.md for the scoring rubric.

Output Disclaimer¶

LLM outputs are non-deterministic. Your results will differ from run to run and from model to model. The purpose of these labs is to observe relative differences between prompt strategies, not to reproduce exact numbers.

← Back to curriculum