Prompt Engineering Labs¶
Runnable experiments that demonstrate curriculum concepts with real LLM API calls. Each lab is a self-contained Python script — or run them free in Google Colab with zero local setup.
🚀 Run Free in Google Colab (Recommended)¶
Click any badge below to open the lab directly in your browser — no installation required:
Free LLM Providers¶
The Colab notebooks let you choose a provider at runtime. No credit card required for the free options:
| Provider | Free Tier | How to Get a Key |
|---|---|---|
| Google Gemini ⭐ | 15 RPM, 1 M tokens/day | aistudio.google.com/apikey |
| Groq | 30 RPM, 14.4 K tokens/min | console.groq.com |
| OpenAI (paid) | Pay-per-token | platform.openai.com |
Run Locally¶
Sandbox policy: Labs enforce isolated execution. Run them in
.venv(recommended),conda, or Google Colab. Running with system Python outside an isolated environment will exit with a safety error.
Prerequisites¶
- Python 3.10+
- An API key for at least one provider (see table above)
Setup¶
cd learn/labs
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
# Edit .env with your API key(s)
Available Labs¶
| Lab | Module Link | What You'll Learn | Time |
|---|---|---|---|
| Lab 1 — Zero-Shot vs. Few-Shot | Module 3, §3.2–§3.3 | Quantify the difference between zero-shot and few-shot on a classification task | 10 min |
| Lab 2 — Chain-of-Thought Impact | Module 3, §3.4 | Measure CoT's improvement on arithmetic reasoning (direct vs. step-by-step) | 10 min |
| Lab 3 — Specificity Experiment | Module 2, §2.1 | Compare outputs from vague vs. specific prompts across multiple runs | 10 min |
| Lab 4 — Evaluation Pipeline | Module 5, §5.4 | Build a mini evaluation pipeline with test suite, metrics, and LLM-as-Judge | 15 min |
| Lab 5 — Tool-Calling & Structured Output | Module 3, §3.6 · Module 5, §5.4 | Compare JSON-mode prompting vs. function-calling API — measure valid-JSON rate and field completeness | 15 min |
| Lab 6 — Agentic Plan-and-Execute | Module 6, §6.2 | Build a plan-and-execute agent in pure Python; compare to single-prompt baseline | 20 min |
How Labs Work¶
Each lab script:
- Defines two or more prompt variants (naive vs. pattern-applied)
- Sends each variant to the configured LLM API multiple times
- Collects and scores the outputs
- Prints a comparison table showing the difference
The labs auto-detect your provider: Google Gemini → Groq → OpenAI (first available key wins). Override with LLM_MODEL and OPENAI_API_BASE in your .env.
Failure Gallery¶
Interactive exercises where you diagnose broken prompts before reading the solution.
| Case | Anti-Pattern | Core Lesson |
|---|---|---|
| 01 — Kitchen Sink | Doing too much in one prompt | Task decomposition |
| 02 — Stale Context | Relying on out-of-date model knowledge | Grounding / RAG |
| 03 — Injection Vulnerable | Secrets in system prompt, no override defense | Security & input sanitisation |
| 04 — Ambiguous Format | No output schema for structured data | Constrained output (Lab 5) |
| 05 — Missing Constraints | All 4 of 5 prompt components absent | Prompt anatomy (Module 1) |
See failure-gallery/README.md for the scoring rubric.
Output Disclaimer¶
LLM outputs are non-deterministic. Your results will differ from run to run and from model to model. The purpose of these labs is to observe relative differences between prompt strategies, not to reproduce exact numbers.