Skip to content

Prompt Engineering Labs

Runnable experiments that demonstrate curriculum concepts with real LLM API calls. Each lab is a self-contained Python script — or run them free in Google Colab with zero local setup.


Click any badge below to open the lab directly in your browser — no installation required:

Lab Colab Link
Lab 1 — Zero-Shot vs. Few-Shot Open In Colab
Lab 2 — Chain-of-Thought Impact Open In Colab
Lab 3 — Specificity Experiment Open In Colab
Lab 4 — Evaluation Pipeline Open In Colab
Lab 5 — Tool-Calling & Structured Output Open In Colab
Lab 6 — Agentic Plan-and-Execute Open In Colab

Free LLM Providers

The Colab notebooks let you choose a provider at runtime. No credit card required for the free options:

Provider Free Tier How to Get a Key
Google Gemini 15 RPM, 1 M tokens/day aistudio.google.com/apikey
Groq 30 RPM, 14.4 K tokens/min console.groq.com
OpenAI (paid) Pay-per-token platform.openai.com

Run Locally

Sandbox policy: Labs enforce isolated execution. Run them in .venv (recommended), conda, or Google Colab. Running with system Python outside an isolated environment will exit with a safety error.

Prerequisites

  • Python 3.10+
  • An API key for at least one provider (see table above)

Setup

cd learn/labs
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
# Edit .env with your API key(s)

Available Labs

Lab Module Link What You'll Learn Time
Lab 1 — Zero-Shot vs. Few-Shot Module 3, §3.2–§3.3 Quantify the difference between zero-shot and few-shot on a classification task 10 min
Lab 2 — Chain-of-Thought Impact Module 3, §3.4 Measure CoT's improvement on arithmetic reasoning (direct vs. step-by-step) 10 min
Lab 3 — Specificity Experiment Module 2, §2.1 Compare outputs from vague vs. specific prompts across multiple runs 10 min
Lab 4 — Evaluation Pipeline Module 5, §5.4 Build a mini evaluation pipeline with test suite, metrics, and LLM-as-Judge 15 min
Lab 5 — Tool-Calling & Structured Output Module 3, §3.6 · Module 5, §5.4 Compare JSON-mode prompting vs. function-calling API — measure valid-JSON rate and field completeness 15 min
Lab 6 — Agentic Plan-and-Execute Module 6, §6.2 Build a plan-and-execute agent in pure Python; compare to single-prompt baseline 20 min

How Labs Work

Each lab script:

  1. Defines two or more prompt variants (naive vs. pattern-applied)
  2. Sends each variant to the configured LLM API multiple times
  3. Collects and scores the outputs
  4. Prints a comparison table showing the difference

The labs auto-detect your provider: Google Gemini → Groq → OpenAI (first available key wins). Override with LLM_MODEL and OPENAI_API_BASE in your .env.


Interactive exercises where you diagnose broken prompts before reading the solution.

Case Anti-Pattern Core Lesson
01 — Kitchen Sink Doing too much in one prompt Task decomposition
02 — Stale Context Relying on out-of-date model knowledge Grounding / RAG
03 — Injection Vulnerable Secrets in system prompt, no override defense Security & input sanitisation
04 — Ambiguous Format No output schema for structured data Constrained output (Lab 5)
05 — Missing Constraints All 4 of 5 prompt components absent Prompt anatomy (Module 1)

See failure-gallery/README.md for the scoring rubric.


Output Disclaimer

LLM outputs are non-deterministic. Your results will differ from run to run and from model to model. The purpose of these labs is to observe relative differences between prompt strategies, not to reproduce exact numbers.


← Back to curriculum