References¶

This file provides a centralized bibliography for all works cited across the learn/ curriculum and the comparisons/ documents. Entries follow APA 7^th edition format and include DOIs or stable URLs where available.

Documents in this repository that previously used unresolvable source indices (e.g., [web:2], [web:11]) should be updated to reference entries in this bibliography by their citation key (e.g., [Wei2022], [Brown2020]).

Foundational Works¶

[Brown2020] Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., … Amodei, D. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877–1901. https://doi.org/10.48550/arXiv.2005.14165

[Vaswani2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30, 5998–6008. https://doi.org/10.48550/arXiv.1706.03762

[Ouyang2022] Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C. L., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P., Leike, J., & Lowe, R. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35, 27730–27744. https://doi.org/10.48550/arXiv.2203.02155

Chain-of-Thought and Reasoning¶

[Wei2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., & Zhou, D. (2022). Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35, 24824–24837. https://doi.org/10.48550/arXiv.2201.11903

[Kojima2022] Kojima, T., Gu, S. S., Reid, M., Matsuo, Y., & Iwasawa, Y. (2022). Large language models are zero-shot reasoners. Advances in Neural Information Processing Systems, 35. https://doi.org/10.48550/arXiv.2205.11916

[Wang2023] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., & Zhou, D. (2023). Self-consistency improves chain of thought reasoning in language models. International Conference on Learning Representations (ICLR). https://doi.org/10.48550/arXiv.2203.11171

Agents and Tool Use¶

[Yao2023] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., & Cao, Y. (2023). ReAct: Synergizing reasoning and acting in language models. International Conference on Learning Representations (ICLR). https://doi.org/10.48550/arXiv.2210.03629

[Schick2023] Schick, T., Dwivedi-Yu, J., Dessì, R., Raileanu, R., Lomeli, M., Zettlemoyer, L., Cancedda, N., & Scialom, T. (2023). Toolformer: Language models can teach themselves to use tools. Advances in Neural Information Processing Systems, 36. https://doi.org/10.48550/arXiv.2302.04761

[Shinn2023] Shinn, N., Cassano, F., Gopinath, A., Shakkottai, K., Labash, A., & Kambhampati, S. (2023). Reflexion: Language agents with verbal reinforcement learning. Advances in Neural Information Processing Systems, 36. https://doi.org/10.48550/arXiv.2303.11366

[Park2023] Park, J. S., O'Brien, J. C., Cai, C. J., Morris, M. R., Liang, P., & Bernstein, M. S. (2023). Generative agents: Interactive simulacra of human behavior. Proceedings of the 36^th Annual ACM Symposium on User Interface Software and Technology (UIST). https://doi.org/10.1145/3586183.3606763

[Sumers2024] Sumers, T. R., Yao, S., Narasimhan, K., & Griffiths, T. L. (2024). Cognitive architectures for language agents. Transactions on Machine Learning Research (TMLR). https://doi.org/10.48550/arXiv.2309.02427

Retrieval-Augmented Generation¶

[Lewis2020] Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W., Rocktäschel, T., Riedel, S., & Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. Advances in Neural Information Processing Systems, 33, 9459–9474. https://doi.org/10.48550/arXiv.2005.11401

[Gao2024] Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., Wang, M., & Wang, H. (2024). Retrieval-augmented generation for large language models: A survey. arXiv preprint. https://doi.org/10.48550/arXiv.2312.10997

Prompt Engineering Methodology¶

[White2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., & Schmidt, D. C. (2023). A prompt pattern catalog to enhance prompt engineering with ChatGPT. arXiv preprint. https://doi.org/10.48550/arXiv.2302.11382

[Bach2022] Bach, S. H., Sanh, V., Yong, Z. X., Webson, A., Raffel, C., Nayak, N. V., Sharma, A., Kim, T., Bari, M. S., Fèvry, T., Alyafeai, Z., Dey, M., Santilli, A., Sun, Z., Ben-David, S., Xu, C., Candan, G., Haber, J., Zhu, F., … Rush, A. M. (2022). PromptSource: An integrated development environment and repository for natural language prompts. Proceedings of the 60^th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 93–104. https://doi.org/10.18653/v1/2022.acl-demo.9

[Zheng2023] Zheng, L., Chiang, W.-L., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., Lin, Z., Li, Z., Li, D., Xing, E. P., Zhang, H., Gonzalez, J. E., & Stoica, I. (2023). Judging LLM-as-a-judge with MT-Bench and Chatbot Arena. Advances in Neural Information Processing Systems, 36. https://doi.org/10.48550/arXiv.2306.05685

Instruction Tuning¶

[Sanh2022] Sanh, V., Webson, A., Raffel, C., Bach, S. H., Sutawika, L., Alyafeai, Z., Chaffin, A., Stiegler, A., Raja, A., Dey, M., Bari, M. S., Xu, C., Thakker, U., Sharma, S. S., Szczechla, E., Kim, T., Chhablani, G., Nayak, N. V., Datta, D., … Rush, A. M. (2022). Multitask prompted training enables zero-shot task generalization. International Conference on Learning Representations (ICLR). https://doi.org/10.48550/arXiv.2110.08207

[Chung2022] Chung, H. W., Hou, L., Longpre, S., Zoph, B., Tay, Y., Fedus, W., Li, E., Wang, X., Dehghani, M., Brahma, S., Webson, A., Gu, S. S., Dai, Z., Suzgun, M., Chen, X., Chowdhery, A., Narang, S., Mishra, G., Yu, A., … Wei, J. (2022). Scaling instruction-finetuned language models. arXiv preprint. https://doi.org/10.48550/arXiv.2210.11416

Adversarial Robustness and Safety¶

[Perez2022] Perez, E., Huang, S., Song, F., Cai, T., Ring, R., Aslanides, J., Glaese, A., McAleese, N., & Irving, G. (2022). Red teaming language models with language models. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP), 3419–3448. https://doi.org/10.18653/v1/2022.emnlp-main.225

[OWASP2025] OWASP. (2025). OWASP Top 10 for Large Language Model Applications. https://owasp.org/www-project-top-10-for-large-language-model-applications/

[Greshake2023] Greshake, K., Abdelnabi, S., Mishra, S., Endres, C., Holz, T., & Fritz, M. (2023). Not what you've signed up for: Compromising real-world LLM-integrated applications with indirect prompt injection. Proceedings of the 16^th ACM Workshop on Artificial Intelligence and Security, 79–90. https://doi.org/10.1145/3605764.3623985

Automatic Prompt Optimization¶

[Khattab2023] Khattab, O., Singhvi, A., Maheshwari, P., Zhang, Z., Santhanam, K., Vardhamanan, S., Haq, S., Sharma, A., Joshi, T. T., Mober, H., Grabber, M., Ji, J., Baez, R. M., Rush, A. M., Potts, C., & Zaharia, M. (2023). DSPy: Compiling declarative language model calls into self-improving pipelines. arXiv preprint. https://doi.org/10.48550/arXiv.2310.03714

[Yang2023] Yang, C., Wang, X., Lu, Y., Liu, H., Le, Q. V., Zhou, D., & Chen, X. (2023). Large language models as optimizers. arXiv preprint. https://doi.org/10.48550/arXiv.2309.03409

[Zhou2023] Zhou, Y., Muresanu, A. I., Han, Z., Paster, K., Pitis, S., Chan, H., & Ba, J. (2023). Large language models are human-level prompt engineers. International Conference on Learning Representations (ICLR). https://doi.org/10.48550/arXiv.2211.01910

[Fernando2023] Fernando, C., Banarse, D., Michalewski, H., Osindero, S., & Rocktäschel, T. (2023). PromptBreeder: Self-referential self-improvement via prompt evolution. arXiv preprint. https://doi.org/10.48550/arXiv.2309.16797

Context Window and Attention¶

[Liu2024] Liu, N. F., Lin, K., Hewitt, J., Paranjape, A., Bevilacqua, M., Petroni, F., & Liang, P. (2024). Lost in the middle: How language models use long contexts. Transactions of the Association for Computational Linguistics, 12, 157–173. https://doi.org/10.1162/tacl_a_00638

Reasoning Models and Test-Time Compute¶

[Snell2024] Snell, C., Lee, J., Xu, K., & Kumar, A. (2024). Scaling LLM test-time compute optimally can be more effective than scaling model parameters. arXiv preprint. https://doi.org/10.48550/arXiv.2408.03314

[Lightman2023] Lightman, H., Kosaraju, V., Burda, Y., Edwards, H., Baker, B., Lee, T., Leike, J., Schulman, J., Sutskever, I., & Cobbe, K. (2023). Let's verify step by step. International Conference on Learning Representations (ICLR). https://doi.org/10.48550/arXiv.2305.20050

[Saha2024] Saha, S., Hase, P., & Bansal, M. (2024). System-2 attention (is something you might need too). arXiv preprint. https://doi.org/10.48550/arXiv.2311.11829

Benchmarks Referenced¶

GSM8K — Cobbe, K., Kosaraju, V., Bavarian, M., Chen, M., Jun, H., Kaiser, L., Plappert, M., Tworek, J., Hilton, J., Nakano, R., Hesse, C., & Schulman, J. (2021). Training verifiers to solve math word problems. arXiv preprint. https://doi.org/10.48550/arXiv.2110.14168

StrategyQA — Geva, M., Khashabi, D., Segal, E., Khot, T., Roth, D., & Berant, J. (2021). Did Aristotle use a laptop? A question answering benchmark with implicit reasoning strategies. Transactions of the Association for Computational Linguistics, 9, 346–361. https://doi.org/10.1162/tacl_a_00370

MT-Bench — See [Zheng2023] above.

How to Cite This Repository¶

If you use these templates or curriculum materials in academic work, please cite:

@software{suri2026promptengineering,
  author       = {Suri, Kunal},
  title        = {Prompt Engineering Playbook: Curriculum and Reusable Prompt Templates for LLM-powered Development},
  year         = {2026},
  version      = {v0.1.0-beta},
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.18827631},
  url          = {https://doi.org/10.5281/zenodo.18827631},
}

APA:

Suri, K. (2026). Prompt Engineering Playbook: Curriculum and Reusable Prompt Templates for LLM-powered Development (v0.1.0-beta). Zenodo. https://doi.org/10.5281/zenodo.18827631

Note on performance figures. Where comparison documents cite approximate performance numbers (e.g., "~85% accuracy on GSM8K"), these are illustrative figures intended for pedagogical purposes. Exact numbers vary by model, prompt variant, and evaluation protocol. Consult the primary sources listed above for precise empirical benchmarks.