Skip to content

References

This file provides a centralized bibliography for all works cited across the learn/ curriculum and the comparisons/ documents. Entries follow APA 7th edition format and include DOIs or stable URLs where available.

Documents in this repository that previously used unresolvable source indices (e.g., [web:2], [web:11]) should be updated to reference entries in this bibliography by their citation key (e.g., [Wei2022], [Brown2020]).


Foundational Works

[Brown2020] Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., … Amodei, D. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877–1901. https://doi.org/10.48550/arXiv.2005.14165

[Vaswani2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30, 5998–6008. https://doi.org/10.48550/arXiv.1706.03762

[Ouyang2022] Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C. L., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P., Leike, J., & Lowe, R. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35, 27730–27744. https://doi.org/10.48550/arXiv.2203.02155


Chain-of-Thought and Reasoning

[Wei2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., & Zhou, D. (2022). Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35, 24824–24837. https://doi.org/10.48550/arXiv.2201.11903

[Kojima2022] Kojima, T., Gu, S. S., Reid, M., Matsuo, Y., & Iwasawa, Y. (2022). Large language models are zero-shot reasoners. Advances in Neural Information Processing Systems, 35. https://doi.org/10.48550/arXiv.2205.11916

[Wang2023] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., & Zhou, D. (2023). Self-consistency improves chain of thought reasoning in language models. International Conference on Learning Representations (ICLR). https://doi.org/10.48550/arXiv.2203.11171


Agents and Tool Use

[Yao2023] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., & Cao, Y. (2023). ReAct: Synergizing reasoning and acting in language models. International Conference on Learning Representations (ICLR). https://doi.org/10.48550/arXiv.2210.03629

[Schick2023] Schick, T., Dwivedi-Yu, J., Dessì, R., Raileanu, R., Lomeli, M., Zettlemoyer, L., Cancedda, N., & Scialom, T. (2023). Toolformer: Language models can teach themselves to use tools. Advances in Neural Information Processing Systems, 36. https://doi.org/10.48550/arXiv.2302.04761

[Shinn2023] Shinn, N., Cassano, F., Gopinath, A., Shakkottai, K., Labash, A., & Kambhampati, S. (2023). Reflexion: Language agents with verbal reinforcement learning. Advances in Neural Information Processing Systems, 36. https://doi.org/10.48550/arXiv.2303.11366

[Park2023] Park, J. S., O'Brien, J. C., Cai, C. J., Morris, M. R., Liang, P., & Bernstein, M. S. (2023). Generative agents: Interactive simulacra of human behavior. Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST). https://doi.org/10.1145/3586183.3606763

[Sumers2024] Sumers, T. R., Yao, S., Narasimhan, K., & Griffiths, T. L. (2024). Cognitive architectures for language agents. Transactions on Machine Learning Research (TMLR). https://doi.org/10.48550/arXiv.2309.02427


Retrieval-Augmented Generation

[Lewis2020] Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W., Rocktäschel, T., Riedel, S., & Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. Advances in Neural Information Processing Systems, 33, 9459–9474. https://doi.org/10.48550/arXiv.2005.11401

[Gao2024] Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., Wang, M., & Wang, H. (2024). Retrieval-augmented generation for large language models: A survey. arXiv preprint. https://doi.org/10.48550/arXiv.2312.10997


Prompt Engineering Methodology

[White2023] White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., & Schmidt, D. C. (2023). A prompt pattern catalog to enhance prompt engineering with ChatGPT. arXiv preprint. https://doi.org/10.48550/arXiv.2302.11382

[Bach2022] Bach, S. H., Sanh, V., Yong, Z. X., Webson, A., Raffel, C., Nayak, N. V., Sharma, A., Kim, T., Bari, M. S., Fèvry, T., Alyafeai, Z., Dey, M., Santilli, A., Sun, Z., Ben-David, S., Xu, C., Candan, G., Haber, J., Zhu, F., … Rush, A. M. (2022). PromptSource: An integrated development environment and repository for natural language prompts. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 93–104. https://doi.org/10.18653/v1/2022.acl-demo.9

[Zheng2023] Zheng, L., Chiang, W.-L., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., Lin, Z., Li, Z., Li, D., Xing, E. P., Zhang, H., Gonzalez, J. E., & Stoica, I. (2023). Judging LLM-as-a-judge with MT-Bench and Chatbot Arena. Advances in Neural Information Processing Systems, 36. https://doi.org/10.48550/arXiv.2306.05685


Instruction Tuning

[Sanh2022] Sanh, V., Webson, A., Raffel, C., Bach, S. H., Sutawika, L., Alyafeai, Z., Chaffin, A., Stiegler, A., Raja, A., Dey, M., Bari, M. S., Xu, C., Thakker, U., Sharma, S. S., Szczechla, E., Kim, T., Chhablani, G., Nayak, N. V., Datta, D., … Rush, A. M. (2022). Multitask prompted training enables zero-shot task generalization. International Conference on Learning Representations (ICLR). https://doi.org/10.48550/arXiv.2110.08207

[Chung2022] Chung, H. W., Hou, L., Longpre, S., Zoph, B., Tay, Y., Fedus, W., Li, E., Wang, X., Dehghani, M., Brahma, S., Webson, A., Gu, S. S., Dai, Z., Suzgun, M., Chen, X., Chowdhery, A., Narang, S., Mishra, G., Yu, A., … Wei, J. (2022). Scaling instruction-finetuned language models. arXiv preprint. https://doi.org/10.48550/arXiv.2210.11416


Adversarial Robustness and Safety

[Perez2022] Perez, E., Huang, S., Song, F., Cai, T., Ring, R., Aslanides, J., Glaese, A., McAleese, N., & Irving, G. (2022). Red teaming language models with language models. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP), 3419–3448. https://doi.org/10.18653/v1/2022.emnlp-main.225

[OWASP2025] OWASP. (2025). OWASP Top 10 for Large Language Model Applications. https://owasp.org/www-project-top-10-for-large-language-model-applications/

[Greshake2023] Greshake, K., Abdelnabi, S., Mishra, S., Endres, C., Holz, T., & Fritz, M. (2023). Not what you've signed up for: Compromising real-world LLM-integrated applications with indirect prompt injection. Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security, 79–90. https://doi.org/10.1145/3605764.3623985


Automatic Prompt Optimization

[Khattab2023] Khattab, O., Singhvi, A., Maheshwari, P., Zhang, Z., Santhanam, K., Vardhamanan, S., Haq, S., Sharma, A., Joshi, T. T., Mober, H., Grabber, M., Ji, J., Baez, R. M., Rush, A. M., Potts, C., & Zaharia, M. (2023). DSPy: Compiling declarative language model calls into self-improving pipelines. arXiv preprint. https://doi.org/10.48550/arXiv.2310.03714

[Yang2023] Yang, C., Wang, X., Lu, Y., Liu, H., Le, Q. V., Zhou, D., & Chen, X. (2023). Large language models as optimizers. arXiv preprint. https://doi.org/10.48550/arXiv.2309.03409

[Zhou2023] Zhou, Y., Muresanu, A. I., Han, Z., Paster, K., Pitis, S., Chan, H., & Ba, J. (2023). Large language models are human-level prompt engineers. International Conference on Learning Representations (ICLR). https://doi.org/10.48550/arXiv.2211.01910

[Fernando2023] Fernando, C., Banarse, D., Michalewski, H., Osindero, S., & Rocktäschel, T. (2023). PromptBreeder: Self-referential self-improvement via prompt evolution. arXiv preprint. https://doi.org/10.48550/arXiv.2309.16797


Context Window and Attention

[Liu2024] Liu, N. F., Lin, K., Hewitt, J., Paranjape, A., Bevilacqua, M., Petroni, F., & Liang, P. (2024). Lost in the middle: How language models use long contexts. Transactions of the Association for Computational Linguistics, 12, 157–173. https://doi.org/10.1162/tacl_a_00638


Reasoning Models and Test-Time Compute

[Snell2024] Snell, C., Lee, J., Xu, K., & Kumar, A. (2024). Scaling LLM test-time compute optimally can be more effective than scaling model parameters. arXiv preprint. https://doi.org/10.48550/arXiv.2408.03314

[Lightman2023] Lightman, H., Kosaraju, V., Burda, Y., Edwards, H., Baker, B., Lee, T., Leike, J., Schulman, J., Sutskever, I., & Cobbe, K. (2023). Let's verify step by step. International Conference on Learning Representations (ICLR). https://doi.org/10.48550/arXiv.2305.20050

[Saha2024] Saha, S., Hase, P., & Bansal, M. (2024). System-2 attention (is something you might need too). arXiv preprint. https://doi.org/10.48550/arXiv.2311.11829


Benchmarks Referenced

GSM8K — Cobbe, K., Kosaraju, V., Bavarian, M., Chen, M., Jun, H., Kaiser, L., Plappert, M., Tworek, J., Hilton, J., Nakano, R., Hesse, C., & Schulman, J. (2021). Training verifiers to solve math word problems. arXiv preprint. https://doi.org/10.48550/arXiv.2110.14168

StrategyQA — Geva, M., Khashabi, D., Segal, E., Khot, T., Roth, D., & Berant, J. (2021). Did Aristotle use a laptop? A question answering benchmark with implicit reasoning strategies. Transactions of the Association for Computational Linguistics, 9, 346–361. https://doi.org/10.1162/tacl_a_00370

MT-Bench — See [Zheng2023] above.


How to Cite This Repository

If you use these templates or curriculum materials in academic work, please cite:

@software{suri2026promptengineering,
  author       = {Suri, Kunal},
  title        = {Prompt Engineering Playbook: Curriculum and Reusable Prompt Templates for LLM-powered Development},
  year         = {2026},
  version      = {v0.1.0-beta},
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.18827631},
  url          = {https://doi.org/10.5281/zenodo.18827631},
}

APA:

Suri, K. (2026). Prompt Engineering Playbook: Curriculum and Reusable Prompt Templates for LLM-powered Development (v0.1.0-beta). Zenodo. https://doi.org/10.5281/zenodo.18827631

Note on performance figures. Where comparison documents cite approximate performance numbers (e.g., "~85% accuracy on GSM8K"), these are illustrative figures intended for pedagogical purposes. Exact numbers vary by model, prompt variant, and evaluation protocol. Consult the primary sources listed above for precise empirical benchmarks.