Home

”Further Reading” from the Book

Chapter 1

  • Custom-built LLMs are able to outperform general-purpose LLMs as a team at Bloomberg showed via a version of GPT pretrained on finance data from scratch. The custom LLM outperformed ChatGPT on financial tasks while maintaining good performance on general LLM benchmarks:

  • Existing LLMs can be adapted and fine-tuned to outperform general LLMs as well, which teams from Google Research and Google DeepMind showed in a medical context:

  • The following paper proposed the original transformer architecture:

  • On the original encoder-style transformer, called BERT, see “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding” (2018) by Devlin et al., https://arxiv.org/abs/1810.04805.

  • The paper describing the decoder-style GPT-3 model, which inspired modern LLMs and will be used as a template for implementing an LLM from scratch in this book, is “Language Models are Few-Shot Learners” (2020) by Brown et al., https://arxiv.org/abs/2005.14165.

  • The following covers the original vision transformer for classifying images, which illus- trates that transformer architectures are not only restricted to text inputs:

  • The following experimental (but less popular) LLM architectures serve as examples that not all LLMs need to be based on the transformer architecture:

  • Meta AI’s model is a popular implementation of a GPT-like model that is openly avail- able in contrast to GPT-3 and ChatGPT:

  • For additional details about the dataset references in section 1.5, this paper describes the publicly available The Pile dataset curated by Eleuther AI:

  • The following paper provides the reference for InstructGPT for fine-tuning GPT-3, which was mentioned in section 1.6 and will be discussed in more detail in chapter 7:


Additional Materials from the Web


Home