Bring back legendary tag line :D

hahaha
2026-02-06 11:26:53 +00:00 · 2023-10-05 15:19:39 -07:00 · 2023-10-05 15:19:39 -07:00 · d0237abd32
commit d0237abd32
parent 766a30bc6e
1 changed files with 2 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -4,6 +4,8 @@
  <img src="assets/llama_cute.jpg" width="300" height="300" alt="Cute Llama">
 </p>

+Have you ever wanted to inference a baby [Llama 2](https://ai.meta.com/llama/) model in pure C? No? Well, now you can!
+
 Train the Llama 2 LLM architecture in PyTorch then inference it with one simple 700-line C file ([run.c](run.c)). You might think that you need many billion parameter LLMs to do anything useful, but in fact very small LLMs can have surprisingly strong performance if you make the domain narrow enough (ref: [TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories) paper). This repo is a "fullstack" train + inference solution for Llama 2 LLM, with focus on minimalism and simplicity.

 As the architecture is identical, you can also load and inference Meta's Llama 2 models. However, the current code only inferences models in fp32, so you will most likely not be able to productively load models larger than 7B. Work on model quantization is currently ongoing.