mirror of
https://github.com/trholding/llama2.c.git
synced 2026-02-06 11:26:53 +00:00
Bring back legendary tag line :D
hahaha
This commit is contained in:
parent
766a30bc6e
commit
d0237abd32
@ -4,6 +4,8 @@
|
||||
<img src="assets/llama_cute.jpg" width="300" height="300" alt="Cute Llama">
|
||||
</p>
|
||||
|
||||
Have you ever wanted to inference a baby [Llama 2](https://ai.meta.com/llama/) model in pure C? No? Well, now you can!
|
||||
|
||||
Train the Llama 2 LLM architecture in PyTorch then inference it with one simple 700-line C file ([run.c](run.c)). You might think that you need many billion parameter LLMs to do anything useful, but in fact very small LLMs can have surprisingly strong performance if you make the domain narrow enough (ref: [TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories) paper). This repo is a "fullstack" train + inference solution for Llama 2 LLM, with focus on minimalism and simplicity.
|
||||
|
||||
As the architecture is identical, you can also load and inference Meta's Llama 2 models. However, the current code only inferences models in fp32, so you will most likely not be able to productively load models larger than 7B. Work on model quantization is currently ongoing.
|
||||
|
||||
Loading…
Reference in New Issue
Block a user