Commit Graph

21 Commits

Author SHA1 Message Date
Vulcan
2a52e9d292 Llama 3.1 Support
README.md - Added examples and docs for Llama 3.1 usage
run / runq - Llama 3.1 is supported as Llama 3 is supported
2024-07-24 01:56:56 +05:30
Vulcan
3d9ae22541 Update run and runq
run - mirror changes to runq
2024-07-20 21:35:09 +05:30
Vulcan
e842bf7118 Update runq.c
runq - moarrr openmp/openacc parallel loops
2024-07-20 20:53:25 +05:30
Vulcan
1c47da5ebf Update runq.c
runq - speed up rmsnorm with OpenMP / OpenACC
2024-07-20 19:47:46 +05:30
Vulcan
16e223fbca Update runq.c
runq - Undo #pragma omp parallel sections for matmuls for now as there is no real benefit with low number of cores
2024-07-20 19:20:30 +05:30
Vulcan
725faaa608 Update runq.c 2024-07-20 19:14:56 +05:30
Vulcan
fae1157b0b runq - Add OpenMP parallel regions
runq - Experiment to verify speed up matmuls with OpenMP parallel sections

Ref: https://github.com/karpathy/llama2.c/pull/75
2024-07-20 19:08:18 +05:30
Vulcan
036d7cb9f2 runq - remove blas & optimize
runq - optimize matmul and quantization functions with OpenMP
2024-07-20 17:44:29 +05:30
Vulcan
8458b68338 runq and runc tiny fixes
runq - add blas for matmul
2024-07-19 14:57:19 +05:30
Vulcan
e893f18a36 Support Llama3 8bit quantized inference
runq - add llama3 support
2024-07-12 11:52:03 +05:30
Vulcan
4d6452ed5b Makefile: LLVM BOLT Support
- Makefile: Add LLVM BOLT build

Usage:

make BOLTPREP=1 <target> ;  make run_bolt

- run.c / runq.c : Enable exit command in prompt in embedded model builds

- README.md: Update usage
2024-04-05 21:37:48 +05:30
Vulcan
d62525d980 runq.c - Disabled cblas matmul
May need invasive rewrite for 8bit quant. Won't fix.
2024-03-20 17:32:16 +05:30
Vulcan
dd82c76dce L2Efy runq.c
TODO:
- BLAS builds are broken
- Add to Makefile
2024-03-20 16:43:04 +05:30
Andrej
e0eb8b29ab
Merge pull request #444 from maxbbraun/patch-1
Fix typo in runq.c comment
2024-02-12 17:21:08 -08:00
digger yu
2fbf7059aa
fix some typo 2023-11-28 18:09:22 +08:00
Max Braun
c760ae6171
Fix typo in runq.c comment 2023-11-11 19:00:00 -08:00
Andrej Karpathy
b233b77058 add some docs for runq 2023-10-09 16:35:51 +00:00
atamyrat
6e52df9b41 properly handle token embeddings & shared classifier wcls 2023-08-27 08:18:03 +03:00
atamyrat
06175b946b free() quantizedtensors 2023-08-27 06:47:03 +03:00
atamyrat
f850a97c6a draft refactor to use QuantizedTensor in function arguments 2023-08-27 06:05:20 +03:00
Andrej Karpathy
df80471914 draft of int8 attempt number two 2023-08-26 22:28:08 +00:00