Inference of LLaMA model in pure C/C++

5050 Views 0 Replies Harry Potter
#
04-14-2023, 03:21 PM |
The main goal is to run the model using 4-bit quantization on a MacBook

Plain C/C++ implementation without dependencies
Apple silicon first-class citizen - optimized via ARM NEON and Accelerate framework
AVX2 support for x86 architectures
Mixed F16 / F32 precision
4-bit quantization support
Runs on the CPU
I have no idea if it works correctly. Please do not make conclusions about the models based on the results from this implementation. For all I know, it can be completely wrong. This project is for educational purposes. 

   


Hidden Content


Please Log In or Register to view this content.
Thread Info
AuthorHarry Potter
Posted
Views5050
Replies0
Participants1

This board is for authorized security research only. Attacking systems without permission is illegal. The community follows responsible disclosure.