aksara_v1_GGUF / README.md
hingeankit's picture
Update README.md
8673964 verified
---
license: apache-2.0
language:
- en
pipeline_tag: text-generation
inference: false
tags:
- gguf
- 4bit
---
This repo provides the GGUF format for the [aksara_v1](https://huggingface.co/cropinailab/aksara_v1) model. This model has a precision of 4-bit and is capable of doing inference with GPU as well as CPU only.
## To run using Python:
1. **Install llama-cpp-python:**
```
! CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python
```
2. **Download the model:**
```python
from huggingface_hub import hf_hub_download
model_name = "cropinailab/aksara_v1_GGUF"
model_file = "aksara_v1.Q4_K_M.gguf"
model_path = hf_hub_download(model_name,
filename=model_file,
token='<YOUR_HF_TOKEN>'
local_dir='<PATH_TO_SAVE_MODEL>')
```
3. **Run the model:**
```python
from llama_cpp import Llama
llm = Llama(
model_path=model_path, # path to GGUF file
n_ctx=4096, # The max sequence length to use - note that longer sequence lengths require much more resources
n_gpu_layers=-1, # The number of layers to offload to GPU, if you have GPU acceleration available.
# Set to 0 if no GPU acceleration is available on your system and -1 for all GPU layers.
)
prompt = "What are the recommended NPK dosage for maize varieties?"
# Simple inference example
output = llm(
f"<|user|>\n{prompt}<|end|>\n<|assistant|>",
max_tokens=512, # Generate up to 512 tokens
stop=["<|end|>"],
echo=True, # Whether to echo the prompt
)
print(output['choices'][0]['text'])
```
**For using the model with a more detailed pipeline refer to the following [notebook](https://colab.research.google.com/drive/13u4msrKGJX2V_5_k8PZAVJh84-7XonmA?usp=sharing)**