--- license: apache-2.0 language: - en pipeline_tag: text-generation inference: false tags: - gguf - 4bit --- This repo provides the GGUF format for the [aksara_v1](https://huggingface.co/cropinailab/aksara_v1) model. This model has a precision of 4-bit and is capable of doing inference with GPU as well as CPU only. ## To run using Python: 1. **Install llama-cpp-python:** ``` ! CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python ``` 2. **Download the model:** ```python from huggingface_hub import hf_hub_download model_name = "cropinailab/aksara_v1_GGUF" model_file = "aksara_v1.Q4_K_M.gguf" model_path = hf_hub_download(model_name, filename=model_file, token='' local_dir='') ``` 3. **Run the model:** ```python from llama_cpp import Llama llm = Llama( model_path=model_path, # path to GGUF file n_ctx=4096, # The max sequence length to use - note that longer sequence lengths require much more resources n_gpu_layers=-1, # The number of layers to offload to GPU, if you have GPU acceleration available. # Set to 0 if no GPU acceleration is available on your system and -1 for all GPU layers. ) prompt = "What are the recommended NPK dosage for maize varieties?" # Simple inference example output = llm( f"<|user|>\n{prompt}<|end|>\n<|assistant|>", max_tokens=512, # Generate up to 512 tokens stop=["<|end|>"], echo=True, # Whether to echo the prompt ) print(output['choices'][0]['text']) ``` **For using the model with a more detailed pipeline refer to the following [notebook](https://colab.research.google.com/drive/13u4msrKGJX2V_5_k8PZAVJh84-7XonmA?usp=sharing)**