README.md · cropinailab/aksara_v1

metadata

license: apache-2.0
language:
  - en
pipeline_tag: text-generation
inference: false
tags:
  - gguf
  - 4bit

This repo provides the GGUF format for the aksara_v1 model. This model has a precision of 4-bit and is capable of doing inference with GPU as well as CPU only.

To run using Python:

Install llama-cpp-python:

! CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python

Download the model:

from huggingface_hub import hf_hub_download

model_name = "cropinailab/aksara_v1_GGUF"
model_file = "aksara_v1.Q4_K_M.gguf"
model_path = hf_hub_download(model_name,
                             filename=model_file,
                             token='<YOUR_HF_TOKEN>'
                             local_dir='<PATH_TO_SAVE_MODEL>')

Run the model:

from llama_cpp import Llama
llm = Llama(
  model_path=model_path,  # path to GGUF file
  n_ctx=4096,  # The max sequence length to use - note that longer sequence lengths require much more resources
  n_gpu_layers=-1, # The number of layers to offload to GPU, if you have GPU acceleration available.
                   # Set to 0 if no GPU acceleration is available on your system and -1 for all GPU layers.
)
prompt = "What are the recommended NPK dosage for maize varieties?"

# Simple inference example
output = llm(
  f"<|user|>\n{prompt}<|end|>\n<|assistant|>",
  max_tokens=512,  # Generate up to 512 tokens
  stop=["<|end|>"], 
  echo=True,  # Whether to echo the prompt
)
print(output['choices'][0]['text'])

For using the model with a more detailed pipeline refer to the following notebook