adarshxs commited on
Commit
991440d
1 Parent(s): aa24407

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +103 -1
README.md CHANGED
@@ -1,3 +1,105 @@
1
  ---
2
  license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ datasets:
4
+ - mhenrichsen/alpaca_2k_test
5
+ ---
6
+
7
+ We fine tune base `Llama-2-7b-hf` on the `henrichsen/alpaca_2k_test` dataset using peft-LORA.
8
+ Find adapters at: https://huggingface.co/Tensoic/Llama-2-7B-alpaca-2k-test
9
+
10
+ Visit us at: https://tensoic.com
11
+
12
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/644bf6ef778ecbfb977e8e84/C0btqRI3eCz0kNYGQoa9k.png)
13
+
14
+ ## Training Setup:
15
+ ```
16
+ Number of GPUs: 8x NVIDIA V100 GPUs
17
+ GPU Memory: 32GB each (SXM2 form factor)
18
+ ```
19
+ ## Training Configuration:
20
+
21
+ ```yaml
22
+ base_model: meta-llama/Llama-2-7b-hf
23
+ base_model_config: meta-llama/Llama-2-7b-hf
24
+ model_type: LlamaForCausalLM
25
+ tokenizer_type: LlamaTokenizer
26
+ is_llama_derived_model: true
27
+
28
+ load_in_8bit: true
29
+ load_in_4bit: false
30
+ strict: false
31
+
32
+ datasets:
33
+ - path: mhenrichsen/alpaca_2k_test
34
+ type: alpaca
35
+ dataset_prepared_path: last_run_prepared
36
+ val_set_size: 0.01
37
+ output_dir: ./lora-out
38
+
39
+ sequence_len: 4096
40
+ sample_packing: false
41
+ pad_to_sequence_len: true
42
+
43
+ adapter: lora
44
+ lora_model_dir:
45
+ lora_r: 32
46
+ lora_alpha: 16
47
+ lora_dropout: 0.05
48
+ lora_target_linear: true
49
+ lora_fan_in_fan_out:
50
+
51
+ wandb_project:
52
+ wandb_entity:
53
+ wandb_watch:
54
+ wandb_run_id:
55
+ wandb_log_model:
56
+
57
+ gradient_accumulation_steps: 4
58
+ micro_batch_size: 2
59
+ num_epochs: 3
60
+ optimizer: adamw_bnb_8bit
61
+ lr_scheduler: cosine
62
+ learning_rate: 0.0002
63
+
64
+ train_on_inputs: false
65
+ group_by_length: false
66
+ bf16: false
67
+ fp16: true
68
+ tf32: false
69
+
70
+ gradient_checkpointing: true
71
+ early_stopping_patience:
72
+ resume_from_checkpoint:
73
+ local_rank:
74
+ logging_steps: 1
75
+ xformers_attention: true
76
+ flash_attention: false
77
+
78
+ warmup_steps: 10
79
+ eval_steps: 20
80
+ save_steps:
81
+ debug:
82
+ deepspeed:
83
+ weight_decay: 0.0
84
+ fsdp:
85
+ fsdp_config:
86
+ special_tokens:
87
+ bos_token: "<s>"
88
+ eos_token: "</s>"
89
+ unk_token: "<unk>"
90
+ ```
91
+ ```
92
+ The following `bitsandbytes` quantization config was used during training:
93
+ - quant_method: bitsandbytes
94
+ - load_in_8bit: True
95
+ - load_in_4bit: False
96
+ - llm_int8_threshold: 6.0
97
+ - llm_int8_skip_modules: None
98
+ - llm_int8_enable_fp32_cpu_offload: False
99
+ - llm_int8_has_fp16_weight: False
100
+ - bnb_4bit_quant_type: fp4
101
+ - bnb_4bit_use_double_quant: False
102
+ - bnb_4bit_compute_dtype: float32
103
+
104
+ ```
105
+