duongttr commited on
Commit
aa2cd99
1 Parent(s): 32efea7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -47,7 +47,7 @@ The dataset consists of:
47
  You can find out the combined version here: [duongttr/vi-dataset-for-pretrain](https://huggingface.co/datasets/duongttr/vi-dataset-for-pretrain)
48
 
49
  ## Hyperparamters & Results
50
- We trained the model ~100k steps, with `lr=1e-4`, `bs=1920`, `optimizer=adamw` on TPU-VM-3.8 from [TRC Program](https://sites.research.google/trc/about/). The training costs around **2.5 days**.
51
  |Model|Eval Loss|Eval Perplexity|
52
  |---|---|---|
53
  |**gpt2-base**|**3.939**|**51.35**|
 
47
  You can find out the combined version here: [duongttr/vi-dataset-for-pretrain](https://huggingface.co/datasets/duongttr/vi-dataset-for-pretrain)
48
 
49
  ## Hyperparamters & Results
50
+ We trained the model ~100k steps, with `lr=1e-4`, `bs=2560` (`single_batch_size=32` * `num_core=8` * `grad_cum=10`), `optimizer=adamw` on TPU-VM-3.8 from [TRC Program](https://sites.research.google/trc/about/). The training costs around **1 day**.
51
  |Model|Eval Loss|Eval Perplexity|
52
  |---|---|---|
53
  |**gpt2-base**|**3.939**|**51.35**|