minhtriphan
commited on
Commit
•
eda0730
1
Parent(s):
c2664de
Update README.md
Browse files
README.md
CHANGED
@@ -19,9 +19,9 @@ We're training the model again with more care and some tricks to enhance the sem
|
|
19 |
Furthermore, the model is trained longer (~10 epochs~ 8 epochs). ~The new pre-trained model weights will be updated as soon as the training and validation are completed.~
|
20 |
|
21 |
# Time and space efficiency
|
22 |
-
We compare the time and space efficiency of this model and some competitors. For these competitors, we clone the positional embedding layers so that they can accept input sequences with maximum length of 65536 tokens.
|
23 |
|
24 |
-
The experiments are implemented with an NVIDIA A100-SXM4-40GB. Batch size of 1. The figures show the time and memory needed to run one batch. In the training mode, forward pass and backpropagation
|
25 |
|
26 |
## Training mode
|
27 |
|
|
|
19 |
Furthermore, the model is trained longer (~10 epochs~ 8 epochs). ~The new pre-trained model weights will be updated as soon as the training and validation are completed.~
|
20 |
|
21 |
# Time and space efficiency
|
22 |
+
We compare the time and space efficiency of this model and some competitors. For these competitors, we clone the positional embedding layers so that they can accept input sequences with the maximum length of 65536 tokens.
|
23 |
|
24 |
+
The experiments are implemented with an NVIDIA A100-SXM4-40GB. Batch size of 1. The figures show the time and memory needed to run one batch. In the training mode, forward pass and backpropagation are included. In the inferring model, only forward pass is included.
|
25 |
|
26 |
## Training mode
|
27 |
|