aisingapore
/

sea-lion-3b

Text Generation

Transformers

Safetensors

mpt

custom_code

Model card Files Files and versions Community

dotw commited on Oct 24, 2023

Commit

29dd675

•

1 Parent(s): b72b898

Update README.md

Browse files

Files changed (1) hide show

README.md +36 -18

README.md CHANGED Viewed

@@ -78,7 +78,7 @@ Users (both direct and downstream) should be made aware of the risks, biases and
 Use the code below to get started with the model.
-[Todo: Insert Code Here]
 ## Training Details
@@ -113,11 +113,11 @@ SEA LION 3B was trained on 980B tokens of RefinedWeb (English) and mC4 (Chinese,
 <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
-SEA LION 3B was trained on 256 A100 40GB GPUs, using MosaicML Composer.
 #### Training Hyperparameters
@@ -146,23 +146,23 @@ The training took 14 days to complete.
 <!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
 #### Factors
 <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[More Information Needed]
 #### Metrics
 <!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
 ### Results
-[More Information Needed]
 #### Summary
@@ -202,7 +202,6 @@ SEA LION 3B is a decoder model using the MPT architecture.
 ### Compute Infrastructure
 #### Hardware
 SEA LION 3B was trained on AWS EC2 cluster comprising 32 p4d.24xlarge instances, using a total of 256 A100 40GB GPUs.
@@ -217,28 +216,47 @@ SEA LION 3B was trained using MosaicML Composer using PyTorch FullyShardedDataPa
 **BibTeX:**
-[More Information Needed]
 **APA:**
-[More Information Needed]
 ## Glossary [optional]
 <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
 ## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
 ## Model Card Contact
-[More Information Needed]

 Use the code below to get started with the model.
+[ Todo: Insert Code Here ]
 ## Training Details
 <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+SEA LION 3B was trained on 256 A100 40GB GPUs, using MosaicML Composer.
+#### Preprocessing [optional]
+N/A
 #### Training Hyperparameters
 <!-- This should link to a Dataset Card if possible. -->
+_Coming soon_
 #### Factors
 <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+_Coming soon_
 #### Metrics
 <!-- These are the evaluation metrics being used, ideally with a description of why. -->
+_Coming soon_
 ### Results
+_Coming soon_
 #### Summary
 ### Compute Infrastructure
 #### Hardware
 SEA LION 3B was trained on AWS EC2 cluster comprising 32 p4d.24xlarge instances, using a total of 256 A100 40GB GPUs.
 **BibTeX:**
+N/A
 **APA:**
+N/A
 ## Glossary [optional]
 <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+N/A
 ## More Information [optional]
+N/A
+## The Team
+Hamsawardhini Rengarajan
+Holy Lovenia
+Lam Clarence
+Leong Weiqi
+Li Yier
+Ng Raymond
+Ngui Jian Gang
+Railey Montalan
+Tai Ngee Chia
+Tan Choon Meng
+Thanh Ngan Nguyen
+Teo Jin Howe
+Teo Wei Yi
+Yeo Yeow Tong
+Yong Xianbin
+Yosephine
+William Tjhi
+Ong Tat-Wee David
+Darius Liu
+Leslie Teo
 ## Model Card Contact
+[ Todo: Get AISG Contact ]