SmolLM models in MLC, ONNX and GGUF format for local applications + in-browser demos
Hugging Face TB Research
Enterprise
community
AI & ML interests
Exploring synthetic datasets, generated by Large Language Models (TB is for Textbook, as inspired by the "Textbooks are all your need" paper)
Organization Card
HuggingFaceTB
This is the home of synthetic datasets for pre-training, such as Cosmopedia v1 and v2. We're trying to scale synthetic data generation by curating diverse prompts that cover a wide range of topics and efficiently scaling the generations on GPUs with tools like llm-swarm.
We released:
- Cosmopedia: the largest open synthetic dataset, with 25B tokens and more than 30M samples. It contains synthetic textbooks, blog posts, stories, posts, and WikiHow articles generated by Mixtral-8x7B-Instruct-v0.1.
- Cosmo-1B a 1B model trained on Cosmopedia.
- FineWeb-Edu: a filtered version of FineWeb dataset for educational content
- SmolLM models: a series of strong small models in three sizes: 135M, 360M and 1.7B
- Smollm-Corpus: the pre-training corpus of SmolLM models including Cosmopedia v0.2, FineWeb-Edu and Python-Edu.
For more details check our blog posts: https://huggingface.co/blog/cosmopedia and https://huggingface.co/blog/smollm
models
18
HuggingFaceTB/SmolLM-1.7B
Text Generation
•
Updated
•
11.4k
•
153
HuggingFaceTB/SmolLM-135M-Instruct
Text Generation
•
Updated
•
22k
•
92
HuggingFaceTB/cosmo2-tokenizer
Updated
•
1
HuggingFaceTB/SmolLM-1.7B-Instruct
Text Generation
•
Updated
•
7.84k
•
98
HuggingFaceTB/SmolLM-360M-Instruct
Text Generation
•
Updated
•
16.7k
•
73
HuggingFaceTB/smollm-135M-instruct-v0.2-Q8_0-GGUF
Updated
•
911
•
5
HuggingFaceTB/smollm-360M-instruct-add-basics-q0f16-MLC
Updated
•
202
HuggingFaceTB/smollm-1.7B-instruct-add-basics-q4f16_1-MLC
Updated
•
5
•
1
HuggingFaceTB/smollm-135M-instruct-add-basics-q0f16-MLC
Updated
•
17
HuggingFaceTB/smollm-360M-instruct-add-basics
Text Generation
•
Updated
•
52
•
2
datasets
30
HuggingFaceTB/MATH
Updated
•
112
HuggingFaceTB/alpaca_eval_details
Viewer
•
Updated
•
3.22k
•
51
HuggingFaceTB/smollm-corpus
Viewer
•
Updated
•
237M
•
68.6k
•
224
HuggingFaceTB/everyday-conversations-llama3.1-2k
Viewer
•
Updated
•
2.38k
•
400
•
74
HuggingFaceTB/instruct-data-basics-smollm-H4
Viewer
•
Updated
•
767
•
120
HuggingFaceTB/self-oss-instruct-sc2-H4
Viewer
•
Updated
•
50.7k
•
133
•
1
HuggingFaceTB/Magpie-Pro-300K-Filtered-H4
Viewer
•
Updated
•
300k
•
165
•
2
HuggingFaceTB/OpenHermes-2.5-H4
Viewer
•
Updated
•
1M
•
132
•
1
HuggingFaceTB/bisac_expanded_topics
Viewer
•
Updated
•
34.2k
•
83
HuggingFaceTB/cosmopedia
Viewer
•
Updated
•
31.1M
•
19.1k
•
559