Peter Szemraj PRO

pszemraj

https://pszemraj.carrd.co/

pszemraj

AI & ML interests

metallic intuition

Organizations

pszemraj's activity

upvoted a collection 10 days ago

INTELLECT-1 Dataset

Collection

INTELLECT-1 Training dataset • 5 items • Updated 15 days ago • 7

upvoted an article 14 days ago

Article

Improving Parquet Dedupe on Hugging Face Hub

18 days ago

• 27

upvoted a collection 15 days ago

Florence

Collection

9 items • Updated Jul 11 • 157

upvoted a paper 23 days ago

The AdEMAMix Optimizer: Better, Faster, Older

Paper • 2409.03137 • Published Sep 5 • 5

upvoted a collection 26 days ago

Qwen2.5

Collection

Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 45 items • Updated Sep 18 • 268

upvoted a paper 29 days ago

DSBench: How Far Are Data Science Agents to Becoming Data Science Experts?

Paper • 2409.07703 • Published Sep 12 • 66

upvoted a collection about 1 month ago

NanoLM

Collection

a collection of nano LMs • 13 items • Updated Sep 11 • 4

upvoted a collection about 2 months ago

Switch-Transformers release

Collection

This release included various MoE (Mixture of expert) models, based on the T5 architecture . The base models use from 8 to 256 experts. • 9 items • Updated Jul 31 • 15

upvoted an article about 2 months ago

Article

Improving Hugging Face Training Efficiency Through Packing with Flash Attention

Aug 21

• 21

upvoted 2 articles 2 months ago

Article

∞🧙🏼‍♂️AnyClassifier - Generating Synthetic Data For Text Classification

•

Aug 19

• 8

Article

The case for specialized pre-training: ultra-fast foundation models for dedicated tasks

•

Aug 4

• 26

upvoted a collection 3 months ago

📈 Scaling Laws with Vocabulary

Collection

Increase your vocabulary size when you scale up your language model • 5 items • Updated Aug 11 • 3

upvoted 3 papers 3 months ago

upvoted a collection 3 months ago

🪐 SmolLM

Collection

A series of smol LLMs: 135M, 360M and 1.7B. We release base and Instruct models as well as the training corpus and some WebGPU demos • 12 items • Updated Aug 18 • 176

upvoted a paper 3 months ago

SpreadsheetLLM: Encoding Spreadsheets for Large Language Models

Paper • 2407.09025 • Published Jul 12 • 128

upvoted a collection 4 months ago

InternLM2.5

Collection

14 items • Updated Sep 14 • 68

upvoted an article 4 months ago

Article

Tokenization Is A Dead Weight (Tokun Part 1)

•

Jun 27

• 16

upvoted a collection 4 months ago

Nemotron 4 340B

Collection

Nemotron-4: open models for Synthetic Data Generation (SDG). Includes Base, Instruct, and Reward models. • 4 items • Updated 22 days ago • 156

upvoted a collection 5 months ago

MatMulfree LM

Collection

Pre-trined models for Matmulfree LM. • 4 items • Updated Jun 10 • 25

upvoted 7 papers 5 months ago

Zamba: A Compact 7B SSM Hybrid Model

Paper • 2405.16712 • Published May 26 • 20

Not All Language Model Features Are Linear

Paper • 2405.14860 • Published May 23 • 39

Is Bigger Edit Batch Size Always Better? -- An Empirical Study on Model Editing with Llama-3

Paper • 2405.00664 • Published May 1 • 18

Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models

Paper • 2405.01535 • Published May 2 • 115

What matters when building vision-language models?

Paper • 2405.02246 • Published May 3 • 98

Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory

Paper • 2405.08707 • Published May 14 • 27

LoRA Learns Less and Forgets Less

Paper • 2405.09673 • Published May 15 • 87

upvoted 5 collections 6 months ago

Flan-T5 release

Collection

The Flan-T5 covers 4 checkpoints of different sizes each time. It also includes upgrades versions trained using Universal sampling • 7 items • Updated Jul 31 • 19

Bee Models 🍯

Collection

models fine-tuned to be knowledgeable about apiary practice • 6 items • Updated Apr 29 • 1

Arctic-embed

Collection

A collection of text embedding models optimized for retrieval accuracy and efficiency • 6 items • Updated Jul 18 • 14

Meta Llama 3

Collection

This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated 27 days ago • 680

RecurrentGemma Release

Collection

8 items • Updated Jul 31 • 39

upvoted 3 papers 9 months ago

ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7 • 36

More Agents Is All You Need

Paper • 2402.05120 • Published Feb 3 • 51

Grandmaster-Level Chess Without Search

Paper • 2402.04494 • Published Feb 7 • 67

upvoted 2 collections 9 months ago

Qwen1.5

Collection

Qwen1.5 is the improved version of Qwen, the large language model series developed by Alibaba Cloud. • 55 items • Updated Sep 18 • 206

Zeroshot Classifiers

Collection

These are my current best zeroshot classifiers. Some of my older models are downloaded more often, but the models in this collection are newer/better. • 11 items • Updated Apr 3 • 109

upvoted 2 papers 10 months ago

LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning

Paper • 2401.01325 • Published Jan 2 • 26

SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling

Paper • 2312.15166 • Published Dec 23, 2023 • 56

upvoted a collection 11 months ago

Nougat ONNX

Collection

Faster Nougat in ONNX format (optimum onnxruntime) • 6 items • Updated Feb 24 • 1

upvoted a paper 11 months ago

Technical Report: Large Language Models can Strategically Deceive their Users when Put Under Pressure

Paper • 2311.07590 • Published Nov 9, 2023 • 16

upvoted a collection 12 months ago

smol llama

Collection

🚧"raw" pretrained smol_llama checkpoints - WIP 🚧 • 4 items • Updated Apr 29 • 6