Lysandre's picture

Lysandre

lysandre

·

http://lysand.re

AI & ML interests

chief open-source officer @ hf

Articles

Fixing Gradient Accumulation

License to Call: Introducing Transformers Agents 2.0

We are hiring interns!

Hugging Face on PyTorch / XLA TPUs

Organizations

lysandre's activity

New activity in lysandre/ctrl-clone-2 15 days ago

qwefewf

#3 opened 15 days ago by

asdasd

#2 opened 11 months ago by

New activity in lysandre/tiny-bert-random 15 days ago

asdasd

#1 opened 15 days ago by

New activity in THUDM/CogVideoX-5b about 2 months ago

Update transformers version

#1 opened about 2 months ago by

New activity in mistralai/Mistral-Large-Instruct-2407 3 months ago

consolidated vs model safetensors - what's the difference?

#9 opened 3 months ago by

Transformers implementation

#1 opened 3 months ago by

New activity in meta-llama/Llama-3.1-405B 3 months ago

Update tokenizer to prepend special token

#12 opened 3 months ago by

New activity in meta-llama/Llama-3.1-70B 3 months ago

Update tokenizer to prepend special token

#11 opened 3 months ago by

New activity in meta-llama/Llama-3.1-405B-FP8 3 months ago

Update tokenizer to prepend special token

#12 opened 3 months ago by

New activity in meta-llama/Llama-3.1-8B 3 months ago

Update tokenizer to prepend special token

#12 opened 3 months ago by

New activity in meta-llama/Llama-3.1-405B-Instruct 3 months ago

Upload tokenizer

#9 opened 3 months ago by

New activity in meta-llama/Llama-3.1-70B-Instruct 3 months ago

Upload tokenizer

#12 opened 3 months ago by

New activity in meta-llama/Llama-3.1-8B-Instruct 3 months ago

Upload tokenizer

#29 opened 3 months ago by

New activity in meta-llama/Llama-3.1-405B-Instruct-FP8 3 months ago

Upload tokenizer

#9 opened 3 months ago by

New activity in meta-llama/Llama-3.1-70B-Instruct 3 months ago

configuration-changes

#1 opened 3 months ago by

New activity in meta-llama/Llama-3.1-405B-Instruct 3 months ago

Update original/mp16/README.md

#1 opened 3 months ago by

Update original/mp8/README.md

#2 opened 3 months ago by

New activity in meta-llama/Llama-3.1-405B 3 months ago

Update original/mp16/README.md

#5 opened 3 months ago by

Update original/mp8/README.md

#4 opened 3 months ago by

New activity in meta-llama/Llama-3.1-8B 3 months ago

Have saner defaults in the generation config

#4 opened 3 months ago by

New activity in meta-llama/Llama-3.1-70B 3 months ago

Have saner defaults in the generation config

#3 opened 3 months ago by

New activity in meta-llama/Llama-3.1-405B 3 months ago

Have saner defaults in the generation config

#3 opened 3 months ago by

Have saner defaults in the generation config

#2 opened 3 months ago by

New activity in meta-llama/Llama-3.1-405B-FP8 3 months ago

Have saner defaults in the generation config

#5 opened 3 months ago by

New activity in yentinglin/Llama-3-Taiwan-8B-Instruct-128k 3 months ago

TGI model serving errors

#4 opened 4 months ago by

New activity in shenzhi-wang/Gemma-2-27B-Chinese-Chat 4 months ago

Default to eager attention

#1 opened 4 months ago by

New activity in google/gemma-2-27b-it 4 months ago

Default to 'eager' attention implementation

#22 opened 4 months ago by

New activity in google/gemma-2-27b 4 months ago

Default attention to eager implementation

#12 opened 4 months ago by

New activity in google/gemma-2-27b-it 4 months ago

Default to eager implementation

#21 opened 4 months ago by

New activity in google/gemma-2-27b 4 months ago

Default attention to eager implementation

#11 opened 4 months ago by

New activity in google/gemma-2-9b-it 4 months ago

it looks it do not work as expected , see below

#17 opened 4 months ago by

New activity in google/gemma-2-9b 4 months ago

ValueError: Transformers does not recognize this architecture.

#15 opened 4 months ago by

New activity in google/gemma-2-27b 4 months ago

The base model doesn't generate coherently

#9 opened 4 months ago by

New activity in google/gemma-2-27b-it 4 months ago

How can I get results similar to those from Google AI Studio locally?

#14 opened 4 months ago by

New activity in google/gemma-2-9b-it 4 months ago

"It is strongly recommended to train Gemma2 models with the `eager` attention implementation "

#10 opened 4 months ago by

error of ATen\native\cuda\IndexKernel.cu

#14 opened 4 months ago by

nonsense response when bsz>1

#16 opened 4 months ago by

New activity in google/gemma-2-9b 4 months ago

Can't repro MMLU: sliding window attention implementation seems broken

#11 opened 4 months ago by

TypeError: arange() received an invalid combination of arguments

#12 opened 4 months ago by

Model repeating information and "spitting out" random characters

#14 opened 4 months ago by

New activity in hf.rst.imokbook-images 5 months ago

Upload agents_db5.png

#15 opened 5 months ago by

New activity in facebook/blenderbot-3B 6 months ago

Updates incorrect tokenizer configuration file

#7 opened 8 months ago by

New activity in microsoft/Phi-3-mini-128k-instruct 6 months ago

About Transformers version

#58 opened 6 months ago by

New activity in distilbert/distilbert-base-multilingual-cased 6 months ago

Updates incorrect tokenizer configuration file

#5 opened 8 months ago by

New activity in distilbert/distilbert-base-german-cased 6 months ago

Updates incorrect tokenizer configuration file

#4 opened 8 months ago by

New activity in distilbert/distilbert-base-uncased-distilled-squad 6 months ago

Updates incorrect tokenizer configuration file

#8 opened 8 months ago by

New activity in distilbert/distilbert-base-cased-distilled-squad 6 months ago

Updates incorrect tokenizer configuration file

#10 opened 8 months ago by

New activity in distilbert/distilbert-base-cased 6 months ago

Updates incorrect tokenizer configuration file

#8 opened 8 months ago by

New activity in distilbert/distilbert-base-uncased 6 months ago

Updates incorrect tokenizer configuration file

#12 opened 8 months ago by

New activity in openai-community/gpt2 7 months ago

model output

#86 opened 7 months ago by

🚩 Report

#87 opened 7 months ago by

New activity in facebook/wav2vec2-xls-r-1b-21-to-en 7 months ago

Incorrect config file

#5 opened 7 months ago by

New activity in facebook/xlm-roberta-xl 7 months ago

Adding `safetensors` variant of this model

#3 opened 7 months ago by

New activity in lysandre/bert-test 7 months ago

shhhhh

#3 opened 7 months ago by

nononon

#2 opened 7 months ago by

New activity in openai-community/gpt2 7 months ago

OSError: gpt2 does not appear to have a file named config.json. Checkout 'https://huggingface.co/gpt2/None' for available files.

#59 opened over 1 year ago by

New activity in FacebookAI/roberta-large-mnli 7 months ago

How to finetune this model on RTE, MRPC and SST datasets in GLUE benchmark?

#9 opened 8 months ago by

New activity in google/flan-t5-xxl 7 months ago

ValueError: Need either a `state_dict` or a `save_folder` containing offloaded weights.

#53 opened over 1 year ago by

New activity in google/gemma-7b-it 7 months ago

Difficulty importing Pipeline - AttributeError: module 'keras._tf_keras.keras' has no attribute 'internal'

#71 opened 8 months ago by

New activity in open-source-metrics/stars 8 months ago

Fix splits

#2 opened 8 months ago by