Moritz Laurer

MoritzLaurer

https://www.linkedin.com/in/moritz-laurer/

AI & ML interests

None yet

Articles

Going multimodal: How Prezi is leveraging the Hub and the Expert Support Program to accelerate their ML roadmap

Jun 19

• 11

Synthetic data: save money, time and carbon with open source

Feb 16

• 47

Organizations

MoritzLaurer's activity

replied to their post 19 days ago

Here is the full thesis if you're interested: https://research.vu.nl/ws/portalfiles/portal/355675396/dissertationlaurerfinal+-+66c885c7e9d0b.pdf

Here is the collection of my most recent models: https://huggingface.co/collections/MoritzLaurer/zeroshot-classifiers-6548b4ff407bb19ff5c3ad6f

posted an update 19 days ago

Post

3497

#phdone - I defended my PhD yesterday! A key lesson: it is amazing how open science and open source can empower beginners with limited resources:

I first learned about instruction-based classifiers like BERT-NLI 3-4 years ago, through the @HuggingFace ZeroShotClassificationPipeline. Digging deeper into this, it was surprisingly easy to find new datasets, newer base models, and reusable fine-tuning scripts on the HF Hub to create my own zeroshot models - although I didn't know much about fine-tuning at the time.

Thanks to the community effect of the Hub, my models were downloaded hundreds of thousands of times after a few months. Seeing my research being useful for people motivated me to improve and upload newer models. Leaving my contact details in the model cards led to academic cooperation and consulting contracts (and eventually my job at HF).

That's the power of open science & open source: learning, sharing, improving, collaborating.

I mean every word in my thesis acknowledgments (screenshot). I'm very grateful to my supervisors @vanatteveldt @CasAndreu @KasperWelbers for their guidance; to @profAndreaRenda and @CEPS_thinktank for enabling me to work part-time during the first year; to @huggingface for creating awesome tools and an awesome platform; and to many others who are not active on social media.

Links to the full thesis and the collection of my most recent models are below.

PS: If someone happens to speak Latin, let me know if my diploma contains some hidden Illuminati code or something :D

3 replies

replied to their post 29 days ago

#!pip install "huggingface_hub>=0.25.0"
from huggingface_hub import InferenceClient

client = InferenceClient(
    base_url="https://huggingface.co/api/integrations/dgx/v1",
    api_key="MY_FINEGRAINED_ENTERPRISE_ORG_TOKEN"  # see docs: https://huggingface.co/blog/inference-dgx-cloud#create-a-fine-grained-token
)

output = client.chat.completions.create(
    model="meta-llama/Meta-Llama-3.1-405B-Instruct-FP8",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Count to 10"},
    ],
    max_tokens=1024,
)

print(output)

posted an update 29 days ago

Post

1678

The new NIM Serverless API by HF and Nvidia is a great option if you want a reliable API for open-weight LLMs like Llama-3.1-405B that are too expensive to run on your own hardware.

- It's pay-as-you-go, so it doesn't have rate limits like the standard HF Serverless API and you don't need to commit to hardware like for a dedicated endpoint.
- It works out-of-the box with the new v0.25 release of our huggingface_hub.InferenceClient
- It's specifically tailored to a small collection of popular open-weight models. For a broader selection of open models, we recommend using the standard HF Serverless API.
- Note that you need a token from an Enterprise Hub organization to use it.

Details in this blog post: https://huggingface.co/blog/inference-dgx-cloud
Compatible models in this hf.rst.imllection: nvidia/nim-serverless-inference-api-66a3c6fcdcb5bbc6e975b508
Release notes with many more features of huggingface_hub==0.25.0: https://github.com/huggingface/huggingface_hub/releases/tag/v0.25.0

Copy-pasteable code in the first comment:

2 replies

posted an update about 1 month ago

Post

1600

Why would you fine-tune a model if you can just prompt an LLM? The new paper "What is the Role of Small Models in the LLM Era: A Survey" provides a nice pro/con overview. My go-to approach combines both:

1. Start testing an idea by prompting an LLM/VLM behind an API. It's fast and easy and I avoid wasting time on tuning a model on a task that might not make it into production anyways.

2. The LLM/VLM then needs to be manually validated. Anyone seriously considering putting AI into production has to do at least some manual validation. Setting up a good validation pipeline with a tool like Argilla is crucial and it can be reused for any future experiments. Note: you can use LLM-as-a-judge to automate some evals, but you always also need to validate the judge!

3. Based on this validation I can then (a) either just continue using the prompted LLM if it is accurate enough and it makes sense financially given my load; or (b) if the LLM is not accurate enough or too expensive to run in the long-run, I reuse the existing validation pipeline to annotate some additional data for fine-tuning a smaller model. This can be sped up by reusing & correcting synthetic data from the LLM (or just pure distillation).

Paper: https://arxiv.org/pdf/2409.06857
Argilla docs: https://docs.argilla.io/latest/
Argilla is also very easy to deploy with Hugging Face Spaces (or locally): https://huggingface.co/new-space?template=argilla%2Fargilla-template-space

replied to dvilasuero's post 4 months ago

lol, just found this old tweet from January 2023 :D
Super happy this came to fruition!

replied to their post 5 months ago

@HAMRONI can you share the full inference code that caused this error? you can open a discussion in the model repo

posted an update 5 months ago

Post

3271

We are hiring a "Developer Experience Engineer for Inference" at Hugging Face! If you want to make it easier for millions of people to use modern machine learning inference, apply! You can either work from one of our offices e.g. in Paris or New York, or work fully remotely. Details: https://apply.workable.com/huggingface/j/E732F4B8FC/

posted an update 6 months ago

Post

4730

Why does Meta invest millions in Llama 3 and then makes it available for free? Here is Zuckerberg's explanation to investors in the Q3 2023 earnings call:

"The second part of our playbook is open source software infrastructure. Our long-standing strategy has been to build and open source general infrastructure while keeping our specific product implementations proprietary.

[...] First, open source software is typically safer and more secure, as well as more compute efficient to operate due to all the ongoing feedback, scrutiny, and development from the community. This is a big deal because safety is one of the most important issues in AI. Efficiency improvements and lowering the compute costs also benefit everyone including us.

Second, open source software often becomes an industry standard, and when companies standardize on building with our stack, that then becomes easier to integrate new innovations into our products. That’s subtle, but the ability to learn and improve quickly is a huge advantage and being an industry standard enables that.

Third, open source is hugely popular with developers and researchers. We know that people want to work on open systems that will be widely adopted, so this helps us recruit the best people at Meta, which is a very big deal for leading in any new technology area.

And again, we typically have unique data and build unique product integrations anyway, so providing infrastructure like Llama as open source doesn't reduce our main advantages. This is why our long-standing strategy has been to open source general infrastructure and why I expect it to continue to be the right approach for us going forward."

Fully earnings call transcript: https://s21.q4cdn.com/399680738/files/doc_financials/2023/q4/META-Q4-2023-Earnings-Call-Transcript.pdf

posted an update 7 months ago

Post

3705

🆕 Releasing a new series of 8 zeroshot classifiers: better performance, fully commercially useable thanks to synthetic data, up to 8192 tokens, run on any hardware.

Summary:
🤖 The zeroshot-v2.0-c series replaces commercially restrictive training data with synthetic data generated with mistralai/Mixtral-8x7B-Instruct-v0.1 (Apache 2.0). All models are released under the MIT license.
🦾 The best model performs 17%-points better across 28 tasks vs. facebook/bart-large-mnli (the most downloaded commercially-friendly baseline).
🌏 The series includes a multilingual variant fine-tuned from BAAI/bge-m3 for zeroshot classification in 100+ languages and with a context window of 8192 tokens
🪶 The models are 0.2 - 0.6 B parameters small, so they run on any hardware. The base-size models are +2x faster than bart-large-mnli while performing significantly better.
🤏 The models are not generative LLMs, they are efficient encoder-only models specialized in zeroshot classification through the universal NLI task.
🤑 For users where commercially restrictive training data is not an issue, I've also trained variants with even more human data for improved performance.

Next steps:
✍️ I'll publish a blog post with more details soon
🔮 There are several improvements I'm planning for v2.1. Especially the multilingual model has room for improvement.

All models are available for download in this Hugging Face collection: MoritzLaurer/zeroshot-classifiers-6548b4ff407bb19ff5c3ad6f

These models are an extension of the approach explained in this paper, but with additional synthetic data: https://arxiv.org/abs/2312.17543

3 replies

posted an update 9 months ago

Post

Prompts are hyperparameters. Every time you test a different prompt on your data, you become less sure if the LLM actually generalizes to unseen data.

Issues of overfitting to a test set seem like concepts from boring times when people still fine-tuned models, but it's just as important for "zeroshot prompting". Using a separate validation split to tune the main hyperparameter of LLMs (the prompt) is just as important as train-val-test splitting for fine-tuning. The only difference is that you don't have a training dataset anymore and it somehow feels different because there is no training / no parameter updates.

Its easy to trick yourself into believing that an LLM performs well on your task, while you've actually overfit the prompt on your data. Every good "zeroshot" paper should clarify that they used a validation split for finding their prompt before final testing.

1 reply