1 17 16

Jaward Sesay

Jaward

https://github.com/Jaykef

AI & ML interests

I like to train large deep neural nets too 🧠🤖💥 | First Paper (AutoAgents: A Framework for Automatic Agent Generation) Accepted @ IJCAI 2024 | Role Model Karpathy

Articles

Journey With Me Into The Mind of Large Language Models: Interesting Findings in AnthropicAI's Scaling Monosemanticity paper.

May 22

• 2

On Coding Your First Attention

Apr 21

• 7

Organizations

Posts 56

Post

1614

Triton nanoGPT now has a custom cross entropy loss kernel 🚀
Next: matmul, gradually overthrowing all major PyTorch ops:)

Simplified pseudo for parallel cross-entropy loss compute:
- init program: get pid, compute offsets, load targets.
- init row_max and row_sum.
- for-loop1 (find max logits): update row_max with max logits.
- for-loop2 (compute softmax and loss): compute row_sum, update loss.
- add log(row_sum) and store loss.

Code: https://github.com/Jaykef/ai-algorithms/blob/main/triton_nanoGPT.ipynb

Post

342

This has to be the first peak performance level use case of a non-autoregressive architecture for TTS. Flow matching for the win!!

Demo: mrfakename/E2-F5-TTS
Model: SWivid/E2-TTS

View all posts