Jaward Sesay

Jaward

AI & ML interests

I like to train large deep neural nets too ๐Ÿง ๐Ÿค–๐Ÿ’ฅ | First Paper (AutoAgents: A Framework for Automatic Agent Generation) Accepted @ IJCAI 2024 | Role Model Karpathy

Articles

Organizations

Posts 56

view post
Post
1614
Triton nanoGPT now has a custom cross entropy loss kernel ๐Ÿš€
Next: matmul, gradually overthrowing all major PyTorch ops:)

Simplified pseudo for parallel cross-entropy loss compute:
- init program: get pid, compute offsets, load targets.
- init row_max and row_sum.
- for-loop1 (find max logits): update row_max with max logits.
- for-loop2 (compute softmax and loss): compute row_sum, update loss.
- add log(row_sum) and store loss.

Code: https://github.com/Jaykef/ai-algorithms/blob/main/triton_nanoGPT.ipynb

datasets

None public yet