Akarshan Biswas's picture

Akarshan Biswas

qnixsynapse

·

qnixsynapse

AI & ML interests

NLP, models, quantization

Organizations

None yet

qnixsynapse's activity

upvoted a paper about 1 month ago

Kolmogorov-Arnold Transformer

Paper • 2409.10594 • Published Sep 16 • 38

upvoted a paper about 2 months ago

Show-o: One Single Transformer to Unify Multimodal Understanding and Generation

Paper • 2408.12528 • Published Aug 22 • 50

upvoted an article 2 months ago

Article

Tool Use, Unified

Aug 12

• 56

upvoted 2 papers 3 months ago

Language Model Can Listen While Speaking

Paper • 2408.02622 • Published Aug 5 • 37

The Llama 3 Herd of Models

Paper • 2407.21783 • Published Jul 31 • 103

upvoted a collection 3 months ago

Gemma 2 2B Release

The 2.6B parameter version of Gemma 2. • 6 items • Updated Jul 31 • 76

upvoted 2 papers 3 months ago

Human-like Episodic Memory for Infinite Context LLMs

Paper • 2407.09450 • Published Jul 12 • 57

LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs

Paper • 2407.03963 • Published Jul 4 • 15

upvoted a paper 4 months ago

MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention

Paper • 2407.02490 • Published Jul 2 • 23

upvoted a collection 4 months ago

SSMs

A collection of Mamba-2-based research models with 8B parameters trained on 3.5T tokens for comparison with Transformers. • 5 items • Updated 22 days ago • 25

upvoted a paper 5 months ago

The Hallucinations Leaderboard -- An Open Effort to Measure Hallucinations in Large Language Models

Paper • 2404.05904 • Published Apr 8 • 7

upvoted 3 papers 6 months ago

KAN: Kolmogorov-Arnold Networks

Paper • 2404.19756 • Published Apr 30 • 108

SpaceByte: Towards Deleting Tokenization from Large Language Modeling

Paper • 2404.14408 • Published Apr 22 • 6

The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

Paper • 2404.13208 • Published Apr 19 • 38

upvoted a collection 6 months ago

Meta Llama 3

This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated 27 days ago • 680

upvoted a paper 6 months ago

RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

Paper • 2404.07839 • Published Apr 11 • 41

upvoted a paper 7 months ago

Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Paper • 2402.19427 • Published Feb 29 • 52

upvoted an article 7 months ago

Article

CodeGemma - an official Google release for code LLMs

Apr 9

• 99

upvoted a collection 7 months ago

Gemma release

Groups the Gemma models released by the Google team. • 40 items • Updated Jul 31 • 325

upvoted 3 papers 7 months ago

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2 • 104

The Unreasonable Ineffectiveness of the Deeper Layers

Paper • 2403.17887 • Published Mar 26 • 77

Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking

Paper • 2403.09629 • Published Mar 14 • 72

upvoted 4 papers 8 months ago

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27 • 601

Beyond Language Models: Byte Models are Digital World Simulators

Paper • 2402.19155 • Published Feb 29 • 49

Simple linear attention language models balance the recall-throughput tradeoff

Paper • 2402.18668 • Published Feb 28 • 18

Transformers Can Achieve Length Generalization But Not Robustly

Paper • 2402.09371 • Published Feb 14 • 12

upvoted 2 papers 9 months ago

MobileDiffusion: Subsecond Text-to-Image Generation on Mobile Devices

Paper • 2311.16567 • Published Nov 28, 2023 • 22

EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty

Paper • 2401.15077 • Published Jan 26 • 17

upvoted 2 papers 10 months ago

MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts

Paper • 2401.04081 • Published Jan 8 • 70

TinyLlama: An Open-Source Small Language Model

Paper • 2401.02385 • Published Jan 4 • 89

upvoted a paper 11 months ago

SparQ Attention: Bandwidth-Efficient LLM Inference

Paper • 2312.04985 • Published Dec 8, 2023 • 38