Pratyay Banerjee

Neilblaze

https://neilblaze.live

AI & ML interests

Computer Vision, Object Detection, Pattern Recognition, NLP, Supervised Learning

Organizations

Neilblaze's activity

upvoted 28 papers 1 day ago

Boosting Healthcare LLMs Through Retrieved Context

Paper • 2409.15127 • Published 29 days ago • 19

Large Language Models as Markov Chains

Paper • 2410.02724 • Published 19 days ago • 31

Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models

Paper • 2410.02416 • Published 19 days ago • 25

OmniBooth: Learning Latent Control for Image Synthesis with Multi-modal Instruction

Paper • 2410.04932 • Published 15 days ago • 9

LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations

Paper • 2410.02707 • Published 19 days ago • 45

Differential Transformer

Paper • 2410.05258 • Published 15 days ago • 159

RevisEval: Improving LLM-as-a-Judge via Response-Adapted References

Paper • 2410.05193 • Published 15 days ago • 12

DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning Trajectories Search

Paper • 2410.03864 • Published 18 days ago • 10

Everything Everywhere All at Once: LLMs can In-Context Learn Multiple Tasks in Superposition

Paper • 2410.05603 • Published 15 days ago • 11

StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization

Paper • 2410.08815 • Published 11 days ago • 36

Rethinking Data Selection at Scale: Random Selection is Almost All You Need

Paper • 2410.09335 • Published 11 days ago • 14

VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents

Paper • 2410.10594 • Published 8 days ago • 21

Toward General Instruction-Following Alignment for Retrieval-Augmented Generation

Paper • 2410.09584 • Published 10 days ago • 43

Animate-X: Universal Character Image Animation with Enhanced Motion Representation

Paper • 2410.10306 • Published 9 days ago • 46

What Matters in Transformers? Not All Attention is Needed

Paper • 2406.15786 • Published Jun 22 • 27

Improving Long-Text Alignment for Text-to-Image Diffusion Models

Paper • 2410.11817 • Published 7 days ago • 13

FlatQuant: Flatness Matters for LLM Quantization

Paper • 2410.09426 • Published 11 days ago • 12

Do LLMs Have Political Correctness? Analyzing Ethical Biases and Jailbreak Vulnerabilities in AI Systems

Paper • 2410.13334 • Published 6 days ago • 12

MoH: Multi-Head Attention as Mixture-of-Head Attention

Paper • 2410.11842 • Published 7 days ago • 19

WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines

Paper • 2410.12705 • Published 6 days ago • 24

Harnessing Webpage UIs for Text-Rich Visual Understanding

Paper • 2410.13824 • Published 5 days ago • 27

DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation

Paper • 2410.13726 • Published 5 days ago • 8

Mini-Omni2: Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities

Paper • 2410.11190 • Published 8 days ago • 17

FiTv2: Scalable and Improved Flexible Vision Transformer for Diffusion Model

Paper • 2410.13925 • Published 5 days ago • 19

SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs

Paper • 2410.13276 • Published 6 days ago • 22

upvoted an article 18 days ago

Article

ColPali: Efficient Document Retrieval with Vision Language Models 👀

•

Jul 5

• 139

upvoted a paper 22 days ago

ReFT: Representation Finetuning for Language Models

Paper • 2404.03592 • Published Apr 4 • 89

upvoted a paper 25 days ago

LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

Paper • 2402.13753 • Published Feb 21 • 111

upvoted a collection 25 days ago

Llama 3.2

Collection

This collection hosts the transformers and original repos of the Llama 3.2 and Llama Guard 3 • 11 items • Updated 27 days ago • 386

upvoted 4 papers 29 days ago

LLMs + Persona-Plug = Personalized LLMs

Paper • 2409.11901 • Published Sep 18 • 30

A Controlled Study on Long Context Extension and Generalization in LLMs

Paper • 2409.12181 • Published Sep 18 • 43

Training Language Models to Self-Correct via Reinforcement Learning

Paper • 2409.12917 • Published Sep 19 • 131

YesBut: A High-Quality Annotated Multimodal Dataset for evaluating Satire Comprehension capability of Vision-Language Models

Paper • 2409.13592 • Published Sep 20 • 46

upvoted 9 papers about 1 month ago

NVLM: Open Frontier-Class Multimodal LLMs

Paper • 2409.11402 • Published Sep 17 • 69

LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation

Paper • 2408.15881 • Published Aug 28 • 20

Efficient LLM Scheduling by Learning to Rank

Paper • 2408.15792 • Published Aug 28 • 19

Law of Vision Representation in MLLMs

Paper • 2408.16357 • Published Aug 29 • 92

LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA

Paper • 2409.02897 • Published Sep 4 • 44

MemoRAG: Moving towards Next-Gen RAG Via Memory-Inspired Knowledge Discovery

Paper • 2409.05591 • Published Sep 9 • 28

LLaMA-Omni: Seamless Speech Interaction with Large Language Models

Paper • 2409.06666 • Published Sep 10 • 55

GroUSE: A Benchmark to Evaluate Evaluators in Grounded Question Answering

Paper • 2409.06595 • Published Sep 10 • 37

Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers

Paper • 2409.04109 • Published Sep 6 • 42

upvoted 11 papers about 2 months ago

Tora: Trajectory-oriented Diffusion Transformer for Video Generation

Paper • 2407.21705 • Published Jul 31 • 25

SAM 2: Segment Anything in Images and Videos

Paper • 2408.00714 • Published Aug 1 • 106

Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

Paper • 2408.03314 • Published Aug 6 • 33

MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models

Paper • 2408.02718 • Published Aug 5 • 60

WalledEval: A Comprehensive Safety Evaluation Toolkit for Large Language Models

Paper • 2408.03837 • Published Aug 7 • 17

Transformer Explainer: Interactive Learning of Text-Generative Models

Paper • 2408.04619 • Published Aug 8 • 154

VITA: Towards Open-Source Interactive Omni Multimodal LLM

Paper • 2408.05211 • Published Aug 9 • 46

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

Paper • 2408.06292 • Published Aug 12 • 115

LongVILA: Scaling Long-Context Visual Language Models for Long Videos

Paper • 2408.10188 • Published Aug 19 • 51

Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22 • 115

Diffusion Models Are Real-Time Game Engines

Paper • 2408.14837 • Published Aug 27 • 121

upvoted an article 3 months ago

Article

Everything About Long Context Fine-tuning

•

May 10

• 28

upvoted 2 papers 3 months ago

Mixture of Nested Experts: Adaptive Processing of Visual Tokens

Paper • 2407.19985 • Published Jul 29 • 34

From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting

Paper • 2309.04269 • Published Sep 8, 2023 • 32

upvoted an article 3 months ago

Article

Google releases Gemma 2 2B, ShieldGemma and Gemma Scope

Jul 31

• 59