Sugato Ray's picture

Sugato Ray

sugatoray

·

https://linkedin.com/in/sugatoray

AI & ML interests

None yet

Organizations

sugatoray's activity

commented a paper 27 days ago

Streaming Diffusion Policy: Fast Policy Synthesis with Variable Noise Diffusion Models

Paper • 2406.04806 • Published Jun 7 • 1 •

commented a paper about 1 month ago

Self-Reflection in LLM Agents: Effects on Problem-Solving Performance

Paper • 2405.06682 • Published May 5 • 2 •

New activity in dvilasuero/img-prefs-distilabel about 2 months ago

Update README.md with process-howto information

#2 opened about 2 months ago by

commented 3 papers 3 months ago

Probabilistic Programming with Programmable Variational Inference

Paper • 2406.15742 • Published Jun 22 • 2 •

Trace is the New AutoDiff -- Unlocking Efficient Optimization of Computational Workflows

Paper • 2406.16218 • Published Jun 23 • 1 •

TaskGen: A Task-Based, Memory-Infused Agentic Framework using StrictJSON

Paper • 2407.15734 • Published Jul 22 • 1 •

commented 3 papers 4 months ago

Grokfast: Accelerated Grokking by Amplifying Slow Gradients

Paper • 2405.20233 • Published May 30 • 5 •

HyperZ$\cdot$Z$\cdot$W Operator Connects Slow-Fast Networks for Full Context Interaction

Paper • 2401.17948 • Published Jan 31 • 2 •

Extreme Compression of Large Language Models via Additive Quantization

Paper • 2401.06118 • Published Jan 11 • 12 •

New activity in sugatoray/DeepSeek-Coder-V2-Lite-Instruct-Q4_K_M-GGUF 4 months ago

Add banner image to README.md

#2 opened 4 months ago by

Upload llama.png

#1 opened 4 months ago by

commented a paper 4 months ago

Spectrum: Targeted Training on Signal to Noise Ratio

Paper • 2406.06623 • Published Jun 7 • 7 •

commented 3 papers 5 months ago

OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework

Paper • 2405.11143 • Published May 20 • 33 •

Zero-Shot Tokenizer Transfer

Paper • 2405.07883 • Published May 13 • 4 •

Automating the Enterprise with Foundation Models

Paper • 2405.03710 • Published May 3 • 1 •

New activity in unalignment/toxic-dpo-v0.2 6 months ago

Update README.md

#2 opened 6 months ago by

New activity in HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1 6 months ago

Update config.json

#11 opened 6 months ago by

New activity in mlx-community/stable-code-instruct-3b-4bit 7 months ago

Update config.json

#1 opened 7 months ago by

New activity in stabilityai/stable-code-instruct-3b 7 months ago

Update config.json with correct model-repo-name

#2 opened 7 months ago by

commented 2 papers 7 months ago

T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy

Paper • 2403.14610 • Published Mar 21 • 3 •

Evolutionary Optimization of Model Merging Recipes

Paper • 2403.13187 • Published Mar 19 • 49 •

commented 11 papers 8 months ago

EasyQuant: An Efficient Data-free Quantization Algorithm for LLMs

Paper • 2403.02775 • Published Mar 5 • 11 •

Learning to Compress Prompts with Gist Tokens

Paper • 2304.08467 • Published Apr 17, 2023 • 3 •

MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers

Paper • 2305.07185 • Published May 12, 2023 • 9 •

Efficient Training of Language Models to Fill in the Middle

Paper • 2207.14255 • Published Jul 28, 2022 • 1 •

DoRA: Weight-Decomposed Low-Rank Adaptation

Paper • 2402.09353 • Published Feb 14 • 25 •

LoRA: Low-Rank Adaptation of Large Language Models

Paper • 2106.09685 • Published Jun 17, 2021 • 29 •

Empirical Study of PEFT techniques for Winter Wheat Segmentation

Paper • 2310.01825 • Published Oct 3, 2023 • 2 •

From PEFT to DEFT: Parameter Efficient Finetuning for Reducing Activation Density in Transformers

Paper • 2402.01911 • Published Feb 2 • 2 •

LoRA+: Efficient Low Rank Adaptation of Large Models

Paper • 2402.12354 • Published Feb 19 • 6 •

The FinBen: An Holistic Financial Benchmark for Large Language Models

Paper • 2402.12659 • Published Feb 20 • 16 •

FinTral: A Family of GPT-4 Level Multimodal Financial Large Language Models

Paper • 2402.10986 • Published Feb 16 • 76 •

commented a paper 9 months ago

Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling

Paper • 2401.16380 • Published Jan 29 • 47 •