Ameer Azam

ameerazam08

https://linktr.ee/ameerazam22

AI & ML interests

Gen AI || Deep Learning || Transfer Learning

Organizations

ameerazam08's activity

upvoted a paper 27 days ago

MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling

Paper • 2409.16160 • Published 28 days ago • 32

upvoted 4 articles 28 days ago

Article

Introducing the SQL Console on Datasets

Sep 17

• 18

Article

Optimize and deploy models with Optimum-Intel and OpenVINO GenAI

Sep 20

• 14

Article

Efficient Controllable Generation for SDXL with T2I-Adapters

Sep 8, 2023

• 4

Article

Fine-tuning Parler TTS on a Specific Language

•

Sep 16

• 21

upvoted 2 papers 29 days ago

Kolmogorov-Arnold Transformer

Paper • 2409.10594 • Published Sep 16 • 38

Portrait Video Editing Empowered by Multimodal Generative Priors

Paper • 2409.13591 • Published Sep 20 • 15

upvoted a paper about 1 month ago

InstantDrag: Improving Interactivity in Drag-based Image Editing

Paper • 2409.08857 • Published Sep 13 • 30

upvoted an article about 1 month ago

Article

"Diffusers Image Fill" guide

•

Sep 13

• 34

upvoted 2 papers about 1 month ago

Instant Facial Gaussians Translator for Relightable and Interactable Facial Rendering

Paper • 2409.07441 • Published Sep 11 • 10

Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models

Paper • 2409.07452 • Published Sep 11 • 18

upvoted an article about 1 month ago

Article

Training Flux Locally on Mac

•

Sep 12

• 13

upvoted an article about 2 months ago

Article

XetHub is joining Hugging Face!

Aug 8

• 79

upvoted a collection 2 months ago

Gradio Spaces for Background Removal

Collection

Enhance your images by removing the background. Will ensure these Spaces are up and maintained for the community. • 5 items • Updated Aug 20 • 23

upvoted 2 papers 2 months ago

Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics

Paper • 2408.04631 • Published Aug 8 • 8

Transformer Explainer: Interactive Learning of Text-Generative Models

Paper • 2408.04619 • Published Aug 8 • 154

upvoted an article 4 months ago

Article

ColPali: Efficient Document Retrieval with Vision Language Models 👀

•

Jul 5

• 139

upvoted a paper 4 months ago

Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively

Paper • 2401.02955 • Published Jan 5 • 19

upvoted a collection 5 months ago

ZeroGPU Spaces

Collection

ZeroGPU Spaces made by the community • 17 items • Updated Jun 6 • 228

upvoted 3 articles 5 months ago

Article

Virtual Try-On using IP-Adapter Inpainting

•

Jun 4

• 24

Article

Hugging Face + Google Visual Blocks

•

May 16

• 21

Article

PaliGemma – Google's Cutting-Edge Open Vision Language Model

May 14

• 202

upvoted an article 6 months ago

Article

Fine-Tune ViT for Image Classification with 🤗 Transformers

Feb 11, 2022

• 26

upvoted 5 papers 6 months ago

ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving

Paper • 2404.16771 • Published Apr 25 • 16

Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings

Paper • 2404.16820 • Published Apr 25 • 15

FlashSpeech: Efficient Zero-Shot Speech Synthesis

Paper • 2404.14700 • Published Apr 23 • 29

MeshLRM: Large Reconstruction Model for High-Quality Mesh

Paper • 2404.12385 • Published Apr 18 • 26

ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback

Paper • 2404.07987 • Published Apr 11 • 47

upvoted a paper 7 months ago

Aligning Diffusion Models by Optimizing Human Utility

Paper • 2404.04465 • Published Apr 6 • 13

upvoted a collection 7 months ago

IP-Adapter Plug-In

Collection

5 items • Updated Apr 9 • 3

upvoted 13 papers 7 months ago

CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching

Paper • 2404.03653 • Published Apr 4 • 33

LVLM-Intrepret: An Interpretability Tool for Large Vision-Language Models

Paper • 2404.03118 • Published Apr 3 • 23

ReFT: Representation Finetuning for Language Models

Paper • 2404.03592 • Published Apr 4 • 89

MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens

Paper • 2404.03413 • Published Apr 4 • 25

ReNoise: Real Image Inversion Through Iterative Noising

Paper • 2403.14602 • Published Mar 21 • 19

AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks

Paper • 2403.14468 • Published Mar 21 • 21

RadSplat: Radiance Field-Informed Gaussian Splatting for Robust Real-Time Rendering with 900+ FPS

Paper • 2403.13806 • Published Mar 20 • 18

V3D: Video Diffusion Models are Effective 3D Generators

Paper • 2403.06738 • Published Mar 11 • 28

Learning Generalizable Feature Fields for Mobile Manipulation

Paper • 2403.07563 • Published Mar 12 • 6

Motion Mamba: Efficient and Long Sequence Motion Generation with Hierarchical and Bidirectional Selective SSM

Paper • 2403.07487 • Published Mar 12 • 13

upvoted a paper 8 months ago

OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on

Paper • 2403.01779 • Published Mar 4 • 27

upvoted 6 collections 8 months ago

OpenCodeInterpreter

Collection

18 items • Updated Mar 3 • 82

Masking All

Collection

2 items • Updated Feb 24 • 1

LLMs All

Collection

5 items • Updated Feb 24 • 1

SAM All

Collection

3 items • Updated Feb 24 • 1

Diffusion All

Collection

5 items • Updated Mar 8 • 4

TTS All

Collection

5 items • Updated Apr 20 • 2

upvoted a paper 8 months ago

Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation

Paper • 2402.10491 • Published Feb 16 • 16

upvoted 4 papers 9 months ago

Object-Driven One-Shot Fine-tuning of Text-to-Image Diffusion with Prototypical Embedding

Paper • 2401.15708 • Published Jan 28 • 10

PALP: Prompt Aligned Personalization of Text-to-Image Models

Paper • 2401.06105 • Published Jan 11 • 46

Quantum Denoising Diffusion Models

Paper • 2401.07049 • Published Jan 13 • 12

Towards A Better Metric for Text-to-Video Generation

Paper • 2401.07781 • Published Jan 15 • 14

upvoted a paper 10 months ago

VecFusion: Vector Font Generation with Diffusion

Paper • 2312.10540 • Published Dec 16, 2023 • 21

upvoted a paper 11 months ago

Pearl: A Production-ready Reinforcement Learning Agent

Paper • 2312.03814 • Published Dec 6, 2023 • 14

upvoted a collection 11 months ago

My Projects

Collection

Projects I've worked on (includes collabs) • 22 items • Updated about 17 hours ago • 7

upvoted 2 papers 11 months ago

DiffiT: Diffusion Vision Transformers for Image Generation

Paper • 2312.02139 • Published Dec 4, 2023 • 13

LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models

Paper • 2312.02949 • Published Dec 5, 2023 • 11