DITTO: Diffusion Inference-Time T-Optimization for Music Generation Paper • 2401.12179 • Published Jan 22 • 19
Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency Paper • 2409.02634 • Published Sep 4 • 87
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders Paper • 2408.15998 • Published Aug 28 • 83
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites Paper • 2404.16821 • Published Apr 25 • 53
GPT-4 generated datasets Collection Collection of some GPT-4 generated datasets. It may be useful for those looking for the best-quality datasets to train competitive LLMs. • 18 items • Updated Apr 16 • 8
VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis Paper • 2403.08764 • Published Mar 13 • 34
Awesome SDXL LoRAs Collection A curated set of amazing Stable Diffusion XL LoRAs (they power the LoRA the Explorer Space) • 36 items • Updated Jun 24 • 18
LLaMA Beyond English: An Empirical Study on Language Capability Transfer Paper • 2401.01055 • Published Jan 2 • 54
MAG-Edit: Localized Image Editing in Complex Scenarios via Mask-Based Attention-Adjusted Guidance Paper • 2312.11396 • Published Dec 18, 2023 • 10
FreeInit: Bridging Initialization Gap in Video Diffusion Models Paper • 2312.07537 • Published Dec 12, 2023 • 26
Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models Paper • 2312.06109 • Published Dec 11, 2023 • 20
Rank-without-GPT: Building GPT-Independent Listwise Rerankers on Open-Source Large Language Models Paper • 2312.02969 • Published Dec 5, 2023 • 12
StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter Paper • 2312.00330 • Published Dec 1, 2023 • 10
Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation Paper • 2306.07954 • Published Jun 13, 2023 • 113