arxiv:2410.13370

MagicTailor: Component-Controllable Personalization in Text-to-Image Diffusion Models

Published on Oct 17

· Submitted by

BryanW on Oct 21

#3 Paper of the day

Upvote

Authors:

Donghao Zhou ,

Jiancheng Huang ,

Jinbin Bai ,

Jiaze Wang ,

Guangyong Chen ,

Abstract

Recent advancements in text-to-image (T2I) diffusion models have enabled the creation of high-quality images from text prompts, but they still struggle to generate images with precise control over specific visual concepts. Existing approaches can replicate a given concept by learning from reference images, yet they lack the flexibility for fine-grained customization of the individual component within the concept. In this paper, we introduce component-controllable personalization, a novel task that pushes the boundaries of T2I models by allowing users to reconfigure specific components when personalizing visual concepts. This task is particularly challenging due to two primary obstacles: semantic pollution, where unwanted visual elements corrupt the personalized concept, and semantic imbalance, which causes disproportionate learning of the concept and component. To overcome these challenges, we design MagicTailor, an innovative framework that leverages Dynamic Masked Degradation (DM-Deg) to dynamically perturb undesired visual semantics and Dual-Stream Balancing (DS-Bal) to establish a balanced learning paradigm for desired visual semantics. Extensive comparisons, ablations, and analyses demonstrate that MagicTailor not only excels in this challenging task but also holds significant promise for practical applications, paving the way for more nuanced and creative image generation.

View arXiv page View PDF Add to collection

Community

BryanW

Paper author Paper submitter 1 day ago

We present MagicTailor to enable component-controllable personalization, a newly formulated task aiming to reconfigure specific components of concepts during personalization.

Page: https://correr-zhou.github.io/MagicTailor/
Paper: https://arxiv.org/pdf/2410.13370
Code: https://github.com/Correr-Zhou/MagicTailor

oguzhanercan

1 day ago

Why did not you compare the results with InstantID, UniPortrait etc? Your table make no sense since other methods proposed at ancient times.

donghao-zhou

Paper author 1 day ago

Hi, thanks for your comments. : )

The methods you mentioned focus on the domain of human faces for the vanilla personalization task, which are quite different from our setting and thus cannot be adapted to our task for a meaningful comparison.
Our method follows a widely-adopted tuning-based paradigm, which is still considered a worthwhile technical solution. In light of this, we have compared our methods with recent SOTA tuning-based methods, especially those capable of handling fine-grained visual elements, e.g., Break-A-Scene (Siggraph Asia'23) and CLiC (CVPR'24).

DarrenFR

1 day ago

It's always fascinating to see projects like this in 2D image generation/modification since there’s still so much to explore in this field, can't wait to try your code!