Papers
arxiv:2410.13370

MagicTailor: Component-Controllable Personalization in Text-to-Image Diffusion Models

Published on Oct 17
· Submitted by BryanW on Oct 21
#3 Paper of the day
Authors:
,
,

Abstract

Recent advancements in text-to-image (T2I) diffusion models have enabled the creation of high-quality images from text prompts, but they still struggle to generate images with precise control over specific visual concepts. Existing approaches can replicate a given concept by learning from reference images, yet they lack the flexibility for fine-grained customization of the individual component within the concept. In this paper, we introduce component-controllable personalization, a novel task that pushes the boundaries of T2I models by allowing users to reconfigure specific components when personalizing visual concepts. This task is particularly challenging due to two primary obstacles: semantic pollution, where unwanted visual elements corrupt the personalized concept, and semantic imbalance, which causes disproportionate learning of the concept and component. To overcome these challenges, we design MagicTailor, an innovative framework that leverages Dynamic Masked Degradation (DM-Deg) to dynamically perturb undesired visual semantics and Dual-Stream Balancing (DS-Bal) to establish a balanced learning paradigm for desired visual semantics. Extensive comparisons, ablations, and analyses demonstrate that MagicTailor not only excels in this challenging task but also holds significant promise for practical applications, paving the way for more nuanced and creative image generation.

Community

Paper author Paper submitter

We present MagicTailor to enable component-controllable personalization, a newly formulated task aiming to reconfigure specific components of concepts during personalization.

Page: https://correr-zhou.github.io/MagicTailor/
Paper: https://arxiv.org/pdf/2410.13370
Code: https://github.com/Correr-Zhou/MagicTailor

Why did not you compare the results with InstantID, UniPortrait etc? Your table make no sense since other methods proposed at ancient times.

·
Paper author

Hi, thanks for your comments. : )

  1. The methods you mentioned focus on the domain of human faces for the vanilla personalization task, which are quite different from our setting and thus cannot be adapted to our task for a meaningful comparison.
  2. Our method follows a widely-adopted tuning-based paradigm, which is still considered a worthwhile technical solution. In light of this, we have compared our methods with recent SOTA tuning-based methods, especially those capable of handling fine-grained visual elements, e.g., Break-A-Scene (Siggraph Asia'23) and CLiC (CVPR'24).

It's always fascinating to see projects like this in 2D image generation/modification since there’s still so much to explore in this field, can't wait to try your code!

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2410.13370 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2410.13370 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2410.13370 in a Space README.md to link it from this page.

Collections including this paper 3