CatVTON-MaskFree / README.md
zhengchong's picture
Update README.md
25a9392 verified
metadata
extra_gated_prompt: >-
  This version of catvton is available for non-commercial scientific research
  purposes only. You agree NOT to use these models and their generated content
  for any commercial purposes, and not to share these models publicly or
  privately with others. 
extra_gated_fields:
  First Name: text
  Last Name: text
  Date of birth: date_picker
  Country: country
  Affiliation: text
  Email (Institutional Email Only): text
  I agree to use these models for non-commercial use ONLY and not to share these models publicly or privately with others: checkbox
viewer: false

🐈 CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Models

CatVTON is a simple and efficient virtual try-on diffusion model with 1) Lightweight Network (899.06M parameters totally), 2) Parameter-Efficient Training (49.57M parameters trainable) and 3) Simplified Inference (< 8G VRAM for 1024X768 resolution).

Updates

  • 2024/10/17:Mask-free versionπŸ€— of CatVTON is release and please try it in our Online Demo.
  • 2024/10/13: We have built a repo Awesome-Try-On-Models that focuses on image, video, and 3D-based try-on models published after 2023, aiming to provide insights into the latest technological trends. If you're interested, feel free to contribute or give it a 🌟 star!
  • 2024/08/13: We localize DensePose & SCHP to avoid certain environment issues.
  • 2024/08/10: Our πŸ€— HuggingFace Space is available now! Thanks for the grant from ZeroGPU!
  • 2024/08/09: Evaluation code is provided to calculate metrics πŸ“š.
  • 2024/07/27: We provide code and workflow for deploying CatVTON on ComfyUI πŸ’₯.
  • 2024/07/24: Our Paper on ArXiv is available πŸ₯³!
  • 2024/07/22: Our App Code is released, deploy and enjoy CatVTON on your mechine πŸŽ‰!
  • 2024/07/21: Our Inference Code and Weights πŸ€— are released.
  • 2024/07/11: Our Online Demo is released 😁.

Installation

Create a conda environment & Install requirments

conda create -n catvton python==3.9.0
conda activate catvton
cd CatVTON-main  # or your path to CatVTON project dir
pip install -r requirements.txt

Deployment

ComfyUI Workflow

We have modified the main code to enable easy deployment of CatVTON on ComfyUI. Due to the incompatibility of the code structure, we have released this part in the Releases, which includes the code placed under custom_nodes of ComfyUI and our workflow JSON files.

To deploy CatVTON to your ComfyUI, follow these steps:

  1. Install all the requirements for both CatVTON and ComfyUI, refer to Installation Guide for CatVTON and Installation Guide for ComfyUI.
  2. Download ComfyUI-CatVTON.zip and unzip it in the custom_nodes folder under your ComfyUI project (clone from ComfyUI).
  3. Run the ComfyUI.
  4. Download catvton_workflow.json and drag it into you ComfyUI webpage and enjoy πŸ˜†!

Problems under Windows OS, please refer to issue#8.

When you run the CatVTON workflow for the first time, the weight files will be automatically downloaded, usually taking dozens of minutes.

Gradio App

To deploy the Gradio App for CatVTON on your machine, run the following command, and checkpoints will be automatically downloaded from HuggingFace.

CUDA_VISIBLE_DEVICES=0 python app.py \
--output_dir="resource/demo/output" \
--mixed_precision="bf16" \
--allow_tf32 

When using bf16 precision, generating results with a resolution of 1024x768 only requires about 8G VRAM.

Inference

1. Data Preparation

Before inference, you need to download the VITON-HD or DressCode dataset. Once the datasets are downloaded, the folder structures should look like these:

β”œβ”€β”€ VITON-HD
|   β”œβ”€β”€ test_pairs_unpaired.txt
β”‚   β”œβ”€β”€ test
|   |   β”œβ”€β”€ image
β”‚   β”‚   β”‚   β”œβ”€β”€ [000006_00.jpg | 000008_00.jpg | ...]
β”‚   β”‚   β”œβ”€β”€ cloth
β”‚   β”‚   β”‚   β”œβ”€β”€ [000006_00.jpg | 000008_00.jpg | ...]
β”‚   β”‚   β”œβ”€β”€ agnostic-mask
β”‚   β”‚   β”‚   β”œβ”€β”€ [000006_00_mask.png | 000008_00.png | ...]
...
β”œβ”€β”€ DressCode
|   β”œβ”€β”€ test_pairs_paired.txt
|   β”œβ”€β”€ test_pairs_unpaired.txt
β”‚   β”œβ”€β”€ [dresses | lower_body | upper_body]
|   |   β”œβ”€β”€ test_pairs_paired.txt
|   |   β”œβ”€β”€ test_pairs_unpaired.txt
β”‚   β”‚   β”œβ”€β”€ images
β”‚   β”‚   β”‚   β”œβ”€β”€ [013563_0.jpg | 013563_1.jpg | 013564_0.jpg | 013564_1.jpg | ...]
β”‚   β”‚   β”œβ”€β”€ agnostic_masks
β”‚   β”‚   β”‚   β”œβ”€β”€ [013563_0.png| 013564_0.png | ...]
...

For the DressCode dataset, we provide script to preprocessed agnostic masks, run the following command:

CUDA_VISIBLE_DEVICES=0 python preprocess_agnostic_mask.py \
--data_root_path <your_path_to_DressCode> 

2. Inference on VTIONHD/DressCode

To run the inference on the DressCode or VITON-HD dataset, run the following command, checkpoints will be automatically downloaded from HuggingFace.

CUDA_VISIBLE_DEVICES=0 python inference.py \
--dataset [dresscode | vitonhd] \
--data_root_path <path> \
--output_dir <path> 
--dataloader_num_workers 8 \
--batch_size 8 \
--seed 555 \
--mixed_precision [no | fp16 | bf16] \
--allow_tf32 \
--repaint \
--eval_pair  

3. Calculate Metrics

After obtaining the inference results, calculate the metrics using the following command:

CUDA_VISIBLE_DEVICES=0 python eval.py \
--gt_folder <your_path_to_gt_image_folder> \
--pred_folder <your_path_to_predicted_image_folder> \
--paired \
--batch_size=16 \
--num_workers=16 
  • --gt_folder and --pred_folder should be folders that contain only images.
  • To evaluate the results in a paired setting, use --paired; for an unpaired setting, simply omit it.
  • --batch_size and --num_workers should be adjusted based on your machine.

Acknowledgement

Our code is modified based on Diffusers. We adopt Stable Diffusion v1.5 inpainting as the base model. We use SCHP and DensePose to automatically generate masks in our Gradio App and ComfyUI workflow. Thanks to all the contributors!

License

All the materials, including code, checkpoints, and demo, are made available under the Creative Commons BY-NC-SA 4.0 license. You are free to copy, redistribute, remix, transform, and build upon the project for non-commercial purposes, as long as you give appropriate credit and distribute your contributions under the same license.

Citation

@misc{chong2024catvtonconcatenationneedvirtual,
 title={CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Models}, 
 author={Zheng Chong and Xiao Dong and Haoxiang Li and Shiyue Zhang and Wenqing Zhang and Xujie Zhang and Hanqing Zhao and Xiaodan Liang},
 year={2024},
 eprint={2407.15886},
 archivePrefix={arXiv},
 primaryClass={cs.CV},
 url={https://arxiv.org/abs/2407.15886}, 
}