File size: 9,215 Bytes
5eaa703
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
---

license: cc-by-nc-sa-4.0
---


# 🐈 CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Models

<div style="display: flex; justify-content: center; align-items: center;">
  <a href="http://arxiv.org/abs/2407.15886" style="margin: 0 2px;">
    <img src='https://img.shields.io/badge/arXiv-2407.15886-red?style=flat&logo=arXiv&logoColor=red' alt='arxiv'>

  </a>

  <a href='https://huggingface.co/zhengchong/CatVTON' style="margin: 0 2px;">

    <img src='https://img.shields.io/badge/Hugging Face-ckpts-orange?style=flat&logo=HuggingFace&logoColor=orange' alt='huggingface'>

  </a>

  <a href="https://github.com/Zheng-Chong/CatVTON" style="margin: 0 2px;">

    <img src='https://img.shields.io/badge/GitHub-Repo-blue?style=flat&logo=GitHub' alt='GitHub'>

  </a>

  <a href="http://120.76.142.206:8888" style="margin: 0 2px;">

    <img src='https://img.shields.io/badge/Demo-Gradio-gold?style=flat&logo=Gradio&logoColor=red' alt='Demo'>

  </a>

  <a href="https://huggingface.co/spaces/zhengchong/CatVTON" style="margin: 0 2px;">

    <img src='https://img.shields.io/badge/Space-ZeroGPU-orange?style=flat&logo=Gradio&logoColor=red' alt='Demo'>

  </a>

  <a href='https://zheng-chong.github.io/CatVTON/' style="margin: 0 2px;">

    <img src='https://img.shields.io/badge/Webpage-Project-silver?style=flat&logo=&logoColor=orange' alt='webpage'>

  </a>

  <a href="https://github.com/Zheng-Chong/CatVTON/LICENCE" style="margin: 0 2px;">

    <img src='https://img.shields.io/badge/License-CC BY--NC--SA--4.0-lightgreen?style=flat&logo=Lisence' alt='License'>

  </a>

</div>




**CatVTON** is a simple and efficient virtual try-on diffusion model with ***1) Lightweight Network (899.06M parameters totally)***, ***2) Parameter-Efficient Training (49.57M parameters trainable)*** and ***3) Simplified Inference (< 8G VRAM for 1024X768 resolution)***.



## Updates
- **`2024/08/10`**: Our πŸ€— [**HuggingFace Space**](https://huggingface.co/spaces/zhengchong/CatVTON) is available now! Thanks for the grant from [**ZeroGPU**](https://huggingface.co/zero-gpu-explorers)!
- **`2024/08/09`**: [**Evaluation code**](https://github.com/Zheng-Chong/CatVTON?tab=readme-ov-file#3-calculate-metrics) is provided to calculate metrics πŸ“š.
- **`2024/07/27`**: We provide code and workflow for deploying CatVTON on [**ComfyUI**](https://github.com/Zheng-Chong/CatVTON?tab=readme-ov-file#comfyui-workflow) πŸ’₯.
- **`2024/07/24`**: Our [**Paper on ArXiv**](http://arxiv.org/abs/2407.15886) is available πŸ₯³!
- **`2024/07/22`**: Our [**App Code**](https://github.com/Zheng-Chong/CatVTON/blob/main/app.py) is released, deploy and enjoy CatVTON on your mechine πŸŽ‰!
- **`2024/07/21`**: Our [**Inference Code**](https://github.com/Zheng-Chong/CatVTON/blob/main/inference.py) and [**Weights** πŸ€—](https://huggingface.co/zhengchong/CatVTON) are released.
- **`2024/07/11`**: Our [**Online Demo**](http://120.76.142.206:8888) is released 😁.




## Installation
An [Installation Guide](https://github.com/Zheng-Chong/CatVTON/blob/main/INSTALL.md) is provided to help build the conda environment for CatVTON. When deploying the app, you will need Detectron2 & DensePose, which are not required for inference on datasets. Install the packages according to your needs.

## Deployment 
### ComfyUI Workflow
We have modified the main code to enable easy deployment of CatVTON on [ComfyUI](https://github.com/comfyanonymous/ComfyUI). Due to the incompatibility of the code structure, we have released this part in the [Releases](https://github.com/Zheng-Chong/CatVTON/releases/tag/ComfyUI), which includes the code placed under `custom_nodes` of ComfyUI and our workflow JSON files.

To deploy CatVTON to your ComfyUI, follow these steps:
1. Install all the requirements for both CatVTON and ComfyUI, refer to [Installation Guide for CatVTON](https://github.com/Zheng-Chong/CatVTON/blob/main/INSTALL.md) and [Installation Guide for ComfyUI](https://github.com/comfyanonymous/ComfyUI?tab=readme-ov-file#installing).
2. Download [`ComfyUI-CatVTON.zip`](https://github.com/Zheng-Chong/CatVTON/releases/download/ComfyUI/ComfyUI-CatVTON.zip) and unzip it in the `custom_nodes` folder under your ComfyUI project (clone from [ComfyUI](https://github.com/comfyanonymous/ComfyUI)).
3. Run the ComfyUI.
4. Download [`catvton_workflow.json`](https://github.com/Zheng-Chong/CatVTON/releases/download/ComfyUI/catvton_workflow.json) and drag it into you ComfyUI webpage and enjoy πŸ˜†!

> Problems under Windows OS, please refer to [issue#8](https://github.com/Zheng-Chong/CatVTON/issues/8).
> 

When you run the CatVTON workflow for the first time, the weight files will be automatically downloaded, usually taking dozens of minutes.

 
<!-- <div align="center">
 <img src="resource/img/comfyui.png" width="100%" height="100%"/>
</div> -->

### Gradio App

To deploy the Gradio App for CatVTON on your machine, run the following command, and checkpoints will be automatically downloaded from HuggingFace.

```PowerShell

CUDA_VISIBLE_DEVICES=0 python app.py \

--output_dir="resource/demo/output" \

--mixed_precision="bf16" \

--allow_tf32 

```
When using `bf16` precision, generating results with a resolution of `1024x768` only requires about `8G` VRAM.

## Inference
### 1. Data Preparation
Before inference, you need to download the [VITON-HD](https://github.com/shadow2496/VITON-HD) or [DressCode](https://github.com/aimagelab/dress-code) dataset.
Once the datasets are downloaded, the folder structures should look like these:
```

β”œβ”€β”€ VITON-HD

|   β”œβ”€β”€ test_pairs_unpaired.txt

β”‚   β”œβ”€β”€ test

|   |   β”œβ”€β”€ image

β”‚   β”‚   β”‚   β”œβ”€β”€ [000006_00.jpg | 000008_00.jpg | ...]

β”‚   β”‚   β”œβ”€β”€ cloth

β”‚   β”‚   β”‚   β”œβ”€β”€ [000006_00.jpg | 000008_00.jpg | ...]

β”‚   β”‚   β”œβ”€β”€ agnostic-mask

β”‚   β”‚   β”‚   β”œβ”€β”€ [000006_00_mask.png | 000008_00.png | ...]

...

```
For the DressCode dataset, we provide [our preprocessed agnostic masks](https://drive.google.com/drive/folders/1uT88nYQl0n5qHz6zngb9WxGlX4ArAbVX?usp=share_link), download and place in `agnostic_masks` folders under each category.
```

β”œβ”€β”€ DressCode

|   β”œβ”€β”€ test_pairs_paired.txt

|   β”œβ”€β”€ test_pairs_unpaired.txt

β”‚   β”œβ”€β”€ [dresses | lower_body | upper_body]

|   |   β”œβ”€β”€ test_pairs_paired.txt

|   |   β”œβ”€β”€ test_pairs_unpaired.txt

β”‚   β”‚   β”œβ”€β”€ images

β”‚   β”‚   β”‚   β”œβ”€β”€ [013563_0.jpg | 013563_1.jpg | 013564_0.jpg | 013564_1.jpg | ...]

β”‚   β”‚   β”œβ”€β”€ agnostic_masks

β”‚   β”‚   β”‚   β”œβ”€β”€ [013563_0.png| 013564_0.png | ...]

...

```

### 2. Inference on VTIONHD/DressCode
To run the inference on the DressCode or VITON-HD dataset, run the following command, checkpoints will be automatically downloaded from HuggingFace.

```PowerShell

CUDA_VISIBLE_DEVICES=0 python inference.py \

--dataset [dresscode | vitonhd] \

--data_root_path <path> \

--output_dir <path> 

--dataloader_num_workers 8 \

--batch_size 8 \

--seed 555 \

--mixed_precision [no | fp16 | bf16] \

--allow_tf32 \

--repaint \

--eval_pair  

```
### 3. Calculate Metrics

After obtaining the inference results, calculate the metrics using the following command: 

```PowerShell

CUDA_VISIBLE_DEVICES=0 python eval.py \

--gt_folder <your_path_to_gt_image_folder> \

--pred_folder <your_path_to_predicted_image_folder> \

--paired \

--batch_size=16 \

--num_workers=16 

```

-  `--gt_folder` and `--pred_folder` should be folders that contain **only images**.
- To evaluate the results in a paired setting, use `--paired`; for an unpaired setting, simply omit it.
- `--batch_size` and `--num_workers` should be adjusted based on your machine.


## Acknowledgement
Our code is modified based on [Diffusers](https://github.com/huggingface/diffusers). We adopt [Stable Diffusion v1.5 inpainting](https://huggingface.co/runwayml/stable-diffusion-inpainting) as the base model. We use [SCHP](https://github.com/GoGoDuck912/Self-Correction-Human-Parsing/tree/master) and [DensePose](https://github.com/facebookresearch/DensePose) to automatically generate masks in our [Gradio](https://github.com/gradio-app/gradio) App and [ComfyUI](https://github.com/comfyanonymous/ComfyUI) workflow. Thanks to all the contributors!

## License
All the materials, including code, checkpoints, and demo, are made available under the [Creative Commons BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) license. You are free to copy, redistribute, remix, transform, and build upon the project for non-commercial purposes, as long as you give appropriate credit and distribute your contributions under the same license.


## Citation

```bibtex

@misc{chong2024catvtonconcatenationneedvirtual,

 title={CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Models}, 

 author={Zheng Chong and Xiao Dong and Haoxiang Li and Shiyue Zhang and Wenqing Zhang and Xujie Zhang and Hanqing Zhao and Xiaodan Liang},

 year={2024},

 eprint={2407.15886},

 archivePrefix={arXiv},

 primaryClass={cs.CV},

 url={https://arxiv.org/abs/2407.15886}, 

}

```