|
--- |
|
license: apache-2.0 |
|
base_model: stabilityai/stable-diffusion-xl-base-1.0 |
|
tags: |
|
- stable-diffusion-xl |
|
- stable-diffusion-xl-diffusers |
|
- text-to-image |
|
- diffusers |
|
- controlnet |
|
inference: false |
|
language: |
|
- en |
|
pipeline_tag: text-to-image |
|
--- |
|
|
|
# Softedge ControlNet |
|
EcomXL contains a series of text-to-image diffusion models optimized for e-commerce scenarios, developed based on [Stable Diffusion XL](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0).<br/> |
|
The controlnet weights are fine-tuned based on stable-diffusion-xl-base-1.0. |
|
It works good on SDXL as well as community models based on SDXL. |
|
The model is trained on general data and taobao e-commerce data, and has good capabilities in both general and e-commerce scenarios. |
|
|
|
## Examples |
|
These cases are generated using AUTOMATIC1111/stable-diffusion-webui. |
|
|
|
|
|
`softedge`|`weight-0.6`|`weight-0.8` |
|
:--:|:--:|:--: |
|
![images)](./images/1_0.png) | ![images)](./images/1_1.png) | ![images)](./images/1_2.png) |
|
![images)](./images/2_0.png) | ![images)](./images/2_1.png) | ![images)](./images/2_2.png) |
|
![images)](./images/3_0.png) | ![images)](./images/3_1.png) | ![images)](./images/3_2.png) |
|
![images)](./images/4_0.png) | ![images)](./images/4_1.png) | ![images)](./images/4_2.png) |
|
|
|
|
|
## Usage with Diffusers |
|
```python |
|
from diffusers import ( |
|
ControlNetModel, |
|
StableDiffusionXLControlNetPipeline, |
|
DPMSolverMultistepScheduler, |
|
AutoencoderKL |
|
) |
|
from diffusers.utils import load_image |
|
from controlnet_aux import PidiNetDetector, HEDdetector |
|
import torch |
|
from PIL import Image |
|
|
|
controlnet = ControlNetModel.from_pretrained( |
|
"alimama-creative/EcomXL_controlnet_softedge", torch_dtype=torch.float16, use_safetensors=True |
|
) |
|
vae = AutoencoderKL.from_pretrained('madebyollin/sdxl-vae-fp16-fix', torch_dtype=torch.float16) |
|
pipe = StableDiffusionXLControlNetPipeline.from_pretrained( |
|
"stabilityai/stable-diffusion-xl-base-1.0", |
|
controlnet=controlnet, |
|
vae=vae, |
|
torch_dtype=torch.float16 |
|
) |
|
|
|
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config) |
|
# pipe.enable_xformers_memory_efficient_attention() |
|
pipe.to(device="cuda", dtype=torch.float16) |
|
pipe.enable_vae_slicing() |
|
|
|
|
|
image = load_image( |
|
"https://huggingface.co/alimama-creative/EcomXL_controlnet_softedge/resolve/main/images/1_1.png" |
|
) |
|
edge_processor = PidiNetDetector.from_pretrained('lllyasviel/Annotators') |
|
edge_image = edge_processor(image, safe=False) # set True to use pidisafe |
|
|
|
prompt="a bottle on the Twilight Grassland, Sitting on the ground, a couple of tall grass sitting in a field of tall grass, sunset," |
|
negative_prompt = "low quality, bad quality, sketches" |
|
|
|
output = pipe( |
|
prompt, |
|
negative_prompt=negative_prompt, |
|
image=edge_image, |
|
num_inference_steps=25, |
|
controlnet_conditioning_scale=0.6, |
|
guidance_scale=7, |
|
width=1024, |
|
height=1024, |
|
).images[0] |
|
|
|
output.save(f'test_edge.png') |
|
|
|
``` |
|
The model exhibits good performance when the controlnet weight (controlnet_condition_scale) is within the range of 0.6 to 0.8. |
|
|
|
## Training details |
|
|
|
Mixed precision: FP16<br/> |
|
Learning rate: 1e-5<br/> |
|
batch size: 1024<br/> |
|
Noise offset: 0.05<br/> |
|
The model is trained for 37k steps. |
|
The training data includes 12M laion2B and internal sources images with aesthetic 6 plus, as well as 3M Taobao e-commerce images. The softedge preproessor during training is randomly selected from pidinet, hed, pidisafe and hedsafe, which are officially supported by Automatic&&Mikubill. The model has good performance when the weight is in 0.6~0.8. |
|
|
|
|