Spaces:

multimodalart
/

stable-diffusion-inpainting

App Files Files Community

stable-diffusion-inpainting / clipseg /Readme.md

apolinario's picture

upload clipseg

48fa639 about 2 years ago

|

3.34 kB

	# Image Segmentation Using Text and Image Prompts
	This repository contains the code used in the paper ["Image Segmentation Using Text and Image Prompts"](https://arxiv.org/abs/2112.10003).

	The Paper has been accepted to CVPR 2022!

	<img src="overview.png" alt="drawing" height="200em"/>

	The systems allows to create segmentation models without training based on:
	- An arbitrary text query
	- Or an image with a mask highlighting stuff or an object.

	### Quick Start

	In the `Quickstart.ipynb` notebook we provide the code for using a pre-trained CLIPSeg model. If you run the notebook locally, make sure you downloaded the `rd64-uni.pth` weights, either manually or via git lfs extension.
	It can also be used interactively using [MyBinder](https://mybinder.org/v2/gh/timojl/clipseg/HEAD?labpath=Quickstart.ipynb)
	(please note that the VM does not use a GPU, thus inference takes a few seconds).


	### Dependencies
	This code base depends on pytorch, torchvision and clip (`pip install git+https://github.com/openai/CLIP.git`).
	Additional dependencies are hidden for double blind review.


	### Datasets

	* `PhraseCut` and `PhraseCutPlus`: Referring expression dataset
	* `PFEPascalWrapper`: Wrapper class for PFENet's Pascal-5i implementation
	* `PascalZeroShot`: Wrapper class for PascalZeroShot
	* `COCOWrapper`: Wrapper class for COCO.

	### Models

	* `CLIPDensePredT`: CLIPSeg model with transformer-based decoder.
	* `ViTDensePredT`: CLIPSeg model with transformer-based decoder.

	### Third Party Dependencies
	For some of the datasets third party dependencies are required. Run the following commands in the `third_party` folder.
	```bash
	git clone https://github.com/cvlab-yonsei/JoEm
	git clone https://github.com/Jia-Research-Lab/PFENet.git
	git clone https://github.com/ChenyunWu/PhraseCutDataset.git
	git clone https://github.com/juhongm999/hsnet.git
	```

	### Weights

	The MIT license does not apply to these weights.

	We provide two model weights, for D=64 (4.1MB) and D=16 (1.1MB).
	```
	wget https://owncloud.gwdg.de/index.php/s/ioHbRzFx6th32hn/download -O weights.zip
	unzip -d weights -j weights.zip
	```


	### Training and Evaluation

	To train use the `training.py` script with experiment file and experiment id parameters. E.g. `python training.py phrasecut.yaml 0` will train the first phrasecut experiment which is defined by the `configuration` and first `individual_configurations` parameters. Model weights will be written in `logs/`.

	For evaluation use `score.py`. E.g. `python score.py phrasecut.yaml 0 0` will train the first phrasecut experiment of `test_configuration` and the first configuration in `individual_configurations`.


	### Usage of PFENet Wrappers

	In order to use the dataset and model wrappers for PFENet, the PFENet repository needs to be cloned to the root folder.
	`git clone https://github.com/Jia-Research-Lab/PFENet.git `


	### License

	The source code files in this repository (excluding model weights) are released under MIT license.

	### Citation
	```
	@InProceedings{lueddecke22_cvpr,
	author = {L\"uddecke, Timo and Ecker, Alexander},
	title = {Image Segmentation Using Text and Image Prompts},
	booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
	month = {June},
	year = {2022},
	pages = {7086-7096}
	}

	```