Your need to download dataset from [Facebook AI Image Similarity Challenge: Descriptor Track](https://www.drivendata.org/competitions/79/competition-image-similarity-1-dev/), and use the `./training_images` folder as your training images root. It requires about 165G space.
ISC is trained using [contrastive learning](https://lilianweng.github.io/posts/2021-05-31-contrastive/), which is a type of self-supervised training. The training images do not require any labels. We only need to prepare a folder `./training_images`, under which a large number of diverse training images can be stored.
In the original training of [ISC21-Descriptor-Track-1st](https://github.com/lyakaap/ISC21-Descriptor-Track-1st), the training dataset is a huge dataset which takes more than 165G space. And it uses [multi-steps training strategy](https://arxiv.org/abs/2104.00298).
In our fine-tune example, to simplification, we prepare a small dataset to run, and you can replace it with your own custom dataset.
Your can change [training script](https://towhee.io/image-embedding/isc/src/branch/main/train_isc.py) in your way.
Your can change [training script](https://towhee.io/image-embedding/isc/src/branch/main/train_isc.py) in your custom way.
Or your can refer to the [original repo](https://github.com/lyakaap/ISC21-Descriptor-Track-1st) and [paper](https://arxiv.org/abs/2112.04323) to learn more about contrastive learning and image instance retrieval.