logo
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Readme
Files and versions

2.2 KiB

Our Implementation of the ZeroCap Baseline Model


Catalogue:


1. Environment Preparation:

To install the correct environment, please run the following command:

pip install -r requirements.txt

2. Image Captioning on MSCOCO:

To perform image captioning on MSCOCO, please run the following command:

chmod +x ./mscoco_zerocap.sh
./mscoco_zerocap.sh

3. Image Captioning on Flickr30k:

To perform image captioning on Flickr30k, please run the following command:

chmod +x ./flickr30k_zerocap.sh
./flickr30k_zerocap.sh

4. Cross Domain Image Captioning on MSCOCO:

To perform image captioning on MSCOCO with the language model from Flickr30k domain, please run the following command:

chmod +x ./flickr30k_to_mscoco_zerocap.sh
./flickr30k_to_mscoco_zerocap.sh

5. Cross Domain Image Captioning on Flickr30k:

To perform image captioning on Flickr30k with the language model from MSCOCO domain, please run the following command:

chmod +x ./mscoco_to_flickr30k_zerocap.sh
./mscoco_to_flickr30k_zerocap.sh

6. Citation:

If you find our code helpful, please cite the original paper as

@article{tewel2021zero,
  title={Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic},
  author={Tewel, Yoad and Shalev, Yoav and Schwartz, Idan and Wolf, Lior},
  journal={arXiv preprint arXiv:2111.14447},
  year={2021}
}

7. Acknowledgements:

We thank the authors for releasing their code. Our reimplementation of the baseline is based on their original codebase [here].

2.2 KiB

Our Implementation of the ZeroCap Baseline Model


Catalogue:


1. Environment Preparation:

To install the correct environment, please run the following command:

pip install -r requirements.txt

2. Image Captioning on MSCOCO:

To perform image captioning on MSCOCO, please run the following command:

chmod +x ./mscoco_zerocap.sh
./mscoco_zerocap.sh

3. Image Captioning on Flickr30k:

To perform image captioning on Flickr30k, please run the following command:

chmod +x ./flickr30k_zerocap.sh
./flickr30k_zerocap.sh

4. Cross Domain Image Captioning on MSCOCO:

To perform image captioning on MSCOCO with the language model from Flickr30k domain, please run the following command:

chmod +x ./flickr30k_to_mscoco_zerocap.sh
./flickr30k_to_mscoco_zerocap.sh

5. Cross Domain Image Captioning on Flickr30k:

To perform image captioning on Flickr30k with the language model from MSCOCO domain, please run the following command:

chmod +x ./mscoco_to_flickr30k_zerocap.sh
./mscoco_to_flickr30k_zerocap.sh

6. Citation:

If you find our code helpful, please cite the original paper as

@article{tewel2021zero,
  title={Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic},
  author={Tewel, Yoad and Shalev, Yoav and Schwartz, Idan and Wolf, Lior},
  journal={arXiv preprint arXiv:2111.14447},
  year={2021}
}

7. Acknowledgements:

We thank the authors for releasing their code. Our reimplementation of the baseline is based on their original codebase [here].