diff --git a/README.md b/README.md
index cfcbc6a..b496a0e 100644
--- a/README.md
+++ b/README.md
@@ -1,4 +1,4 @@
-# Image Captioning with BLIP
+# Image Captioning with ClipCap
*author: David Wang*
@@ -9,7 +9,7 @@
## Description
-This operator generates the caption with [BLIP](https://arxiv.org/abs/2201.12086) which describes the content of the given image. This is an adaptation from [salesforce/BLIP](https://github.com/salesforce/BLIP).
+This operator generates the caption with [ClipCap](https://arxiv.org/abs/2111.09734) which describes the content of the given image. ClipCap uses CLIP encoding as a prefix to the caption, by employing a simple mapping network, and then fine-tunes a language model to generate the image captions. This is an adaptation from [rmokady/CLIP_prefix_caption](https://github.com/rmokady/CLIP_prefix_caption).
@@ -17,17 +17,16 @@ This operator generates the caption with [BLIP](https://arxiv.org/abs/2201.12086
## Code Example
-Load an image from path './animals.jpg' to generate the caption.
+Load an image from path './hulk.jpg' to generate the caption.
*Write the pipeline in simplified style*:
```python
import towhee
-towhee.glob('./animals.jpg') \
+towhee.glob('./hulk.jpg') \
.image_decode() \
- .image_captioning.blip(model_name='blip_base') \
- .select() \
+ .image_captioning.clipcap(model_name='clipcap_coco') \
.show()
```
@@ -37,9 +36,9 @@ towhee.glob('./animals.jpg') \
```python
import towhee
-towhee.glob['path']('./animals.jpg') \
+towhee.glob['path']('./hulk.jpg') \
.image_decode['path', 'img']() \
- .image_captioning.blip['img', 'text'](model_name='blip_base') \
+ .image_captioning.clipcap['img', 'text'](model_name='clipcap_coco') \
.select['img', 'text']() \
.show()
```
@@ -54,14 +53,15 @@ towhee.glob['path']('./animals.jpg') \
Create the operator via the following factory method
-***blip(model_name)***
+***clipcap(model_name)***
**Parameters:**
***model_name:*** *str*
- The model name of BLIP. Supported model names:
-- blip_base
+ The model name of ClipCap. Supported model names:
+- clipcap_coco
+- clipcap_conceptual
@@ -74,7 +74,7 @@ An image-text embedding operator takes a [towhee image](link/to/towhee/image/api
**Parameters:**
- ***data:*** *towhee.types.Image (a sub-class of numpy.ndarray)* or *str*
+ ***data:*** *towhee.types.Image (a sub-class of numpy.ndarray)*
The image to generate embedding.
diff --git a/cap.png b/cap.png
new file mode 100644
index 0000000..34bfa97
Binary files /dev/null and b/cap.png differ
diff --git a/tabular.png b/tabular.png
new file mode 100644
index 0000000..77b1f9a
Binary files /dev/null and b/tabular.png differ