logo
Browse Source

improve the readme.

Signed-off-by: wxywb <xy.wang@zilliz.com>
main
wxywb 2 years ago
parent
commit
465b48eb5e
  1. 24
      README.md
  2. BIN
      cap.png
  3. BIN
      tabular.png

24
README.md

@ -1,4 +1,4 @@
# Image Captioning with BLIP
# Image Captioning with ClipCap
*author: David Wang*
@ -9,7 +9,7 @@
## Description
This operator generates the caption with [BLIP](https://arxiv.org/abs/2201.12086) which describes the content of the given image. This is an adaptation from [salesforce/BLIP](https://github.com/salesforce/BLIP).
This operator generates the caption with [ClipCap](https://arxiv.org/abs/2111.09734) which describes the content of the given image. ClipCap uses CLIP encoding as a prefix to the caption, by employing a simple mapping network, and then fine-tunes a language model to generate the image captions. This is an adaptation from [rmokady/CLIP_prefix_caption](https://github.com/rmokady/CLIP_prefix_caption).
<br />
@ -17,17 +17,16 @@ This operator generates the caption with [BLIP](https://arxiv.org/abs/2201.12086
## Code Example
Load an image from path './animals.jpg' to generate the caption.
Load an image from path './hulk.jpg' to generate the caption.
*Write the pipeline in simplified style*:
```python
import towhee
towhee.glob('./animals.jpg') \
towhee.glob('./hulk.jpg') \
.image_decode() \
.image_captioning.blip(model_name='blip_base') \
.select() \
.image_captioning.clipcap(model_name='clipcap_coco') \
.show()
```
<img src="./cap.png" alt="result1" style="height:20px;"/>
@ -37,9 +36,9 @@ towhee.glob('./animals.jpg') \
```python
import towhee
towhee.glob['path']('./animals.jpg') \
towhee.glob['path']('./hulk.jpg') \
.image_decode['path', 'img']() \
.image_captioning.blip['img', 'text'](model_name='blip_base') \
.image_captioning.clipcap['img', 'text'](model_name='clipcap_coco') \
.select['img', 'text']() \
.show()
```
@ -54,14 +53,15 @@ towhee.glob['path']('./animals.jpg') \
Create the operator via the following factory method
***blip(model_name)***
***clipcap(model_name)***
**Parameters:**
***model_name:*** *str*
​ The model name of BLIP. Supported model names:
- blip_base
​ The model name of ClipCap. Supported model names:
- clipcap_coco
- clipcap_conceptual
<br />
@ -74,7 +74,7 @@ An image-text embedding operator takes a [towhee image](link/to/towhee/image/api
**Parameters:**
***data:*** *towhee.types.Image (a sub-class of numpy.ndarray)* or *str*
***data:*** *towhee.types.Image (a sub-class of numpy.ndarray)*
​ The image to generate embedding.

BIN
cap.png

Binary file not shown.

After

Width:  |  Height:  |  Size: 9.8 KiB

BIN
tabular.png

Binary file not shown.

After

Width:  |  Height:  |  Size: 107 KiB

Loading…
Cancel
Save