logo
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Readme
Files and versions

73 lines
1.7 KiB

# Fine-grained Image Captioning with CLIP Reward
*author: David Wang*
<br />
## Description
This operator generates the caption with [CLIPReward](https://arxiv.org/abs/2205.13115) which describes the content of the given image. CLIPReward uses CLIP as a reward function and a simple finetuning strategy of the CLIP text encoder to impove grammar that does not require extra text annotation, thus towards to more descriptive and distinctive caption generation. This is an adaptation from [j-min/CLIP-Caption-Reward](https://github.com/j-min/CLIP-Caption-Reward).
<br />
## Code Example
Load an image from path './animals.jpg' to generate the caption.
*Write a pipeline with explicit inputs/outputs name specifications:*
```python
from towhee.dc2 import pipe, ops, DataCollection
p = (
pipe.input('url')
.map('url', 'img', ops.image_decode.cv2_rgb())
.map('img', 'text', ops.image_captioning.clip_caption_reward(model_name='clipRN50_clips_grammar'))
.output('img', 'text')
)
DataCollection(p('./animals.jpg')).show()
```
<img src="./tabular.png" alt="result2" style="height:60px;"/>
<br />
## Factory Constructor
Create the operator via the following factory method
***clip_caption_reward(model_name)***
**Parameters:**
***model_name:*** *str*
​ The model name of CLIPReward. Supported model names:
- clipRN50_clips_grammar
<br />
## Interface
An image captioning operator takes a [towhee image](link/to/towhee/image/api/doc) as input and generate the correspoing caption.
**Parameters:**
***img:*** *towhee.types.Image (a sub-class of numpy.ndarray)*
​ The image to generate caption.
**Returns:** *str*
​ The caption generated by model.
2 years ago