copied
Readme
Files and versions
Updated 2 years ago
image-captioning
Fine-grained Image Captioning with CLIP Reward
author: David Wang
Description
This operator generates the caption with CLIPReward which describes the content of the given image. CLIPReward uses CLIP as a reward function and a simple finetuning strategy of the CLIP text encoder to impove grammar that does not require extra text annotation, thus towards to more descriptive and distinctive caption generation. This is an adaptation from j-min/CLIP-Caption-Reward.
Code Example
Load an image from path './animals.jpg' to generate the caption.
Write the pipeline in simplified style:
import towhee
towhee.glob('./animals.jpg') \
.image_decode() \
.image_captioning.clip_caption_reward(model_name='clipRN50_clips_grammar') \
.show()
Write a same pipeline with explicit inputs/outputs name specifications:
import towhee
towhee.glob['path']('./animals.jpg') \
.image_decode['path', 'img']() \
.image_captioning.clip_caption_reward['img', 'text'](model_name='clipRN50_clips_grammar') \
.select['img', 'text']() \
.show()
Factory Constructor
Create the operator via the following factory method
clip_caption_reward(model_name)
Parameters:
model_name: str
The model name of CLIPReward. Supported model names:
- clipRN50_clips_grammar
Interface
An image captioning operator takes a towhee image as input and generate the correspoing caption.
Parameters:
img: towhee.types.Image (a sub-class of numpy.ndarray)
The image to generate caption.
Returns: str
The caption generated by model.
wxywb
dd8e14ef07
| 7 Commits | ||
---|---|---|---|
captioning | 2 years ago | ||
configs | 2 years ago | ||
data | 2 years ago | ||
mclip | 2 years ago | ||
utils | 2 years ago | ||
weights | 2 years ago | ||
.DS_Store |
8.0 KiB
|
2 years ago | |
.gitattributes |
1.1 KiB
|
2 years ago | |
.gitignore |
7 B
|
2 years ago | |
README.md |
1.9 KiB
|
2 years ago | |
__init__.py |
729 B
|
2 years ago | |
cap.png |
12 KiB
|
2 years ago | |
clip_caption_reward.py |
5.4 KiB
|
2 years ago | |
requirements.txt |
52 B
|
2 years ago | |
tabular.png |
94 KiB
|
2 years ago | |
transformer_model.py |
14 KiB
|
2 years ago |