Fine-grained Image Captioning with CLIP Reward

author: David Wang

Description

This operator generates the caption with CLIPReward which describes the content of the given image. CLIPReward uses CLIP as a reward function and a simple finetuning strategy of the CLIP text encoder to impove grammar that does not require extra text annotation, thus towards to more descriptive and distinctive caption generation. This is an adaptation from j-min/CLIP-Caption-Reward.

Code Example

Load an image from path './animals.jpg' to generate the caption.

Write a pipeline with explicit inputs/outputs name specifications:

from towhee import pipe, ops, DataCollection

p = (
    pipe.input('url')
    .map('url', 'img', ops.image_decode.cv2_rgb())
    .map('img', 'text', ops.image_captioning.clip_caption_reward(model_name='clipRN50_clips_grammar'))
    .output('img', 'text')
)

DataCollection(p('./animals.jpg')).show()

Factory Constructor

Create the operator via the following factory method

clip_caption_reward(model_name)

Parameters:

model_name: str

The model name of CLIPReward. Supported model names:

clipRN50_clips_grammar

Interface

An image captioning operator takes a towhee image as input and generate the correspoing caption.

Parameters:

img: towhee.types.Image (a sub-class of numpy.ndarray)

The image to generate caption.

Returns: str

The caption generated by model.

wxywb ddf5fa908a remove dc2 Signed-off-by: wxywb <xy.wang@zilliz.com>			9 Commits
captioning		init the repo.	3 years ago
configs		init the repo.	3 years ago
data		init the repo.	3 years ago
mclip		update the repo.	3 years ago
utils		init the repo.	3 years ago
weights		update the operator.	3 years ago
.DS_Store	8.0 KiB	update the repo.	3 years ago
.gitattributes	1.1 KiB	Initial commit	3 years ago
.gitignore	7 B	update the operator.	3 years ago
README.md	1.7 KiB	remove dc2	3 years ago
__init__.py	729 B	update the repo.	3 years ago
cap.png	12 KiB	update the repo.	3 years ago
clip_caption_reward.py	5.5 KiB	update the readme.	3 years ago
requirements.txt	52 B	amend the requirement.	3 years ago
tabular.png	94 KiB	update the repo.	3 years ago
transformer_model.py	14 KiB	init the repo.	3 years ago