Readme

Files and versions

1.9 KiB

Raw Blame History

Fine-grained Image Captioning with CLIP Reward

author: David Wang

Description

This operator generates the caption with CLIPReward which describes the content of the given image. CLIPReward uses CLIP as a reward function and a simple finetuning strategy of the CLIP text encoder to impove grammar that does not require extra text annotation, thus towards to more descriptive and distinctive caption generation. This is an adaptation from j-min/CLIP-Caption-Reward.

Code Example

Load an image from path './animals.jpg' to generate the caption.

Write the pipeline in simplified style:

import towhee

towhee.glob('./animals.jpg') \
      .image_decode() \
      .image_captioning.clip_caption_reward(model_name='clipRN50_clips_grammar') \
      .show()

Write a same pipeline with explicit inputs/outputs name specifications:

import towhee

towhee.glob['path']('./animals.jpg') \
      .image_decode['path', 'img']() \
      .image_captioning.clip_caption_reward['img', 'text'](model_name='clipRN50_clips_grammar') \
      .select['img', 'text']() \
      .show()

Factory Constructor

Create the operator via the following factory method

clip_caption_reward(model_name)

Parameters:

model_name: str

The model name of BLIP. Supported model names:

clipRN50_clips_grammar

Interface

An image-text embedding operator takes a towhee image as input and generate the correspoing caption.

Parameters:

img: towhee.types.Image (a sub-class of numpy.ndarray)

The image to generate embedding.

Returns: str

The caption generated by model.

1.9 KiB

Raw Blame History

Fine-grained Image Captioning with CLIP Reward

author: David Wang

Description

Code Example

Load an image from path './animals.jpg' to generate the caption.

Write the pipeline in simplified style:

import towhee

towhee.glob('./animals.jpg') \
      .image_decode() \
      .image_captioning.clip_caption_reward(model_name='clipRN50_clips_grammar') \
      .show()

Write a same pipeline with explicit inputs/outputs name specifications:

import towhee

towhee.glob['path']('./animals.jpg') \
      .image_decode['path', 'img']() \
      .image_captioning.clip_caption_reward['img', 'text'](model_name='clipRN50_clips_grammar') \
      .select['img', 'text']() \
      .show()

Factory Constructor

Create the operator via the following factory method

clip_caption_reward(model_name)

Parameters:

model_name: str

The model name of BLIP. Supported model names:

clipRN50_clips_grammar

Interface

An image-text embedding operator takes a towhee image as input and generate the correspoing caption.

Parameters:

img: towhee.types.Image (a sub-class of numpy.ndarray)

The image to generate embedding.

Returns: str

The caption generated by model.

Readme

Files and versions

1.9 KiB Raw Blame History

Fine-grained Image Captioning with CLIP Reward

Description

Code Example

Factory Constructor

Interface

1.9 KiB Raw Blame History

Fine-grained Image Captioning with CLIP Reward

Description

Code Example

Factory Constructor

Interface

1.9 KiB

Raw Blame History

1.9 KiB

Raw Blame History