Readme

Files and versions

2.0 KiB

Raw Blame History

Image Captioning with MAGIC

author: David Wang

Description

This operator generates the caption with MAGIC which describes the content of the given image. MAGIC is a simple yet efficient plug-and-play framework, which directly combines an off-the-shelf LM (i.e., GPT-2) and an image-text matching model (i.e., CLIP) for image-grounded text generation. During decoding, MAGIC influences the generation of the LM by introducing a CLIP-induced score, called magic score, which regularizes the generated result to be semantically related to a given image while being coherent to the previously generated context. This is an adaptation from yxuansu / MAGIC.

Code Example

Load an image from path './image.jpg' to generate the caption.

Write the pipeline in simplified style:

import towhee

towhee.glob('./image.jpg') \
      .image_decode() \
      .image_captioning.magic(model_name='expansionnet_rf') \
      .show()

Write a same pipeline with explicit inputs/outputs name specifications:

import towhee

towhee.glob['path']('./image.jpg') \
      .image_decode['path', 'img']() \
      .image_captioning.magic['img', 'text'](model_name='expansionnet_rf') \
      .select['img', 'text']() \
      .show()

Factory Constructor

Create the operator via the following factory method

expansionnet_v2(model_name)

Parameters:

model_name: str

The model name of MAGIC. Supported model names:

magic_mscoco

Interface

An image-text embedding operator takes a towhee image as input and generate the correspoing caption.

Parameters:

data: towhee.types.Image (a sub-class of numpy.ndarray)

The image to generate embedding.

Returns: str

The caption generated by model.

2.0 KiB

Raw Blame History

Image Captioning with MAGIC

author: David Wang

Description

Code Example

Load an image from path './image.jpg' to generate the caption.

Write the pipeline in simplified style:

import towhee

towhee.glob('./image.jpg') \
      .image_decode() \
      .image_captioning.magic(model_name='expansionnet_rf') \
      .show()

Write a same pipeline with explicit inputs/outputs name specifications:

import towhee

towhee.glob['path']('./image.jpg') \
      .image_decode['path', 'img']() \
      .image_captioning.magic['img', 'text'](model_name='expansionnet_rf') \
      .select['img', 'text']() \
      .show()

Factory Constructor

Create the operator via the following factory method

expansionnet_v2(model_name)

Parameters:

model_name: str

The model name of MAGIC. Supported model names:

magic_mscoco

Interface

An image-text embedding operator takes a towhee image as input and generate the correspoing caption.

Parameters:

data: towhee.types.Image (a sub-class of numpy.ndarray)

The image to generate embedding.

Returns: str

The caption generated by model.

Readme

Files and versions

2.0 KiB Raw Blame History

Image Captioning with MAGIC

Description

Code Example

Factory Constructor

Interface

2.0 KiB Raw Blame History

Image Captioning with MAGIC

Description

Code Example

Factory Constructor

Interface

2.0 KiB

Raw Blame History

2.0 KiB

Raw Blame History