logo
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Readme
Files and versions

Updated 1 year ago

image-captioning

Image Captioning with CaMEL

author: David Wang


Description

This operator generates the caption with CaMEL which describes the content of the given image. CaMEL is a novel Transformer-based architecture for image captioning which leverages the interaction of two interconnected language models that learn from each other during the training phase. The interplay between the two language models follows a mean teacher learning paradigm with knowledge distillation. This is an adaptation from aimagelab/camel.


Code Example

Load an image from path './image.jpg' to generate the caption.

Write the pipeline in simplified style:

import towhee

towhee.glob('./image.jpg') \
      .image_decode() \
      .image_captioning.camel(model_name='camel_mesh') \
      .show()
result1

Write a same pipeline with explicit inputs/outputs name specifications:

import towhee

towhee.glob['path']('./image.jpg') \
      .image_decode['path', 'img']() \
      .image_captioning.camel['img', 'text'](model_name='camel_mesh') \
      .select['img', 'text']() \
      .show()
result2


Factory Constructor

Create the operator via the following factory method

camel(model_name)

Parameters:

model_name: str

​ The model name of CaMEL. Supported model names:

  • camel_mesh


Interface

An image captioning operator takes a towhee image as input and generate the correspoing caption.

Parameters:

data: towhee.types.Image (a sub-class of numpy.ndarray)

​ The image to generate caption.

Returns: str

​ The caption generated by model.

More Resources

  • What is a Transformer Model? An Engineer's Guide: A transformer model is a neural network architecture. It's proficient in converting a particular type of input into a distinct output. Its core strength lies in its ability to handle inputs and outputs of different sequence length. It does this through encoding the input into a matrix with predefined dimensions and then combining that with another attention matrix to decode. This transformation unfolds through a sequence of collaborative layers, which deconstruct words into their corresponding numerical representations.

At its heart, a transformer model is a bridge between disparate linguistic structures, employing sophisticated neural network configurations to decode and manipulate human language input. An example of a transformer model is GPT-3, which ingests human language and generates text output.

Jael Gu dd4f8ad67e Add more resources 7 Commits
folder-icon data init the operator. 3 years ago
folder-icon models init the operator. 3 years ago
folder-icon utils update the camel. 3 years ago
file-icon .gitattributes
1.1 KiB
download-icon
Initial commit 3 years ago
file-icon README.md
4.7 KiB
download-icon
Add more resources 1 year ago
file-icon __init__.py
676 B
download-icon
init the operator. 3 years ago
file-icon camel.py
3.9 KiB
download-icon
update the figure. 3 years ago
file-icon cap.png
8.0 KiB
download-icon
update the figure. 3 years ago
file-icon requirements.txt
55 B
download-icon
update the operator. 3 years ago
file-icon tabular.png
173 KiB
download-icon
update the figure. 3 years ago