diff --git a/README.md b/README.md index cfcbc6a..b496a0e 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -# Image Captioning with BLIP +# Image Captioning with ClipCap *author: David Wang* @@ -9,7 +9,7 @@ ## Description -This operator generates the caption with [BLIP](https://arxiv.org/abs/2201.12086) which describes the content of the given image. This is an adaptation from [salesforce/BLIP](https://github.com/salesforce/BLIP). +This operator generates the caption with [ClipCap](https://arxiv.org/abs/2111.09734) which describes the content of the given image. ClipCap uses CLIP encoding as a prefix to the caption, by employing a simple mapping network, and then fine-tunes a language model to generate the image captions. This is an adaptation from [rmokady/CLIP_prefix_caption](https://github.com/rmokady/CLIP_prefix_caption).
@@ -17,17 +17,16 @@ This operator generates the caption with [BLIP](https://arxiv.org/abs/2201.12086 ## Code Example -Load an image from path './animals.jpg' to generate the caption. +Load an image from path './hulk.jpg' to generate the caption. *Write the pipeline in simplified style*: ```python import towhee -towhee.glob('./animals.jpg') \ +towhee.glob('./hulk.jpg') \ .image_decode() \ - .image_captioning.blip(model_name='blip_base') \ - .select() \ + .image_captioning.clipcap(model_name='clipcap_coco') \ .show() ``` result1 @@ -37,9 +36,9 @@ towhee.glob('./animals.jpg') \ ```python import towhee -towhee.glob['path']('./animals.jpg') \ +towhee.glob['path']('./hulk.jpg') \ .image_decode['path', 'img']() \ - .image_captioning.blip['img', 'text'](model_name='blip_base') \ + .image_captioning.clipcap['img', 'text'](model_name='clipcap_coco') \ .select['img', 'text']() \ .show() ``` @@ -54,14 +53,15 @@ towhee.glob['path']('./animals.jpg') \ Create the operator via the following factory method -***blip(model_name)*** +***clipcap(model_name)*** **Parameters:** ​ ***model_name:*** *str* -​ The model name of BLIP. Supported model names: -- blip_base +​ The model name of ClipCap. Supported model names: +- clipcap_coco +- clipcap_conceptual
@@ -74,7 +74,7 @@ An image-text embedding operator takes a [towhee image](link/to/towhee/image/api **Parameters:** -​ ***data:*** *towhee.types.Image (a sub-class of numpy.ndarray)* or *str* +​ ***data:*** *towhee.types.Image (a sub-class of numpy.ndarray)* ​ The image to generate embedding. diff --git a/cap.png b/cap.png new file mode 100644 index 0000000..34bfa97 Binary files /dev/null and b/cap.png differ diff --git a/tabular.png b/tabular.png new file mode 100644 index 0000000..77b1f9a Binary files /dev/null and b/tabular.png differ