lightningdot
copied
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Readme
Files and versions
Updated 2 years ago
image-text-embedding
Universal-Image-Text-Transformer
Research code for pre-training universal vision and language models
Requirements
nvidia driver (418.xx), docker(19.03+), nvidia-container-toolkit
docker pull convaicontainerregistry1.azurecr.io/img-txt
lauching the environment
# can use CUDA_VISIBLE_DEVICES to seperate GPUs for each container
source launch_container.sh $TXT_DB $IMG_DIR $OUTPUT $PRETRAIN_PATH
# TXT_DB: convaistorage2share2/TXT_DB_v3
# IMG_DIR: convaistorage2share2/Bottom-up-features/adaptive/npy_per_img_id
# OUTPUT: somewhere to store model checkpoint (can be on share storage)
# PRETRAIN: path to pretrained model
# when need to preprocessing
source launch_container.sh $TXT_DB $IMG_DIR $OUTPUT $PRETRAIN_PATH --prepro
# this will make /db writable
# multi-node training
source launch_container_dist.sh $TXT_DB $IMG_DIR $OUTPUT $PRETRAIN_PATH
Pretrain
# inside the docker container
horovodrun -np $N_GPU -H localhost:$N_GPU \
python pretrain.py --config config/config-pretrain-alltask.json
finetune VQA
horovodrun -np 2 -H localhost:2 \
python train_vqa.py --config config/config-vqa-bert-2gpu-alldata.json
VQA inference
# single node only
# please refer to code for commandline options
horovodrun -np $N_GPU -H localhost:$N_GPU \
python eval_vqa.py --txt_db /db/vqa_test_[base/large]-cased.db/ \
--img_dir /img/coco_test2015 --checkpoint [NUM] \
--output_dir /path/to/trained/vqa
NLVR2 official evaluation
Use official script to get both acc (our validation matched this) and consistency
# concat all output files
cat $OUTPUT/result/[val/test]_results_$STEP_rank*.csv > $OUTPUT.csv
python eval/nlvr2.py $OUTPUT.csv ANNOTATION.json
Referring Expression Comprehension: Finetuning and Evaluation
# train on gd-truth pairs of (ref, sent)
horovodrun -np $N_GPU -H localhost:$N_GPU \
python train_re.py --config config/hps-refcoco+.json
# evaluate multiple splits on gd-truth boxes
horovodrun -np $N_GPU -H localhost:$N_GPU \
python eval_re.py \
--txt_db /db/refcoco+_val_base-cased.db:/db/refcoco+_testA_base-cased.db:/db/refcoco+_testB_base-cased.db \
--img_dir /img/visual_grounding_coco_gt \
--output_dir /storage/refcoco+/bert-base_mlm+itm+mrfr_pretrain-refcoco+_lr1e-4 \
--checkpoint 26
# evaluate multiple splits on detected boxes
horovodrun -np $N_GPU -H localhost:$N_GPU \
python eval_re.py \
--txt_db /db/refcoco+_val_base-cased.db:/db/refcoco+_testA_base-cased.db:/db/refcoco+_testB_base-cased.db \
--img_dir /img/visual_grounding_det_coco \
--output_dir /storage/refcoco+/bert-base_mlm+itm+mrfr_pretrain-refcoco+_lr1e-4 \
--checkpoint 26
Misc
- w/o horovodrun it will run on single GPU
- useful for debugger (-m pdb)
- try
--pin_mem
it might give a tiny performance improvement --img_format [lmdb/lmdb-compress]
- trade-off between memory/CPU
- use
--n_workers $N_CPU
to specify data workers (default: 4)
wxywb
4059059c1b
| 12 Commits | ||
---|---|---|---|
.. | |||
config | 2 years ago | ||
data | 2 years ago | ||
eval | 2 years ago | ||
experiments | 2 years ago | ||
misc | 2 years ago | ||
model | 2 years ago | ||
optim | 2 years ago | ||
scripts | 2 years ago | ||
tests | 2 years ago | ||
utils | 2 years ago | ||
Dockerfile |
646 B
|
2 years ago | |
LICENSE |
1.0 KiB
|
2 years ago | |
README.md |
3.0 KiB
|
2 years ago | |
eval_re.py |
7.8 KiB
|
2 years ago | |
eval_vcr.py |
10 KiB
|
2 years ago | |
eval_vqa.py |
6.5 KiB
|
2 years ago | |
format_vcr_predictions.py |
2.0 KiB
|
2 years ago | |
inf_itm.py |
5.7 KiB
|
2 years ago | |
launch_container.sh |
668 B
|
2 years ago | |
launch_container_dist.sh |
818 B
|
2 years ago | |
prepro.py |
28 KiB
|
2 years ago | |
pretrain.py |
33 KiB
|
2 years ago | |
pretrain_vcr.py |
31 KiB
|
2 years ago | |
requirements.txt |
42 B
|
2 years ago | |
train_itm.py |
24 KiB
|
2 years ago | |
train_itm_v2.py |
21 KiB
|
2 years ago | |
train_nlvr2.py |
18 KiB
|
2 years ago | |
train_re.py |
18 KiB
|
2 years ago | |
train_vcr.py |
26 KiB
|
2 years ago | |
train_ve.py |
17 KiB
|
2 years ago | |
train_vqa.py |
17 KiB
|
2 years ago |