STAIR Captions
A Large-Scale Japanese Image Caption Dataset
Accepted as ACL2017 Short Paper
Abstract
In recent years, automatic generation of image descriptions (captions), that is, image captioning, has attracted a great deal of attention. In this paper, we particularly consider generating Japanese captions for images. Most studies on image captioning target English language, and there are few image caption datasets in Japanese. To tackle this problem, we construct a large-scale Japanese image caption dataset based on images from MS-COCO. Our dataset consists of 820,310 Japanese captions for 164,062 images. In the experiment, we show that a neural network trained using our dataset can generate more natural and better Japanese captions, compared to those generated using English Japanese machine translation after generating English captions.
spec
Statistics
STAIR Captions | YJ Captions [1] | |
---|---|---|
# of images | 164,062 (123,287) | 26,500 |
# of captions | 820,310 (616.435) | 131,730 |
Vocabulary size | 35,642 (31,938) | 13,274 |
Avg. # of chars. | 23.79 (23.80) | 23.23 |
The table summarizes the statistics of STAIR Captions.
Numbers in the brackets indicate statistics of public part of STAIR Captions.
Compared with YJ Captions [1], overall, the numbers of Japanese captions and images in our dataset are 6.23x
and 6.19x
, respectively.
In the public part of our dataset, the numbers of images and Japanese captions are 4.65x
and 4.67x
greater than those in YJ Captions, respectively.
That the numbers of images and captions are large in our dataset is an important point in image caption generation because it reduces the possibility of unknown scenes and objects appearing in the test images.
The vocabulary of our dataset is 2.69x
larger than that of YJ Captions.
Because the large vocabulary of our dataset, it is expected that the caption generation model can learn and generate various captions.
The average numbers of characters per a sentence in our dataset and in YJ Captions are almost the same.
[1] Takeshi Miyazaki and Nobuyuki Shimizu, ``Cross-Lingual Image Caption Generation,’’ Association for Computational Linguistics (ACL), 2016.
experiment
Japanese Image Caption Generation
We performed experiments that generates Japanese image captions by a neural network learned on STAIR Captions. The below table shows the performances of two methods, MS-COCO + MT and STAIR Captions. MS-COCO + MT first generates English captions for images using a neural network learned on the original MS-COCO dataset, and then, translates the generated captions into Japanese ones using Google Translate. On the other hand, STAIR Captions generates Japanese captions directly using a neural network learned on STAIR Captions dataset. For the details, see our papers.
Bleu-1 | Bleu-2 | Bleu-3 | Bleu-4 | ROUGE_L | CIDEr | |
---|---|---|---|---|---|---|
MS-COCO + MT | 0.565 | 0.330 | 0.204 | 0.127 | 0.449 | 0.324 |
STAIR Captions | 0.763 | 0.614 | 0.492 | 0.385 | 0.553 | 0.883 |
download
publications
Our Publication list
- Yuya Yoshikawa, Yutaro Shigeto, Akikazu Takeuchi, ``STAIR Captions: Constructing a Large-Scale Japanese Image Caption Dataset,’’ Annual Meeting of the Association for Computational Linguistics (ACL), Short Paper, 2017. [arXiv, poster]
- 吉川友也, 重藤優太郎, 竹内彰一, ``STAIR Captions: 大規模日本語画像キャプションデータセット’’, 言語処理学会第23回年次大会 (NLP2017), 2017. (In Japanese) [PDF]
Bibliography
If you use STAIR Captions dataset, please cite the following paper.
@InProceedings{Yoshikawa2017,
title = {STAIR Captions: Constructing a Large-Scale Japanese Image Caption Dataset},
booktitle = {Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)},
month = {July},
year = {2017},
address = {Vancouver, Canada},
publisher = {Association for Computational Linguistics},
pages = {417--421},
url = {http://aclweb.org/anthology/P17-2066}
}
About
Owner
Software Technology and Artificial Intelligence Research Laboratory (STAIR Lab)
Chiba Institute of Technology
Contributors
- Akikazu Takeuchi, STAIR Lab
- Yuya Yoshikawa, STAIR Lab
- Yutaro Shigeto, Nara Institute of Science and Technology (Internship)
- Teruhisa Kamachi, Mokha Inc.
- Hiroshi Ogino, Scaleout Inc.
- Student volunteers in Chiba Institute of Technology, and about 2,100 crowd workers
Contact
竹内彰一 (Akikazu Takeuchi)
takeuchi–at–stair.center