基于对抗生成网络的超分辨率海报生成模型
摘要
方法,将StackGan模型的第二阶段改进为超分辨率模型(GAN_SR3)。我们引入CLIP模型将中文文本映射到高维编
码,提升文本与图像的匹配度。通过在StackGan的第一阶段输入编码和随机噪声,生成低分辨率图像,然后输入至
改进的超分辨率模型SR3,获得高分辨率海报。在自建数据集上与其他模型进行比较,实验结果显示GAN_SR3在电
影海报和网络海报数据集上获得更高的Inception Score和更小的FID。
关键词
全文:
PDF参考
[1] GOODFELLOW I, POUGET-ABADIE J, MIRZA M,
et al. Generative adversarial networks[J]. Communications of
the ACM, 2020, 63(11): 139-144.
[2] LIU Q, CHEN M, ZHOU D. Fast haze removal from
a single image[C]//2013 25th Chinese Control and Decision
Conference (CCDC). IEEE, 2013: 3780-3785.
[3] Kingma D P, Welling M. Auto-Encoding Variational
Bayes[J]. stat, 2014, 1050: 1
[4] KARRAS T, LAINE S, AITTALA M, et al. Analyzing
and improving the image quality of stylegan[C]//Proceedings
of the IEEE/CVF conference on computer vision and pattern
recognition. 2020: 8110-8119.
[5] RADFORD A, KIM J W, HALLACY C, et al.
Learning transferable visual models form natural language
supervision[C]//International conference on machine learning.
PMLR,2021:8748-8763
[6] Ramesh A, Dhariwal P, Nichol A, et al.
Hierarchical Text-Conditional Image Generation with CLIP
Latents[J]. arXiv e-prints, 2022: arXiv: 2204.06125.
[7] HO J, JAIN A, ABBEEL P. Denoising diffusion
probabilistic models[J]. Advances in neural information
processing systems, 2020, 33: 6840-6851.
[8] REED S, AKATA Z, YAN X, et al. Generative
adversarial text to image synthesis[C]//International conference
on machine learning. PMLR, 2016: 1060-1069.
[9] Reed S, Akata Z, Mohan S, et al. Learning
what and where to draw[J]. Advances in neural information
processing systems, 2016, 29.
[10] ZHANG H, XU T, LI H, et al. Stackgan: Text to
photo-realistic image synthesis with stacked generative
adversarial networks[C]//Proceedings of the IEEE international
conference on computer vision. 2017: 5907-5915.
[11] ZHANG H, XU T, LI H, et al. Stackgan++:
Realistic image synthesis with stacked generative adversarial
networks[J]. IEEE transactions on pattern analysis and
machine intelligence, 2018, 41(8): 1947-1962.
[12] XU T, ZHANG P, HUANG Q, et al. Attngan: Finegrained text to image generation with attentional generative
adversarial networks[C]//Proceedings of the IEEE conference
on computer vision and pattern recognition. 2018: 1316-1324.
[13] Arjovsky M, Chintala S, Bottou L.
Wasserstein generative adversarial networks[C]//International
conference on machine learning. PMLR, 2017: 214-223.
[14] Kang M, Zhu J Y, Zhang R, et al. Scaling up
gans for text-to-image synthesis[C]//Proceedings of the IEEE/
CVF Conference on Computer Vision and Pattern Recognition.
2023: 10124-10134.
[15] Yang A, Pan J, Lin J, et al. Chinese CLIP:
Contrastive Vision-Language Pretraining in Chinese[J]. arXiv
e-prints, 2022: arXiv: 2211.01335.
[16] Arjovsky M, Chintala S, Bottou L.
Wasserstein generative adversarial networks[C]//International
conference on machine learning. PMLR, 2017: 214-223.
[17] SAHARIA C, HO J, CHAN W, et al. Image superresolution via iterative refinement[J]. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 2022, 45(4): 4713-4726.
[18] Ronneberger O, Fischer P, Brox T. U-net:
Convolutional networks for biomedical image segmentation[C]//
Medical Image Computing and Computer-Assisted
Intervention–MICCAI 2015: 18th International Conference,
Munich, Germany, October 5-9, 2015, Proceedings, Part III
18. Springer International Publishing, 2015: 234-241.
DOI: http://dx.doi.org/10.12361/2661-3727-05-06-148173
Refbacks
- 当前没有refback。