首页出版说明中文期刊中文图书环宇英文官网付款页面

基于TextCaps的视觉辅助识别应用

李 涵鑫, 李 丹
四川大学锦城学院 计算机与软件学院

摘要


中国目前存在着近乎两千万的视障人士,虽然中国的盲道长度居于世界第一,但盲道使用率却不甚理想。这其中,盲道的占用和导航设备的笨重成为了视障人士的主要障碍,因此在计算机视觉领域有一些先进的模型来解决这些路面识别的问题,但是大多依赖于笨重或贵重的仪器,因此本文采用一种轻量级的图像识别模型(TextCaps)来进行对于视障人士的路面情况识别框架。TextCaps是基于胶囊网络的一种基于超小型数据集的模型,其不仅保留了胶囊网络中对于空间特征的抓取,并且有效的解决了图像质量问题和图像数据集过小的问题。本文利用了TextCaps的空间能力,对于数据集较小的不同情况的路面进行分类,其中包括不同的灯光、障碍物和室内,室外的情况,能够有效的抓取其中的空间特征。在低精度RGB图像中,TextCaps和胶囊网络的分类精准率分别是66%和51%,而在传统CNN架构中,例如VGG16,最高达到了84%。

关键词


胶囊;图片识别;视觉辅助

全文:

PDF


参考


[1] BREVE, Fabricio Aparecido; FISCHER, Carlos Norberto. Visually Impaired Aid using Convolutional Neural Networks, Transfer Learning, and Particle Competition and Cooperation In: 2020 International Joint Conference on Neural Networks (IJCNN 2020), 2020, Glasgow, UK. Proceedings of 2020 International Joint Conference on Neural Networks (IJCNN 2020), 2020.

[2] Jayasundara V , Jayasekara S , Jayasekara H , et al. TextCaps: Handwritten Character Recognition With Very Small Datasets[C]// 2019 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 2019.

[3] Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. In: NIPS, Long Beach, CA (2017) 3856-3866.

[4] Hinton, G.E., Krizhevsky, A., Wang, S.D.: Transforming auto- encoders. In: ICANN, Berlin, Heidelberg (2011) 44-51.

[5] F. Chollet, “Xception: Deep learning with depthwise separable convolutions,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017, pp. 1800–1807.

[6] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition.” Computational and Biological Learning Society, 2015, pp. 1-14.

[7] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016, pp. 770–778.

[8] “Identity mappings in deep residual networks,” in Computer Vision – ECCV 2016, B. Leibe, J. Matas, N. Sebe, and M.Welling, Eds. Cham: Springer International Publishing, 2016, pp. 630–645.

[9] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016, pp. 2818-2826.

[10] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi, “Inception-v4, inception-resnet and the impact of residual connections on learning,” in Thirty-first AAAI conference on artificial intelligence, 2017.

[11] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017.

[12] G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017, pp. 4700– 4708.

[13] B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le, “Learning transferable architectures for scalable image recognition,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018, pp. 8697-8710.

[14] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018, pp. 4510-4520.




DOI: http://dx.doi.org/10.18686/jsjxt.v2i4.30226

Refbacks

  • 当前没有refback。