Abstract

Customization is an increasing trend in fashion product industry to reflect individual lifestyles. Previous studies have examined the idea of virtual footwear try-on in augmented reality (AR) using a depth camera. However, the depth camera restricts the deployment of this technology in practice. This research proposes to estimate the six degrees-of-freedom pose of a human foot from a color image using deep learning models to solve the problem. We construct a training dataset consisting of synthetic and real foot images that are automatically annotated. Three convolutional neural network models (deep object pose estimation (DOPE), DOPE2, and You Only Look Once (YOLO)-6D) are trained with the dataset to predict the foot pose in real-time. The model performances are evaluated using metrics for accuracy, computational efficiency, and training time. A prototyping system implementing the best model demonstrates the feasibility of virtual footwear try-on using a red–green–blue camera. Test results also indicate the necessity of real training data to bridge the reality gap in estimating the human foot pose.

References

1.
Fincato
,
M.
,
Cornia
,
M.
,
Landi
,
F.
,
Cesari
,
F.
, and
Cucchiara
,
R.
,
2022
, “
Transform, Warp, and Dress: A New Transformation-Guided Model for Virtual Try-On
,”
ACM Trans. Multimedia Comput. Commun. Appl.
,
18
(
2
), pp.
1
24
.
2.
Hu
,
P.
,
Nourbakhsh
,
N.
,
Tian
,
J.
,
Sturges
,
S.
,
Dadarlat
,
V.
, and
Munteanu
,
A.
,
2020
, “
A Generic Method of Wearable Items Virtual Try-On
,”
Text. Res. J.
,
90
(
19–20
), pp.
2161
2174
.
3.
Shin
,
E.
, and
Baytar
,
F.
,
2014
, “
Apparel Fit and Size Concerns and Intentions to Use Virtual Try-On: Impacts of Body Satisfaction and Images of Models’ Bodies
,”
Cloth. Text. Res. J.
,
32
(
1
), pp.
20
33
.
4.
Plotkina
,
D.
, and
Saurel
,
H.
,
2019
, “
Me or Just Like Me? The Role of Virtual Try-On and Physical Appearance in Apparel M-Retailing
,”
Retail. Consum. Serv.
,
51
, pp.
362
377
.
5.
Chu
,
C. H.
,
Chen
,
Y. A.
,
Huang
,
Y. Y.
, and
Lee
,
Y. J.
,
2022
, “
A Comparative Study of Virtual Footwear Try-On Applications in Virtual and Augmented Reality
,”
ASME J. Comput. Inf. Sci. Eng.
,
22
(
4
), p.
041004
.
6.
Mottura
,
S.
,
Greci
,
L.
,
Sacco
,
M.
, and
Boër
,
C. R.
,
2003
, “
An Augmented Reality System for the Customized Shoe Shop
,”
Second Interdisciplinary World Congress on Mass Customization and Personalization
,
Munich, Germany
.
7.
Eisert
,
P.
,
Fechteler
,
P.
, and
Rurainsky
,
J.
,
2008
, “
3-D Tracking of Shoes for Virtual Mirror Applications
,”
IEEE Conference on Computer Vision and Pattern Recognition
, Anchorage, AK, June 23–28, pp.
1
6
.
8.
Greci
,
L.
,
Sacco
,
M.
,
Cau
,
N.
, and
Buonanno
,
F.
,
2012
, “
FootGlove: A Haptic Device Supporting the Customer in the Choice of the Best Fitting Shoes
,”
Haptics: Perception, Devices, Mobility, and Communication
, pp.
148
159
.
9.
Yang
,
Y. I.
,
Yang
,
C. K.
, and
Chu
,
C. H.
,
2014
, “
A Virtual Try-On System in Augmented Reality Using RGB-D Cameras for Footwear Personalization
,”
J. Manuf. Syst.
,
33
(
4
), pp.
690
698
.
10.
Chu
,
C. H.
,
Cheng
,
C. H.
,
Wu
,
H. S.
, and
Kuo
,
C. C.
,
2019
, “
A Cloud Service Framework for Virtual Try-On of Footwear in Augmented Reality
,”
ASME J. Comput. Inf. Sci. Eng.
,
19
(
2
), p.
021002
.
11.
Chou
,
C. T.
,
Lee
,
C. H.
,
Zhang
,
K.
,
Lee
,
H. C.
, and
Hsu
,
W. H.
,
2018
, “
PIVTONS: Pose Invariant Virtual Try-On Shoe With Conditional Image Completion
,”
Asian Conference on Computer Vision
, pp.
654
668
.
12.
An
,
S.
,
Che
,
G.
,
Guo
,
J.
,
Zhu
,
H.
,
Ye
,
J.
,
Zhou
,
F.
,
Zhu
,
Z.
,
Wei
,
D.
,
Liu
,
A.
, and
Zhang
,
W.
,
2021
, “
ARShoe: Real-Time Augmented Reality Shoe Try-On System on Smartphones
,”
The 29th ACM International Conference on Multimedia
, Virtual, Oct. 20–24, pp.
1111
1119
.
13.
Szegedy
,
C.
,
Vanhoucke
,
V.
,
Ioffe
,
S.
,
Shlens
,
J.
, and
Wojna
,
Z.
,
2016
, “
Rethinking the Inception Architecture for Computer Vision
,”
IEEE Conference on Computer Vision and Pattern Recognition
, Las Vegas, NV, June 27–20, pp.
2818
2826
.
14.
Trefethen
,
L. N.
, and
Bau
,
D.
III
,
1997
,
Numerical Linear Algebra
,
SIAM
,
Philadelphia, PA
.
15.
Hinterstoisser
,
S.
,
Lepetit
,
V.
,
Ilic
,
S.
,
Holzer
,
S.
,
Bradski
,
G.
,
Konolige
,
K.
, and
Navab
,
N.
,
2012
, “
Model Based Training, Detection and Pose Estimation of Texture-Less 3d Objects in Heavily Cluttered Scenes
,”
Computer Vision—ACCV 2012: 11th Asian Conference on Computer Vision
,
Daejeon, South Korea
,
Nov. 5–9
,
Springer
,
Berlin/Heidelberg
, pp.
548
562
.
16.
Brachmann
,
E.
,
Krull
,
A.
,
Michel
,
F.
,
Gumhold
,
S.
,
Shotton
,
J.
, and
Rother
,
C.
,
2014
, “
Learning 6d Object Pose Estimation Using 3d Object Coordinates
,”
European Conference on Computer Vision
, pp.
536
551
.
17.
Nikolenko
,
S. I.
,
2019
, “
Synthetic Data for Deep Learning
,” preprint arXiv:1909.11512.
18.
Tobin
,
J.
,
Fong
,
R.
,
Ray
,
A.
,
Schneider
,
J.
,
Zaremba
,
W.
, and
Abbeel
,
P.
,
2017
, “
Domain Randomization for Transferring Deep Neural Networks From Simulation to the Real World
,”
IEEE/RSJ International Conference on Intelligent Robots and Systems
, Vancouver, Canada, Sept. 24–28, pp.
23
30
.
19.
Lee
,
Y. C.
, and
Wang
,
M. J.
,
2015
, “
Taiwanese Adult Foot Shape Classification Using 3D Scanning Data
,”
Ergonomics
,
58
(
3
), pp.
513
523
.
20.
To
,
T.
,
Tremblay
,
J.
,
McKay
,
D.
,
Yamaguchi
,
Y.
,
Leung
,
K.
,
Balanon
,
A.
,
Cheng
,
J.
,
Hodge
,
W.
, and
Birchfield
,
S.
,
2018
, “
NDDS: NVIDIA Deep Learning Dataset Synthesizer
,”
CVPR 2018 Workshop on Real World Challenges and New Benchmarks for Deep Learning in Robotic Vision
,
Salt Lake City, UT
,
June
, Vol. 22.
21.
Quattoni, A., and Torralba, A.,
2009
, “
Recognizing Indoor Scenes
,”
2009 IEEE Conference on Computer Vision and Pattern Recognition
, IEEE, pp.
413
420
.
22.
Peng
,
S.
,
Liu
,
Y.
,
Huang
,
Q.
,
Zhou
,
X.
, and
Bao
,
H.
,
2019
, “
Pvnet: Pixel-Wise Voting Network for 6dof Pose Estimation
,”
IEEE Conference on Computer Vision and Pattern Recognition
, Long Beach, CA, June 15–20, pp.
4561
4570
.
23.
Tekin
,
B.
,
Sinha
,
S. N.
, and
Fua
,
P.
,
2018
, “
Real-Time Seamless Single Shot 6d Object Pose Prediction
,”
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, Salt Lake City, UT, June 18–23, pp.
292
301
.
24.
Tremblay
,
J.
,
To
,
T.
,
Sundaralingam
,
B.
,
Xiang
,
Y.
,
Fox
,
D.
, and
Birchfield
,
S.
,
2018
, “
Deep Object Pose Estimation for Semantic Robotic Grasping of Household Objects
,” preprint arXiv:1809.10790.
25.
Wei
,
S. E.
,
Ramakrishna
,
V.
,
Kanade
,
T.
, and
Sheikh
,
Y.
,
2016
, “
Convolutional Pose Machines
,”
IEEE Conference on Computer Vision and Pattern Recognition
, pp.
4724
4732
.
26.
Hartley
,
R.
, and
Zisserman
,
A.
,
2003
,
Multiple View Geometry in Computer Vision
,
Cambridge University Press
,
Cambridge, UK
.
27.
Xiang
,
Y.
,
Schmidt
,
T.
,
Narayanan
,
V.
, and
Fox
,
D.
,
2017
, “
Posecnn: A Convolutional Neural Network for 6d Object Pose Estimation in Cluttered Scenes
,” preprint arXiv:1711.00199.
You do not currently have access to this content.