Abstract
Customization is an increasing trend in fashion product industry to reflect individual lifestyles. Previous studies have examined the idea of virtual footwear try-on in augmented reality (AR) using a depth camera. However, the depth camera restricts the deployment of this technology in practice. This research proposes to estimate the six degrees-of-freedom pose of a human foot from a color image using deep learning models to solve the problem. We construct a training dataset consisting of synthetic and real foot images that are automatically annotated. Three convolutional neural network models (deep object pose estimation (DOPE), DOPE2, and You Only Look Once (YOLO)-6D) are trained with the dataset to predict the foot pose in real-time. The model performances are evaluated using metrics for accuracy, computational efficiency, and training time. A prototyping system implementing the best model demonstrates the feasibility of virtual footwear try-on using a red–green–blue camera. Test results also indicate the necessity of real training data to bridge the reality gap in estimating the human foot pose.