Current approaches used in hand rehabilitation to measure hand motion require the placement of a goniometer, wearable sensors, or markers, which are inefficient, intrusive, and hinder hand movements.
This project focuses on a markerless approach, where given an image, we aim to estimate the hand and object pose that best matches the observed image.
Markerless motion capture alleviates the need for time-consuming placement of markers. By using color and depth images from commercially available RGB-D cameras to estimate dynamic hand motion, it has the advantages of being non-contact, ubiquitous, and scalable.
We first present a minimal setup to estimate 3D hand shape and pose from a color image, using an efficient neural network running at over 75 fps on a CPU.
To overcome the lack of accuracy due to depth ambiguity, we propose a simple method using mirror reflections to create a multi-view setup. This eliminates the complexity of synchronizing multiple cameras and helps to reduce the joint angle error by half as compared to a single view setup.
To consider hand-object interaction, we generate synthetic depth images of subjects with varying body shapes to train a neural network to segment forearms, hands, and objects.
In practice, the initial pose estimate or object segmentation from the neural network is never perfect, but it can be refined with model fitting. Therefore, we combine the methods to track an articulated hand model with object interaction. The method is scalable to handle a multi-view setup of 5 synchronized color cameras, achieving an average joint angle error of around 10 degrees when validated against a marker-based motion capture system.