Bidirectional GaitNet: A Bidirectional Prediction Model of Human Gait and Anatomical Conditions

Jungnam Park [email protected] Department of Computer Science and EngineeringSeoul National UniversitySouth Korea , Moon Seok Park [email protected] Department of Orthopaedic SurgerySeoul National University Bundang HospitalSouth Korea , Jehee Lee [email protected] NCsoftSouth Korea Department of Computer Science and EngineeringSeoul National UniversitySouth Korea and Jungdam Won 0000-0001-5510-6425 [email protected] Department of Computer Science and EngineeringSeoul National UniversitySouth Korea

(2023)

Abstract.

We present a novel generative model, called Bidirectional GaitNet, that learns the relationship between human anatomy and its gait. The simulation model of human anatomy is a comprehensive, full-body, simulation-ready, musculoskeletal model with 304 Hill-type musculotendon units. The Bidirectional GaitNet consists of forward and backward models. The forward model predicts a gait pattern of a person with specific physical conditions, while the backward model estimates the physical conditions of a person when his/her gait pattern is provided. Our simulation-based approach first learns the forward model by distilling the simulation data generated by a state-of-the-art predictive gait simulator and then constructs a Variational Autoencoder (VAE) with the learned forward model as its decoder. Once it is learned its encoder serves as the backward model. We demonstrate our model on a variety of healthy/impaired gaits and validate it in comparison with physical examination data of real patients.

GaitNet, Musculoskeletal Simulation, Predictive Gait Simulation, Clinical Gait Analysis

^†^†submissionid: 271^†^†journalyear: 2023^†^†copyright: acmlicensed^†^†conference: Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Proceedings; August 6–10, 2023; Los Angeles, CA, USA^†^†booktitle: Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Proceedings (SIGGRAPH ’23 Conference Proceedings), August 6–10, 2023, Los Angeles, CA, USA^†^†price: 15.00^†^†doi: 10.1145/3588432.3591492^†^†isbn: 979-8-4007-0159-7/23/08^†^†ccs: Computing methodologies Physical simulation^†^†ccs: Computing methodologies Motion capture^†^†ccs: Computing methodologies Reinforcement learning^†^†ccs: Computing methodologies Learning from demonstrations

1. Introduction

Realistic simulation of human movement is one of the long-standing challenges in computer graphics. Musculoskeletal simulation has provided stepping stones to reproduce a range of human movements at the biomechanical and anatomical levels, adding significant realism to the movements created. It can also be used to predict how changes in anatomical conditions (e.g., bone deformity, muscle capacity/deficiency, mass distribution) and intrinsic/extrinsic factors (e.g., metabolic energy expenditure, fatigue, pain) affect human movement.

Accurate setting of anatomical conditions is the very first and fundamental process for generating realistic movements in musculoskeletal simulation as it creates the space for anatomically plausible human movements. A number of biomechanical studies have been conducted to accurately model these conditions through various experiments on human subjects and cadavers (Arnold et al., 2010; Rajagopal et al., 2016; Delp et al., 1990, 2007; Carbone et al., 2015). Several standard models for the typical/average human body have been adopted by the research community (Delp et al., 1990, 2007) and those average body models have significantly contributed to recent progress in biomechanics research (Dembia et al., 2020; Lee et al., 2019; Park et al., 2022). Building an accurate musculoskeletal model of a specific (healthy or impaired) person has long been a notoriously difficult challenge since accurate modeling of live organs and tissues often requires invasive measurements.

In this paper, we address the problem of estimating the physiological parameters of a musculoskeletal model from observed gait cycles. Our anatomy model describes the conditions of individual bones and muscles with 300+ parameters. Conceptually speaking, we want to uncover the relationship between human anatomy and gait. Although this relationship is well accepted empirically in biomechanics and clinical gait analysis, the relationship is probabilistic rather than a one-to-one deterministic mapping. For example, many people with different physical conditions can walk with a similar gait, and conversely, there is no guarantee that two people with similar physical conditions will walk with a similar gait. In this paper, we build a novel generative model, which we call Bidirectional GaitNet, based on a comprehensive full-body, simulation-ready, musculoskeletal model. The Bidirectional GaitNet consists of forward and backward models. The forward model is functionally equivalent to predictive gait simulation, which generates a bipedal gait for any anatomical model with a specific set of physical conditions. Conversely, the backward model is its inverse process that estimates the physical conditions of the model given a gait. More specifically, we first learn the forward model by distilling the simulation data generated by a state-of-the-art predictive gait simulator and then construct a conditional Variational Autoencoder (c-VAE) with the forward model as its decoder. Once it is learned, its encoder serves as the backward model. By the nature of VAE, our backward model generates a distribution of physical conditions that potentially produce the input gait in predictive gait simulation, from which many different physical conditions can be sampled.

We demonstrate the power of our model by showing results with a variety of healthy/impaired gaits. The simulated results are validated in comparison to not only unseen simulated gaits but also gaits from real patients. The effectiveness of the non-trivial system design choices that we made for developing our model are also validated by ablation studies. Code for this paper is available at https://github.com/namjohn10/BidirectionalGaitNet.

2. Related work

In musculoskeletal simulation, the human body is typically modeled by rigid bones and flexible musculotendon units. The bones are connected by rotational joints to which the musculotendon units are attached such that they can actuate the joints. The muscle dynamics is often formulated using Hill-type muscles (Zajac, 1989; Delp et al., 2007) composed of contractile and elastic elements and the dynamics of each element is determined by force-length and force-velocity curves. Accurate simulation requires adequate determination of all anatomical conditions of the bones and the muscles based on reliable measurements. A set of parameters for a typical/average human have been estimated by taking measurements from live tissues and cadavers (Delp et al., 1990; Rajagopal et al., 2016; Arnold et al., 2010; Carbone et al., 2015). Estimates thus obtained have been used as default parameters in many simulation-based studies (Delp et al., 2007; Lee et al., 2019; Dembia et al., 2020). Anthropometric scaling of a musculoskeletal model can generate a range of models with different heights, weights, and limb lengths (Ryu et al., 2021). Building accurate individualized models often requires expensive medical images (e.g., CT and MRI) and labor-intensive image labeling (Matias et al., 2009; Levin et al., 2011; Li et al., 2022).

Building robust dynamic controllers that can drive musculoskeletal models has been regarded as an open challenge for decades because musculoskeletal models are high-dimensional and highly nonlinear, and their control systems are often partly under-actuated and, at the same time, partly over-determined. In this paper, we will focus on legged locomotion, although there have been a series of studies for simulating other body parts, such as upper bodies (Lee et al., 2009, 2018), eyes (Nakada et al., 2018), faces (Ichim et al., 2017; Cong et al., 2016), hands (Sachdeva et al., 2015) and swimming (Si et al., 2014).

Biomechanics researchers have developed locomotion controllers to answer research questions, such as how weakening certain muscles affect gait patterns. Open-loop (Sok et al., 2007; Liu et al., 2010; Al Borno et al., 2013; Falisse et al., 2019), model-based feedback control (Geyer and Herr, 2010; Yin et al., 2007; Lee et al., 2010; Ye and Liu, 2010; Coros et al., 2010; Ha et al., 2012; Liu et al., 2016; Song and Geyer, 2015; Ong et al., 2019), and their combinations (Mordatch and Todorov, 2014) have been explored where the controller design is often motivated by structural and functional hypotheses on nervous and motor control systems. On the other hand, computer animation researchers have focused on allowing physically-simulated characters to move lifelikely as humans by adding actuation constraints imposed by muscle dynamics. Wang et al. (2012) developed walking controllers for 3D biped characters equipped with 8 hill-type muscles per leg with the muscle-reflex model proposed by Geyer and Herr (2010). Stochastic optimization is used to determine feedback control parameters such that biped characters can maintain their balance while walking. Geijtenbeek et al. (2013) applied a similar approach to non-human musculoskeletal characters, where manually-specified initial muscle routings are further optimized to improve the motor skills of the characters. Lee et al. (2014) demonstrated interactive controllers for full-body musculoskeletal characters having more than 100 muscles, where it first runs offline optimization to refine the reference motion and then the online controller based on quadratic programming tracks the refined reference motion at runtime.

Recently, deep reinforcement learning (DRL) has successfully demonstrated its capabilities in solving high-dimensional, continuous control problems including human motion imitation (Yu et al., 2019; Peng et al., 2018; Bergamin et al., 2019; Park et al., 2019; Won et al., 2020; Peng et al., 2021; Merel et al., 2019; Peng et al., 2022; Won et al., 2022), motion control in complex environments (Clegg et al., 2018; Liu and Hodgins, 2018; Won et al., 2021; Yang et al., 2022; Ye et al., 2022; Winkler et al., 2022) and non-human character control (Yu et al., 2018; Luo et al., 2020; Lee et al., 2022; Ishiwaka et al., 2022). The control of musculoskeletal characters is no exception for these technological innovations; in particular, controllers based on DRL have been significantly improved in terms of robustness against external perturbation, computational efficiency at runtime, and the scope of reproducible motor skills. Many simulation algorithms for controlling biped musculoskeletal models competed in the Learn-to-Move challenges (Kidziński et al., 2020) and DRL-based algorithms performed well. The winner of the final challenge proposed a model-based DRL algorithm equipped with an ensemble of probabilistic dynamics models and a risk minimization scheme by expanding the lower confidence bound of the value estimation (Kidziński et al., 2018). Lee et al. (2019) developed a two-level control architecture composed of a trajectory mimicking network and a muscle coordination network, by which comprehensive musculoskeletal characters with more than 300 muscles successfully reproduced highly dynamic human movements such as running, jumping, and cartwheel. Yifeng et al. (2019) proposed a new action space for DRL that mimics the behaviors of musculoskeletal simulation computationally more efficiently with torque-based simulation. Park et al. (2022) presented a predictive gait simulation framework Generative GaitNet, which can predict a broad spectrum of healthy and pathological gait of comprehensive musculoskeletal models over a high-dimensional parameter space spanned by anatomical (e.g., bone/muscle parameters, mass distribution, muscle capacity) and gait (e.g., stride and cadence) conditions.

3. human anatomy and gait

The musculoskeletal model in this work is designed to represent healthy and pathological gaits often discussed in clinical gait analysis and medical engineering. Our model consists of 23 bones connected by 22 skeletal joints and 304 musculotendons. The activation of muscles generates contraction forces that drive skeletal joints. The individual bones and muscles are conditioned by anatomical parameters. Let $C_{\mathrm{anatomy}}=(C_{\mathrm{skeleton}},C_{\mathrm{muscle}})$ be the anatomical condition, where $C_{\mathrm{skeleton}}=(c_{\mathrm{head}},c_{\mathrm{trunk}},c_{1},\cdots,c_{8},\tau_{1},\tau_{2})$ represents the scaling factors of the head, the trunk, the lengths of four (upper and lower) limbs with respect to a reference (average) model of healthy adults. $\tau_{1}$ and $\tau_{2}$ are the torsional angles of femurs, which correspond to femoral anteversion and retroversion often observed in pathological gait. Each musculotendon is conditioned by two parameters: weakness and contracture. The weakness parameter of a muscle indicates the ratio of maximum isometric force the muscle can exert relative to the reference model. The higher the parameter, the larger the force the muscle can produce. The contracture parameter indicates the scaling factor of the muscle length relative to the reference model, which refers to the permanent shortening of a muscle that often limits the range of joint movements.

Conventionally, a complete two-step cycle of gait begins with a left heel strike, where both feet are in contact with the ground at the same time, and ends at the next left heel strike. To avoid the singularity at both ends of the gait cycle, two gait cycles are considered as basic units of gait in our system. Therefore, the gait pattern is represented by a time series of full-body poses $M=(Q_{0},Q_{\delta},Q_{2\delta},\cdots)$ , where each pose $Q=(\mathbf{h},\mathbf{v},\mathbf{q}_{0},\mathbf{q}_{1},\mathbf{q}_{2},\cdots)$ denotes the root (pelvis) height $h$ from the ground, the root linear velocity $\mathbf{v}$ parallel to the ground plane, and joint rotations $\mathbf{q}_{i}$ with respect to their parent joints. $\mathbf{q}_{0}$ denotes the global orientation of the root body node. We use the first two columns of the rotation matrix to describe the rotation. The sample rate $\delta$ is 60 per two gait cycles in our implementation. The gait is conditioned by two parameters: stride and cadence.

Refer to caption — Figure 1. Predictive gait simulation and intelligent gait analysis.

The predictive gait simulation generates a gait pattern $M$ that likely occurs given anatomical and gait conditions (see Figure 1). Conversely, gait analysis is its inverse process of predicting the anatomical and gait conditions of a given gait pattern. The human musculoskeletal model is a dynamical system that is partly under-actuated because the root node is not actuated and, at the same time, partly over-actuated because the body has more muscles than minimally required to drive the skeletal joints. This redundancy makes the whole process probabilistic.

The predictive power of gait simulation stems from making full use of the laws of physics, providing accurate anatomical modeling, and designing reliable control policies for seemingly unstable biped locomotion. Recent approaches successfully derived robust control policies through deep reinforcement learning (Lee et al., 2019; Kidziński et al., 2018). In this case, the core of a predictive gait simulation is a policy network, which takes the current state of the musculoskeletal model as input and outputs the desired level of activation at all muscles. The state-of-the-art method learned control policies conditioned by high-dimensional conditioning vectors (more than 600 parameters) (Park et al., 2022). The musculoskeletal model with specific physical conditions driven by a policy network generates a trace of full-body poses in physics-based simulation.

In this paper, the key challenge is to find the inverse process of predictive simulation. Given a gait pattern, the goal of gait analysis is to find corresponding physical conditions. There are three issues to be addressed. First, the search space is high-dimensional. In our system, conditioning vectors are 280 dimensional. State space search or optimization in such a high-dimensional space is often computationally intractable. Secondly, the solution is not unique. As discussed before, the backward mapping is probabilistic, and thus we need to find a probabilistic distribution over the conditioning space rather than a single optimal solution. Lastly, the computational cost should be reasonable. Rolling out a single gait pattern from a predictive simulator requires the computational cost of physics-based simulation over a complete gait cycle, which is substantial with a stochastic sampling of gait patterns.

We address these issues by pretraining forward and backward networks. We transfer control policies from policy networks to a forward network representing a direct anatomy-to-gait mapping. The transfer process is similar to policy distillation. It takes samples from the policy network and learns the target network using supervised regression. The pretrained forward network has two advantages. The forward network generates physically-valid gait patterns without actually performing physics-based simulations because it imitates the behavior of the policy network. This direct anatomy-to-gait mapping is computationally more efficient at runtime than predictive gait simulation. This computational efficiency also makes it computationally feasible to pretrain the backward network employing a conditional Variational AutoEncoder (c-VAE) that models the probabilistic mapping between anatomy and gait using Gaussian distributions in latent space.

4. Bidirectional GaitNet

Let $C_{\mathrm{anatomy}}=(C_{\mathrm{skeleton}},C_{\mathrm{muscle}})$ and $C_{\mathrm{gait}}$ be anatomical and gait conditions, respectively. The gait pattern $M=(Q_{0},Q_{\delta},Q_{2\delta},\cdots,Q_{59\delta})$ includes two gait cycles, which are parameterized by phase $\phi\in[0,4\pi]$ . Our Bidirectional GaitNet learns the relationship between anatomy and gait by conditional forward mapping $\mathbf{\hat{F}}(M|C_{\mathrm{anatomy}},C_{\mathrm{gait}})$ and backward mapping $\mathbf{\hat{B}}(C_{\mathrm{anatomy}},C_{\mathrm{gait}}|M)$ . This formulation has several issues in designing network models. The size of the forward network depends on the frame rate $\delta$ and can be larger than actually needed. Given a gait pattern, its stride, cadence, body proportions, and limb lengths are readily available in the motion capture process or can be directly computed from the gait data. To address the issues, we simplify the forward and backward mappings, respectively, by $\mathbf{F}(Q_{\phi}|C_{\mathrm{anatomy}},C_{\mathrm{gait}},\phi)$ and $\mathbf{B}(C_{\mathrm{muscle}}|M,C_{\mathrm{skeleton}},C_{\mathrm{gait}})$ where we added the phase in the input layer of the forward network and reduced the output layer such that it generates a full-body pose $Q_{\phi}$ at phase $\phi$ instead of a full gait pattern. This design decision significantly reduces the complexity of the prediction and allows us to achieve improved accuracy with smaller regression networks. The backward network also becomes smaller by removing the skeleton and gait conditions from the output layer.

4.1. Forward GaitNet

The Forward GaitNet is a regression network learned in a supervised manner with training data generated by a predictive gait simulator. In our work, we use Generative GaitNet (Park et al., 2022) to generate a collection of condition-gait tuples $\{(C^{i}_{\mathrm{anatomy}},C^{i}_{\mathrm{gait}},M^{i})\}$ randomly sampled over the domain of anatomy and gait conditions. The loss function for regression is

(1)

\Sigma_{i}\Sigma_{\phi}\mathrm{D}_{\mathrm{pose}}(Q^{i}_{\phi},\mathrm{FGN}(\phi,C^{i}_{\mathrm{anatomy}},C^{i}_{\mathrm{gait}})),

where $\mathrm{FGN}$ is the output of the regression network. $\mathrm{D}_{\mathrm{pose}}$ measures the difference between two full-body poses

(2)

\begin{split}\mathrm{D}_{\mathrm{pose}}(Q,Q^{\prime})=w_{h}\|\mathbf{h}-\mathbf{h}^{\prime}\|^{2}+w_{v}\|\mathbf{v}-\mathbf{v}^{\prime}\|^{2}+\Sigma_{j}\|\mathrm{D}_{\mathrm{rot}}(\mathbf{q},\mathbf{q}^{\prime})\|^{2},\end{split}

where it computes the weighted sum of the differences in root heights, root velocities, and joint rotations. $w_{h}$ and $w_{v}$ are the weights of their corresponding terms and $\mathrm{D}_{\mathrm{rot}}$ is the difference between two rotation matrices.

4.2. Backward GaitNet

The Forward GaitNet and the Backward GaitNet constitute a conditional Variational AutoEncoder (c-VAE) that takes a gait pattern $M$ as input (see Figure 2). Once the Forward GaitNet is learned through regression, we learn the Backward GaitNet while having the parameters of the Forward GaitNet fixed. The loss function is

(3)

w_{\mathrm{g}}\mathrm{D}_{\mathrm{g}}(M,\hat{M})+w_{\mathrm{kl}}\mathrm{D}_{\mathrm{kl}}(N(\mu,\sigma)||N(0,I))+\sum_{m}{w^{m}(1-\hat{c}_{m})^{2}},

where $\hat{M}$ is a gait pattern reconstructed through the autoencoder.

(4)

\hat{M}=\mathrm{FGN}(\mathrm{BGN}(M,C_{\mathrm{gait}},C_{\mathrm{skeleton}}),C_{\mathrm{gait}},C_{\mathrm{skeleton}}).

Note that $\mathrm{FGN}$ here is a gait pattern generated by concatenating the output of the phase-wise forward network over two gait cycles. $D_{\mathrm{kl}}$ measures KL-divergence, $\hat{c}_{m}$ is the $m$ -th element of the predicted muscle conditions $\hat{C}_{\mathrm{muscle}}$ , and $\mathrm{D}_{\mathrm{g}}$ is the difference between two gait patterns. The first term is the reconstruction loss between input and output gaits, and the second term is the typical kl divergence loss for VAE learning, where we use the standard normal distribution $N(0,I)$ as our prior. The last term regularizes the predicted conditions to the reference conditions (i.e., typical healthy adults).

Note that the probability distribution is located in the middle of the backward model rather than its output $\hat{C}_{\mathrm{muscle}}$ so that muscle conditions are encoded in the low-dimensional latent space, where the latent codes are then decoded to muscle conditions via the pre-decoder (see Figure 2). This structure has several advantages over modeling muscle conditions directly as stochastic latent variables. First, many muscles are functionally correlated, meaning that muscle coordination can have lower intrinsic dimensions and potentially be modeled efficiently in low-dimensional spaces. Second, the multi-modality of muscle conditions can be better preserved. In c-VAE, the latent space is often modeled as a uni-modal probability distribution, which is not ideal for our case because there exist disparate muscle conditions that generate almost the same gait. The pre-decoder structure enables us to map a simple uni-modal distribution to a complex multi-model distribution.

4.3. Learning

4.3.1. Selection of Muscle Parameters

The musculoskeletal model we use has 604 muscle conditions in total which cover the upper and lower bodies. We observed that prediction of muscle conditions for the upper body (especially for the arms) could harm the overall performance because the upper body motion is loosely correlated with their muscle conditions due to higher freedom in motion when compared to the lower body. Intuitively speaking, people could perform almost any motion while walking, which would not be true for the lower body. To mitigate this problem, we only include muscle conditions for the lower body as well as a few conditions of the muscles attached to the hip (e.g., iliopsoas), of which size is 280 in total.

4.3.2. Mixture of Multiple Backward GaitNet

The space of muscle conditions covered by our musculoskeletal model is highly diverse, and we also aim to infer extreme muscle conditions occurred in many patients. To better capture such high variations, we learn multiple Backward GaitNets, then choose one of their output by evaluating the results qualitatively and quantitatively (if possible). This strategy is similar to mixture-of-experts (Masoudnia and Ebrahimpour, 2014) where each expert covers a subspace of the entire space. Furthermore, this process is analogous to diagnosing a single patient by several doctors who are good at diagnosing a specific pathology. In this work, we use three different models. One is trained only for the muscle conditions of knees and ankles, the other two are trained for the entire muscle with different weights in the Equation 3.

4.3.3. Data Sampling

In this work, we are dealing with very high-dimensional space to explore, spanned by 280 conditioning variables. Each conditioning variable has its valid range specified by the user. In order to examine only the corners of this high-dimensional domain, we need to sample a tremendously large number $2^{280}\approx 1.9\times 10^{84}$ of condition-gait tuples, which is computationally intractable. Surprisingly, the training of both Forward and Backward GaitNet required a much smaller collection (1.7 million) of condition-gait tuples in our experiments. Both networks generalize pretty well to learn the influence of physical conditions on gait from very sparse samples. Admitting that sparse sampling is inevitable, uniformly random sampling in the high-dimensional space is not ideal for exploring large variations in physical conditions. We use an alternative approach called grid-based sampling, which selects samples only at corner points, where all conditions can have either minimum or maximum value in their valid range. In our experiments, grid-based sampling outperforms uniform random sampling in particular for reproducing severely impaired gaits. The rationale for our grid-based sampling is that changes in motion and muscle conditions often have monotonically increasing/decreasing relationship. For example, interpolation of the two different pathological gaits, where one is generated by lengthening a specific muscle and the other is vice-versa, can often produce a gait that can be generated when the muscle is in its normal condition. This implies that having the model experience the extreme input conditions by combining those extremities would be more efficient than uniform sampling in the specified range.

5. Results

To train Bidirectional GaitNet, we collected approximately 500 hours long walking motions from our predictive gait simulator. This data generation takes approximately 10 hours in the cluster machine equipped with 20 Intel Xeon 6242 CPUs. We use Torch (Paszke et al., 2019) to implement our models. The encoder, the pre-decoder, and Forward GaitNet are modeled by feed-forward networks with [256, 256, 256], [256, 256, 256], and [512, 512, 512] layers, respectively. For the activation units, LeakyReLu/Linear, LeakyReLu/Sigmoid, and ReLu/Linear are used for the hidden/output layers. Both models are learned by following the standard procedure of supervised learning with 65536 (forward), 2048 (backward) batch sizes, Adam optimizer with $1e-5$ learning rate. The training takes 2, 1.5 hours for Forward GaitNet, and Backward GaitNet, respectively, by using the desktop equipped with Ryzen 3950, NVIDIA 2070, and 64GB ram. The dimension of $M$ , $z$ , $(C_{\mathrm{gait}},C_{\mathrm{skeleton}})$ , and $C_{\mathrm{muscle}}$ are 6060, 32, 13, and 268, respectively.

5.1. Evaluation on Unseen Simulated Data

Our Bidirectional GaitNet is trained by using the dataset generated by a predictive gait simulator. We first evaluate that our model is generalizable to 51 unseen simulated data points separated out from the training data, which includes normal, fully random, and pathological cases. Out of 51 data points, 7 data points are created by running Generative GaitNet with the similar muscle conditions demonstrated in the previous work [Park et al. 2022] to cover well-known gait patterns (normal, foot drop, equinus, stiff knee, crouch, trendelenburg, and waddling gait) while the remaining data points are created by randomly sampling anatomical conditions within the valid range. Note that they are generated independently, so exactly the same data are never included in the training dataset.

5.1.1. Prediction of Gaits

We ultimately should be able to predict anatomical conditions that are realizable similarly to the input gait in the predictive gait simulator. So, we measure error between the ground truth gait and the simulated gait with the predicted anatomical conditions from Backward GaitNet, where the 3rd row in Table 1 shows the average and variance of joint angle prediction error measured over 2 gait cycles. On average, the angle difference is less than 8 degrees. This means that our model is able to generalize to unseen simulated data successfully. Furthermore, the amount of error occurred by our model is almost visually indistinguishable as demonstrated in both Figure 6 and the supplemental video.

Table 1. Joint Angle Prediction Error.

Forward

GaitNet

BackWard

GaitNet

Pelvis

FemurR

TibiaR

TalusR

FootPinkyR

FootThumbR

FemurL

TibiaL

TalusL

FootPinkyL

FootThumbL

Uniform

8.87398

(2.27329)

11.9979

(1.97183)

5.53967

(6.22365)

14.874

(7.87755)

8.38405

(8.08005)

8.89714

(8.29367)

12.2464

(7.78203)

5.33964

(2.2433)

14.6996

(2.27785)

7.49264

(8.47628)

7.35133

(12.6308)

Uniform

Grid

8.77184

(4.27071)

11.8603

(0.746043)

5.47452

(5.75071)

15.085

(4.78839)

8.38307

(8.30824)

8.45141

(8.17749)

12.2202

(5.44729)

5.3674

(5.05303)

14.6827

(7.69895)

7.29844

(12.5078)

7.00184

(1.70778)

Grid

8.47702

(2.15696)

11.8692

(1.15372)

5.79214

(11.8167)

14.7716

(2.1861)

7.62802

(6.71003)

7.67742

(7.66728)

11.1595

(1.70138)

5.4935

(1.85809)

14.5509

(1.28581)

8.45036

(20.0347)

8.14378

(22.3314)

Spine

Torso

Neck

Head

ShoulderR

ArmR

ForeArmR

HandR

ShoulderL

ArmL

ForeArmL

HandL

8.59933

(1.50475)

8.22234

(1.96971)

7.4927

(4.40804)

6.8291

(4.05152)

1.80658

(0.316834)

11.5427

(6.8967)

3.24304

(0.75264)

9.78388

(7.1614)

1.98267

(1.95443)

10.9824

(1.56775)

3.00934

(2.98694)

8.65188

(8.31125)

8.67501

(3.51157)

8.27154

(0.26088)

7.57086

(4.38966)

6.83435

(4.60315)

1.79577

(0.985541)

11.52

(4.41658)

3.25868

(0.643535)

9.98208

(7.61925)

1.99904

(1.85795)

10.9108

(0.524045)

3.09474

(3.09089)

8.7093

(5.95637)

8.28418

(4.57552)

8.14461

(0.243296)

7.24122

(0.91435)

6.87393

(3.44239)

1.77189

(0.147876)

10.8449

(1.71357)

2.98214

(1.62909)

10.2911

(6.87899)

1.92183

(1.64011)

10.5057

(1.13507)

2.74626

(2.73175)

8.04477

(6.98929)

5.1.2. Prediction of Muscle Conditions

Figure 4 shows muscle conditions predicted by our Backward GaitNet given a typical pathological gait, hip-drop (trendelenburg), which result from defective muscles at one side of hip. Physiologically, such dropping motion could appear either when a muscle at one side has contracture or when the other side has weakness. Our results show that both physiologically plausible conditions can be discovered successfully by our backward model. In the supplemental video, we also show a variety of predicted muscle conditions and their corresponding simulated gait.

We also compare the ground truth muscle conditions with the estimated muscle conditions. In contrast to gait prediction error, where we can compute joint angle difference directly, the ground truth and estimated muscle conditions are not directly comparable because multiple conditions that generate the same gait pattern could exist. Instead, given an input gait paired with the ground truth muscle conditions, we evaluate our backward model by testing whether the probability distribution predicted by our model from the input gait includes the ground truth conditions or not. We first create 1000 sets of muscle conditions from our backward model by randomly sampling from the distribution, then we run a dimensionality technique for those conditions in addition to the ground truth conditions. Figure 3(a) shows 2D embedding drawn by Umap (McInnes et al., 2018), from which we could infer that the ground truth conditions are observable with high probability under the predicted distribution.

5.2. Evaluation on Gaits of Real Patients

We show further generalization capability by evaluating our model for gaits of real patients. Unlike the evaluation on simulated data, the ground truth muscle conditions do not exist for real patients because examining such conditions for every muscle of a patient is not realizable rather examining itself is a very challenging problem. We evaluate our results qualitatively by comparing the input gaits from real patients and the simulated gaits under the anatomical conditions predicted by our backward model (see Figure 6 and our supplemental video). Even if we trained our model with the simulation data only, our model still can produce plausible predictions for the gaits from real patients which might differ from simulation results due to Sim-to-Real gap.

We also evaluate the predicted muscle conditions by using Physical Exam (Moon et al., 2017), which has been widely used in medicine for testing muscle conditions of patients. The exam measures range-of-motions of several joints at pre-specified postures to examine that muscle lengths are either shortened or lengthened permanently, which significantly could affect functionality in locomotion if they differ from the normal conditions. We implemented the exam, where the maximum range-of-motion for a joint is determined by measuring the magnitude of joint torque accumulated from passive forces of muscle relevant to the joint while performing the exam, where we regard it as the maximum value if the magnitude is larger than 20 Nm. Figure 5 compares range-of-motions of the left and the right ankles given a patient’s gait, where our system predicts that the reason of asymmetry of the input gait is because the right calf muscles are weaker than the left one. This study is approved by The Institutional Review Board of Seoul National University Hospital (B-1107-132-101).

5.3. Comparison

To show the effectiveness of the technical components adopted in our method, we conduct ablation studies on the three components that we think are crucial.

5.3.1. Grid vs. Uniform Sampling

We show the effectiveness of grid-based random sampling which we used when generating the training data, by comparing it with another model trained with data generated by the uniform sampling. We run an evaluation with the same unseen simulated gaits used in Section 5.1. Table 1 compares joint angle prediction errors over three different design choices, Uniform-Uniform, Uniform-Grid, and Grid-Grid (ours), where the errors are lower in general when our grid-based sampling is used for learning both the forward and the backward models.

6. Discussion

We developed a novel generative model, called Bidirectional GaitNet, that learns the relationship between human anatomy and its gait. It consists of forward and backward models where the forward model predicts a gait pattern of a person with specific physical conditions, while the backward model estimates the physical conditions of a person when his/her gait pattern is provided. By constructing a c-VAE structure with the forward model learned by simulation data generated from the state-of-the-art predictive gait simulator, we were able to learn its inverse mapping (i.e., the backward model) effectively. We showed that the anatomical conditions predicted by our model were able to be realized for both unseen simulated gaits and real patients’ gaits.

There are still several limitations in our method. First, although joint angle prediction error is pretty accurate (less than 8 degrees on average), it needs to be further improved for some joints such as talus (ankle) and femur (upper leg) which show larger errors when compared to other joints. Note that those are actually the joints that have much wider variations in our dataset than others. The sophisticated network architectures (e.g., Transformer) or loss functions that can better capture wider variations in data would be helpful in general. Second, our method can be easily extended so that it includes the upper body muscle conditions. However, the prediction of those muscle conditions might be less reliable when compared to ones in the lower body due to weaker dependency between gaits and their muscle conditions. Simply speaking, the upper body motions have much larger freedom and could be almost arbitrary. In addition, some of anatomical features such as the nervous system, flexible tendon model, and skin are missing in the musculoskeletal model we used in this research, which might affect the prediction of gait and physical condition differently.

We envision several promising future directions that we want to study further. The input of the current system should be given as mocap data, which might be cumbersome and expensive to obtain. It would be interesting if we can use a monocular video as input to our system, which could be done by using existing video-to-mocap solutions or other training schemes. Another interesting direction would be exploring more complex full-body or facial musculoskeletal models that include volumetric muscles, which will have a practical impact on animation industries because those models have often been used in commercial films.

Acknowledgements.

This study was supported by the New Faculty Startup Fund from Seoul National University, ICT(Institute of Computer Technology) at Seoul National University, and grant no (14-2020-0012) from the SNUBH Research Fund.

References

(1)
Al Borno et al. (2013) Mazen Al Borno, Martin de Lasa, and Aaron Hertzmann. 2013. Trajectory Optimization for Full-Body Movements with Complex Contacts. IEEE Transactions on Visualization and Computer Graphics 19, 8 (2013), 1405–1414.
Arnold et al. (2010) Edith M Arnold, Samuel R Ward, Richard L Lieber, and Scott L Delp. 2010. A model of the lower limb for analysis of human movement. Annals of biomedical engineering 38, 2 (2010), 269–279.
Bergamin et al. (2019) Kevin Bergamin, Simon Clavet, Daniel Holden, and James Richard Forbes. 2019. DReCon: data-driven responsive control of physics-based characters. ACM Transactions On Graphics 38, 6 (2019).
Carbone et al. (2015) Vincenzo Carbone, René Fluit, Pim Pellikaan, MM Van Der Krogt, D Janssen, M Damsgaard, L Vigneron, T Feilkas, Hubertus FJM Koopman, and N Verdonschot. 2015. TLEM 2.0–A comprehensive musculoskeletal geometry dataset for subject-specific modeling of lower extremity. Journal of biomechanics 48, 5 (2015), 734–741.
Clegg et al. (2018) Alexander Clegg, Wenhao Yu, Jie Tan, C Karen Liu, and Greg Turk. 2018. Learning to dress: Synthesizing human dressing motion via deep reinforcement learning. ACM Transactions on Graphics 37, 6 (2018).
Cong et al. (2016) Matthew Cong, Kiran S. Bhat, and Ronald Fedkiw. 2016. Art-Directed Muscle Simulation for High-End Facial Animation. In Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation. 119–127.
Coros et al. (2010) Stelian Coros, Philippe Beaudoin, and Michiel Van de Panne. 2010. Generalized biped walking control. ACM Transactions On Graphics 29, 4 (2010).
Delp et al. (2007) Scott L Delp, Frank C Anderson, Allison S Arnold, Peter Loan, Ayman Habib, Chand T John, Eran Guendelman, and Darryl G Thelen. 2007. OpenSim: open-source software to create and analyze dynamic simulations of movement. IEEE transactions on biomedical engineering 54, 11 (2007), 1940–1950.
Delp et al. (1990) Scott L Delp, J Peter Loan, Melissa G Hoy, Felix E Zajac, Eric L Topp, and Joseph M Rosen. 1990. An interactive graphics-based model of the lower extremity to study orthopaedic surgical procedures. IEEE Transactions on Biomedical engineering 37, 8 (1990), 757–767.
Dembia et al. (2020) Christopher L Dembia, Nicholas A Bianco, Antoine Falisse, Jennifer L Hicks, and Scott L Delp. 2020. Opensim moco: musculoskeletal optimal control. PLOS Computational Biology 16, 12 (2020).
Falisse et al. (2019) Antoine Falisse, Gil Serrancolí, Christopher L Dembia, Joris Gillis, Ilse Jonkers, and Friedl De Groote. 2019. Rapid predictive simulations with complex musculoskeletal models suggest that diverse healthy and pathological human gaits can emerge from similar control strategies. Journal of the Royal Society Interface 16, 157 (2019).
Geijtenbeek et al. (2013) Thomas Geijtenbeek, Michiel Van De Panne, and A Frank Van Der Stappen. 2013. Flexible muscle-based locomotion for bipedal creatures. ACM Transactions on Graphics 32, 6 (2013).
Geyer and Herr (2010) Hartmut Geyer and Hugh Herr. 2010. A muscle-reflex model that encodes principles of legged mechanics produces human walking dynamics and muscle activities. IEEE Transactions on neural systems and rehabilitation engineering 18, 3 (2010), 263–273.
Ha et al. (2012) Sehoon Ha, Yuting Ye, and C Karen Liu. 2012. Falling and landing motion control for character animation. ACM Transactions on Graphics 31, 6 (2012).
Ichim et al. (2017) Alexandru-Eugen Ichim, Petr Kadleček, Ladislav Kavan, and Mark Pauly. 2017. Phace: Physics-based face modeling and animation. ACM Transactions on Graphics 36, 4 (2017).
Ishiwaka et al. (2022) Yuko Ishiwaka, Xiao S Zeng, Shun Ogawa, Donovan Michael Westwater, Tadayuki Tone, and Masaki Nakada. 2022. DeepFoids: Adaptive Bio-Inspired Fish Simulation with Deep Reinforcement Learning. In Advances in Neural Information Processing Systems.
Jiang et al. (2019) Yifeng Jiang, Tom Van Wouwe, Friedl De Groote, and C Karen Liu. 2019. Synthesis of biologically realistic human motion using joint torque actuation. ACM Transactions On Graphics 38, 4 (2019).
Kidziński et al. (2018) Łukasz Kidziński, Sharada Prasanna Mohanty, Carmichael F Ong, Zhewei Huang, Shuchang Zhou, Anton Pechenko, Adam Stelmaszczyk, Piotr Jarosik, Mikhail Pavlov, Sergey Kolesnikov, et al. 2018. Learning to run challenge solutions: Adapting reinforcement learning methods for neuromusculoskeletal environments. In The NIPS’17 Competition: Building Intelligent Systems. 121–153.
Kidziński et al. (2020) Łukasz Kidziński, Carmichael Ong, Sharada Prasanna Mohanty, Jennifer Hicks, Sean Carroll, Bo Zhou, Hongsheng Zeng, Fan Wang, Rongzhong Lian, Hao Tian, et al. 2020. Artificial intelligence for prosthetics: Challenge solutions. In The NeurIPS’18 Competition. 69–128.
Lee et al. (2022) Seyoung Lee, Jiye Lee, and Jehee Lee. 2022. Learning Virtual Chimeras by Dynamic Motion Reassembly. ACM Transactions on Graphics 41, 6 (2022).
Lee et al. (2019) Seunghwan Lee, Moonseok Park, Kyoungmin Lee, and Jehee Lee. 2019. Scalable muscle-actuated human simulation and control. ACM Transactions On Graphics 38, 4 (2019).
Lee et al. (2018) Seunghwan Lee, Ri Yu, Jungnam Park, Mridul Aanjaneya, Eftychios Sifakis, and Jehee Lee. 2018. Dexterous manipulation and control with volumetric muscles. ACM Transactions on Graphics 37, 4 (2018).
Lee et al. (2009) Sung-Hee Lee, Eftychios Sifakis, and Demetri Terzopoulos. 2009. Comprehensive biomechanical modeling and simulation of the upper body. ACM Transactions on Graphics 28, 4 (2009).
Lee et al. (2010) Yoonsang Lee, Sungeun Kim, and Jehee Lee. 2010. Data-driven Biped Control. ACM Trans. Graph. 29, 4 (2010).
Lee et al. (2014) Yoonsang Lee, Moon Seok Park, Taesoo Kwon, and Jehee Lee. 2014. Locomotion control for many-muscle humanoids. ACM Transactions on Graphics 33, 6 (2014).
Levin et al. (2011) David IW Levin, Benjamin Gilles, Burkhard Mädler, and Dinesh K Pai. 2011. Extracting skeletal muscle fiber fields from noisy diffusion tensor data. Medical Image Analysis 15, 3 (2011), 340–353.
Li et al. (2022) Yuwei Li, Longwen Zhang, Zesong Qiu, Yingwenqi Jiang, Nianyi Li, Yuexin Ma, Yuyao Zhang, Lan Xu, and Jingyi Yu. 2022. NIMBLE: a non-rigid hand model with bones and muscles. ACM Transactions on Graphics 41, 4 (2022).
Liu and Hodgins (2018) Libin Liu and Jessica Hodgins. 2018. Learning basketball dribbling skills using trajectory optimization and deep reinforcement learning. ACM Transactions on Graphics 37, 4 (2018).
Liu et al. (2016) Libin Liu, Michiel Van De Panne, and KangKang Yin. 2016. Guided learning of control graphs for physics-based characters. ACM Transactions on Graphics 35, 3 (2016).
Liu et al. (2010) Libin Liu, KangKang Yin, Michiel van de Panne, Tianjia Shao, and Weiwei Xu. 2010. Sampling-based Contact-rich Motion Control. ACM Trans. Graph. 29, 4 (2010).
Luo et al. (2020) Ying-Sheng Luo, Jonathan Hans Soeseno, Trista Pei-Chun Chen, and Wei-Chao Chen. 2020. Carl: Controllable agent with reinforcement learning for quadruped locomotion. ACM Transactions on Graphics 39, 4 (2020).
Masoudnia and Ebrahimpour (2014) Saeed Masoudnia and Reza Ebrahimpour. 2014. Mixture of experts: a literature survey. Artificial Intelligence Review 42, 2 (2014), 275–293.
Matias et al. (2009) Ricardo Matias, Carlos Andrade, and António Prieto Veloso. 2009. A transformation method to estimate muscle attachments based on three bony landmarks. Journal of biomechanics 42, 3 (2009), 331–335.
McInnes et al. (2018) Leland McInnes, John Healy, and James Melville. 2018. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 (2018).
Merel et al. (2019) Josh Merel, Leonard Hasenclever, Alexandre Galashov, Arun Ahuja, Vu Pham, Greg Wayne, Yee Whye Teh, and Nicolas Heess. 2019. Neural Probabilistic Motor Primitives for Humanoid Control. In 7th International Conference on Learning Representations, ICLR 2019.
Moon et al. (2017) Seung Jun Moon, Young Choi, Chin Youb Chung, Ki Hyuk Sung, Byung Chae Cho, Myung Ki Chung, Jaeyoung Kim, Mi Sun Yoo, Hyung Min Lee, and Moon Seok Park. 2017. Normative values of physical examinations commonly used for cerebral palsy. Yonsei Medical Journal 58, 6 (2017), 1170–1176.
Mordatch and Todorov (2014) Igor Mordatch and Emo Todorov. 2014. Combining the benefits of function approximation and trajectory optimization. In Proceedings of Robotics: Science and Systems.
Nakada et al. (2018) Masaki Nakada, Tao Zhou, Honglin Chen, Tomer Weiss, and Demetri Terzopoulos. 2018. Deep learning of biomimetic sensorimotor control for biomechanical human animation. ACM Transactions on Graphics 37, 4 (2018).
Ong et al. (2019) Carmichael F Ong, Thomas Geijtenbeek, Jennifer L Hicks, and Scott L Delp. 2019. Predicting gait adaptations due to ankle plantarflexor muscle weakness and contracture using physics-based musculoskeletal simulations. PLoS computational biology 15, 10 (2019), e1006993.
Park et al. (2022) Jungnam Park, Sehee Min, Phil Sik Chang, Jaedong Lee, Moon Seok Park, and Jehee Lee. 2022. Generative GaitNet. In ACM SIGGRAPH 2022 Conference Proceedings.
Park et al. (2019) Soohwan Park, Hoseok Ryu, Seyoung Lee, Sunmin Lee, and Jehee Lee. 2019. Learning predict-and-simulate policies from unorganized human motion data. ACM Transactions on Graphics 38, 6 (2019).
Paszke et al. (2019) Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32. Curran Associates, Inc., 8024–8035.
Peng et al. (2018) Xue Bin Peng, Pieter Abbeel, Sergey Levine, and Michiel van de Panne. 2018. Deepmimic: Example-guided deep reinforcement learning of physics-based character skills. ACM Transactions on Graphics 37, 4 (2018).
Peng et al. (2022) Xue Bin Peng, Yunrong Guo, Lina Halper, Sergey Levine, and Sanja Fidler. 2022. ASE: Large-Scale Reusable Adversarial Skill Embeddings for Physically Simulated Characters. 41, 4 (2022).
Peng et al. (2021) Xue Bin Peng, Ze Ma, Pieter Abbeel, Sergey Levine, and Angjoo Kanazawa. 2021. AMP: Adversarial motion priors for stylized physics-based character control. ACM Transactions on Graphics 40, 4 (2021).
Rajagopal et al. (2016) Apoorva Rajagopal, Christopher L Dembia, Matthew S DeMers, Denny D Delp, Jennifer L Hicks, and Scott L Delp. 2016. Full-body musculoskeletal model for muscle-driven simulation of human gait. IEEE transactions on biomedical engineering 63, 10 (2016), 2068–2079.
Ryu et al. (2021) Hoseok Ryu, Minseok Kim, Seungwhan Lee, Moon Seok Park, Kyoungmin Lee, and Jehee Lee. 2021. Functionality-Driven Musculature Retargeting. Computer Graphics Forum 40, 1 (2021), 341–356.
Sachdeva et al. (2015) Prashant Sachdeva, Shinjiro Sueda, Susanne Bradley, Mikhail Fain, and Dinesh K Pai. 2015. Biomechanical simulation and control of hands and tendinous systems. ACM Transactions on Graphics 34, 4 (2015).
Si et al. (2014) Weiguang Si, Sung-Hee Lee, Eftychios Sifakis, and Demetri Terzopoulos. 2014. Realistic biomechanical simulation and control of human swimming. ACM Transactions on Graphics 34, 1 (2014).
Sok et al. (2007) Kwang Won Sok, Manmyung Kim, and Jehee Lee. 2007. Simulating biped behaviors from human motion data. ACM Transactions on Graphics 26, 3 (2007).
Song and Geyer (2015) Seungmoon Song and Hartmut Geyer. 2015. A neural circuitry that emphasizes spinal feedback generates diverse behaviours of human locomotion. The Journal of physiology 593, 16 (2015), 3493–3511.
Wang et al. (2012) Jack M Wang, Samuel R Hamner, Scott L Delp, and Vladlen Koltun. 2012. Optimizing locomotion controllers using biologically-based actuators and objectives. ACM Transactions on Graphics 31, 4 (2012).
Winkler et al. (2022) Alexander Winkler, Jungdam Won, and Yuting Ye. 2022. QuestSim: Human Motion Tracking from Sparse Sensors with Simulated Avatars. In SIGGRAPH Asia 2022 Conference Proceedings.
Won et al. (2020) Jungdam Won, Deepak Gopinath, and Jessica Hodgins. 2020. A scalable approach to control diverse behaviors for physically simulated characters. ACM Transactions on Graphics 39, 4 (2020), 33–1.
Won et al. (2021) Jungdam Won, Deepak Gopinath, and Jessica Hodgins. 2021. Control strategies for physically simulated characters performing two-player competitive sports. ACM Transactions on Graphics 40, 4 (2021).
Won et al. (2022) Jungdam Won, Deepak Gopinath, and Jessica Hodgins. 2022. Physics-based character controllers using conditional VAEs. ACM Transactions on Graphics 41, 4 (2022).
Yang et al. (2022) Zeshi Yang, Kangkang Yin, and Libin Liu. 2022. Learning to use chopsticks in diverse gripping styles. ACM Transactions on Graphics 41, 4 (2022).
Ye and Liu (2010) Yuting Ye and C. Karen Liu. 2010. Optimal Feedback Control for Character Animation Using an Abstract Model. ACM Trans. Graph. 29, 4 (2010).
Ye et al. (2022) Yongjing Ye, Libin Liu, Lei Hu, and Shihong Xia. 2022. Neural3Points: Learning to Generate Physically Realistic Full-body Motion for Virtual Reality Users. arXiv preprint arXiv:2209.05753 (2022).
Yin et al. (2007) KangKang Yin, Kevin Loken, and Michiel Van de Panne. 2007. Simbicon: Simple biped locomotion control. ACM Transactions on Graphics 26, 3 (2007).
Yu et al. (2019) Ri Yu, Hwangpil Park, and Jehee Lee. 2019. Figure skating simulation from video. Computer Graphics Forum 38, 7 (2019), 225–234.
Yu et al. (2018) Wenhao Yu, Greg Turk, and C Karen Liu. 2018. Learning symmetric and low-energy locomotion. ACM Transactions on Graphics 37, 4 (2018).
Zajac (1989) Felix E Zajac. 1989. Muscle and tendon: properties, models, scaling, and application to biomechanics and motor control. Critical reviews in biomedical engineering 17, 4 (1989), 359–411.