This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Sleep Posture One-Shot Learning Framework Using Kinematic Data Augmentation: In-Silico and In-Vivo Case Studies11footnotemark: 1

Omar Elnaggar [email protected] Frans Coenen Andrew Hopkinson Lyndon Mason Paolo Paoletti School of Engineering, University of Liverpool, Liverpool L69 3GH, United Kingdom School of Electrical Engineering, Electronics and Computer Science, University of Liverpool, Liverpool L69 3BX, United Kingdom School of Psychology,University of Liverpool, Liverpool L69 7ZA, United Kingdom School of Medicine, University of Liverpool, Liverpool L69 3GE, United Kingdom Department of Trauma and Orthopaedics, Liverpool University Hospitals NHS Foundation Trust, Liverpool L9 7AL, United Kingdom
Abstract

Sleep posture is linked to several health conditions such as nocturnal cramps and more serious musculoskeletal issues. However, in-clinic sleep assessments are often limited to vital signs (e.g. brain waves). Wearable sensors with embedded inertial measurement units have been used for sleep posture classification; nonetheless, previous works consider only few (commonly four) postures, which are inadequate for advanced clinical assessments. Moreover, posture learning algorithms typically require longitudinal data collection to function reliably, and often operate on raw inertial sensor readings unfamiliar to clinicians. This paper proposes a new framework for sleep posture classification based on a minimal set of joint angle measurements. The proposed framework is validated on a rich set of twelve postures in two experimental pipelines: computer animation to obtain synthetic postural data, and human participant pilot study using custom-made miniature wearable sensors. Through fusing raw geo-inertial sensor measurements to compute a filtered estimate of relative segment orientations across the wrist and ankle joints, the body posture can be characterised in a way comprehensible to medical experts. The proposed sleep posture learning framework offers plug-and-play posture classification by capitalising on a novel kinematic data augmentation method that requires only one training example per posture. Additionally, a new metric together with data visualisations are employed to extract meaningful insights from the postures dataset, demonstrate the added value of the data augmentation method, and explain the classification performance. The proposed framework attained promising overall accuracy as high as 100%100\% on synthetic data and 92.7%92.7\% on real data, on par with state of the art data-hungry algorithms available in the literature.

keywords:
Wearable sensors, Sensor fusion , Data augmentation, One-shot learning, Multi-classifier system, Human posture
journal: XXXXXX

1 Introduction

A recent comprehensive epidemiological study revealed that nearly 22%22\% of the global population suffer from musculoskeletal disorders with most cases being in high-income countries [1]. For example, in the United Kingdom, musculoskeletal conditions affect 1 in every 4 adults. One-third of medical consultations [2] and over 25%25\% of all surgical interventions [3] are consequent to musculoskeletal conditions. Another study projects these conditions will rise more rapidly in low- and middle-income countries [4].

The study of human posture allows for understanding the musculoskeletal system and opens the door for supporting the musculoskeletal health and well-being over the whole lifespan. Over recent years, human sleep behaviour studies have gained more traction among the research community [5]. Traditionally, sleep had been considered as a natural mechanism to recover from exhaustion of daily activities, but recent sleep studies gave contradictory observations. In fact, it was found that certain sleep behaviours could bring about health complications, such as pressure ulcers [6], or uncover underlying disorders [7], including restless leg syndrome and periodic leg movements. Interestingly, some studies linked musculoskeletal morbidity to postural cues, for example, the supine position has been correlated to apnoea more strongly compared to lateral positions [8]. Another evidence shows that prolonged joint immobilisation could lead to muscular contractions [9] which could potentially develop into chronic pain episodes. Moreover, muscle cramps and painful spasms can also occur during wake or sleep states due to sustained abnormal body postures, lack of exercise or pregnancy [10].

Motivated by the evidence above, clinicians and biomedical engineers are keen to investigate whether a significant statistical link exists between the development of musculoskeletal diseases and specific sleep postures. To this end, body sleep postures need to be monitored by means of motion capture technologies, which are generally categorised into optical and non-optical techniques, both of which have been employed to monitor the sleep body postures.

Optical methods are categorised according to whether they involve the use of on-body retroreflective markers or not: marker-based versus markerless techniques. The first category tends to be impractical for sleep analysis due to cost of the equipment involved, controlled lab setting requirements, and marker occlusions. Markerless motion capture capitalises on recent advances in computer vision and deep neural architectures to regress over the body-surface coordinates given a set of image pixels [11, 12, 13]. Markerless techniques also struggle with occlusions due to body covering and are often criticised over privacy failing, thus limiting their adoption.

Non-optical motion capture methods in sleep-related applications comprise two main categories; bed-embodied sensors and wearable sensors. Within the former category, force sensitive resistor grids embedded into mattresses [14] and load cells attached to bed frame supports [15], are by far the most common techniques. However, bed-embodied sensors only provide measurements of the body weight distribution, which consequently require an indirect pose inference framework that is not guaranteed to be entirely reliable. Wearable inertial sensing offers a solution with low intrusiveness, does not require optical line of sight and guarantees privacy, addressing the aforementioned limitations of both optical and bed-embodied non-optical techniques. Moreover, processing of low-dimensional timeseries from on-body sensors is generally of low computational cost. Therefore, it is overall more suited for sleep monitoring applications.

There are a number of open research questions that hinder the large-scale deployment of wearable inertial sensors for tracking sleep postures. The challenges are primarily with the sensing and intelligent perception aspects of these systems. Measurement errors [16], sensor misalignment with respect to body segments [17], and soft tissue artefact [18] are amongst the most prominent sensing errors. Speaking of intelligent perception, we herein focus on three main challenges. First, wearable inertial sleep trackers have so far been exploited mostly for standard posture sensing (supine, prone and lateral positions) which has little to offer clinicians studying posture-dependent musculoskeletal pathologies, such as leg and calf cramps. Second, current works typically employ machine learning (ML) models that directly operate on raw sensor data - an incomprehensible black-box framework to clinicians who have an outsider perspective on artificial intelligence. Third, for these models to function reliably, extended data collection and expensive manual labelling are often prerequisites.

This paper proposes a human sleep posture learning framework (illustrated in Fig. 1 and detailed in Section 3) to overcome the aforementioned challenges. The framework capitalises on data augmentation to facilitate sleep posture modelling from a single postural observation (hereafter “shot”). The experimental pipelines have been developed and validated both in silico and on real world data. The main contributions of the presented work can be summarised as follows:

  • 1.

    Our approach is the first study directed at wearable-based classification of twelve sleep postures, whereas previous work had been mostly limited to four “standard” postures. The twelve postures include a much wider range of postures common in sleep, thus making the proposed framework better suited for clinical use.

  • 2.

    To the best knowledge of the authors, we are the first to use inertial sensor fusion in sleep postural analysis. Unlike the often used sensor raw data, our framework provides access to joint orientations which is more human-interpretable and better serves medical diagnosis.

  • 3.

    We showcase that approximate segment-to-segment orientations are sufficiently viable to characterise sleep postures, without the need for exhaustive sensor-to-segment calibration procedures that hinder the deployment of wearable sensors in clinical or home settings.

  • 4.

    We propose the use of three-dimensional (3D) computer graphics software to accelerate development and tune algorithms by performing an in silico sleep experiment before validating the methodology on human participants. Previous works often rely solely on real data, which may be hard to collect during the developmental phase.

  • 5.

    We propose a novel one-shot learning scheme to accelerate learning of arbitrary human sleep postures with augmented observations. This eliminates the need for longitudinal data collection and labelling, which often hinder the use of wearables.

  • 6.

    We built quadruple non-invasive wearable sensor modules using low-cost off-the-shelf components. Each module comprises two inertial measurement units (IMUs) to offer dual-segment tracking across the distal joint of each extremity limb.

  • 7.

    We propose a metric-based approach, coupled with data visualisation, to extract quantitative and qualitative insights on posture data trends, data augmentation, and the sleep posture classification problem as a whole.

The structure of the rest of this paper is as follows. Section 2 discusses the literature relevant to the problem of human posture analysis using wearable sensors, with particular emphasis on the knowledge gap and clinical needs. In Section 3, we explain our methodology with reference to the proposed framework depicted in Fig. 1. Section 4 presents the experimental design and setup, together with a description of the framework implementation. Section 5 presents the evaluation results obtained and discusses the main findings. In Section 6, the paper highlights are summarised along with suggestions for future research directions.

Refer to caption
Figure 1: Schematic of the proposed sleep posture classification framework.

2 Related Work

In the clinical landscape, polysomnography (PSG) is regarded as the gold standard clinical diagnostic tool for diagnosing sleep disorders. It involves the simultaneous recording of several parameters to evaluate two major aspects: sleep staging and physiology [19]. Sleep stages are essential as they allow for the recovery and development of the body and the brain. Sleep staging is typically evaluated based on the brain neural activity from electroencephalogram signals, besides electrooculogram and electromyography which help too, particularly with the rapid eye movement stage [20]. The sleep physiology is necessary to assess the respiratory health, blood circulation and other functions like that of the renal and endocrine systems [21]. Hence, PSG is a key tool for the diagnosis of various cardiovascular, neurologic and neuromuscular conditions, and in the evaluation of other sleep disorders such as insomnia and abnormal body movements.

Nevertheless, PSG is faced with sensing, interpretation and diagnosis challenges. To begin with, it requires patient overnight hospitalisation with 20+20+ on‑body intrusive sensors which adversely affect the patient’s sleep quality. As far as human posture is concerned in this work, this aspect remains premature and partially exploited in PSG. Although some PSG implementations evaluate the positional component of sleep-disordered breathing, they often come with a basic thoracic sensor that is limited to recognising only few body postures [22, 23]. An enhanced postural analysis tool would allow clinicians to have a more holistic understanding of how sleep is linked to other conditions such as musculoskeletal morbidities.

Clinicians often welcome the uprising of wearable sleep trackers [20]. However, these trackers remain in the early validation phase, as it is unclear how this additional data can provide more information other than the general wellness and sleep/wake detection [24]. For wearable trackers, each manufacturer integrates their own proprietary algorithm and there is no widely accepted standard, unlike the case of PSG whose standard was set by the American Academy of Sleep Medicine.

From a motion capture perspective, the human motion analysis literature is branched into movement quantification and classification [25]. Most of the available literature belongs to the former category, and is concerned with the estimation of position and/or orientation of one or more body segments during various human activities, which are useful in sport science and film making. In contrast, classification targets high-level interpretations or labelling of the underlying human motion/posture. Within the field of sleep posture classification, only the works on IMU-based wearables of a geo-inertial sensing modality are reviewed here as they are the most relevant for the work presented in this paper.

The available literature can be grouped according to the number of postures considered. The majority of literature consider only four standard sleep postures: supine, prone, right and left lateral positions. Interestingly, a single sensor attached to the chest and feeding data to a Linear Discriminant Analysis (LDA) classifier was sufficient to classify four sleep postures with an accuracy of 99%99\% [26]. Another study [27] evaluated four classifier architectures: Naïve Bayes, Bayesian Network, Decision Tree (DT) and Random Forest on their performance in recognising the four postures based on statistical features extracted from an accelerometer embedded in a smartwatch. Their accuracies were found to vary between 60.3%60.3\% and 91.8%91.8\%, with Random Forest being the best performer. In [28], spectral features extracted from a sole upper arm sensor on a frame-by-frame basis were used to train a Long Short-Term Memory (LSTM) network, achieving 99%99\% in a four-posture classification problem. A recent study [29] investigated two aspects of a single-sensor sleep posture classification: (1) optimal body locations for sensor placement, and (2) the evaluation of feature-based pattern recognition against deep learning models. Given a quad-posture dataset, the comparative analysis identified the chest and either thighs as optimal body locations, and revealed comparable performance between handcrafted feature-based classifier and deep learning models. A different approach to sleep quad-posture classification was proposed in [30], where a probabilistic state transition from one posture to another is conditioned on the inertial profile of the pose change motion. The authors defined the transitioning motion profile through the extraction of 66 different features in time and frequency domains from raw data channels sourced from triple sensors attached to the chest and wrists.

Fewer works included more sleep postures in their case studies. For a care home application [31], three classifier models were evaluated in a six-posture classification problem: (1) k-nearest neighbours (kk-NN), (2) Decision Tree (DT), and (3) Support Vector Machines (SVM). Using three sensors embedded into garments (socks and T-shirt), this work adopted a pure pattern recognition approach where the authors preprocessed and extracted features from the sensory timeseries for classifier training. These pose classifications were then fed into a knowledge-based fuzzy model to automatically determine the priority level of postural changes for the prevention of pressure ulcers. The SVM was identified as the best performing classifier during pilot experiments with an accuracy of 99%99\%.

A clinical study investigated the recognition of eight sleep postures using three wearable sensors placed on the forearms and chest [32]. The eight postures represent minor variations of the four standard sleep postures. Using statistical features manually extracted from raw sensory data, the average four-posture classification accuracy was 99.5%99.5\%. Notably, this figure dropped to 92.5%92.5\% when considering the eight minor posture variations, with the worst model accuracy hitting as low as 84.3%84.3\%. The authors also identified battery life and large sensor size as two limitations of wearable-based sleep trackers, and provided recommendations on sensor design, packaging and data capture/transmission optimisation.

With three sensors attached to the chest and ankles, a case study explored the feasibility of classifying six to eight minor variations of the four standard sleep postures [33]. Under different test settings, the generalised matrix learning vector quantisation (GMLVQ) technique was found to perform variably on 7-hour individual participant data from 58.4%58.4\% up to 99.8%99.8\%. Multi-subject models were examined too and achieved a mean accuracy between 78%78\% and 83.6%83.6\%.

More recently, we reported the classification of twelve simulated benchmark sleep postures using sparse postural cues from the four extremity limbs [34]. The posture dataset encloses different limb configurations common in sleep, making it more qualified for clinical use. The proposed data augmentation technique allowed for synthetically generating more postural samples and was proven to enhance the overall posture classification performance. Given a scarce dataset, the reported average classification accuracy was as high as 100%100\% using an SVM-based classifier. To emulate sensing artefacts commonly encountered in off-the-shelf sensors, mild to extreme levels of noise-based jamming were added to the testing postural samples, with the classifier showing high robustness (above 77%77\%).

Some case studies investigated additional aspects of sleep as well. In [35], a smartwatch embedded with an accelerometer, microphone and illumination sensor was used to capture sleep information on the body posture, hand position and acoustic events (e.g. snores and coughs). Based on the tilt of the hand, the authors employed a 1-NN classifier to recognise the four standard body sleep postures where the similarity criterion is based on direct Euclidean distance measurement. With a 6-hour data recording per participant, the system achieved over 90%90\% accuracy in the quad-posture classification task.

Though some works do not emerge from the domain of sleep tracking, they remain relevant to intelligent wearable sensing and the analysis of human posture and movement. A smart jumpsuit with four inertial sensors on the upper arms and thighs was used for the early detection of neurodevelopmental disorders among infants through the analysis of their body postures and movements [36]. The authors investigated: (1) feature-based ML, and (2) end-to-end deep learning, both of which performed comparably around 95%95\%. It was also shown that the quadruple sensor configuration improved the system’s classification accuracy by up to 24%24\% compared to partial sensor deployment.

Another study employed a dense sensor network composed of 31 wearable sensors to classify 22 (non-sleep) body postures common in human daily activities [37]. Despite the large number of postures considered and the high throughput of sensor data, a 1-NN classifier attained an average classification accuracy of 81%81\% using simple weighted posture attributes.

The framework proposed in this paper sits at a sweet spot between the quantification and classification branches of human motion analysis. Instead of operating on raw sensor signals, we map the sleep posture labels to the kinematic orientation space of the body’s extremity limbs. Specifically, the extremity segment-to-segment relative orientations (joint angles of wrists and ankles) are regarded as primal indicators of body posture. This translates to a more explainable posture recognition algorithm and equips clinicians with better qualified diagnostic tools. With twelve sleep postures, our work goes beyond the four standard poses commonly considered in the literature. Clinically speaking, our work advances the sleep posture sensing capability which has been a main shortcoming of today’s PSG systems. According to the literature, it is evident that different classification models perform comparably; from the naïve kk-NN classifiers to deep learning models. Such remarkable a conclusion shall draw more attention to the data collection and treatment stages. Therefore, we leverage on a noise injection based data augmentation technique to: (1) mitigate the effect of biases present in our postures dataset, and (2) accomplish performance similar to the state-of-the-art models at a fraction of the training data. We also leverage on additional performance interpretation techniques to showcase the added value brought by data augmentation to the one-shot learning problem, while lending explainability to the reasoning behind the model.

3 Methods

This section describes the methods adopted in each stage of the proposed framework sketched in Fig. 1. An overview of the framework is presented in Section 3.1. The acquisition of postural cues defining the sleep posture had been first formulated virtually through in silico simulations as explained in Sections 3.2 and 3.3 and then similarly performed using real world data collected by the wearable sensors described in Sections 3.4 and 3.5. The proposed postural data augmentation is described in Section 3.6. Lastly, the model behind the posture classification is outlined in Section 3.7.

Refer to caption
(a)
Refer to caption
(b)
Figure 2: In silico sleep simulation: (a) motion sequence illustrating the twelve sleep postures virtually replicated in Blender© , and (b) anthropomorphic rig used for animating the 3D character model.

3.1 Posture Learning Framework

The proposed human posture learning framework is designed to serve as a plug-and-play system to recognise any arbitrary body posture given a single training shot for that posture. The system comprises four wearable sensor modules and a server for sensor data acquisition, storage and analysis. In real life the system would be used as follows. Following an instruction manual or video, a subject will attach the wearable sensor modules to their wrists and ankles before sleep, then replicate a defined set of sleep postures in bed and a snapshot of sensor data is recorded at each posture. All transmitted sensor data are preprocessed to extract segment-to-segment orientations (joint angles) to be used as postural cues defining each posture (see Section 3.3). To avoid the need for longitudinal data collection and labelling, the single shots of preprocessed data are subsequently augmented with many more modified copies (i.e. synthetic data samples) which accelerate effective modelling of each posture. This new augmented posture dataset is sufficiently diversified, and therefore suitable for training a multi-class classifier for this particular sleep session. The patient would then sleep while the sensor data continue to be streamed to the server for sleep posture analysis. From the sequence and duration of sleep postures overnight, a clinician may be able to extract useful clinical insights. The fact that the wearable sensor modules are not taken off between the collection of the training data and real sleep data (testing data) means that sensor-to-segment misalignment is fixed throughout the recording session, thus no calibration is required.

Unlike the vast majority of literature that considers only four standard sleep postures, we showcase the scalability of the framework with twelve wide-ranging postures common in sleep. The framework has been validated in two experimental pipelines, in silico sleep simulation and human participant study as depicted in Fig. 1.

The mechanics of the proposed framework is best explained by analogy with the flow of information in a standard pattern recognition system: data collection, preprocessing, classifier training and testing. The data collection stage involves a local server acquiring body segment orientations either from an exported sleep simulation file (see Section 3.2) or from IMU data transmitted by the wearable sensor modules (see Section 3.5).

The data preprocessing stage starts with the body pose characterisation step (described in Section 3.3) to extract segment-to-segment orientations from each extremity limb to monitor joints perceived relevant from a clinical perspective. These four relative orientations serve as a simplified and human-interpretable representation of the overall body posture. The segment-to-segment orientation data are then augmented as described in Section 3.6 to accelerate sleep posture modelling.

The use of the data augmentation step for the in-silico sleep simulation slightly differs from that of the in-vivo case. The sleep simulation provides only one observation for each posture; therefore, data augmentation is used twice to generate training and testing posture datasets, respectively, to validate the framework virtually. In contrast, the in-vivo session provides recordings of body postures. Therefore, data augmentation is only used to diversify the training dataset of postures, whereas the real test-labelled recordings are readily available for the purpose of framework validation in real-world. Since real timeseries are used for testing, the relative orientation data channels from the quadruple wearable sensors were synchronised.

The multi-class classification model is comprised of an ensemble of SVM binary classifiers. The classifier training and testing procedures are the same for both in-silico and in-vivo pipelines. Using the augmented posture dataset for training, the classifier model is trained to recognise the underlying sleep posture. Then, the test-labelled posture dataset is used to evaluate the performance of the pre-trained model against groundtruth labels.

3.2 In Silico Sleep Simulation

The virtual sleep simulation is built around Blender© (The Blender Foundation, Amsterdam, NL), an open-source computer graphics software for 3D modelling, animation and video rendering. A 3D human-like character model222https://cloud.blender.org/training/animation-fundamentals/5d69ab4dea6789db11ee65d1/ is virtually animated to replicate the twelve sleep postures shown in Fig. 2(a). Each body posture was captured in a keyframe and transitions between keyframes were interpolated to create a motion sequence simulating sleep. We followed the standard pipeline used by digital artists for character animation. An anthropomorphic rig (acting as a skeleton, see Fig. 2(b)) is carefully aligned and bond to the character model to allow for full-body animation by posing the rig alone.

Structurally, the rig segments have root-parent-child relationships. The root is a fully unconstrained segment with six Degrees of Freedom (DOF) representing the translation and rotation of the rig as a whole. The root segment sits at the top of the rig’s hierarchy and was chosen to be the lower spine segment. Branched off from the root segment are the kinematic chains forming the remainder of the rig (e.g. lower limbs, upper back segments, etc.). The 26 segments forming these kinematic chains follow a parent-child transformation nature in the sense that rotation or translation of a parent segment, affects the pose of all its subsequent child segments, but not vice versa.

The complete definition of the body posture, 𝑩\bm{B}, is defined by two components: (1) the combined position and orientation of the root segment 𝝆6\bm{\rho}\in\mathbb{R}^{6}, and (2) the rotations vector 𝜶n\bm{\alpha}\in\mathbb{R}^{n} of the remaining 26 rig segments. The 𝜶\bm{\alpha} vector contains the angular displacements about nn active rotational axes of all body segments, depending on the joint definitions (ball, saddle, or hinge joints). To allow for body posture tracking, Blender© automatically assigns right-handed 3D coordinate systems {Bi|i:1iD}\{B_{i}\ |\ i\in\mathbb{Z}\mathrel{\mathop{\mathchar 58\relax}}1\leq i\leq D\} to all DD segments anchored at their respective parent joints’ active centres of rotation. Given an arbitrary jthj^{\text{th}} segment, its pose can be referred to a reference coordinate system, RR, using the shape forward kinematics map

𝑻jR\displaystyle\prescript{R}{}{\bm{T}}_{j} =𝑻1𝑻2𝑻j𝑻jR(0)\displaystyle={\bm{T}}_{1}\ {\bm{T}}_{2}\ ...\ {\bm{T}}_{j}\prescript{R}{}{\bm{T}}_{j}(0)
=(i=1j𝑻i)𝑻jR(0)\displaystyle=\left(\prod_{i=1}^{j}\ {\bm{T}}_{i}\right)\prescript{R}{}{\bm{T}}_{j}(0) (1)

where 𝑻i4×4{\bm{T}}_{i}\in\mathbb{R}^{4\times 4} denotes the homogeneous transformation matrix describing the rotations and translations of BiB_{i}, and 𝑻jR(0)\prescript{R}{}{\bm{T}}_{j}(0) represents the initial postural offset between coordinate systems jj and RR after the rig binding process is completed. This map allows for the calculation of the net transformation of a child segment by combining all hierarchical transformations from parent segments. In Blender©, RR is often the global coordinate system of the 3D viewport, and thus the offset term for each segment is known a priori.

The terms 𝑻i{\bm{T}}_{i} can be exported with the animation rendered video as a BioVision Hierarchy (BVH) file containing these hierarchical transformations, quantified by the angular and translational displacements of each segment at each frame, and the fixed segment-to-segment positional offsets.

3.3 Characterisation of Body Posture

In principle, a posture is defined by the complete set of joint angles of all body segments, which is an unrealistic measurement and computing challenge for wearable sensing. Therefore, what we refer to as “body pose characterisation” is the selection of relatively few joint angles that are practically measurable and, at the same time, allow sleeping postures to be classified. The definition of the sleep posture is important since the outcome of this step will have a strong impact on the selection of effective techniques for collecting and analysing data.

The challenge with the body posture is that its study follows a dual nature of parametric and subjective aspects. It is parametric in the sense that measurements of some modality are required for algorithms to use in decision making. This is considered the mainstream direction of the available literature as covered in Section 2. Subjectivity is more related to the human perception of the sleep posture, which varies from a person to another. For example, in [36], multiple human annotators were found to disagree in labelling postures captured in video. Having said that, subjectivity is not completely disjointed from measurements; humans need high-level information in some form (e.g. images, or numbers), but not raw sensor readings. The subjective element in posture classification is useful since posture measurement variability within some constraint should be permissible. In this paper, we exploit the parametric-subjective nature for posture characterisation (herein this section), and augmentation (see Section 3.6).

Refer to caption
Figure 3: The kinematic definition of the wrist joint.

To reach a good compromise between the numerical and perceptual reasoning behind postural analysis, we propose the use of segment-to-segment relative orientations at the four extremity limbs (wrists and ankles) as primal indicators of the sleep posture. This provides clinicians with an advantageous access to human-interpretable and simplified posture definition alongside the output pose labels. The choice of ankle and wrist joints stems from their strong connection with various sleep-related pathologies, such as ankle osteoarthritis [38] and carpal tunnel syndrome [39]. Such intuitive postural information is envisaged to bring clinicians more comprehensibility of the posture classification algorithm, making it a better fit as a future medical diagnostic tool.

Segment-to-segment relative orientations represent the rotational component of the local joint transformation linking a child segment to its parent. For illustration, Fig. 3 shows the right wrist joint and the coordinate systems SpS_{p} and ScS_{c} of, respectively, the forearm (parent segment) and the hand (child segment). The wrist is a condyloid synovial (or saddle) joint allowing only two motions: flexion/extension and ulnar/radial deviation. Based on this definition, the hand-to-forearm rotation matrix, 𝑹ScSp\prescript{S_{p}}{}{\bm{R}}_{S_{c}}, can be formulated as

𝑹Sp=Sc𝑹𝑹z𝑹yx\prescript{S_{p}}{}{\bm{R}}\prescript{}{S_{c}}{=}{\bm{R}}\prescript{}{z}{\ }{\bm{R}}\prescript{}{y}{\ }{\bm{R}}\prescript{}{x}{\ } (2)

where, in the case of the wrist joint, 𝑹z{\bm{R}}\prescript{}{z}{\ }and 𝑹x{\bm{R}}\prescript{}{x}{\ }represent the rotations of the flexion/extension and ulnar/radial deviation respectively. The wrist pronation/supination originates from the elbow joint, hence 𝑹y{\bm{R}}\prescript{}{y}{\ }is ideally an identity matrix.

For the in silico sleep simulation, segment-to-segment orientations can be derived from the skeleton hierarchical transformations provided in the BVH file exported from Blender©. Indeed, using Eq. 1, the relative transformation between the parent and child segments can be obtained as

𝑻ScSp=(𝑻SpR)T𝑻ScR\prescript{S_{p}}{}{\bm{T}}_{S_{c}}=\left(\prescript{R}{}{\bm{T}}_{S_{p}}\right)^{T}\prescript{R}{}{\bm{T}}_{S_{c}} (3)

The rotational component of the local transformation across the extremity limb distal joint can then be extracted as

𝑹Sp=Sc𝑸𝑻ScSp𝑸T\prescript{S_{p}}{}{\bm{R}}\prescript{}{S_{c}}{=}\ {\bm{Q}}\ \prescript{S_{p}}{}{\bm{T}}_{S_{c}}\ {\bm{Q}}^{T} (4)

where

𝑸=[𝑰3×3𝟎3×1]\bm{Q}=\begin{bmatrix}\bm{I}_{3\times 3}&\bm{0}_{3\times 1}\end{bmatrix}

Similar kinematic definitions are made for the lower extremity limbs. In this case, the local transformation of the ankle joint can be monitored by tracking both the shin and foot segments, with the allowable ankle motions being the inversion/eversion and plantar/dorsi-flexion.

The BVH articulated body representation is often expressed in Euler angle-based rotation matrices as defined in Eq. 2. However, for the purpose of this study 𝑹SpSc\prescript{S_{p}}{}{\bm{R}}\prescript{}{S_{c}}{\ } was converted to its equivalent quaternion form 𝒒ScSp\prescript{{}_{S_{p}}}{}{\bm{q}}_{{}_{S_{c}}} to obtain a more concise and numerically stable representation. Thus, the pose characterisation vector χv{\bm{{\mathchoice{\raisebox{0.0pt}{$\displaystyle\chi$}}{\raisebox{0.0pt}{$\textstyle\chi$}}{\raisebox{0.0pt}{$\scriptstyle\chi$}}{\raisebox{0.0pt}{$\scriptscriptstyle\chi$}}}}}^{v} for the virtual character model is defined as

χv=[𝒒Sc𝒥Sp𝒒Sc𝒥Sp𝒒Sc𝒥Sp𝒒Sc𝒥Sp]{\bm{{\mathchoice{\raisebox{0.0pt}{$\displaystyle\chi$}}{\raisebox{0.0pt}{$\textstyle\chi$}}{\raisebox{0.0pt}{$\scriptstyle\chi$}}{\raisebox{0.0pt}{$\scriptscriptstyle\chi$}}}}}^{v}=\begin{bmatrix}\prescript{{}_{S_{p}}}{}{\bm{q}}_{{}_{S_{c}}}^{\mathbfcal{J}(1)}&\prescript{{}_{S_{p}}}{}{\bm{q}}_{{}_{S_{c}}}^{\mathbfcal{J}(2)}&\prescript{{}_{S_{p}}}{}{\bm{q}}_{{}_{S_{c}}}^{\mathbfcal{J}(3)}&\prescript{{}_{S_{p}}}{}{\bm{q}}_{{}_{S_{c}}}^{\mathbfcal{J}(4)}\end{bmatrix} (5)

where 𝒥{right wristleft wristright ankleleft ankle}\mathbfcal{J}=\{\text{right wrist},\text{left wrist},\text{right ankle},\text{left ankle}\} denotes the set of four distal joints of the four extremity limbs.

It is worth noting that the pose characterisation framework proposed in this paper does not utilise any calibration poses and exploits segment-to-segment orientations, making the approach more meaningful clinically. This goes beyond the approach previously presented by the authors in [34] which required a reference calibration T-pose, and offered tracking of the child segment alone.

Refer to caption
(a)
Refer to caption
(b)
Figure 4: The real-world experimental setup: (a) placement of the wearable posture sensors on the four extremity limbs; each two black-filled circles represent one complete limb tracker, and (b) annotated illustration of the wearable sensor module.

3.4 Wearable Posture Sensors

Some form of wearable technology is required to track segment-to-segment orientation of the four distal joint angles. The main requirements for such wearable technology include: (1) multi-segment orientation tracking, (2) compact size to not compromise sleep quality, and (3) low cost to make it affordable to the public health sector. Predominantly, reported works on human motion analysis involving several body segments simply employ multiple standalone wearable sensors; one for each segment. In this work, we opt to design a custom-made sensor module (shown in Fig. 4(b)) with dual-segment tracking capability, empowered by two embedded BNO055 IMU sensors from Bosch Sensortec© (Bosch Sensortec GmbH, Reutlingen, DE). Both IMU sensors are managed by a single ESP32-WROOM-32D microcontroller from Espressif Systems© (Espressif Systems Shanghai Co Ltd, Shanghai, CN) featuring Bluetooth connectivity for wireless data transmission. At about 6cubic centimeters6\ \text{cubic centimeters} in volume for each IMU case, the sensor module is sufficiently slim and small for wearability during sleep. Moreover, all the electronic components used in this design are commercially available, with a total low cost of approximately GBP 100. Fig. 4(a) illustrates the on-body placement of these sensor modules such that the parent and child IMU sensors are mounted on the last two segments of each extremity limb.

3.5 Intra- and Inter-Sensor Fusion

A sensor fusion algorithm is needed to estimate the attitude of each IMU sensor (intra-sensor fusion), that is, a function of the body segment it is mounted on. Afterwards, a pose characterisation framework is employed in a similar way to that described in Section 3.3. To this end, an inter-sensor fusion step is applied to fuse the two absolute IMU orientations for each wearable sensor module into one segment-to-segment orientation.

To compensate for the drift inherent to the IMU heading estimates, readings from the magnetometer, embedded in the IMU, was exploited to provide a stable estimate of the orientation. Herein, the Madgwick filter [40] is employed for fusing the geo-inertial measurements from the IMU sensor, thanks to its orientation tracking robustness and successful deployment in human motion analysis research [41]. Furthermore, the optimisation procedure of the filter is of low computational cost and takes place in the quaternion space, allowing for online and singularity-free attitude estimation. As formulated in Eq. 6, the filter first carries out a vector observation step which involves iteratively searching for an optimal orientation estimate, 𝒒^EM,\prescript{{}_{M}}{}{\hat{\bm{q}}}_{{}_{E}}, defined from the IMU frame MM to the Earth frame EE. The validity criterion for the orientation estimate depends on how well it aligns a sensor-measured field vector 𝒔M=[0sxsysz]\prescript{{}_{M}}{}{\bm{s}}=\begin{bmatrix}0\quad s_{x}\quad s_{y}\quad s_{z}\end{bmatrix} with some Earth-referenced geophysical quantity 𝒓E=[0rxryrz]\prescript{{}_{E}}{}{\bm{r}}=\begin{bmatrix}0\quad r_{x}\quad r_{y}\quad r_{z}\end{bmatrix}.

𝒒EM=min𝒒EM4𝒇(𝒒EM,𝒓E,𝒔M)\prescript{{}_{M}}{}{\bm{q}}_{{}_{E}}=\min_{\prescript{{}_{M}}{}{\bm{q}}_{{}_{E}}\in\mathbb{R}^{4}}\quad{\bm{f}}\left(\prescript{{}_{M}}{}{\bm{q}}_{{}_{E}},\prescript{{}_{E}}{}{\bm{r}},\prescript{{}_{M}}{}{\bm{s}}\right) (6)

such that

𝒇(𝒒EM,𝒓E,𝒔M)=𝒒EM𝒓E𝒒EM𝒔M{\bm{f}}\left(\prescript{{}_{M}}{}{\bm{q}}_{{}_{E}},\prescript{{}_{E}}{}{\bm{r}},\prescript{{}_{M}}{}{\bm{s}}\right)=\prescript{{}_{M}}{}{\bm{q}}_{{}_{E}}^{*}\otimes\prescript{{}_{E}}{}{\bm{r}}\otimes\prescript{{}_{M}}{}{\bm{q}}_{{}_{E}}-\prescript{{}_{M}}{}{\bm{s}}

where the operator \otimes denotes quaternion multiplication.

The filter then uses the Jacobian matrix of the vector objective function to determine its gradient 𝒇\nabla{\bm{f}}, which is later used to define the normalised quaternion estimation error 𝒒EϵM\prescript{{}_{M}}{}{\bm{q}}_{{}_{E}}^{\epsilon} at time index tkt_{k}

𝒒EϵM(tk)=𝒇𝒇|tk\prescript{{}_{M}}{}{\bm{q}}_{{}_{E}}^{\epsilon}(t_{k})=\left.\frac{\nabla{\bm{f}}}{\mathinner{\!\left\lVert\nabla{\bm{f}}\right\rVert}}\right|_{t_{k}} (7)

Geophysical vector observation alone provides a sluggish orientation estimate since it is a memoryless framework and is highly susceptible to sensor noise. As shown in Eqs. 8 and 9, the Madgwick filter produces a smoother orientation estimate through numerically integrating a reliable orientation rate estimate 𝒒˙EM\prescript{{}_{M}}{}{\dot{\bm{q}}}_{{}_{E}} at each descent update step. The orientation rate is the outcome of fusing 𝒒EϵM\prescript{{}_{M}}{}{\bm{q}}_{{}_{E}}^{\epsilon}, weighted by a hyperparameter (β<<1)\left(\beta<<1\right), with the rate of orientation change 𝒒˙EωM\prescript{{}_{M}}{}{\dot{\bm{q}}}_{{}_{E}}^{\omega} derived from the gyroscope measurement vector 𝝎M=[0ωxωyωz]\prescript{{}_{M}}{}{\bm{\omega}}=\begin{bmatrix}0\quad\omega_{x}\quad\omega_{y}\quad\omega_{z}\end{bmatrix}.

𝒒^EM(tk)=𝒒^EM(tk1)+𝒒˙EM(tk)Δtk\prescript{{}_{M}}{}{\hat{\bm{q}}}_{{}_{E}}(t_{k})=\prescript{{}_{M}}{}{\hat{\bm{q}}}_{{}_{E}}(t_{k-1})+\prescript{{}_{M}}{}{\dot{\bm{q}}}_{{}_{E}}(t_{k})\cdot\Delta t_{k} (8)
𝒒˙EM(tk)=𝒒˙EωM(tk)β𝒒EϵM(tk)\prescript{{}_{M}}{}{\dot{\bm{q}}}_{{}_{E}}(t_{k})=\prescript{{}_{M}}{}{\dot{\bm{q}}}_{{}_{E}}^{\omega}(t_{k})-\beta\prescript{{}_{M}}{}{\bm{q}}_{{}_{E}}^{\epsilon}(t_{k}) (9)

where

𝒒˙EωM(tk)=12𝒒^EM(tk1)𝝎M(tk)\prescript{{}_{M}}{}{\dot{\bm{q}}}_{{}_{E}}^{\omega}(t_{k})=\frac{1}{2}\prescript{{}_{M}}{}{\hat{\bm{q}}}_{{}_{E}}(t_{k-1})\otimes\prescript{{}_{M}}{}{\bm{\omega}}(t_{k})
Refer to caption
Figure 5: An illustration of wearable posture sensing for the upper limb. By applying Madgwick filtering to the angular velocity, acceleration and magnetic measurements from the IMUs, the absolute orientations 𝒒EMp\prescript{{}_{M_{p}}}{}{\bm{q}}_{{}_{E}} and 𝒒EMc\prescript{{}_{M_{c}}}{}{\bm{q}}_{{}_{E}} of each IMU can be estimated.

As depicted in Fig. 5, the two IMUs MpM_{p} and McM_{c} built in each wearable sensor modules are attached to the two most distal segments of each limb respectively. Leveraging on the sensor fusion algorithm outlined above, the absolute orientations of both IMUs are first estimated, and then fused to determine the IMU-to-IMU orientation 𝒒McMp\prescript{{}_{M_{p}}}{}{\bm{q}}_{{}_{M_{c}}} as

𝒒McMp=𝒒McE𝒒EMp=𝒒EMc𝒒EMp\prescript{{}_{M_{p}}}{}{\bm{q}}_{{}_{M_{c}}}=\prescript{{}_{E}}{}{\bm{q}}_{{}_{M_{c}}}\otimes\prescript{{}_{M_{p}}}{}{\bm{q}}_{{}_{E}}=\prescript{{}_{M_{c}}}{}{\bm{q}}_{{}_{E}}^{*}\otimes\prescript{{}_{M_{p}}}{}{\bm{q}}_{{}_{E}} (10)

This quaternion is computed for each extremity limb to approximately measure the underlying segment-to-segment orientation. Unlike works that prerequisite IMU-to-segment misalignment calibration [42, 43], the proposed framework instead aims at fast posture classification using approximate segment orientations. Fortunately, this serves the feasibility of the proposed system since it is impractical to calibrate eight IMUs for each use.

Similar to Eq. 5, the pose characterisation vector χw{\bm{{\mathchoice{\raisebox{0.0pt}{$\displaystyle\chi$}}{\raisebox{0.0pt}{$\textstyle\chi$}}{\raisebox{0.0pt}{$\scriptstyle\chi$}}{\raisebox{0.0pt}{$\scriptscriptstyle\chi$}}}}}^{w} based on wearable sensor data is defined as

χw=[𝒒Mc𝒥Mp𝒒Mc𝒥Mp𝒒Mc𝒥Mp𝒒Mc𝒥Mp]{\bm{{\mathchoice{\raisebox{0.0pt}{$\displaystyle\chi$}}{\raisebox{0.0pt}{$\textstyle\chi$}}{\raisebox{0.0pt}{$\scriptstyle\chi$}}{\raisebox{0.0pt}{$\scriptscriptstyle\chi$}}}}}^{w}=\begin{bmatrix}\prescript{{}_{M_{p}}}{}{\bm{q}}_{{}_{M_{c}}}^{\mathbfcal{J}(1)}&\prescript{{}_{M_{p}}}{}{\bm{q}}_{{}_{M_{c}}}^{\mathbfcal{J}(2)}&\prescript{{}_{M_{p}}}{}{\bm{q}}_{{}_{M_{c}}}^{\mathbfcal{J}(3)}&\prescript{{}_{M_{p}}}{}{\bm{q}}_{{}_{M_{c}}}^{\mathbfcal{J}(4)}\end{bmatrix} (11)

3.6 Postural Data Augmentation

This section embarks on the stage of data preprocessing where segment-to-segment orientations are augmented to create a larger dataset better suited for the ML algorithm (see Section 3.7) for pose classification. To begin with, let a generic variable χ\textstyle\chi be defined as either χv{\bm{{\mathchoice{\raisebox{0.0pt}{$\displaystyle\chi$}}{\raisebox{0.0pt}{$\textstyle\chi$}}{\raisebox{0.0pt}{$\scriptstyle\chi$}}{\raisebox{0.0pt}{$\scriptscriptstyle\chi$}}}}}^{v} or χw{\bm{{\mathchoice{\raisebox{0.0pt}{$\displaystyle\chi$}}{\raisebox{0.0pt}{$\textstyle\chi$}}{\raisebox{0.0pt}{$\scriptstyle\chi$}}{\raisebox{0.0pt}{$\scriptscriptstyle\chi$}}}}}^{w}, depending on whether posture tracking is taking place in silico or real world. Next, we define a collective pose characterisation vector 𝚿{\bm{\Psi}} that brings together all the twelve sleep postures

𝚿=[χ1χ2χ12]{\bm{\Psi}}=\begin{bmatrix}{\bm{{\mathchoice{\raisebox{0.0pt}{$\displaystyle\chi$}}{\raisebox{0.0pt}{$\textstyle\chi$}}{\raisebox{0.0pt}{$\scriptstyle\chi$}}{\raisebox{0.0pt}{$\scriptscriptstyle\chi$}}}}}_{1}&{\bm{{\mathchoice{\raisebox{0.0pt}{$\displaystyle\chi$}}{\raisebox{0.0pt}{$\textstyle\chi$}}{\raisebox{0.0pt}{$\scriptstyle\chi$}}{\raisebox{0.0pt}{$\scriptscriptstyle\chi$}}}}}_{2}&...&{\bm{{\mathchoice{\raisebox{0.0pt}{$\displaystyle\chi$}}{\raisebox{0.0pt}{$\textstyle\chi$}}{\raisebox{0.0pt}{$\scriptstyle\chi$}}{\raisebox{0.0pt}{$\scriptscriptstyle\chi$}}}}}_{12}\end{bmatrix} (12)

where χj{\bm{{\mathchoice{\raisebox{0.0pt}{$\displaystyle\chi$}}{\raisebox{0.0pt}{$\textstyle\chi$}}{\raisebox{0.0pt}{$\scriptstyle\chi$}}{\raisebox{0.0pt}{$\scriptscriptstyle\chi$}}}}}_{j} corresponds to the jthj^{\text{th}} sleep posture.

Based on this definition, 𝚿{\bm{\Psi}} resembles a reference dictionary containing postural cues belonging to the sleep postures included in the presented case study. In practice, with such single-observation definitions of postures, over-fitting and poor generalisation are clearly unavoidable outcomes for any classifier regardless of its type. As covered in Section 2, related works record extended sensor data timeseries for each posture which sometimes reach several hours or nights worth of training data. Eventually, extended data collection translates to a higher cost of manual data labelling. Moreover, each subject may have slightly different sleep postures that are of interest to clinicians; training data collection and labelling should then be repeated for each subject. This would clearly be an obstacle for clinical use of wearable-based sleep monitoring solutions.

Refer to caption
(a)
Refer to caption
c
Refer to caption
b
Refer to caption
(d)
Figure 6: Postural data augmentation results (N=100N=100): (a) augmented axes and (b) angles of rotation after injecting Gaussian noise (variance = 0.10.1) to a quaternion; (c) augmented axes and (d) angles of rotation after injecting Gaussian noise (σϕ=σθ=30°\sigma_{\phi}=\sigma_{\theta}=30\degree) to the axis-angle based orientation (proposed method). Blue-coloured data are synthetically generated, whereas the input reference axis-angle orientation is in red. The black dashed lines in (b) and (d) represent the sample standard deviation of the angle timeseries.

In this work, data augmentation is a key preprocessing step to make a trade-off between the cost of data collection and timeseries classification. It is essentially needed in applications where only scarce [44] or class-imbalanced [45] datasets are available. Another possible use of data augmentation is to obtain a more capable ML model through enhancing the quantity and quality of the training data by deliberately introducing synthetic samples. When assigned correct labels, synthetic data allows the ML model to explore regions of the input space dismissed in the real training dataset. This leads to expanding the decision boundary of the model, thus lowering the risk of over-fitting [46].

Several families of data augmentation techniques are extensively reviewed in [47, 48] which include, but not limited to, pattern mixing, signal decomposition and generative neural networks. However, these techniques prerequisite medium to large timeseries datasets, thus directly applying any of them to our single “snapshots” of postures is irrational. To address this one-shot learning problem, we propose a noise injection based data augmentation approach to facilitate timeseries generation having only provided a single observation of each posture. The addition of artificial noise helps in overcoming the scarcity and bias issues present in the training data, and provide a good compromise between the parametric and subjective aspects of the human pose definition. Another advantage of noise injection is that the noise generation process can be easily modelled which means that the data augmentation is both editable and invertible. Artificially noised datasets reportedly led to increased robustness to sensor noise and improved classification performance in real world applications, including construction equipment activity recognition [49] and meteorological sensor data processing [50].

Nonetheless, simple addition of noise to a quaternion leads to a chaotic data augmentation process with nonsensical synthetic samples as illustrated in Figs. 6(a) and 6b. Therefore, to generate near-realistic postural data, the quaternion-based pose descriptor χ\textstyle\chi is first converted into its corresponding axis-angle representation 𝒙{\bm{x}}. As shown in Fig. 7, the axis of rotation is defined in the singularity-free Cartesian space, while the augmentation step is performed in an intermediate spherical coordinate system to obtain a more homogeneous augmented dataset. In particular, 𝒙{\bm{x}} is defined as

𝒙=[𝑮(ϕp𝒥,ϕa𝒥)𝒆c𝒥p𝑮(ϕp𝒥,ϕa𝒥)𝒆c𝒥p𝑮(ϕp𝒥,ϕa𝒥)𝒆c𝒥p𝑮(ϕp𝒥,ϕa𝒥)𝒆c𝒥p]+[𝒈(ϕp𝒥,ϕa𝒥)𝒈(ϕp𝒥,ϕa𝒥)𝒈(ϕp𝒥,ϕa𝒥)𝒈(ϕp𝒥,ϕa𝒥)]=[𝒙𝒥𝒙𝒥𝒙𝒥𝒙𝒥]T\begin{split}{\bm{x}}&=\begin{bmatrix}{\bm{G}}(\phi_{p}^{\mathbfcal{J}(1)},\phi_{a}^{\mathbfcal{J}(1)})\cdot\prescript{p}{}{\bm{e}}_{c}^{\mathbfcal{J}(1)}\vspace{0.1cm}\\ {\bm{G}}(\phi_{p}^{\mathbfcal{J}(2)},\phi_{a}^{\mathbfcal{J}(2)})\cdot\prescript{p}{}{\bm{e}}_{c}^{\mathbfcal{J}(2)}\vspace{0.1cm}\\ {\bm{G}}(\phi_{p}^{\mathbfcal{J}(3)},\phi_{a}^{\mathbfcal{J}(3)})\cdot\prescript{p}{}{\bm{e}}_{c}^{\mathbfcal{J}(3)}\vspace{0.1cm}\\ {\bm{G}}(\phi_{p}^{\mathbfcal{J}(4)},\phi_{a}^{\mathbfcal{J}(4)})\cdot\prescript{p}{}{\bm{e}}_{c}^{\mathbfcal{J}(4)}\end{bmatrix}+\begin{bmatrix}{\bm{g}}(\phi_{p}^{\mathbfcal{J}(1)},\phi_{a}^{\mathbfcal{J}(1)})\vspace{0.1cm}\\ {\bm{g}}(\phi_{p}^{\mathbfcal{J}(2)},\phi_{a}^{\mathbfcal{J}(2)})\vspace{0.1cm}\\ {\bm{g}}(\phi_{p}^{\mathbfcal{J}(3)},\phi_{a}^{\mathbfcal{J}(3)})\vspace{0.1cm}\\ {\bm{g}}(\phi_{p}^{\mathbfcal{J}(4)},\phi_{a}^{\mathbfcal{J}(4)})\end{bmatrix}\\ &=\begin{bmatrix}{\bm{x}}^{\mathbfcal{J}(1)}&{\bm{x}}^{\mathbfcal{J}(2)}&{\bm{x}}^{\mathbfcal{J}(3)}&{\bm{x}}^{\mathbfcal{J}(4)}\end{bmatrix}^{T}\end{split} (13)

such that 𝑮()4×3{\bm{G}}(\cdot)\in\mathbb{R}^{4\times 3} and 𝒈()4×1{\bm{g}}(\cdot)\in\mathbb{R}^{4\times 1} denote parametric matrix and vector, respectively, used to transform the axis-angle representation from the spherical space to the Cartesian space, and a generic 𝒆c𝒥|p\prescript{p}{}{\bm{e}}_{c}^{\mathbfcal{J}(j)} is defined as

𝒆c𝒥|p=ϕp𝒥|𝒆^1+ϕa𝒥|𝒆^2axis+θ𝒥|𝒆^3angle\prescript{p}{}{\bm{e}}_{c}^{\mathbfcal{J}(j)}=\underbrace{\phi_{p}^{\mathbfcal{J}(j)}\ {\bm{\hat{e}}}_{{}_{1}}+\phi_{a}^{\mathbfcal{J}(j)}\ {\bm{\hat{e}}}_{{}_{2}}}_{\text{axis}}+\underbrace{\theta^{\mathbfcal{J}(j)}\ {\bm{\hat{e}}}_{{}_{3}}}_{\text{angle}} (14)

where:

  • 1.

    subscript cc and superscript pp stand for the child and parent frames, respectively, anchored to either a body segment SS or an IMU MM.

  • 2.

    𝒆^i{\bm{\hat{e}}}_{{}_{i}} for all ii represents a standard-basis vector.

  • 3.

    ϕp𝒥|[0,180]\phi_{p}^{\mathbfcal{J}(j)}\in[0,180] and ϕa𝒥|[0,360)\phi_{a}^{\mathbfcal{J}(j)}\in[0,360) denote the polar and azimuthal angles, respectively, defining a unit axis of rotation in a spherical coordinate system.

  • 4.

    θ𝒥|[0,180]\theta^{\mathbfcal{J}(j)}\in[0,180] is angle of rotation about the defined axis.

A vectorisation of 𝒙{\bm{x}} is performed such that 𝒙16×1𝒙1×16{\bm{x}}\in\mathbb{R}^{16\times 1}\rightarrow{\bm{x}}\in\mathbb{R}^{1\times 16} for convenience of notation. Then, an augmented dictionary variable, 𝚿(𝝉){\bm{\Psi}}({\bm{\tau}}) is defined as the collective pose characterisation vector timeseries obtained through the augmentation of 𝚿{\bm{\Psi}}

𝚿(𝝉)=[𝒙1(τ1)𝒙2(τ1)𝒙12(τ1)𝒙1(τ2)𝒙2(τ2)𝒙11(τN1)𝒙12(τN1)𝒙1(τN)𝒙11(τN)𝒙12(τN)]{\bm{\Psi}}({\bm{\tau}})=\begin{bmatrix}{\bm{x}}_{1}(\tau_{{}_{1}})&{\bm{x}}_{2}(\tau_{{}_{1}})&&\cdots&{\bm{x}}_{12}(\tau_{{}_{1}})\\ {\bm{x}}_{1}(\tau_{{}_{2}})&{\bm{x}}_{2}(\tau_{{}_{2}})&&&\\ \vdots&&\ddots&&\\ &&&{\bm{x}}_{11}(\tau_{{}_{N-1}})&{\bm{x}}_{12}(\tau_{{}_{N-1}})\\ {\bm{x}}_{1}(\tau_{{}_{N}})&\cdots&&{\bm{x}}_{11}(\tau_{{}_{N}})&{\bm{x}}_{12}(\tau_{{}_{N}})\end{bmatrix} (15)

where 𝝉N{\bm{\tau}}\in\mathbb{Z}^{N} represents the time index vector for the augmented timeseries.

For each arbitrary time index τk\tau_{k}, we sample two Gaussian-distributed noise terms, ϵ12{\bm{\epsilon}_{1}}\in\mathbb{R}^{2} and ϵ2\epsilon_{2}\in\mathbb{R}, to augment the axis-angle representation outlined in Eq. 14 as follows

𝒆c𝒥|p(τk)=[ϕp𝒥|ϕa𝒥|θ𝒥|]T+[ϵ1ϵ2]=(ϕp𝒥|+δϕp)𝒆^1+(ϕa𝒥|+δϕa)𝒆^2+(θ𝒥|+δθ)𝒆^3\begin{split}\prescript{p}{}{\bm{e}}_{c}^{\mathbfcal{J}(j)}(\tau_{k})&=\begin{bmatrix}\phi_{p}^{\mathbfcal{J}(j)}&\phi_{a}^{\mathbfcal{J}(j)}&\theta^{\mathbfcal{J}(j)}\end{bmatrix}^{T}+\begin{bmatrix}{\bm{\epsilon}_{1}}\\ \epsilon_{2}\end{bmatrix}\\ &=(\phi_{p}^{\mathbfcal{J}(j)}+\delta\phi_{p})\ {\bm{\hat{e}}}_{{}_{1}}+(\phi_{a}^{\mathbfcal{J}(j)}+\delta\phi_{a})\ {\bm{\hat{e}}}_{{}_{2}}\\ &\quad+(\theta^{\mathbfcal{J}(j)}+\delta\theta)\ {\bm{\hat{e}}}_{{}_{3}}\end{split} (16)

where ϵ1𝒩×Σ{\bm{\epsilon}_{1}}\sim\mathbfcal{N}_{1}\left(\bm{0}_{2\times 1},{\bm{\Sigma}}\right) is used to augment ϕp𝒥|\phi_{p}^{\mathbfcal{J}(j)} and ϕa𝒥|\phi_{a}^{\mathbfcal{J}(j)}. The symmetric covariance matrix, 𝚺{\bm{\Sigma}} is parameterised by a variance σϕ2\sigma_{\phi}^{2}

𝚺(σϕ2)=[σϕ200σϕ2]{\bm{\Sigma}}(\sigma_{\phi}^{2})=\begin{bmatrix}\sigma_{\phi}^{2}&0\\ 0&\sigma_{\phi}^{2}\end{bmatrix} (17)

and ϵ2𝒩2(0,σθ2)\epsilon_{2}\sim\mathcal{N}_{2}(0,\sigma_{\theta}^{2}) is used to augment θ𝒥|\theta^{\mathbfcal{J}(j)} and is parameterised by a variance σθ2\sigma_{\theta}^{2}.

Refer to captionEq. 13
Figure 7: Annotated visualisation of Cartesian- and spherical-based axis-angle representations.

The strength of the proposed data augmentation technique is that it has a single controllable hyperparameter for each of the two main elements defining a static orientation: σϕ2\sigma_{\phi}^{2} and σθ2\sigma_{\theta}^{2} for the axis and angle of rotation respectively. Assigning different values to both hyperparameters provides varying data augmentation characteristics as per the application requirements. Figs. 6c and 6(d) illustrate one possible augmentation result using the proposed method. The carefully noised augmented timeseries resembles the output signals of microelectromechanical systems (MEMS) making up many of today’s commercial IMU sensors. Moreover, the addition of noise can be intentionally exaggerated to boost the robustness of trained classifiers.

3.7 Sleep Posture Classification

The definition of the collective pose characterisation vector timeseries is context-dependent; it can be either 𝚿  ()\prescript{{}_{\ooalign{\rule[3.01389pt]{3.01389pt}{0.4pt}\cr\hss\rule{0.4pt}{4.89998pt}\hss\cr}}}{}{\bm{\Psi}}(\cdot) or 𝚿()\prescript{*}{}{\bm{\Psi}}(\cdot) to denote training and testing timeseries, respectivey. Herein ()(\cdot) refers to the time index vector 𝒕O{\bm{t}}\in\mathbb{Z}^{O} or 𝝉{\bm{\tau}} to indicate real or augmented timeseries respectively. By definition, a classifier :𝒙y\mathcal{F}\mathrel{\mathop{\mathchar 58\relax}}{\bm{x}}\rightarrow y is required such that y𝒴{𝒴𝒴𝒴}y\in\mathbfcal{Y}=\{\mathcal{Y}_{1},\mathcal{Y}_{2},\dots,\mathcal{Y}_{12}\} denotes the posture label. For clarity of notation, a generic 𝒙{\bm{x}} could either be a training 𝒙  \prescript{{}_{\ooalign{\rule[3.01389pt]{3.01389pt}{0.4pt}\cr\hss\rule{0.4pt}{4.89998pt}\hss\cr}}}{}{\bm{x}} or testing 𝒙\prescript{*}{}{\bm{x}}, which corresponds to y  \prescript{{}_{\ooalign{\rule[3.01389pt]{3.01389pt}{0.4pt}\cr\hss\rule{0.4pt}{4.89998pt}\hss\cr}}}{}{y} and y\prescript{*}{}{y} respectively regardless of whether a real or augmented time index is considered.

We leverage on an error-correcting output codes (ECOC) model [51, 52] to achieve multi-class classification via aggregating binary classifiers, fi:{i, 1iL}f_{i}\mathrel{\mathop{\mathchar 58\relax}}\{i\in\mathbb{Z},\ 1\leq i\leq L\}. The ECOC framework begins with the encoding step in which an encoding matrix, {}×\mathbfcal{M}\in\{-1,0,+1\}^{12\times L}, dictates the class memberships for each fif_{i} such that these values denote negative, ignored and positive classes respectively. The matrix elements of \mathbfcal{M} is denoted by mjim_{j}^{i} corresponding to arbitrary class 𝒴j\mathcal{Y}_{j} and binary classifier fif_{i}. Depending on the adopted encoding strategy, the number of employed binary classifiers and their collective generalisation capability may vary. Herein, we use the one-against-one encoding technique that explores all possible pairs of classes (L=66)(L=66), as this was found to have good generalisation capability, without compromising computational efficiency [53].

Once all binary classifiers are fully trained, the ECOC model then relies on a decoding step to map the output of fif_{i} to the corresponding class label. To accomplish this, a base of reference codewords is created to define the aggregate outputs from all classifiers for each class. The ECOC model eventually compares a given test codeword against each of the reference codewords to determine the class of the largest likelihood. Different decoding strategies were proposed in the literature with the most popular ones being (1) distance-based, (2) probabilistic-based and (3) pattern space transformation techniques [54]. In this work, the pairwise Hamming distance is used as the loss measure to estimate the most likely class label y^j(tk)\prescript{*}{}{\hat{y}}_{j}(t_{k}) for 𝒙j(tk)\prescript{*}{}{\bm{x}}_{j}(t_{k}), i.e.

y^j(tk)=argmin𝒴j𝒀12Li1sgn(mjifi(𝒙j(tk)))\prescript{*}{}{\hat{y}}_{j}(t_{k})=\operatorname*{\text{argmin}}_{\mathcal{Y}_{j}\in{\bm{Y}}}\frac{1}{2L}\ \sum_{i}1-\text{sgn}\left(m_{j}^{i}\cdot f_{i}\left(\prescript{*}{}{\bm{x}}_{j}(t_{k})\right)\right) (18)

where mji{+1,1}ijm_{j}^{i}\in\{+1,-1\}\ \forall i\forall j. In this context, mjim_{j}^{i} serves as the groundtruth binary label for fi()f_{i}(\cdot) given 𝒴j\mathcal{Y}_{j}. A similar expression can be formulated for y^j(τk)\prescript{*}{}{\hat{y}}_{j}(\tau_{k}) and 𝒙j(τk)\prescript{*}{}{\bm{x}}_{j}(\tau_{k}) too.

In regard to the binary classifiers, we apply an ensemble of soft margin SVM algorithms to 𝚿  (𝝉)\prescript{{}_{\ooalign{\rule[3.01389pt]{3.01389pt}{0.4pt}\cr\hss\rule{0.4pt}{4.89998pt}\hss\cr}}}{}{\bm{\Psi}}({\bm{\tau}}) obtained from the one-shot learning phase. The framework of this algorithm uses two hyperparameters: (1) a slack variable ξk,j\xi_{k,j} to tolerate minimal misclassifications owing to outliers in the training dataset, and (2) a scalar CC to control the smoothness of the classifier’s decision boundary. The standard optimisation problem for each fif_{i} is outlined in Eq. 19, where only one positive and one negative classes are selected according to the one-against-one encoding defined by \mathbfcal{M}. While searching for a solution hyperplane, parameterised by weight vector 𝑾1×2Nand a scalar biasb{\bm{W}}\in\mathbb{R}^{1\times 2N}\ \text{and a scalar bias}\ b, a Gaussian kernel 𝝋γ:1×161×2N{\bm{\varphi}}_{\gamma}\mathrel{\mathop{\mathchar 58\relax}}\mathbb{R}^{1\times 16}\rightarrow\mathbb{R}^{1\times 2N} of hyperparameter spread γ\gamma is applied to the support vectors to enhance the separability of classes [55]. Finally, a Bayesian optimisation algorithm [56] is used to find the optimal values of the aforementioned hyperparameters:

min𝑾,b12𝑾2+Ckjξk,j\min_{{\bm{W}},b}\quad\frac{1}{2}\ {\mathinner{\!\left\lVert\bm{W}\right\rVert}}^{2}+C\ \sum_{k}\sum_{j}{\xi_{k,j}}\\ (19)
s.t. mji(𝝋γ(𝒙j  (τk))𝑾T+b)1ξk,j\displaystyle\!m_{j}^{i}\cdot\left({\bm{\varphi}}_{\gamma}\left(\prescript{{}_{\ooalign{\rule[3.01389pt]{3.01389pt}{0.4pt}\cr\hss\rule{0.4pt}{4.89998pt}\hss\cr}}}{}{\bm{x}}_{j}(\tau_{k})\right)\cdot{\bm{W}}^{T}+b\right)\geqslant 1-\xi_{k,j}
ξk,j0\displaystyle\xi_{k,j}\geqslant 0
i:fi\displaystyle i\mathrel{\mathop{\mathchar 58\relax}}f_{i}\in\mathcal{F}
j:j,mji{+1,1}\displaystyle j\mathrel{\mathop{\mathchar 58\relax}}j\in\mathbb{Z},\ \forall m_{j}^{i}\in\{+1,-1\}
k:k, 1kN\displaystyle k\mathrel{\mathop{\mathchar 58\relax}}k\in\mathbb{Z},\ 1\leq k\leq N

4 Experimental Setup

This section describes the experimental design and setup for implementing the posture learning framework reported in Section 3, for both the virtual and the human participant pipelines from data collection all through performance evaluation and interpretation.

Refer to caption
Figure 8: Reconstruction of in silico sleep motion sequence in MATLAB©. Pentagram symbols are used to annotate the four extremity limb distal joints defining χv{\bm{{\mathchoice{\raisebox{0.0pt}{$\displaystyle\chi$}}{\raisebox{0.0pt}{$\textstyle\chi$}}{\raisebox{0.0pt}{$\scriptstyle\chi$}}{\raisebox{0.0pt}{$\scriptscriptstyle\chi$}}}}}^{v} at each sleep posture.

4.1 Virtual Sleep Pipeline

As mentioned in Section 3.2, in silico sleep simulation is built around a motion sequence animated in Blender© through manually keyframing each sleeping pose as depicted in Fig. 2(a). The motion sequence keeps each pose for ten consecutive frames before making another ten-frame transition to the next pose, thus making the whole animation 230 frames long in total. The animation relies on linear interpolation to fill in the gaps between each two consecutive keyframes.

The motion sequence is then exported from Blender© in the BVH file format and imported into the MATLAB© (The MathWorks, Massachusetts, US) environment via a bespoke parser script. The parser creates a data structure to allow for the reconstruction of 𝑩\bm{B} throughout the motion sequence as shown in Fig. 8. Another pose characterisation script then extracts χv{\bm{{\mathchoice{\raisebox{0.0pt}{$\displaystyle\chi$}}{\raisebox{0.0pt}{$\textstyle\chi$}}{\raisebox{0.0pt}{$\scriptstyle\chi$}}{\raisebox{0.0pt}{$\scriptscriptstyle\chi$}}}}}^{v} at each keyframed sleep posture, forming the pose characterisation vector 𝚿v{\bm{\Psi}}^{v} to be used in the one-shot learning scheme explained in Section 4.3.

4.2 Participant Study Pipeline

An experimental setup was built at an outdoor university facility for ideal data collection conditions, avoiding measurement anomalies due to, for example, interference from the building environment with the magnetometer. Prior to the pilot experiment, all IMU sensors were calibrated as described in [57, 58] to estimate and reduce errors owing to constant bias, scale factors, cross-axis sensitivity and response nonlinearity. The protocol was approved by The University of Liverpool Research Ethics Committee (review reference: 9850).

The microcontroller chip built in each wearable sensor performs uniform sampling of both IMUs at a rate of 3030 Hz. For optimal multi-sensor data transmission, dual-IMU data packets are simultaneously sent over Bluetooth from all four wearable sensor modules (clients) to the localhost server running a Python script. All data packets are timestamped using a monotonic digital clock with a microsecond resolution. At the end of the data collection session and after sensor fusion, these timestamps are later needed to synchronise, using linear interpolation, quad-sensor relative orientations under one unified time vector as illustrated in Fig. 1.

Refer to caption
Figure 9: Visual illustration of the data collection protocol for the human participant experiment.

Both IMU sensors were placed on each extremity limb such that they are approximately aligned with the distal joint axes to gauge nearly accurate segment-to-segment orientations. As depicted in Fig. 5, the yy-axis and zz-axis of both IMUs were aligned as much as possible with the flexion/extension and ulnar/radial deviation axes of the wrist joint when the hand is parallel to the forearm. Both IMUs were positioned maximally close towards the wrist joint to reduce artefacts from muscle contractions and skin movements, and to avoid interference with the elbow rotation. Similar considerations were taken into account for the placement of the lower limb sensor modules.

A leaflet containing pictures of the sleep postures was given to the participant to assist them in replicating the desired poses before each sample was recorded. As portrayed in Fig. 9, each sleep posture is recorded twice; one recording for each of two trial sets. To ensure postural data resembles that of a realistic sleep scenario, we collect statistically independent posture samples using a random pose shuffling technique throughout each trial set. In addition, to account for the participant gaining familiarity with pose replication over the course of the experiment, we adopt a randomised train/test trial assignment strategy.

At the server back end, all received sensor data are immediately logged into a comma-separated values (CSV) file for subsequent import into MATLAB©. Therein, a sensor fusion script applies Madgwick filtering to each IMU data channels to estimate its orientation given a unit quaternion as the initial orientation estimate and learning rate β=0.1\beta=0.1. All IMU orientations are then collectively fed into a pose characterisation script which extracts χw{\bm{{\mathchoice{\raisebox{0.0pt}{$\displaystyle\chi$}}{\raisebox{0.0pt}{$\textstyle\chi$}}{\raisebox{0.0pt}{$\scriptstyle\chi$}}{\raisebox{0.0pt}{$\scriptscriptstyle\chi$}}}}}^{w} belonging to each train-labelled posture recording, and 𝚿w(𝒕)\prescript{*}{}{\bm{\Psi}}^{w}({\bm{t}}) from test-labelled trials. For each training posture trial, only one randomly selected sample from the quad-sensor relative orientation timeseries is used to identify χw{\bm{{\mathchoice{\raisebox{0.0pt}{$\displaystyle\chi$}}{\raisebox{0.0pt}{$\textstyle\chi$}}{\raisebox{0.0pt}{$\scriptstyle\chi$}}{\raisebox{0.0pt}{$\scriptscriptstyle\chi$}}}}}^{w} for that pose. The resultant 𝚿w{\bm{\Psi}}^{w} will be later utilised in the one-shot learning described in Section 4.3. When constructing the 𝚿w(𝒕)\prescript{*}{}{\bm{\Psi}}^{w}({\bm{t}}) timeseries, the time vector 𝒕{\bm{t}} has a length O=max(Ωj)O=max(\Omega_{j}), where Ωj\Omega_{j} is the timeseries length of any jthj^{\text{th}} test-labelled posture recording. Thus, 𝚿w(𝒕)\prescript{*}{}{\bm{\Psi}}^{w}({\bm{t}}) can be mathematically written as follows:

𝚿w(𝒕)=[𝒙1(𝒕1:Ω1)𝒙12(𝒕1:Ω12)𝚽(𝒕¯1:Ω1)𝚽(𝒕¯1:Ω12)]\prescript{*}{}{\bm{\Psi}}^{w}({\bm{t}})=\left[\begin{array}[]{c c c}\prescript{*}{}{\bm{x}}_{1}({\bm{t}}_{1\mathrel{\mathop{\mathchar 58\relax}}\Omega_{1}})&\cdots&\prescript{*}{}{\bm{x}}_{12}({\bm{t}}_{1\mathrel{\mathop{\mathchar 58\relax}}\Omega_{12}})\\ &&\\ {\bm{\Phi}}(\bar{\bm{t}}_{1\mathrel{\mathop{\mathchar 58\relax}}\Omega_{1}})&\cdots&{\bm{\Phi}}(\bar{\bm{t}}_{1\mathrel{\mathop{\mathchar 58\relax}}\Omega_{12}})\end{array}\right] (20)

such that 𝒕¯1:Ωj\bar{\bm{t}}_{1\mathrel{\mathop{\mathchar 58\relax}}\Omega_{j}} is the relative complement of 𝒕1:Ωj{\bm{t}}_{1\mathrel{\mathop{\mathchar 58\relax}}\Omega_{j}} in 𝒕{\bm{t}}:

𝒕¯1:Ωj=𝒕𝒕1:Ωj\bar{\bm{t}}_{1\mathrel{\mathop{\mathchar 58\relax}}\Omega_{j}}={\bm{t}}\setminus{\bm{t}}_{1\mathrel{\mathop{\mathchar 58\relax}}\Omega_{j}}

and 𝚽(𝒕¯1:Ωj){\bm{\Phi}}(\bar{\bm{t}}_{1\mathrel{\mathop{\mathchar 58\relax}}\Omega_{j}}) denotes a sixteen-column matrix of Not a Number (NaN) elements to account for any jthj^{\text{th}} test-labelled posture recording with Ωj<O\Omega_{j}<O.

4.3 One-Shot Learning

In scenarios where only scarce data is available, data augmentation is necessary. As outlined in Section 3.6, we proposed a one-shot learning method for modelling human sleep postures given a single observation per pose. Depending on whether the virtual or human participant pipeline is considered, the usage of data augmentation varied slightly.

For the in silico sleep simulation, the motion sequence only provides 𝚿v{\bm{\Psi}}^{v}, meaning that separate training and testing timeseries are unavailable for ML. In such case, data augmentation is employed twice for generating training and testing timeseries, 𝚿v  (𝝉)\prescript{{}_{\ooalign{\rule[3.01389pt]{3.01389pt}{0.4pt}\cr\hss\rule{0.4pt}{4.89998pt}\hss\cr}}}{}{\bm{\Psi}}^{v}({\bm{\tau}}) and 𝚿v(𝝉)\prescript{*}{}{\bm{\Psi}}^{v}({\bm{\tau}}) respectively. Besides the single posture observation, new augmented samples (N=499)(N=499) are appended to the 𝚿v  (𝝉)\prescript{{}_{\ooalign{\rule[3.01389pt]{3.01389pt}{0.4pt}\cr\hss\rule{0.4pt}{4.89998pt}\hss\cr}}}{}{\bm{\Psi}}^{v}({\bm{\tau}}) timeseries, contributing to a total of 500500 training samples per sleep posture. Additional augmented samples (N=125)(N=125) are designated for 𝚿v(𝝉)\prescript{*}{}{\bm{\Psi}}^{v}({\bm{\tau}}).

With regard to the human participant experiment, since 𝚿w(𝒕)\prescript{*}{}{\bm{\Psi}}^{w}({\bm{t}}) is obtained from the test-labelled trial recordings, it is therefore required to generate only one timeseries for classifier training that is 𝚿w  (𝝉)\prescript{{}_{\ooalign{\rule[3.01389pt]{3.01389pt}{0.4pt}\cr\hss\rule{0.4pt}{4.89998pt}\hss\cr}}}{}{\bm{\Psi}}^{w}({\bm{\tau}}). Hence, for each posture, augmented samples (N=999)(N=999) are appended to the single observation, contributing to a total of 10001000 training samples per sleep posture.

The timeseries augmentation step described in Section 3.6 for both the virtual and human participant experiments was carried out given the same range of hyperparameter settings, (σϕ2,σθ2)(\sigma_{\phi}^{2},\sigma_{\theta}^{2}). A grid 6×6{\bm{\Re}}\in\mathbb{Z}^{6\times 6} of discrete points in the hyperparameter space was constructed where σϕ2ϕ={20,200,400,600,800,1000}\sigma_{\phi}^{2}\in{\bm{\Re}}_{\phi}=\{20,200,400,600,800,1000\} and σθ2θ={20,100,200,300,400,500}\sigma_{\theta}^{2}\in{\bm{\Re}}_{\theta}=\{20,100,200,300,400,500\}, yielding 36 different data augmentation settings. Given each pair of hyperparameters, the respective training and testing timeseries datasets are used for the training and testing of the posture classification algorithm outlined in Section 3.7. For the soft margin SVM problem, the Bayesian optimisation algorithm carries out iterative search (6060 iterations) for the optimal values of the two hyperparameters CC and γ\gamma over the range [1e3,1e3]\left[1e^{-3},1e^{3}\right].

4.4 Performance Evaluation

Two main metrics are used for the evaluation of the posture classification performance; the accuracy (macc)(m_{acc}) and F1 score (mF1)(m_{F1}). The accuracy refers to the ratio of correct classifications to the total number of testing samples, and is reliable when the testing dataset is evenly distributed as it is the case with the virtual sleep experiment. For class-imbalanced datasets, such as 𝚿w(𝒕)\prescript{*}{}{\bm{\Psi}}^{w}({\bm{t}}), the F1 score offers a less biased assessment of the model performance through finding the harmonic mean of precision and recall [59]. To mitigate any skewed dataset distribution, we employ a macro-averaged F1 score expressed in Eq. 21 that involves computing all class-specific scores independently, followed by finding the overall unweighted arithmetic mean.

mF1=112k=112(2×recall(j)×precision(j)recall(j)+precision(j))m_{F1}=\frac{1}{12}\sum_{k=1}^{12}\left(2\times\frac{\text{recall}(j)\times\text{precision}(j)}{\text{recall}(j)+\text{precision}(j)}\right) (21)

such that

recall(j)=TP(j)TP(j)+FN(j)precision(j)=TP(j)TP(j)+FP(j)\begin{split}\text{recall}(j)&=\frac{\text{TP}(j)}{\text{TP}(j)+\text{FN}(j)}\\ \text{precision}(j)&=\frac{\text{TP}(j)}{\text{TP}(j)+\text{FP}(j)}\end{split}

and TP(j)\text{TP}(j), FP(j)\text{FP}(j) and FN(j)\text{FN}(j) correspond to the true positives, false positives and false negatives, respectively, of a given arbitrary jthj^{\text{th}} posture class.

Additionally, all performance evaluation experiments are repeated ten times where the mean and standard deviation of both metrics indicate the effectiveness of the Bayesian optimisation algorithm in solving for the SVM optimal hyperparameters.

4.5 Performance Interpretation

The metrics described in Section 4.4 are used to monitor, measure and compare the performance of one or more models during the training and testing phases. To build confidence in deploying ML algorithms in real world applications, additional interpretation methods need to be created to allow human users (e.g. clinicians) to comprehend and trust the outputs and decisions made available by these algorithms. Ideally, these methods are expected to unravel the reasoning behind ML algorithms, and be able to explain their cases of success and failure.

Therefore, we herein present two approaches to lend more explainability to the posture learning algorithm. These approaches are employed to explore any interesting data trends, and the findings are then used to interpret the model’s posture inference.

The first visualisation-based approach utilises uniform manifold approximation and projection (UMAP) [60] to produce a two-dimensional (2D) force-directed graph 𝒰\mathbfcal{U} of high-dimensional datasets, such as 𝚿  (𝝉)\prescript{{}_{\ooalign{\rule[3.01389pt]{3.01389pt}{0.4pt}\cr\hss\rule{0.4pt}{4.89998pt}\hss\cr}}}{}{\bm{\Psi}}({\bm{\tau}}) and 𝚿()\prescript{*}{}{\bm{\Psi}}(\cdot). Dimensionality reduction has been successfully applied to visualise data in many domains, including human motion analysis [36], pedagogical research [61] and speaker recognition [62]. In this work, UMAP facilitates the visualisation of postural observations of the same posture (intra-class distribution) as well as across different postures (inter-class distribution). Such data analysis could provide insights on the research methods (e.g. assessing the added value of data augmentation), and on the problem as a whole (e.g. estimating human postural variability). The visualisation software was based on a MATLAB implementation of UMAP [63].

Although UMAP is known for its capability to preserve the local and global data structures in 𝒰\mathbfcal{U}, it lacks a guarantee on the faithful reconstruction of the actual cluster sizes and inter-cluster distances. Therefore, further interpretation tools, perhaps metric-based, are required for a fine-resolution analysis.

Refer to caption
(a)
Refer to caption
(b)
Figure 10: Performance evaluation metrics for the virtual sleep experiment.

Localisation and pose estimation approaches often use quaternions for tracking the orientation of some target asset. Several works reported the use of the angular offset Δθa,b\Delta\theta_{a,b}, expressed in Eq. 22, between any two quaternions 𝒒a{\bm{q}}_{a} and 𝒒b{\bm{q}}_{b} as a common metric to assess the (dis)similarity between orientations [64, 65, 66]:

Δθa,b=2arccos[𝒒a,bϵ]w\Delta\theta_{a,b}=2\arccos{\left[{\bm{q}}_{a,b}^{\epsilon}\right]_{w}} (22)

where 𝒒a,bϵ=𝒒a𝒒b{\bm{q}}_{a,b}^{\epsilon}={\bm{q}}_{a}^{*}\otimes{\bm{q}}_{b} represents the residual orientation error between 𝒒a{\bm{q}}_{a} and 𝒒b{\bm{q}}_{b}, and []w\left[\ \cdot\ \right]_{w} is an operation to extract the scalar term of 𝒒a,bϵ{\bm{q}}_{a,b}^{\epsilon}.

Such error metric can be used to fuse different orientation estimates into one more robust estimate, or compare different estimation techniques. Although it is useful for some applications, it completely overlooks the axis of rotation error in 𝒒a,bϵ{\bm{q}}_{a,b}^{\epsilon}. For the purpose of this paper it is essential to identify where any postural discrepancies/overlaps, may emerge from: are they due to the angle, axis or both components of the extremity limb orientations? In fact, such information can be used to interpret the model perception, and potentially enable clinicians to make evidence-backed future changes to the sleep analysis problem itself, such as insertion/removal of postures, or altering the pose characterisation method.

Therefore, we propose a second performance interpretation approach empowered by a hybrid metric Λ\Lambda to evaluate the (dis)similarity between multiple posture observations, fusing the axes similarity Λϕ\Lambda_{\phi} with the angles similarity Λθ\Lambda_{\theta}. Suppose we have two arbitrary postural observations 𝒙a{\bm{x}}_{a} and 𝒙b{\bm{x}}_{b}, then Λ\Lambda can be defined as

Λ=Λϕ+Λθ\Lambda=\Lambda_{\phi}+\Lambda_{\theta} (23)

where

Λϕ=j=14(𝒙a,1:3𝒥|𝒙b,1:3𝒥|)Λθ=4πj=14|𝒙a,4𝒥|𝒙b,4𝒥||π\begin{split}\Lambda_{\phi}&=\sum_{j=1}^{4}\left({\bm{x}}_{a,1\mathrel{\mathop{\mathchar 58\relax}}3}^{\mathbfcal{J}(j)}\cdot{\bm{x}}_{b,1\mathrel{\mathop{\mathchar 58\relax}}3}^{\mathbfcal{J}(j)}\right)\\ \Lambda_{\theta}&=\frac{4\pi-\sum_{j=1}^{4}\mathinner{\!\left\lvert{\bm{x}}_{a,4}^{\mathbfcal{J}(j)}-{\bm{x}}_{b,4}^{\mathbfcal{J}(j)}\right\rvert}}{\pi}\end{split}

The quadruple axes similarity is captured in Λϕ\Lambda_{\phi} using the vector dot product of 𝒙a,1:3𝒥|{\bm{x}}_{a,1\mathrel{\mathop{\mathchar 58\relax}}3}^{\mathbfcal{J}(j)} and 𝒙b,1:3𝒥|{\bm{x}}_{b,1\mathrel{\mathop{\mathchar 58\relax}}3}^{\mathbfcal{J}(j)}, whereas Λθ\Lambda_{\theta} computes a normalised similarity measure based on the total absolute angle error between 𝒙a,4𝒥|{\bm{x}}_{a,4}^{\mathbfcal{J}(j)} and 𝒙b,4𝒥|{\bm{x}}_{b,4}^{\mathbfcal{J}(j)}. Each of Λϕ\Lambda_{\phi} and Λθ\Lambda_{\theta} is defined j[1,4]\forall j\in\left[1,4\right] and scored out of four (i.e. a full score dictates 𝒙a=𝒙b{\bm{x}}_{a}={\bm{x}}_{b}), contributing to a total similarity score out of eight for Λ\Lambda.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Figure 11: Performance evaluation metrics for the human participant pilot experiment.

5 Results and Discussion

This section presents the results obtained following the proposed experimental methods and protocols covered in Sections 3 and 4. The discussion first sheds light on the role data augmentation plays in both the in silico (Section 5.1) and the human participant posture analysis pipelines (Section 5.2). Thereafter, further performance interpretation uncovers qualitative and quantitative insights on sleep postures and the classification problem as a whole. A comparison of the results obtained with the proposed approach and the state-of-the-art available in the literature is reported afterwards in Section 5.3.

5.1 Virtual Sleep Experiment

The in silico sleep posture learning pipeline operates on augmented posture datasets in both the training and testing phases. The same data augmentation hyperparameters (σϕ2\sigma_{\phi}^{2} and σθ2\sigma_{\theta}^{2}) are shared by 𝚿v  (𝝉)\prescript{{}_{\ooalign{\rule[3.01389pt]{3.01389pt}{0.4pt}\cr\hss\rule{0.4pt}{4.89998pt}\hss\cr}}}{}{\bm{\Psi}}^{v}({\bm{\tau}}) and 𝚿v(𝝉)\prescript{*}{}{\bm{\Psi}}^{v}({\bm{\tau}}), thus the posture classification model is tested on postural observations for which the level of variability is known a priori. As presented in Section 4.3, this evaluation is conducted repeatedly at different levels of postural variability as dictated by the hyperparameter settings, σϕ2\sigma_{\phi}^{2} and σθ2\sigma_{\theta}^{2}, in {\bm{\Re}}. Therefore, the results obtained from the in silico pipeline inform on how sensitive the posture learning framework is to variations in postural observations.

Fig. 10 shows the sleep posture classification performance given each augmentation setting in {\bm{\Re}}. Since all posture classes are equally weighted in 𝚿v  (𝝉)\prescript{{}_{\ooalign{\rule[3.01389pt]{3.01389pt}{0.4pt}\cr\hss\rule{0.4pt}{4.89998pt}\hss\cr}}}{}{\bm{\Psi}}^{v}({\bm{\tau}}) and 𝚿v(𝝉)\prescript{*}{}{\bm{\Psi}}^{v}({\bm{\tau}}), we only show the mean and standard deviation of mF1m_{F1} as they are almost identical to that of maccm_{acc}. At the bottom left corner of {\bm{\Re}} where the level of injected noise is lowest, (σϕ2,σθ2)=(20,20)(\sigma_{\phi}^{2},\sigma_{\theta}^{2})=(20,20), a perfect 100%100\% mF1m_{F1} score with zero standard deviation is attained. On the opposite corner, where (σϕ2,σθ2)=(1000,500)(\sigma_{\phi}^{2},\sigma_{\theta}^{2})=(1000,500), exaggerated noise injection seems to pose more challenge to the classification task; nevertheless, the mean mF1m_{F1} remains above 81%81\% with small standard deviation.

The mean mF1m_{F1} heat map can be used to study the effect of each augmentation hyperparameter. When σϕ2\sigma_{\phi}^{2} is below 200200, the gradual increase of σθ2\sigma_{\theta}^{2} barely affects the classification performance. On the other hand, the increase in σϕ2\sigma_{\phi}^{2} tend to have more influence over the performance when σθ2\sigma_{\theta}^{2} is below 100100. As we move diagonally along {\bm{\Re}} and near to the top right corner, the influence of stepping up σθ2\sigma_{\theta}^{2} overtakes that of σϕ2\sigma_{\phi}^{2}. Overall, the results obtained through the virtual experiment suggest that the proposed sleep posture learning framework is robust to mild-to-extreme variations in postural observations.

5.2 Participant Pilot Experiment

The participant study pipeline utilises one-shot learning only during the training phase, then validates the resultant trained model on “unseen” real timeseries. Therefore, some level of discrepancy (mismatch) is present between the augmented training dataset and test-labelled posture recordings. Therefore, the participant study complements the virtual experiment through exploring what data augmentation offers to sleep posture learning when unknown discrepancy exists between 𝚿w  (𝝉)\prescript{{}_{\ooalign{\rule[3.01389pt]{3.01389pt}{0.4pt}\cr\hss\rule{0.4pt}{4.89998pt}\hss\cr}}}{}{\bm{\Psi}}^{w}({\bm{\tau}}) and 𝚿w(𝒕)\prescript{*}{}{\bm{\Psi}}^{w}({\bm{t}}).

Refer to caption
Refer to caption
Refer to caption
Refer to caption
(a)
Refer to caption
(b)
Figure 12: UMAP visualisations of 𝚿w  (𝝉)\prescript{{}_{\ooalign{\rule[2.7125pt]{2.7125pt}{0.4pt}\cr\hss\rule{0.4pt}{4.40997pt}\hss\cr}}}{}{\bm{\Psi}}^{w}({\bm{\tau}}) and 𝚿w(𝒕)\prescript{*}{}{\bm{\Psi}}^{w}({\bm{t}}) at (a) mild noise injection, and (b) optimal axis-dominated augmentation. For each row of figure, either train- or test-labelled datapoints are coloured based on their posture labels, while the others are greyed out.

Fig. 11 shows the results obtained via the proposed sleep posture learning framework. Owing to the class-imbalanced 𝚿w(𝒕)\prescript{*}{}{\bm{\Psi}}^{w}({\bm{t}}), both maccm_{acc} and mF1m_{F1} scores are reported. It is evident that the data augmentation settings have a substantial influence over the classification performance, with mF1m_{F1} ranging roughly between 60%and 90%60\%\ \text{and}\ 90\%. Given mild noise injection at (σϕ2,σθ2)=(20,20)(\sigma_{\phi}^{2},\sigma_{\theta}^{2})=(20,20), the mean mF1m_{F1} score was found to be 72.2%72.2\%.

Refer to caption
Figure 13: Intra-posture similarity matrix. Black squares highlight high similarity regions.

Examining the mF1m_{F1} heat map of the human participant study provides a good picture of how augmentation hyperparameter tuning influences the classification performance. In the virtual experiment, each classifier was trained and tested on augmented observations sharing the same postural variability. For the participant study, different data augmentation settings are used to pre-train multiple classifier models that are later tested on the same testing posture dataset, 𝚿w(𝒕)\prescript{*}{}{\bm{\Psi}}^{w}({\bm{t}}), with unknown train-test pose discrepancy. Logically speaking, a good data augmentation setting would then be one that produces an augmented training posture dataset of variability level close to the actual train-test pose discrepancy. Based on this, recommendations for optimal tuning of the data augmentation hyperparameters can be informed.

To facilitate understanding, let us subdivide {\bm{\Re}} in Fig. 11 into three subgrids annotated by 1, 2 and 3 to study the effect of different augmentation settings on the classification performance. Subgrid 1 shows that angle-dominant augmentation yields performance metrics similar to that of (σϕ2,σθ2)=(20,20)(\sigma_{\phi}^{2},\sigma_{\theta}^{2})=(20,20). For subgrid 2, the performance undergoes a falling trend (mF1m_{F1} as low as 60%60\%) in response to augmenting the axes and angles of rotation simultaneously. Finally, subgrid 3 showcases further performance enhancement brought by axis-dominant augmentation, boosting mF1m_{F1} to about 90%90\%. The reason why augmenting axes is more useful may be related to the presence of environmental objects (e.g. mattress and pillow) which constrain joint rotations, hence, variation in sleep postures rather owes to deviations in joint axes of rotation. Consequently, subgrid 3 is recommended for data augmentation, specifically given σϕ2800σθ2=100\sigma_{\phi}^{2}\geqslant 800\cap\sigma_{\theta}^{2}=100.

To further understand how axis-dominant augmentation can contribute up to 30%30\% gain in performance compared to other augmentation settings, we use UMAP-empowered data visualisation described in Section 4.5. Fig. 12 reports 𝒰\mathbfcal{U} for two different scenarios: mild noise injection at (σϕ2,σθ2)=(20,20)(\sigma_{\phi}^{2},\sigma_{\theta}^{2})=(20,20) (Fig. 12(a)), and optimal axis-dominated augmentation at (σϕ2,σθ2)=(800,100)(\sigma_{\phi}^{2},\sigma_{\theta}^{2})=(800,100) (Fig. 12(b)). At mild noise injection, Fig. 12(a) shows large discrepancies between training and testing observations across most sleep postures, as reflected by the sparse distribution of scattered clusters of observations. This clarifies why, in the absence of a sufficiently large dataset, it is hard to accomplish satisfactory posture classification performance. On the other hand, Fig. 12(b) showcases the effectiveness of the axis-dominant augmentation in bringing about structure to the data distribution as the training and testing observations of each posture are located in close proximity. The resultant classifier-friendly data distribution stands behind the significant rise in performance (macc=92.7%)(m_{acc}=92.7\%) and robustness to postural discrepancies compared to the mild augmentation case. Additionally, it is noteworthy how overlaps emerged between certain postures as in annotated regions 4, 5 and 6.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Figure 14: Performance evaluation and interpretation given optimal augmentation settings (σϕ2,σθ2)=(800,100)(\sigma_{\phi}^{2},\sigma_{\theta}^{2})=(800,100): (a) row-normalised confusion matrix, and (b), (c) “one testing versus all training” analysis corresponding to 𝒴\mathbfcal{Y}_{5} and 𝒴\mathbfcal{Y}_{7}.
Refer to caption
Figure 15: Mean joint orientations of training and testing observations for 𝒴\mathbfcal{Y}_{5} and 𝒴\mathbfcal{Y}_{7}.

The hybrid metric Λ\Lambda proposed in Section 4.5 can also be used to further understand the intra-posture similarities between: (i) 𝒴𝒴\mathbfcal{Y}_{1}\leftrightarrow\mathbfcal{Y}_{9}, (ii) 𝒴𝒴\mathbfcal{Y}_{3}\leftrightarrow\mathbfcal{Y}_{12}, and (iii) 𝒴/𝒴\mathbfcal{Y}_{6}\leftrightarrow\mathbfcal{Y}_{10}. Fig. 13 presents the mean Λ\Lambda with 𝒙a{\bm{x}}_{a} and 𝒙b{\bm{x}}_{b} exhausting all combinations of posture-specific (augmented) training and testing observations given the mild augmentation case (σϕ2,σθ2)=(20,20)(\sigma_{\phi}^{2},\sigma_{\theta}^{2})=(20,20). Remarkably, the metric Λ\Lambda is capable of revealing postural similarities that UMAP did not capture in Fig. 12(a). Sifting data for such correlations and trends is of great significance to researchers and clinicians in terms of rethinking human postural analysis, for instance, to evaluate the efficacy of pose characterisation methods. Another possible usage of this map is the examination of posture definitions and confirming their parametric and subjective distinction from other postures before including it in the study.

Fig. 14(a) shows the confusion matrix given the optimal augmentation setting; (σϕ2,σθ2)=(800,100)(\sigma_{\phi}^{2},\sigma_{\theta}^{2})=(800,100). The SVM-ECOC model achieves 100% classification accuracy on testing observations of all postures except 𝒴\mathbfcal{Y}_{5}, demonstrating satisfactory robustness to the postural overlaps outlined in Figs. 12(b) and 13. Such overlaps were viewed as a challenge in similar studies although these works considered no more than eight postures of standard-to-moderate complexity, see for example [33]. To understand why the model confuses 𝒴\mathbfcal{Y}_{5} with 𝒴\mathbfcal{Y}_{7}, we conduct a Λ\Lambda-based similarity assessment specifically focused on the misclassification in Fig. 14(b). Since the model relies on augmented training data to handle unseen testing observations, we compare 𝒙a=𝒙~5(𝒕1:Ω5){\bm{x}}_{a}=\prescript{*}{}{\tilde{\bm{x}}}_{5}(\bm{t}_{1\mathrel{\mathop{\mathchar 58\relax}}\Omega_{5}}) against all 𝒙b=𝒙~j  (𝝉)j[1,12]{\bm{x}}_{b}=\prescript{{}_{\ooalign{\rule[3.01389pt]{3.01389pt}{0.4pt}\cr\hss\rule{0.4pt}{4.89998pt}\hss\cr}}}{}{\tilde{\bm{x}}}_{j}(\bm{\tau})\ \forall j\in\left[1,12\right], where ()~\tilde{(\cdot)} denotes the mean in time domain. Fig. 14(b) reveals a small difference in Λ\Lambda between 𝒴\mathbfcal{Y}_{5} and 𝒴\mathbfcal{Y}_{7}. Moreover, Λ\Lambda scores of both 𝒴\mathbfcal{Y}_{5} and 𝒴\mathbfcal{Y}_{7} indicate a moderate similarity level only around 6.56.5 out of 8.08.0, with no clear winner. Recalling region 7 from Fig. 12(b), the relatively large train-to-test distance for 𝒴\mathbfcal{Y}_{5} confirms the participant was (unintentionally) inconsistent in replicating that posture during data collection, which again explains the misclassification of 𝒙5(𝒕)\prescript{*}{}{\bm{x}}_{5}(\bm{t}). Further inspection into the root cause of such discrepancy is presented in Fig. 15 which shows that the participant’s 𝒥\mathbfcal{J}(2) mean orientation differed considerably between train- and test-labelled recordings of 𝒴\mathbfcal{Y}_{5}. Such inexact posture recreation by participants is an occasional challenge inherent to similar works, as in [32].

For the sake of comparison, Fig. 14(c) shows the result of the same assessment of Fig. 14(b) but with 𝒙a=𝒙~7(𝒕1:Ω7){\bm{x}}_{a}=\prescript{*}{}{\tilde{\bm{x}}}_{7}(\bm{t}_{1\mathrel{\mathop{\mathchar 58\relax}}\Omega_{7}}). Fig. 14(c) reveals an uncertainty-free scenario where 𝒴\mathbfcal{Y}_{7} has a similarity metric of about 7.97.9 out of 8.08.0. Therefore, the Λ\Lambda metric can be regarded as a confidence measure associated with the output posture label, indicating how far one can trust the system at any instant of time.

Interestingly, Figs. 14(b) and 14(c) shows that Λϕ\Lambda_{\phi} experience more acute variations in comparison to Λθ\Lambda_{\theta}. This clarifies why axis-dominant augmentation accomplishes better enhancement to the performance compared to angle-dominant augmentation. Such observation also reflects on the nature of in-bed postural analysis as environmental constraints essentially inhibit the mobility of joints, causing variation to mostly take place along the axial component.

For reference, posture classification using only the real training data available was conducted to directly evaluate the simulation-to-real (Sim2Real) gap. In this case, the training of the classification model is not limited to only one observation per posture (one shot). Instead, the SVM-ECOC model leverages the whole length of train-labelled posture recordings and utilises all observed segment-to-segment orientations for posture modelling. After 10 repeated train-test runs, the average classification accuracy of the model on 𝚿w(𝒕)\prescript{*}{}{\bm{\Psi}}^{w}({\bm{t}}) was found to be around 48.1%48.1\%. The macro-averaged mF1m_{F1} is found to be 40.1%40.1\%, which means the Sim2Real gap is 40%40\% to 60%60\%. This chance-level accuracy reveals poor generalisation performance and justifies the need for a posture learning framework that better deals with data insufficiency - the aim of this paper. The proposed framework indeed provides a boost in mF1m_{F1} by more than 20% out of the box and before fine-tuning the data augmentation hyperparameters.

5.3 Comparison with state of the art

To the best knowledge of the authors, the presented work serves as the most advanced sleep posture analysis in the literature with twelve complex postures representing non-standard postural variations common during sleep. Similar works only covered no more than eight postures, many of which are minor variations of the four standard sleep postures. The protocol of sleep data collection is another crucial aspect that sets the presented methodology apart from others reported in the literature. Specifically, randomised strategies for both pose shuffling and train/test trial assignment are adopted to ensure statistical independence and to account for the participant gaining familiarity with the experiment over time. Moreover, the proposed one-shot posture learning framework makes the use of wearable technologies far easier and more viable, as it removes the need for expensive training data collection sessions. Lastly, the exclusive use of inertial sensor fusion brings forward approximate segment orientations instead of the rudimentary raw data approach often adopted in the literature. Therefore, our approach provides a more comprehensible posture representation to non-technical experts such as clinicians. To better highlight the benefits of the framework proposed in this paper, a comparison with existing works is summarised in Table 1.

Some state-of-the-art studies [32, 33] had reported the possible occurrence of intra-posture similarity and inexact posture recreation, and regarded these as limitations without a clear attempt of formally verifying or quantifying them. A distinctive highlight of this work is the proposed interpretation approaches which provide qualitative and quantitative insights into the nature of sleep postures, their augmentation, and the classification problem as a whole. Our investigation shows that appropriate augmentation settings can make the classification robust to intra-posture similarity.

6 Conclusions

A novel human sleep posture learning framework is proposed, capable of classifying twelve complex sleep postures. This goes beyond related works that mostly consider only four standard postures (supine, prone and lateral positions). The framework was first developed and tested through in silico sleep simulation, then successfully validated in a pilot human participant study. In both experimental pipelines, aggregate segment-to-segment orientations from four distal joints (wrists and ankles) were used to characterise the body posture. This simplified representation was the basis for the sleep posture learning task assigned to an ensemble classifier model. Computer graphics software and custom-made wearable sensor modules with inertial sensing capability were used, respectively, in the virtual and participant pipelines. A major highlight of this work is the use of inertial sensor fusion to gauge segment orientations instead of the raw sensor readings heavily used in the literature. Therefore, our posture representation is more comprehensible to non-technical end users, such as clinicians. Another prominent contribution of this work is the augmentation of postural observations which accelerated posture modelling with increased robustness given only one observation (shot) per posture, omitting the need for longitudinal data collection. The proposed one-shot learning scheme was found to boos the posture classification performance by up to 50%50\% with respect to learning from scarce postural observations. Despite insufficient training data and diversified posture selection, we report performance comparable to the state-of-the-art works. Lastly, we outlined a new metric-based approach and used it along with data visualisation to extract quantitative and qualitative insights on postural analysis, the added value of data augmentation, and the interpretation of the classification performance. The results carry evidence-backed findings that could potentially inform policies and recommendations for the use of wearable sensors in sleep medicine.

A number of directions may guide future research. Since inexact posture recreation (i.e. human non-compliance) seems to be a likely-to-happen open challenge, it might be useful to avoid the discretisation of the human posture space in classification problems and resort to partial- or full-body posture estimation instead. A potentially interesting work would be to examine the system performance when further increasing the number and complexity of sleep postures, and checking whether the robustness to intra-posture similarity will continue to hold. Another future direction could investigate whether incorporating additional absolute segment orientations at the pose characterisation stage would strengthen the differences between postures and reduce overlap.

Acknowledgements

The authors would like to thank Daniel Potts for his assistance with the development of the wearable sensors.

Funding

Omar Elnaggar (first author) is supported by the University of Liverpool Doctoral Network in AI for Future Digital Health.

Declaration of Competing Interest

All authors declare that they have no conflict of interest.

References

  • [1] A. Cieza, K. Causey, K. Kamenov, S. W. Hanson, S. Chatterji, T. Vos, Global estimates of the need for rehabilitation based on the global burden of disease study 2019: a systematic analysis for the global burden of disease study 2019, The Lancet 396 (2020) 2006–2017. doi:10.1016/S0140-6736(20)32340-0.
  • [2] D. of Health) (Institution/Organization, The musculoskeletal services framework – a joint responsibilty: doing it differently (2006).
  • [3] P. M. Clark, B. M. Ellis, A public health approach to musculoskeletal health (2014). doi:10.1016/j.berh.2014.10.002.
  • [4] J. Hartvigsen, M. J. Hancock, A. Kongsted, Q. Louw, M. L. Ferreira, S. Genevay, D. Hoy, J. Karppinen, G. Pransky, J. Sieper, R. J. Smeets, M. Underwood, R. Buchbinder, D. Cherkin, N. E. Foster, C. G. Maher, M. van Tulder, J. R. Anema, R. Chou, S. P. Cohen, L. M. Costa, P. Croft, P. H. Ferreira, J. M. Fritz, D. P. Gross, B. W. Koes, B. Öberg, W. C. Peul, M. Schoene, J. A. Turner, A. Woolf, What low back pain is and why we need to pay attention, The Lancet 391 (2018) 2356–2367. doi:10.1016/S0140-6736(18)30480-X.
  • [5] M. Abdel-Basset, W. Ding, L. Abdel-Fatah, The fusion of internet of intelligent things (ioit) in remote diagnosis of obstructive sleep apnea: A survey and a new model, Information Fusion 61 (2020) 84–100. doi:10.1016/j.inffus.2020.03.010.
  • [6] L. Paquay, R. Wouters, T. Defloor, F. Buntinx, R. Debaillie, L. Geys, Adherence to pressure ulcer prevention guidelines in home care: A survey of current practice, Journal of Clinical Nursing 17 (2008) 627–636. doi:10.1111/j.1365-2702.2007.02109.x.
  • [7] V. Ibáñez, J. Silva, O. Cauli, A survey on sleep questionnaires and diaries, Sleep Medicine 42 (2018) 90–96. doi:10.1016/j.sleep.2017.08.026.
  • [8] G. D. Pinna, E. Robbi, M. T. L. Rovere, A. E. Taurino, C. Bruschi, G. Guazzotti, R. Maestri, Differential impact of body position on the severity of disordered breathing in heart failure patients with obstructive vs. central sleep apnoea, European Journal of Heart Failure 17 (2015) 1302–1309. doi:10.1002/ejhf.410.
  • [9] W. H. Akeson, D. Amiel, M. F. Abel, S. R. Garfin, S. L. Woo, Effects of immobilization on joints, Clinical Orthopaedics and Related Research 219 (1987) 28–37. doi:10.1097/00003086-198706000-00006.
  • [10] L. Parisi, F. Pierelli, G. Amabile, G. Valente, E. Calandriello, F. Fattapposta, P. Rossi, M. Serrao, Muscular cramps: Proposals for a new classification, Acta Neurologica Scandinavica 107 (2003) 176–186. doi:10.1034/j.1600-0404.2003.01289.x.
  • [11] S. Akbarian, G. Delfi, K. Zhu, A. Yadollahi, B. Taati, Automated non-contact detection of head and body positions during sleep, IEEE Access 7 (2019) 72826–72834. doi:10.1109/ACCESS.2019.2920025.
  • [12] Y. Y. Li, Y. J. Lei, L. C. L. Chen, Y. P. Hung, Sleep posture classification with multi-stream cnn using vertical distance map, International Workshop on Advanced Image Technology (IWAIT) (2018) 1–4. doi:10.1109/IWAIT.2018.8369761.
  • [13] S. M. Mohammadi, M. Alnowami, S. Khan, D. J. Dijk, A. Hilton, K. Wells, Sleep posture classification using a convolutional neural network, 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) (2018) 1–4. doi:10.1109/EMBC.2018.8513009.
  • [14] T. H. Kim, S. J. Kwon, H. M. Choi, Y. S. Hong, Determination of lying posture through recognition of multitier body parts, Wireless Communications and Mobile Computing 2019 (2019) 1–16. doi:10.1155/2019/9568584.
  • [15] M. Alaziz, Z. Jia, R. Howard, X. Lin, Y. Zhang, In-bed body motion detection and classification system, ACM Transactions on Sensor Networks 16 (2020) 13:1–13–26. doi:https://doi.org/10.1145/3372023.
  • [16] U. Qureshi, F. Golnaraghi, An algorithm for the in-field calibration of a mems imu, IEEE Sensors Journal 17 (2017) 7479– 7486. doi:10.1109/JSEN.2017.2751572.
  • [17] B. Fan, Q. Li, T. Tan, P. Kang, P. B. Shull, Effects of imu sensor-to-segment misalignment and orientation error on 3-d knee joint angle estimation, IEEE Sensors Journal 22 (2022) 2543–2552. doi:10.1109/JSEN.2021.3137305.
  • [18] A. Leardini, A. Chiari, U. D. Croce, A. Cappozzo, Human movement analysis using stereophotogrammetry part 3. soft tissue artifact assessment and compensation, Gait and Posture 21 (2005) 212–225. doi:10.1016/j.gaitpost.2004.05.002.
  • [19] B. V. Vaughn, P. Giallanza, Technical review of polysomnography, Chest 134 (2008) 1310–1319. doi:10.1378/chest.08-0812.
  • [20] L. C. Markun, A. Sampat, Clinician-focused overview and developments in polysomnography, Current Sleep Medicine Reports 6 (2020) 309–321. doi:10.1007/s40675-020-00197-5.
  • [21] H. R. Colten, B. M. Altevogt, Sleep disorders and sleep deprivation: An unmet public health problem, 2006. doi:10.17226/11617.
  • [22] A. Tiotiu, O. Mairesse, G. Hoffmann, D. Todea, A. Noseda, Body position and breathing abnormalities during sleep: A systematic study, Pneumologia 60 (2011) 216–221.
    URL https://europepmc.org/article/med/22420172
  • [23] J. Verbraecken, Applications of evolving technologies in sleep medicine, Breathe 9 (2013) 442–455. doi:10.1183/20734735.012213.
  • [24] J. Razjouyan, H. Lee, S. Parthasarathy, J. Mohler, A. Sharafkhaneh, B. Najafi, Information from postural/sleep position changes and body acceleration: A comparison of chest-worn sensors, wrist actigraphy, and polysomnography, Journal of Clinical Sleep Medicine 13 (2017) 1301–1310. doi:10.5664/jcsm.6802.
  • [25] I. H. Lopez-Nava, M. M. Angelica, Wearable inertial sensors for human motion analysis: A review, IEEE Sensors Journal 16 (2016) 7821–7834. doi:10.1109/JSEN.2016.2609392.
  • [26] Z. Zhang, G. Z. Yang, Monitoring cardio-respiratory and posture movements during sleep: What can be achieved by a single motion sensor, IEEE 12th International Conference on Wearable and Implantable Body Sensor Networks (BSN) (2015) 1–6. doi:10.1109/BSN.2015.7299409.
  • [27] X. Sun, L. Qiu, Y. Wu, Y. Tang, G. Cao, Sleepmonitor: Monitoring respiratory rate and body position during sleep using smartwatch, Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 1 (2017) 1–22. doi:10.1145/3130969.
  • [28] O. S. Eyobu, Y. W. Kim, D. Cha, D. S. Han, A real-time sleeping position recognition system using imu sensor motion data, IEEE International Conference on Consumer Electronics (ICCE) 2018-Janua (2018) 1–2. doi:10.1109/ICCE.2018.8326209.
  • [29] P. Alinia, A. Samadani, M. Milosevic, H. Ghasemzadeh, S. Parvaneh, Pervasive lying posture tracking, Sensors (Switzerland) 20 (2020) 1–22. doi:10.3390/s20205953.
  • [30] S. Jeon, T. Park, A. Paul, Y. S. Lee, S. H. Son, A wearable sleep position tracking system based on dynamic state transition framework, IEEE Access 7 (2019) 135742–135756. doi:10.1109/ACCESS.2019.2942608.
  • [31] E. B. Monroy, A. P. Rodríguez, M. E. Estevez, J. M. Quero, Fuzzy monitoring of in-bed postural changes for the prevention of pressure ulcers using inertial sensors attached to clothing, Journal of Biomedical Informatics 107 (2020) 1–12. doi:10.1016/j.jbi.2020.103476.
  • [32] R. M. Kwasnicki, G. W. Cross, L. Geoghegan, Z. Zhang, P. Reilly, A. Darzi, G. Z. Yang, R. Emery, A lightweight sensing platform for monitoring sleep quality and posture: A simulated validation study, European Journal of Medical Research 23 (2018) 1–9. doi:10.1186/s40001-018-0326-9.
  • [33] S. Fallmann, R. V. Veen, L. Chen, D. Walker, F. Chen, C. Pan, Wearable accelerometer based extended sleep position recognition, IEEE 19th International Conference on e-Health Networking, Applications and Services (Healthcom) (2017) 1–6. doi:10.1109/HealthCom.2017.8210806.
  • [34] O. Elnaggar, F. Coenen, P. Paoletti, In-bed human pose classification using sparse inertial signals, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 12498 LNAI (2020) 331–344. doi:10.1007/978-3-030-63799-6_25.
  • [35] L. Chang, J. Lu, J. Wang, X. Chen, D. Fang, Z. Tang, P. Nurmi, Z. Wang, Sleepguard: Capturing rich sleep information using smartwatch sensing data, Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2 (2018) 1–34. doi:10.1145/3264908.
  • [36] M. Airaksinen, O. Räsänen, E. Ilén, T. Häyrinen, A. Kivi, V. Marchi, A. Gallen, S. Blom, A. Varhe, N. Kaartinen, L. Haataja, S. Vanhatalo, Automatic posture and movement tracking of infants with wearable movement sensors, Scientific Reports 10 (2020) 1–12. doi:10.1038/s41598-019-56862-5.
  • [37] H. Ohashi, M. Al-Naser, S. Ahmed, K. Nakamura, T. Sato, A. Dengel, Attributes’ importance for zero-shot pose-classification based on wearable sensors, Sensors 18 (2018) 1–17. doi:10.3390/s18082485.
  • [38] A. Mobasheri, M. Batt, An update on the pathophysiology of osteoarthritis, Annals of Physical and Rehabilitation Medicine 59 (2016) 333–339. doi:10.1016/j.rehab.2016.07.004.
  • [39] S. J. McCabe, A. L. Uebele, V. Pihur, R. S. Rosales, I. Atroshi, Epidemiologic associations of carpal tunnel syndrome and sleep position: Is there a case for causation?, Hand 2 (2007) 127–134. doi:10.1007/s11552-007-9035-5.
  • [40] S. O. Madgwick, A. J. Harrison, R. Vaidyanathan, Estimation of imu and marg orientation using a gradient descent algorithm, Proceedings of the IEEE International Conference on Rehabilitation Robotics (2011) 179–185doi:10.1109/ICORR.2011.5975346.
  • [41] X. Xiao, S. Zarar, Machine learning for placement-insensitive inertial motion capture, IEEE International Conference on Robotics and Automation (ICRA) (2018) 6716–6721. doi:10.1109/ICRA.2018.8463176.
  • [42] M. Nazarahari, H. Rouhani, Semi-automatic sensor-to-body calibration of inertial sensors on lower limb using gait recording, IEEE Sensors Journal 19 (2019) 12465– 12474. doi:10.1109/JSEN.2019.2939981.
  • [43] T. Zimmermann, B. Taetz, G. Bleser, Imu-to-segment assignment and orientation alignment for the lower body using deep learning, Sensors 18 (2018) 1–35. doi:10.3390/s18010302.
  • [44] M. Olson, A. J. Wyner, R. Berk, Modern neural networks generalize on small data sets, Advances in Neural Information Processing Systems (2018) 3619–3628.
    URL https://papers.nips.cc/paper/2018/hash/fface8385abbf94b4593a0ed53a0c70f-Abstract.html
  • [45] R. Blagus, L. Lusa, Smote for high-dimensional class-imbalanced data, BMC Bioinformatics 14 (2013) 1–16. doi:10.1186/1471-2105-14-106.
  • [46] C. Shorten, T. M. Khoshgoftaar, A survey on image data augmentation for deep learning, Journal of Big Data 6 (2019) 1–48. doi:10.1186/s40537-019-0197-0.
  • [47] Q. Wen, L. Sun, F. Yang, X. Song, J. Gao, X. Wang, H. Xu, Time series data augmentation for deep learning: A survey, Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (2021) 4653–4660. doi:10.24963/ijcai.2021/631.
  • [48] B. K. Iwana, S. Uchida, An empirical survey of data augmentation for time series classification with neural networks, PLoS ONE 16. doi:10.1371/journal.pone.0254841.
  • [49] K. M. Rashid, J. Louis, Window-warping: A time series data augmentation of imu data for construction equipment activity identification, Proceedings of the 36th International Symposium on Automation and Robotics in Construction (ISARC) (2019) 651–657. doi:10.22260/isarc2019/0087.
  • [50] M. Arslan, M. Guzel, M. Demirci, S. Ozdemir, Smote and gaussian noise based sensor data augmentation, Proceedings of the 4th International Conference on Computer Science and Engineering (UBMK) (2019) 458–462. doi:10.1109/UBMK.2019.8907003.
  • [51] T. G. Dietterich, G. Bakiri, Solving multiclass learning problems via error-correcting output codes, Journal of Artificial Intelligence Research 2 (1995) 263–286. doi:10.1613/jair.105.
  • [52] E. L. Allwein, R. E. Schapire, Y. Singer, Reducing multiclass to binary: A unifying approach for margin classifiers, Journal of Machine Learning Research 1 (2001) 113–141. doi:10.1162/15324430152733133.
  • [53] C. W. Hsu, C. J. Lin, A comparison of methods for multiclass support vector machines, IEEE Transactions on Neural Networks 13 (2002) 415–425. doi:10.1109/72.991427.
  • [54] S. Escalera, O. Pujol, P. Radeva, On the decoding process in ternary error-correcting output codes, IEEE Transactions on Pattern Analysis and Machine Intelligence 32 (2010) 120–134. doi:10.1109/TPAMI.2008.266.
  • [55] S. Abe, Chapter 2: Two-class support vector machines (2010). doi:10.1007/978-1-84996-098-4_2.
  • [56] B. Shahriari, K. Swersky, Z. Wang, R. P. Adams, N. D. Freitas, Taking the human out of the loop: A review of bayesian optimization, Proceedings of the IEEE 104 (2016) 148–175. doi:10.1109/JPROC.2015.2494218.
  • [57] O. J. Woodman, An introduction to inertial navigation (report no. ucam-cl-tr-696) (2007).
    URL https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-696.pdf
  • [58] M. Kok, J. D. Hol, T. B. Schön, Using inertial sensors for position and orientation estimation, Foundations and Trends in Signal Processing 11. doi:10.1561/2000000094.
  • [59] H. He, Y. Ma, Imbalanced learning: Foundations, algorithms, and applications, 2013. doi:10.1002/9781118646106.
  • [60] L. McInnes, J. Healy, J. Melville, Umap: Uniform manifold approximation and projection for dimension reduction, arXiv preprint (2018) 1–63. doi:10.48550/arXiv.1802.03426.
  • [61] O. Elnaggar, R. Arelhi, Quantification of knowledge exchange within classrooms: An ai-based approach, The European Conference on Education 2021: Official Conference Proceedings (2021) 1–11. doi:10.22492/issn.2188-1162.2021.17.
  • [62] O. Elnaggar, R. Arelhi, A new unsupervised short-utterance based speaker identification approach with parametric t-sne dimensionality reduction, International Conference on Artificial Intelligence in Information and Communication (ICAIIC) (2019) 1–10. doi:10.1109/ICAIIC.2019.8669051.
  • [63] C. Meehan, J. Ebrahimian, W. Moore, S. Meehan, Uniform manifold approximation and projection (umap), MATLAB Central File Exchange.
    URL https://www.mathworks.com/matlabcentral/fileexchange/71902
  • [64] B. Taetz, G. Bleser, M. Miezal, Towards self-calibrating inertial body motion capture, Proceedings of the 19th International Conference on Information Fusion (FUSION) (2016) 1751–1759. doi:10.48550/arXiv.1606.03754.
  • [65] J. Solà, Quaternion kinematics for the error-state kalman filter, arXiv preprint (2017) 1–95. doi:10.48550/arXiv.1711.02508.
  • [66] E. Kraft, A quaternion-based unscented kalman filter for orientation tracking, Proceedings of the 6th International Conference on Information Fusion (FUSION) 1 (2003) 47–54. doi:10.1109/ICIF.2003.177425.
Table 1: Comparison with existing works in the literature.
Reference Comparison Criteria
Sleep
Postures Classification
Algorithm Sensor
Placement Dataset
Duration Data
Augmentation Sensor Fusion macc{m_{acc}}
mF1{m_{F1}} Additional
Information
Zhang et al.
[26] 4
(standard) LDA 1 IMU
(chest) - No No 99%
- Heart rate; respiratory rate
Sun et al.
[27] 4
(standard) naïve Bayes, Bayesian network, DT, Random Forest 1 IMU
(left wrist) 70 nights No No (60.3-91.8)%
- Respiratory rate
Eyobu et al.
[28] 4
(standard) LSTM 1 IMU
(upper arm) - No No 99%
- -
Alinia et al.
[29] 4
(standard) Adaptive LSTM 1 IMU
(9 locations) >>56 min. No No (64.9-98.4)%
(62.9-98.2)% -
Alinia et al.
[29] 4
(standard) Ensemble tree classifier 1 IMU
(9 locations) >>56 min. No No (62.9-94.4)%
(60.9-93.6)% -
Jeon et al.
[30] 4
(standard) Dynamic state transition framework 3 IMUs
(chest and wrists) - No No 94%
79% In-bed motion recognition
Monroy et al.
[31] 4 major;
2 minor
(standard) kk-NN, SVM 3 IMUs
(chest and ankles) \sim60 min. No No -
100% Alert need for postural change
Monroy et al.
[31] 4 major;
2 minor
(standard) DT 3 IMUs
(chest and ankles) \sim60 min. No No -
51% Alert need for postural change
Kwasnicki et al.
[32] 4 major;
4 minor
(moderate) LDA, kk-NN, naïve Bayes, DT 3 IMUs
(chest and wrists) \sim160 sec. No No 92.5%
- Sleep phase prediction
Fallmann et al.
[33] 4 major;
4 minor
(moderate) GMLVQ 3 IMUs
(chest and ankles) >>3.75 hr. No No (78-99.8)%
- Evaluation at different settings
Fallmann et al.
[33] 4 major;
2 minor
(standard) GMLVQ 3 IMUs
(chest and ankles) \sim7 hr. No No (58-98)%
- Evaluation at different settings
Chang et al.
[35] 4
(standard) kk-NN 1 IMU
(left wrist) \sim6 hr. No No >>90%
- Nocturnal behavioural analysis
Our Approach
(Virtual sleep) 12
(non-standard, complex) SVM-ECOC N/A N/A Yes Yes (81-100)%
(81-100)% -
Our Approach
(Participant study w/o one-shot learning) 12
(non-standard, complex) SVM-ECOC 4 dual-IMU modules
(wrists and ankles) \sim24 min. No Yes 48.4%
40.1% -
Our Approach
(Participant study with one-shot learning) 12
(non-standard, complex) SVM-ECOC 4 dual-IMU modules
(wrists and ankles) 1/30\nicefrac{{1}}{{30}} sec. (one shot per pose for training) Total dataset(\sim24 min.) Yes Yes (73.9-92.7)%
(72.2-89.7)% Postural similarity; visualisation of posture datasets