(Supplementary) Gimme Signals: Discriminative signal encoding for multimodal activity recognition

Raphael Memmesheimer
Institution1
Institution1 address
[email protected] Second Author
Institution2
First line of institution2 address
[email protected]

Content

•

Figure 1 gives an additional overview of the proposed approach.
•

Table 3 shows results on the NTU-60 subset of the NTU-120 dataset.
•

Figure 2 shows a confusion matrix of our results on the UTD-MHAD dataset (Skeleton + AIS).
•

Figure 3 shows a confusion matrix of our results on the ARIL dataset (AIS).
•

Figure 4 shows a confusion matrix of our results on the NTU 60 dataset (AIS).
•

Results on the NTU-120 Cross View split and Simitate dataset (MoCap) are not attached to not conflict with the Supplementary Material Submission guidelines.

Refer to caption — Figure 1: Approach overview.

Approach	Accuracy	Top-5 Accuracy
Ours (Raw, Resnet152)	0.943470	0.988304
Ours (AIS, Resnet152)	0.961014	0.992203
Ours (Raw, Resnet50)	0.925926	0.984405
Ours (AIS, Resnet50)	0.959064	0.994152
Ours (Raw, Resnet18)	0.925926	0.984405
Ours (AIS, Resnet18)	0.966862	0.996101

Table 1: Results on Simitate

Figure 2: Confusion Matrix for UTD-MHAD (Skeleton, AIS)

Figure 3: Confusion Matrix for ARIL (AIS)

Approach	Cross Subject
Liu et al. [liu2018recognizing]	91.7
Liu et al. [liu2017enhanced]	80.03
Caetano et al. [caetano2019skelemotion]	76.5
Kim et al. [kim2017interpretable]	74.3
Ours (AIS)	72.33
Shahroudy et al. 2 Layer P-LSTM [shahroudy2016ntu]	62.93
Shahroudy et al. 1 Layer P-LSTM [shahroudy2016ntu]	62.05
Shahroudy et al. 2 Layer LSTM [shahroudy2016ntu]	60.69
Shahroudy et al. 1 Layer LSTM [shahroudy2016ntu]	59.14
Shahroudy et al. 2 Layer RNN [shahroudy2016ntu]	56.29
Shahroudy et al. 1 Layer RNN [shahroudy2016ntu]	56.02

Table 2: Approach comparison NTU RGB+D 60. Units are in%

Approach	Cross Subject	Cross View
Shahroudy et al. [shahroudy2016ntu]	25.5	26.3
Hu et al. [hu2018early]	36.3	44.9
Hu et al. [hu2015jointly]	50.8	54.7
Liu et al. [liu2016spatio]	55.7	57.9
Liu et al. [liu2017skeleton1]	58.2	60.9
Liu et al. [liu2017global]	58.3	59.2
Ke et al. [ke2017new]	58.4	57.9
Liu et al. [liu2019skeleton]	59.9	62.4
Liu et al. [liu2017enhanced]	60.3	63.2
Liu et al. [liu2017skeleton]	61.2	63.3
Ke et al. [ke2018learning]	62.2	61.8
Ours (AIS)	63.62	64.86
Liu et al. [liu2018recognizing]	64.6	66.9
Caetano et al. [caetano2019skelemotion] + [yang2018action]	67.7	66.9

Table 3: Approach comparison NTU RGB+D 120. Units are in%