This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Exploring the Long Short-Term Dependencies to Infer Shot Influence in Badminton Matches

Wei-Yao Wang1, Teng-Fong Chan1, Hui-Kuo Yang1, 3, Chih-Chuan Wang1, Yao-Chung Fan2, Wen-Chih Peng1 1National Yang Ming Chiao Tung University, Hsinchu, Taiwan
2National Chung Hsing University, Taichung, Taiwan
{sf1638.cs05, tfchan.cs07g, wangcc, wcpeng}@nctu.edu.tw, 3[email protected], 2[email protected]
Abstract

Identifying significant shots in a rally is important for evaluating players’ performance in badminton matches. While there are several studies that have quantified player performance in other sports, analyzing badminton data is remained untouched. In this paper, we introduce a badminton language to fully describe the process of the shot and propose a deep learning model composed of a novel short-term extractor and a long-term encoder for capturing a shot-by-shot sequence in a badminton rally by framing the problem as predicting a rally result. Our model incorporates an attention mechanism to enable the transparency of the action sequence to the rally result, which is essential for badminton experts to gain interpretable predictions. Experimental evaluation based on a real-world dataset demonstrates that our proposed model outperforms the strong baselines. The source code is publicly available at https://github.com/yao0510/Shot-Influence.

Index Terms:
sport analytics, badminton language representation, shot influence, attention mechanism

I Introduction

In recent years, a growing body of research has started to explore applying artificial intelligence to the sports industry due to the availability of data and the advancement of techniques. Such applications not only play an important role in the moment of matches but also have a great influence on the training stage. Because of the considerable number of professional matches and the enormous number of spectators, sports activities such as soccer attract the most attention. For example, several studies adopting performance analysis [1], tactics discovery [2], actions valuing [3] and similar play retrieval [4] have been carried out. Other sports research involving the evaluation of actions in basketball [5] and pattern recognition of tennis [6] has also been conducted.

In the field of badminton, measuring how good the shot in a rally is is important for decision-making and tactic investigation. There is still room for exploration in the task of quantifying player performance. Existing approaches in the literature [7, 8, 9] focus on notational and temporal variables, utilizing statistical data to investigate performance such as forced or unforced errors for analyzing the behavior of a player and evaluating how likely it is that a player can win. Moreover, analyzing the shot quality and advanced tactics usually focuses on the last few shots before scoring.

These approaches suffer from four limitations. First, existing approaches label what they want based on their own objectives. This causes redundant efforts of designing formats if there are new studies from different groups. Second, most approaches fail to account for the details of each shot throughout the whole rally. Third, statistic-based methods ignore contextual information between the shots. The complexity of calculating each situation will increase considerably due to the number of possible results of each shot. Besides, assessment of experts’ advanced tactics requires repeated review of the broadcast videos to discover the patterns, which is time consuming. Fourth, objectively measuring the influence of the shot is difficult for non-experts since it requires the ability of both long-term and short-term dependencies between the shots.

For addressing the limitations, we propose an unified language for describing badminton plays, which provides a uniform mechanism for translating match videos into dataset for analysis. Further, based on this proposed language, we built our dataset by manually labeling players’ shot actions from broadcast videos of badminton matches held between 2018 and 2020. Specifically, the basic instance of our data is one-shot action. By recording the shots made between opposing players from serving to score, the sequence of shots forms a rally. Developing from shot to rally enables the interpretation without reviewing videos, and benefits deep learning algorithms on time-series tasks to bridge the gap between computer science and the badminton field.

In this paper, we propose a badminton rally analysis application by leveraging the deep learning technology, which measures the influences of shots based on the actions sequence from the rally. By making use of the detailed information of each shot and identifying action-outcome correlations, it helps enhance the confidence of decision-making for badminton players’ performance improvement.

In summary, the main contributions of our paper are to:

  1. 1.

    Propose a language to represent badminton from shot to rally;

  2. 2.

    Design and develop a deep learning model which captures both short-term and long-term dependencies in a badminton rally to measure shot influences;

  3. 3.

    Introduce an attention mechanism into the model to enable the transparency of the model on action sequences with respect to rally outcomes.;

  4. 4.

    Conduct extensive experiments on a real-world dataset to show our model’s capability to infer shot influence.

II BLSR: A Language for Representing Badminton from Shot to Rally

In sports data analysis, data is generally manually labeled by professional experts. Specifically, through reviewing match videos, professional experts compose the video contents into unified data format. For example, for soccer game analysis, SPADL [3] is proposed to unify the existing event stream data formats for improving the data analysis performance. There are also companies such as Dartfish111https://www.dartfish.com/ which provide some techniques to analyze and record data from various sports.

However, to our best knowledge, there is no ready-to-go format designed for badminton analysis. As can be expected, directly using existing format, e.g., SPADL, is not feasible due to different sport natures. Also, vendors design their own format which may vary according to different objectives.

To address the aforementioned issues, by consulting with badminton experts, we propose BLSR (Badminton Language from Shot to Rally) as a representation to formalize event stream data. The goal of our language is to be human-readable, general, professional but simple. These perspectives allow badminton players and coaches to easily interpret the process without reviewing match videos.

Specifically, BLSR is a language for describing the process of a rally. A match is composed of two to three sets, each of which has a number of rallies, and each rally consists of several shots by two players. This characteristic is similar to the composition of a corpus in natural language. Therefore, inspired by the relation of corpus, sentence, and word in natural language, we treat a shot as a word and a rally as a sentence. Specifically, a rally ii can be represented as a sequence of shots {𝒔1(i),𝒔2(i),,𝒔N(i)(i)}\{\boldsymbol{s}^{(i)}_{1},\boldsymbol{s}^{(i)}_{2},\cdots,\boldsymbol{s}^{(i)}_{N^{(i)}}\} and a tuple of the rally information 𝑹(i)\boldsymbol{R}^{(i)}, where N(i){N^{(i)}} is the total number of shots in that rally. Each shot 𝒔n(i)\boldsymbol{s}^{(i)}_{n} is a tuple of eight attributes described as follows: 1) Player: the player who performed the shot; 2) Timestamp: the shot’s hitting time; 3) Type: the type of shot; 4) Back_hand: hit the shuttle with back hand or not; 5) Around_head: hit the shuttle around the head or not; 6) Hit_area: the location where the racket hits the ball; 7) Player_area: the location of the player who performed the shot; 8) Opponent_area: the location of the player who prepared to receive the shot. Each piece of rally information 𝑹(i)\boldsymbol{R}^{(i)} can be described as follows: 1) Roundscore_A: Player A’s current score in the set; 2) Roundscore_B: Player B’s current score in the set; 3) Getpoint_player: the player who won the rally; 4) End_reason: the reason why the rally ended.

To be capable of different usages, we record the timestamp of each shot event, which can not only be used for basic applications such as video replays, but can also be utilized as additional features for data analysis. To describe the shot types accurately, we consulted with professional coaches and players, and defined the 18 shot types as net shot; return net; smash; wrist smash; lob; defensive return lob; clear; drive; driven flight; back-court drive; drop; passive drop; push; rush; defensive return drive; cross-court net shot; short service; and long service. These shot types give specific and clear definitions such that similar hits will be in the same class.

On the other hand, the location where players perform and receive shots is critical information for detecting tactics. To alleviate sampling error in the form of coordinates also mentioned in [4], we discussed with domain experts and proposed a grid system designed for badminton courts to represent the location. In this grid system, a half-court including the outside part is divided into 16 areas as shown in Fig. 1. Note that the grid area is symmetrical with the net in the middle to unify the location information for shots made on different sides. This grid system is used to represent all of the locations in our data.

Refer to caption
Figure 1: The grid system layout used to represent locations on the badminton court. The blue line represents the net.

Generally speaking, the reason for causing rally end can be classified into five different possible results. The possible results are in; out; touch net; not pass over net and misjudge. The two cases in and out represent the case that the last shot lands within or outside the court of the opponent’s side. The touch net and not pass over net are both cases of hitting the shuttle but failing to get it over the net. Their major difference is whether the shuttle touches the net.

III Problem formulation

With BLSR to describe a badminton rally, all information can be presented in a structured manner. Generally speaking, a shot is usually assessed effectively depending on whether the player wins the rally. Therefore, to objectively quantify how good a shot is, this problem can be framed as predicting the final outcome of a rally as follows:

Given a set of rallies {(𝒓(i),𝑹(i),y(i)\boldsymbol{r}^{(i)},\boldsymbol{R}^{\prime(i)},y^{(i)})}, where 𝒓(i)\boldsymbol{r}^{(i)} represents a sequence of shots {𝒔1(i),𝒔2(i),,𝒔N(i)(i)}\{\boldsymbol{s}^{(i)}_{1},\boldsymbol{s}^{(i)}_{2},\cdots,\boldsymbol{s}^{(i)}_{N^{(i)}}\}, 𝑹(i)\boldsymbol{R}^{\prime(i)} only contains information not related to the result (with End_reason and Getpoint_player eliminated from 𝑹(i)\boldsymbol{R}^{(i)}, the score only describes the condition at the beginning of the rally), and y(i)y^{(i)} indicates the outcome (win or lose) of a player in the rally, our goal is to learn the correlation between player actions, rally state, and outcome. Specifically, we aim to approach a rally’s outcome by means of the shots made in that rally and the game progress.

IV Method

Fig. 2 shows a graphical pipeline of the proposed model. Our model consists of three major components for predicting the outcome of a rally. The first component is to encode the shot data (Subsection IV-A), the second component is to extract short-term pattern information from the processed sequence (Subsection IV-B), and the last component is to encode the pattern sequence (Subsection IV-C).

Refer to caption
Figure 2: Architecture of the proposed model. The blue solid line and red dashed line in the pattern extractor represent individual convolution operations from two different 1-D convolution neural networks

IV-A Encoding Shot Data

First, we introduce our designs for transforming the input of shot-level information. Specifically, we apply embedding methods to encode features of each shot in a rally and obtain a processed shot sequence.

IV-A1 Location embedding

A naive method to encode location is one-hot encoding. However, one-hot encoding cannot preserve contextual information between each of them, while using the embedding layer does take context into account. Thus, each location is assigned a learnable vector and then concatenates with other shot-level information as the output in the first stage, as shown in Fig. 2.

IV-A2 Enhanced shot type embedding

We follow the idea that we suggest for the location encoding to encode shot type. That is, we choose to apply an embedding layer to the shot type to capture the underlying information.

Furthermore, inspired by consideration of irregular time gaps from TASA [10], we adopt temporal score learning to reflect the progress and influence shot as rally progresses. The timestamp tn(i)t^{(i)}_{n} of the nn-th shot from rally ii is converted to time proportion τn(i)\tau^{(i)}_{n} with

τn(i)=tn(i)t1(i)tN(i)(i)t1(i).\tau^{(i)}_{n}=\frac{t^{(i)}_{n}-t^{(i)}_{1}}{t^{(i)}_{N^{(i)}}-t^{(i)}_{1}}. (1)

Normalizing it to a range between 0 and 1, it increases linearly and approaches 1 as the shot occurs closer to the last action in the rally. In other words, it acts as the relative closeness in time of actions to the end of the rally. On the other hand, the original shot type is converted into two latent variables θn(i),μn(i)\theta^{(i)}_{n},\mu^{(i)}_{n} and then combined with the time proportion to form the temporal score δn(i)\delta^{(i)}_{n} of the shot using

δn(i)=σ(θn(i)+μn(i)τn(i)),\delta^{(i)}_{n}=\sigma(\theta^{(i)}_{n}+\mu^{(i)}_{n}\tau^{(i)}_{n}), (2)

where σ\sigma is the sigmoid function. The two latent variables θn(i),μn(i)\theta^{(i)}_{n},\mu^{(i)}_{n} are learnable embeddings during the training process, and their purposes are to determine the effect of the action itself and the effect of action as the time progresses, respectively. Finally, this obtained temporal score δn(i)\delta^{(i)}_{n} will be multiplied to corresponding shot type embedding similar to [10] to preserve these influences and enhance the quality of the shot type embedding.

IV-B Extracting Local Shot Patterns

After the shot sequence representation, we further incorporate convolutional neural networks (CNNs) for pattern extraction. Two individual 1-D CNNs are used to separately focus and obtain patterns of different players. Specifically, they both perform convolutions with dcnnd_{cnn} filters of kernel size KK from the beginning to the end of the shot sequence representations. Note that only the 2k2k-th output from the first one and the (2k+1)(2k+1)-th output from the second one will be extracted and then merged alternately to generate the merged pattern sequence, k0\forall k\geq 0. That is, the merged pattern sequence 𝒑:,j(i)\boldsymbol{p}^{(i)}_{:,j} of rally ii from the jj-th filter is obtained by

𝒑^:,j(i)=ReLU(𝑾j[C1]𝒓(i)+bj[C1]),𝒑¯:,j(i)=ReLU(𝑾j[C2]𝒓(i)+bj[C2]),𝒑:,j(i)=AlternateMerge(𝒑^:,j(i),𝒑¯:,j(i))=(𝒑^1,j(i),𝒑¯2,j(i),𝒑^3,j(i),),\begin{split}\hat{\boldsymbol{p}}^{(i)}_{:,j}=&ReLU(\boldsymbol{W}^{[C_{1}]}_{j}\ast\boldsymbol{r}^{\prime(i)}+b_{j}^{[C_{1}]}),\\ \bar{\boldsymbol{p}}^{(i)}_{:,j}=&ReLU(\boldsymbol{W}^{[C_{2}]}_{j}\ast\boldsymbol{r}^{\prime(i)}+b_{j}^{[C_{2}]}),\\ \boldsymbol{p}^{(i)}_{:,j}=&AlternateMerge(\hat{\boldsymbol{p}}^{(i)}_{:,j},\bar{\boldsymbol{p}}^{(i)}_{:,j})=(\hat{\boldsymbol{p}}^{(i)}_{1,j},\bar{\boldsymbol{p}}^{(i)}_{2,j},\hat{\boldsymbol{p}}^{(i)}_{3,j},\cdots),\end{split} (3)

where \ast is the convolution operator, 𝒑^:,j(i)\hat{\boldsymbol{p}}^{(i)}_{:,j} and 𝒑¯:,j(i)\bar{\boldsymbol{p}}^{(i)}_{:,j} are outputs from different convolutions, 𝑾[C1|C2]\boldsymbol{W}^{[C_{1}|C_{2}]} and b[C1|C2]b^{[C_{1}|C_{2}]} are the learnable parameters in the network, and 𝒓(i)\boldsymbol{r}^{\prime(i)} is the concatenation of the processed shot-level features of the whole rally. The padding of zero is used in the convolution process to ensure that the resulting pattern has the same length as the input. In other words, a pattern sequence p(i)N(i)×dcnnp^{(i)}\in\mathbb{R}^{N^{(i)}\times d_{cnn}} is obtained at this stage. The choice of kernel size KK only affects the number of shots to form one pattern but does not affect the size of the resulting sequence of patterns.

There are two reasons for the need to extract local patterns. First, we treat a shot as a base unit for a badminton rally. However, from the perspective of players, a sequence of consecutive shots is much more important than only one shot. The sub-sequence of a shot sequence usually represents a shot pattern of the player’s tactics. Therefore, when modeling the shot sequence of a rally, we consider the sub-sequence level patterns by employing 1-D CNNs, which is known to be effective in terms of capturing local features along with time domain in time-series tasks. Second, the shot sequence of a rally is formed by two players making shots alternately. This motivates us to consider that the patterns made by the two players are different behaviors and mimic the players’ thoughts on short-term influence through two 1-D CNNs.

IV-C Encoding Pattern Sequence

The previous stage only captures the short-term pattern in a shot sequence. However, long-term dependency is also critical in a badminton rally for modeling those global playing strategies. Therefore, we adopt the Gated Recurrent Unit (GRU) [11] to extract long-term relationships from the shot pattern sequences. We employ bidirectional GRU to learn the ability of understanding the long-term influence in a rally.

The state update operations for forward direction involved in the nn-th pattern are

𝒉n,:(i)=GRU(𝒉n1,:(i),𝒑n,:(i);𝑾gru),\boldsymbol{h}^{(i)}_{n,:}=GRU(\boldsymbol{h}^{(i)}_{n-1,:},\boldsymbol{p}^{(i)}_{n,:};\boldsymbol{W}^{gru}), (4)

where 𝑾gru\boldsymbol{W}^{gru} are the GRU parameters. 𝒉n1,:(i)\boldsymbol{h}^{(i)}_{n-1,:} is the hidden state output of previous patterns. The initialization of the hidden state in GRU is filled with zeros. Each of them is a vector of length dgrud_{gru} indicating the number of units used in the GRU network. Therefore, these operations produce a hidden state output h(i)N(i)×dgruh^{(i)}\in\mathbb{R}^{N^{(i)}\times d_{gru}}.

We further incorporate an attention mechanism [12] for constructing the final representation of the sequence, which simultaneously enables the shot pattern transparency by obtaining the attention weights as the significance of each of them with respect to the shot sequence representation. An attention score en(i)e^{(i)}_{n} for the nn-th step is calculated by the short-term dependency pn,:(i)p^{(i)}_{n,:} and long-term dependency hn,:(i)h^{(i)}_{n,:} using skip-connections from the corresponding step as

en(i)=(𝒑n,:(i)𝒉n,:(i))𝑾[A]+b[A],e^{(i)}_{n}=(\boldsymbol{p}^{(i)}_{n,:}\oplus\boldsymbol{h}^{(i)}_{n,:})\boldsymbol{W}^{[A]}+b^{[A]}, (5)

where \oplus represents a concatenate operator. In the learning phase, 𝑾[A]\boldsymbol{W}^{[A]} and b[A]b^{[A]} are parameters to be optimized. Next, the scores are passed through a softmax function and are normalized to weighted score αn(i)\alpha^{(i)}_{n} as follows:

αn(i)=en(i)k=1N(i)ek(i).\alpha^{(i)}_{n}=\frac{e^{(i)}_{n}}{\sum_{k=1}^{N^{(i)}}e^{(i)}_{k}}. (6)

With the weighted score, we obtain the representation 𝒓^(i)dcnn+dgru\hat{\boldsymbol{r}}^{(i)}\in\mathbb{R}^{d_{cnn}+d_{gru}} of the whole shot sequence as the weighted sum of the short-term patterns and long-term states with

𝒓^(i)=n=1Niαn(i)(𝒑n,:(i)𝒉n,:(i)).\hat{\boldsymbol{r}}^{(i)}=\sum_{n=1}^{N_{i}}\alpha^{(i)}_{n}(\boldsymbol{p}^{(i)}_{n,:}\oplus\boldsymbol{h}^{(i)}_{n,:}). (7)

IV-D Predicting Win Probability

With the feature representation 𝒓^(i)\hat{\boldsymbol{r}}^{(i)} from the previous stages, we further merge this representation and rally-level information 𝑹^(i)drally\hat{\boldsymbol{R}}^{(i)}\in\mathbb{R}^{d_{rally}} together by concatenating them, forming an overall rally description of dcnn+dgru+drally\mathbb{R}^{d_{cnn}+d_{gru}+d_{rally}} describing the complete rally. This 𝑹^(i)\hat{\boldsymbol{R}}^{(i)} differs from the input 𝑹(i)\boldsymbol{R}^{\prime(i)} by transforming the features. It contains the score difference between the two players and how many scores have been consecutively made so that it provides more details about the game state. The last step is to transform this vector to a win probability Pwin(i)P^{(i)}_{win} by a fully-connected layer using the learnable parameters 𝑾[L]\boldsymbol{W}^{[L]} and b[L]b^{[L]} with a sigmoid activation:

Pwin(i)=σ((𝒓^(i)𝑹^(i))𝑾[L]+b[L]),P^{(i)}_{win}=\sigma((\hat{\boldsymbol{r}}^{(i)}\oplus\hat{\boldsymbol{R}}^{(i)})\boldsymbol{W}^{[L]}+b^{[L]}), (8)

where the output is a value between 0 and 1 indicating the win probability of a player. Specifically, 0.5 is the threshold of this value, meaning that the player has a chance to win if it is higher than 0.5, and is likely to lose otherwise.

IV-E Learning Objectives

We use the cross-entropy loss

p=i(y(i)logPwin(i)+(1y(i))log(1Pwin(i)))\mathcal{L}_{p}=\sum_{i}(y^{(i)}\log P^{(i)}_{win}+(1-y^{(i)})\log(1-P^{(i)}_{win})) (9)

as one of our learning objectives, which is commonly used for classification tasks.

Finally, we define the total loss by combining both cross-entropy loss and regularization loss r\mathcal{L}_{r} as

=p+λr,\mathcal{L}=\mathcal{L}_{p}+\lambda\mathcal{L}_{r}, (10)

where λ\lambda is a hyperparameter to adjust the strength of regularization. Adam is adopted to minimize the loss and optimize our model. An early stopping mechanism is also applied to avoid over-fitting.

V Experiment

In this section, we evaluate the performance and discuss case studies of our proposed model on a real-world dataset to answer the following research questions: Q1: Does our model outperform other potential baselines? Q2: Does each component of the model affect the prediction results?

V-A Experiment Setup

V-A1 Dataset

Data used in the following experiments were collected from real-world badminton matches222http://bwf.tv, which are manually labeled by domain experts using a specially designed labeling tool with our BLSR design. It should be noted that our dataset is private according to internal corporate policies, and the selected players are anonymous to ensure the players’ privacy. Our data contain a total of 15,742 shots, coming from 1,409 rallies which occurred in 19 international men’s singles high-ranking matches from 2018 to 2020. Rallies with missing shot data due to the highlights in the replay video were filtered out from the dataset. We use one of the players (denoted as Player B, and his opponents, denoted as Player A) in our dataset as the target player to predict his rally performances. Specifically, his winning rallies are labeled as positive, while the remaining losing rallies are labeled as negative. We trained our model on 17 out of the 19 matches, approximately 85% of the total sequences. The remaining 2 matches were respectively used as a validation set and a testing set with 5% and 10% of the total sequences for evaluating the performance of our model.

V-A2 Baselines

To evaluate and compare the performance of our model, we also implemented the following baselines. We applied the same optimization procedure with default hyperparameter settings on each of them:

  • Bidirectional LSTM (BiLSTM) is capable of learning long-term dependencies compared with RNN;

  • Prototype Sequence Network (ProSeNet) [13] provides interpretability while keeping the accuracy of the network;

  • DeepMoji [12] is a model designed to understand the sentences and predict the corresponding emotions;

  • Ordered neurons LSTM (ON-LSTM) [14]. It provides a new recurrent architecture that performs neuron ordering to induce latent structures in sequences;

  • Transformer [15]. A powerful architecture that is widely attributed to the self-attention mechanism and achieves better performance on long sequence tasks.

V-A3 Parameter settings

In the following set of experiments, the dimension of location embedding, shot type embedding, the number of filters in the CNN components, and the number of computational units in the recurrent components were set to 10, 15, 32, and 32, respectively. All models were trained for 100 epochs. The default value for CNN kernel size is set to 3, which is suggested by professional badminton experts based on the fact that players and coaches usually treat three shots as a pattern. For optimization, we initialized the Adam optimizer with a learning rate of 0.001 and set the λ\lambda regularization hyperparameter to 0.01. Early stopping was set to monitor the value of AUC. For each model, we report the average performance on 100 repeated processes.

V-A4 Evaluation platform

All of our evaluation processes were performed on a machine with Intel® Core™ i7-8700 3.2GHz CPU, Nvidia GeForce GTX 2070 GPU, and 32GB RAM, while the methods were implemented in Python 3.6.9 with the Tensorflow 2.0 framework.

V-B Prediction Performance

V-B1 Comparing different models

Since our task was formulated as a classification setting, we aimed to comprehensively evaluate the quality of a predicted probability as an indication of the outcomes. We used two common evaluation metrics: the brier score (BS) and the area under the receiver operator curve (AUC). BS is obtained by

BS=1Di(Pwin(i)y(i))2,BS=\frac{1}{D}\sum_{i}(P^{(i)}_{win}-y^{(i)})^{2}, (11)

where DD is the number of sequences in the dataset.

Table I shows the performance results of our models and the compared baselines. It can be observed that our model outperformed other baselines on rally outcome prediction in both metrics. This result suggests that considering dependencies with alternate patterns is more flexible than naive sequence processing to evaluate the performance of the players, and is thus more capable of capturing rich information. We note that the Transformer model marginally outperformed other baselines, which demonstrated its robust generalization in various domains. The experiment also shows that using only shot-level data is not sufficient. It is evident that rally-level data act as extra information for determining the final results, which conforms to the domain’s opinions of changing tactics according to different score conditions.

TABLE I: Prediction performance compared to different baselines. The best results are in boldface.
Models AUC BS
ProSeNet 0.5333 0.2500
DeepMoji 0.7385 0.1935
BiLSTM 0.7436 0.1859
ON-LSTM 0.7487 0.2001
Transformer 0.7538 0.2009
Ours w/o Rally-level input 0.8649 0.1476
Ours 0.8966 0.1329

V-B2 Ablation study

In order to verify our reasonable design of the proposed model, we conducted an ablation study, and the result is summarized in Table II. By experimentally removing specific components from our model, we could obtain an overview of the performance difference and investigate how they affected the model effectiveness. It is clear that all components had their impact on predicting the outcome of the rally. Besides, we found that our specially designed CNN components significantly improved the model performance, which provides evidence of showing the significance of mimicking two players’ thoughts for short-term influence. Although the employment of the attention mechanism in the model was mainly designed for providing more interpretable details for the model to decide the final decision, it still slightly improved the performance of both metrics.

TABLE II: Prediction performance compared to reduced versions of our model. The best results are in boldface.
Models AUC BS
Full w/o 2 CNNs 0.7583 0.2117
Full w/o 1 CNN 0.7596 0.2100
Full w/o BiGRU 0.8903 0.1347
Full w/o temporal score enhanced shot type embedding 0.8957 0.1347
Full w/o Attention 0.8964 0.1339
Full 0.8966 0.1329

VI Related Work

Currently, little research is focusing on badminton data analysis. Advanced tactics exploration has remained challenging due to the data inaccessibility. There are several statistic-based approaches [7, 8] for badminton match analysis which use notational and temporal variables from broadcast videos. Ghosh et al. [16] have provided an end-to-end framework for analyzing broadcast badminton match videos. They detect player actions by using image processing techniques, and provide analytical outcomes based on the identified motions. Wang et al. [9] used deep learning techniques to retrieve information from match videos and constructed a platform to visualize their results. These approaches mainly focus on frequency analysis and fail to illustrate how good a shot is in a rally. Furthermore, these related works have their own design formats based on their objectives. Hence, we propose a language to fully describe the rally, and a deep learning model to capture the relationships between the shots.

There are different state-of-the-art approaches which have been performed on sports activities such as soccer and basketball for estimating the outcome of actions. Decroos et al. [3] proposed a machine learning model which is applied to soccer event-based data. It utilizes the action sequence to form game states which are then used to predict the score and concede probabilities. The difference in these probabilities between game states indicates the value of that action. Besides, they introduced a language to unify the description of player actions. Several data science challenges that they posed motivated us to consider designing a strategy. Sicilia et al. [5] used a completely different way to achieve a similar purpose for basketball tracking data. They developed a stacked recurrent model which is used for predicting the probabilities of each outcome from a temporal window of possessions. The obtained probabilities can then be transformed into an expected score point. These approaches differ from badminton as a team can perform multiple consecutive actions before any specific outcomes, while the rules of badminton require the player to hit the shuttle and wait for the opponent to return it until it has landed. While we introduce our model for analyzing badminton data, it should be evident that it can be used for other racket sports (e.g., tennis) as well.

VII Conclusions and Future Works

In this paper, we have introduced BLSR, a language to describe the process of the rally. To measure the influences of the shots, we proposed a deep learning model to capture both long-term and short-term dependencies between those shots. Also, we introduced an attention mechanism that enables transparency of the model. The experimental evaluation was conducted on a real-world dataset which shows that our model outperforms other baselines and provides reasonable design from the ablation study. In our future research, we plan to label more players’ data and apply them to our model, and investigate how well other players perform tactics on different opponents. We aim to use the proposed approach to customize the style and regenerate the players’ tactics.

Acknowledgments

This work was supported by the Ministry of Science and Technology of Taiwan under Grants MOST-109-2627-H-009-001.

References

  • [1] H. Ruiz, P. Power, X. Wei, and P. Lucey, “”the leicester city fairytale?”: Utilizing new soccer analytics tools to compare performance in the 15/16 & 16/17 EPL seasons,” in KDD, 2017.
  • [2] T. Decroos, J. V. Haaren, and J. Davis, “Automatic discovery of tactics in spatio-temporal soccer match data,” in KDD, 2018.
  • [3] T. Decroos, L. Bransen, J. V. Haaren, and J. Davis, “Actions speak louder than goals: Valuing player actions in soccer,” in KDD, 2019.
  • [4] Z. Wang, C. Long, G. Cong, and C. Ju, “Effective and efficient sports play retrieval with deep representation learning,” in KDD, 2019.
  • [5] A. Sicilia, K. Pelechrinis, and K. Goldsberry, “Deephoops: Evaluating micro-actions in basketball using deep feature representations of spatio-temporal data,” in KDD, 2019.
  • [6] N. Miyahara, T. Tezuka, and Y. Nakauchi, “Pattern recognition for tennis tactics using hidden markov model from rally series,” in SII, 2019.
  • [7] M.-Á. Gómez-Ruano, A. Cid, F. Rivas, and L.-M. Ruiz, “Serving patterns of women’s badminton medalists in the rio 2016 olympic games,” Frontiers in psychology, vol. 11, 2020.
  • [8] M. A. Gomez, A. S. Leicht, F. Rivas, and P. Furley, “Long rallies and next rally performances in elite men’s and women’s badminton,” PloS one, vol. 15, no. 3, p. e0229604, 2020.
  • [9] W.-Y. Wang, K.-S. Chang, T.-F. Chen, C.-C. Wang, W.-C. Peng, and C.-W. Yi, “Badminton coach ai: A badminton match data analysis platform based on deep learning,” Physical Education Journal, vol. 53, no. 2, pp. 201–213, 2020.
  • [10] M. Pavlovski, J. Gligorijevic, I. Stojkovic, S. Agrawal, S. Komirishetty, D. Gligorijevic, N. Bhamidipati, and Z. Obradovic, “Time-aware user embeddings as a service,” in KDD, 2020.
  • [11] K. Cho, B. van Merrienboer, Ç. Gülçehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using RNN encoder-decoder for statistical machine translation,” in EMNLP, 2014.
  • [12] B. Felbo, A. Mislove, A. Søgaard, I. Rahwan, and S. Lehmann, “Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm,” in EMNLP, 2017.
  • [13] Y. Ming, P. Xu, H. Qu, and L. Ren, “Interpretable and steerable sequence learning via prototypes,” in KDD, 2019.
  • [14] Y. Shen, S. Tan, A. Sordoni, and A. C. Courville, “Ordered neurons: Integrating tree structures into recurrent neural networks,” in ICLR, 2019.
  • [15] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” in NIPS, 2017.
  • [16] A. Ghosh, S. Singh, and C. V. Jawahar, “Towards structured analysis of broadcast badminton videos,” in WACV, 2018.