This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Bitcoin: Like a Satellite or Always Hardcore?
A Core-Satellite Identification in the Cryptocurrency Market
(First Draft: May 14, 2021)

Christoph J. Börner [email protected] Ingo Hoffmann [email protected] Jonas Krettek [email protected] Lars M. Kürzinger [email protected] Tim Schmitz [email protected] Financial Services, Faculty of Business Administration and Economics,
Heinrich Heine University Düsseldorf, 40225 Düsseldorf, Germany
Abstract

Cryptocurrencies (CCs) become more interesting for institutional investors’ strategic asset allocation and will be a fixed component of professional portfolios in future. This asset class differs from established assets especially in terms of the severe manifestation of statistical parameters. The question arises whether CCs with similar statistical key figures exist. On this basis, a core market incorporating CCs with comparable properties enables the implementation of a tracking error approach. A prerequisite for this is the segmentation of the CC market into a core and a satellite, the latter comprising the accumulation of the residual CCs remaining in the complement. Using a concrete example, we segment the CC market into these components, based on modern methods from image / pattern recognition.

keywords:
Cryptocurrencies , Core-Satellite Identification , Market Segmentation , Pattern Recognition
JEL Classification: C14 , C46 , C55 , E22 , G10
ORCID IDs: 0000-0001-5722-3086 (Christoph J. Börner), 0000-0001-7575-5537 (Ingo Hoffmann), 0000-0002-0978-6252 (Jonas Krettek), 0000-0001-5774-1983 (Lars M. Kürzinger), 0000-0001-9002-5129 (Tim Schmitz)
Acknowledgement: We thank Coinmarketcap.com for generously providing the cryptocurrency time series data for our research.

1 Introduction

Cryptocurrencies (CCs) have gained tremendous attention and popularity in media and society in recent years, not least because of the severe manifestation of their statistical parameters, especially market volatility. Due to their nature, CCs are seen more as an investment object than as a currency (Baur et al., 2018) in the classical sense. The development of rising investment volumes has been continuing for years and it can be assumed that CCs are gradually on their way of becoming an established asset class. Against this background, it seems plausible that CCs will become a fixed component of institutional investors’ portfolios in the future.

In professional portfolio managagement, one approach is to segment the investment universe into a core of assets with homogenous statistical properties and assets that differ significantly from these properties – the so-called satellite. The core market can then be tracked using adequate asset picks with a tracking error approach. The satellite investments represent only a small proportion of the total portfolio, which are mostly actively managed sub-portfolios covering selected areas. They are meant to deliver above-average returns and have a diversifying effect due to their low correlation with the core investment (Amenc et al., 2012).

In standard portfolios, for example, satellite investments such as geographical regions, asset classes different from the core investment, and the purchase of portfolios with different management styles or strategies are suitable for enriching or diversifying the core portfolio. It is also possible to consider a certain asset class and differentiate between core investments and satellite investments. A sector selection for corporate bonds or the segmentation of stocks into "with the market" (core) and "high beta stocks" (satellite) serve as an example.

In this paper, we are looking specifically at the CC asset class and propose a method to segment the core market from the satellite, based on the development of key statistical parameters.

Attempts to depict the CC market holisticly for reasons of portfolio and risk management have already been investigated in literature. A prominent strand of literature deals with index construction for CCs. In this context, the CC market index CRIX proposed by Trimborn and Härdle (2018) represents a well-known example, which is intended to serve as a starting point to adress these economic questions. A similar top-down approach based on the 30 largest CCs by market capitalization is used to calculate the CC index CCi30 (Rivin et al., 2017).

However, instead of focusing on market capitalization and trading volume and thus prioritizing larger CCs, we identify the core market by applying a core-satellite approach based on the individual risk-return profile. Our approach has some potential advantages compared to a top-down constructed index. While the indices only take the largest CCs into account and may suffer from survivorship bias, the core-satellite approach identifies the core of the market, i.e. those CCs that behave similarly in statistical terms. Even though we currently exlusively consider 27 CCs due to data gaps, in perspective, as the market grows, it might become possible to use our method to identify the core market from a large number of CCs. To build a portfolio, investors would then no longer have to replicate the indices, but could delibaretely buy fewer individual assets of the core of the CC universe and combine them with those of the satellite. Since the core market could be represented with fewer assets in the portfolio, the monitoring costs for the portfolio would decrease. Moreover, potential problems in the portfolio, such as price collapses, operational risk (Trimborn et al., 2020) or the extinction of entire CCs, could be countered more quickly. This is a decisive advantage, especially due to the dynamics and speed of the market.

In order to determine CCs showing a comparable performance over the course of 2014 to 2019, we consider returns as well as standard deviations proposed by modern portfolio theory (Markowitz, 1952). One general problem is that CCs are different from traditional asset classes, especially in terms of extreme tails and corresponding tail risk. Against this background, Majoros and Zempléni (2018) and Börner et al. (2021) show that the stable distribution (SDI) is well suited to statistically model the returns of CCs overall and especially in the tail area. Thus, we extend our data base by including the tail parameter α\alpha of the SDI to specifically consider the tail risk. To identify similarity patterns in the development of statistical parameters, we use the Dynamic Time Warping (DTW) algorithm. This algorithm has originally been developed for speech recognition (Sakoe and Chiba, 1978), but is widely used for clustering and classification in the fields of various applications today (Giorgino, 2009).

The DTW analysis leads to DTW distances that are defined in pairs. The question arises if assets - in the present case CCs - can be grouped together in such a way that they are similar in the sense of short DTW distances according to specified criteria - here the aforementioned statistical indicators. This would allow assets to be divided into a core and a satellite. The particular difficulty lies in the fact that the sorting of the individual DTW distances becomes a monotonically increasing function over the natural numbers and the possible value range [0,][0,\infty] is almost continuously covered in many cases. Based on this, it must be examined whether a specific DTW distance can be derived purely from the data, acting in further steps as a boundary to divide the investment universe into core and satellite. In the following, we present a general procedure that is based on modern methods from pattern recognition and answer precisely these questions. Step by step, we show the separation of core assets within an investment universe. The process is not restricted to a specific asset class and can be used wherever it is important to separate similar from dissimilar assets.

Using the statistical parameters’ development, we show that segmenting the CC market into a core and a satellite succeeds when applying our method. Furthermore, we answer the question of whether Bitcoin is indeed part of the hard core of the cryptocurrency market or just a satellite. As the CC market becomes more professional, that is, as market capitalization, liquidity, and market depth increase, the method might become an indispensable tool for professional asset management.

The remainder of this paper is structured as follows: In Sec. 2 we describe the data used for our analysis. In Sec. 3 a brief overview of the DTW methodology is given. In the main part of our study, Sec. 4, we develop the identification method to separate the CC universe into a core market segment and a complementary segment which is an accumulation of the residual CCs remaining – called satellite. The separating procedure is shown using real data. The last section summarizes our most important results and gives an overview of further research topics.

2 Data

As a foundation of our analysis, we follow various studies by extracting CCs’ daily prices from the website coinmarketcap.com (Fry and Cheah, 2016; Hayes, 2017; Brauneis and Mestel, 2018; Caporale et al., 2018; Gandal et al., 2018; Glas, 2019). In order to depict the CC market as a whole, we aim to include as many CCs in our analysis as possible. However, there is a trade-off between having the longest time series possible and the number of CCs in the sample because, on average, seven CCs per week die out (ElBahrawy et al., 2017). Against this background, we end up with an observation period from 2014-01-01 to 2019-06-01 taking 6666 potential CCs from the Coinmarketcap Market Cap Ranking at the reference date of 2014-01-01 into consideration, which have been present throughout the entire timeframe.

As data gaps appear in the time series of most CCs, we exclude all CCs with five or more consecutive missing observations. By utilizing the Last Observation Carried Forward (LOCF) approach, as previously done in Schmitz and Hoffmann (2020), Trimborn et al. (2020), Börner et al. (2021), we are able to include all CCs with smaller data gaps. Hence, N=27N=27 CCs remain in our data set, as depicted in Tab.  1.

CC ID CC ID CC ID
Anoncoin ANC BitBar BTB Bitcoin BTC
CasinoCoin CSC Deutsche eMark DEM Diamond DMD
Digitalcoin DGC Dogecoin DOGE Feathercoin FTC
FLO FLO Freicoin FRC GoldCoin GLC
Infinitecoin IFC Litecoin LTC Megacoin MEC
Namecoin NMC Novacoin NVC Nxt NXT
Omni OMNI Peercoin PPC Primecoin XPM
Quark QRK Ripple XRP TagCoin TAG
Terracoin TRC WorldCoin WDC Zetacoin ZET
Table 1: : Considered CCs, data source: CoinMarketCap.

In a next step, we convert the CC closing prices denoted in USD to EUR prices, using the daily USD-EUR exchange rates retrieved from Thomson Reuters Eikon. To prevent potential weekday biases, the resulting (daily) observations are converted to weekly observations (Dorfleitner and Lung, 2018; Aslanidis et al., 2020). The choice of weekly input data derives from the fact that professional asset management by institutional investors often does not operate on the basis of daily data. Intrayday data are not considered to further avoid biases, e.g. through pump and dump schemes.

As a starting point, we compute logarithmized weekly returns, which are referred to as returns for the sake of simplicity in the following. Based on this, we calculate the average weekly returns per year as well as the standard deviations and fit the tail parameter α\alpha of the SDI. Our longitudinal analysis from 2014 to 2019 allows to examine the market dynamics of the statistical parameters mentioned.

3 Dynamic Time Warping

As we use three variables simultaneously, a simple correlation analysis is not adequate, since it would only provide information about the similarity of a certain statistical variable and has no valid significance for a N×NN\times N-matrix with N=27N=27, with a given total of only six annual measuring points. Furthermore, it does not live up to the dynamic character of the development of the statistical parameters. Therefore, we use the DTW algorithm to segment the market into CCs that show a similiar behaviour over the investigation period.

Sigaki et al. (2019) employ this methodology, revealing clusters of CCs with similar information efficiency. However, they solely consider returns as a variable on a smaller time period. Although pursuing a different goal, we also end up with a DTW distance matrix, but instead of analyzing clusters, we use pattern recognition methods to identify core and satellite CCs. Because the DTW alogorithm is well known and widely used, only a cursory overview over this method is given in the following focusing on relevant features of our analysis.

The DTW distance can be used as a shape-based dissimilarity measure that finds the optimum warping path between two time series by minimizing a cost function (Sakoe and Chiba, 1978; Aghabozorgi et al., 2015). By following the definition and notation of the main strand of literature, then in a the first step, a so called distance matrix between each pair of time series compared needs to be calculated. This distance matrix can be based on various metrics. For our analysis and for reasons of robustness, we compute Manhattan, Euclidean and squared Euclidean distance matrices. As explained in Sec. 4 in more detail, we use three variables per CC to determine the distance matrices between each pair of (multivariate) time series over the course of 2014 to 2019. We end up with a distance matrix for each of the three metrics and each pair of time series. Note, the distance matrix described so far is – due to the six discrete points in time – made up of a scheme of six rows and six columns. For a specific currency pair, each cell in the scheme contains the distance in the respective metric for a specific point in time. In literature the latter is also referred to as local cost matrix. The above scheme must be carefully distinguished from the distance matrix 𝐃{\bf D} defined in the following Sec. 4.

Given the distance matrix, i.e. the 6×66\times 6 scheme of each CC pair, the DTW algorithm finds the optimal alignment through it, starting in each distance matrix at (2014,2014)(2014,2014) and finishing at (2019,2019)(2019,2019) (Sakoe and Chiba, 1978). It implies that the time differences between the time series are eliminated by warping the time axis of one so that the maximum coincidence is attained with the other (Sakoe and Chiba, 1978). The individual distances of the DTW path are aggregated to total costs using a cost function. The total costs, referred to as DTW distance dd, reflect the minimum costs between the time series compared. For a better understanding, it should be noted that the DTW distance between the same objects equals 0 since there is no dissimilarity. The upper left part of Fig. 1 shows the DTW distance dmnd_{mn} for each pair m,n=1,,Nm,n=1,\ldots,N of the N=27N=27 CCs.

We outline the underlying methodology only briefly, but there are several restrictions and setting options for the algorithm and the cost function respectively. For a more detailed overview see e.g. Sakoe and Chiba (1978) and Giorgino (2009).

4 Core-Satellite Identification

In strategic asset allocation, a core-satellite strategy is the division of the investment into a portfolio consisting of a broadly diversified core investment which is intended to offer a basic return with moderate risk, and several individual investments (satellites) with higher risk and higher earnings potential. The latter serves to increase the return of the overall investment (Methling and von Nitzsch, 2019).

The returns, sample average r\langle r\rangle, the standard deviation ss and the tail parameters α\alpha are examined as essential statistical parameters for CCs.

A brief overview of the used parameterization and the SDI’s main features are given in A. The tail parameter α\alpha plays a significant role in differentiating between CCs in which the returns almost obey a normal distribution (i.e. α2\alpha\rightarrow 2) or possess a long tail (i.e. α2\alpha\ll 2) with correspondingly high tail risks.

Overall, we consider the dynamics of the sample vector (r,s,α)\left(\langle r\rangle,s,\alpha\right)^{\prime} over time for the years 2014 to 2019 in our analysis.

In Tab. 2 an exemplary excerpt of four CCs of the whole data set is shown. The aim is to use the temporal development of the statistical parameters to infer CCs that can be assigned to a market core due to their similar statistical behaviour.

CC Value Sample Vector DTW Dist.
No. ID 2014 2015 2016 2017 2018 2019 Metric / dmnd_{mn}
6 DMD r\langle r\rangle -4.52 1.67 -0.73 8.46 -5.50 0.85 Manh.
ss 28.12 22.96 9.65 20.00 14.37 16.48 6.01
α\alpha 1.81 1.18 2.00 2.00 1.30 2.00 Eucl.
11 FRC r\langle r\rangle -7.10 -1.62 -0.25 5.92 -1.97 5.05 3.10
ss 18.74 20.27 59.19 44.84 29.43 104.04 sq. Eucl.
α\alpha 2.00 1.44 0.63 1.18 1.51 0.90 5.03
21 XPM r\langle r\rangle -7.27 -0.38 -0.58 5.38 -2.98 0.70 Manh.
ss 15.58 24.28 8.66 26.47 22.84 13.35 1.06
α\alpha 1.67 1.65 1.79 1.63 1.76 1.57 Eucl.
27 ZET r\langle r\rangle -5.74 0.16 0.51 2.88 -3.73 1.11 0.53
ss 28.72 24.63 16.39 35.35 20.14 24.01 sq. Eucl.
α\alpha 1.65 1.68 1.72 1.74 1.83 1.80 0.13
Table 2: : Input data for DTW distance analyses (exemplary excerpt). Return r\langle r\rangle and standard deviation ss in percent per week.

The three-dimensional vector (r,s,α)\left(\langle r\rangle,s,\alpha\right)^{\prime} is examined over the course of six years and the DTW distance dmnd_{mn} is determined in pairs, with m,n=1,,Nm,n=1,\ldots,N. The DTW distance is calculated in three different metrics: Manhattan, Euclidean and squared Euclidean. This allows three matrices 𝐃MetricN×N{\bf D}_{\text{Metric}}\in\mathbb{R}^{N\times N} for the CCs to be determined, each for a specific metric, with elements dmn;Metricd_{mn;\text{Metric}}. These square matrices are symmetrical dnm=dmnd_{nm}=d_{mn}, the entries on the diagonal are zero dmm=0d_{mm}=0 and the off-diagonal elements are all positive.

For two pairs of CCs in the last column of Tab. 2 the calculated DTW distances – in each metric – can be compared. Note, the number in the first column corresponds to the numbering in Tab. 3.

The first pair (Diamond DMD and Freicoin FRC) exemplarily exhibits a considerable distance in each metric. A detailed analysis of the vectors over time shows the reason for this great distance. On the one hand, clear differences can be observed with regard to the absolute level and the sign (same year) of the returns. On the other hand, there are strong differences in the standard deviation and in the tail parameter (same year). While the standard deviation of the CC Diamond remains almost the same at a high level, the scattering of the returns of the FRC increases dramatically in 2019. For both CCs it is remarkable that the underlying return distribution changes from almost normal to a heavy tail distribution. This can be clearly seen in the change in the tail parameter α\alpha (year-to-year). Since this change does not occur at the same time for the two CCs (same year), the DTW analysis results in large distances. Furthermore, we observe this changing distribution behavior with other CCs. At this point, we recommend the fact of a potential time varying, hence, non stationary distribution to be examined more closely in risk controlling of institutional investors if CCs represent a significant component of asset allocation.

The second pair of CCs (Primecoin XPM and Zetacoin ZET) illustrates two CCs behaving very similarly. Overall, comparatively small DTW distances can be observed here. The returns, the standard deviation and the tail parameters are closely related. It is also noteworthy that the form of the underlying distribution of returns hardly changes; the variability of the tail parameter is likely to derive from statistical errors based on the small database.

The upper left part of Fig. 1 shows the distance matrix 𝐃SE{\bf D}_{\text{SE}} for the DTW distances in the squared Euclidean metric as a surface plot for all CCs. The ordering descends from the numbers given in the first column of Tab. 3. The colors used indicate the value of the DTW distance from small (white) to large (black), i.e. in this example from 0 to about 5. The entries with zero DTW distance are marked white.

The problem of identifying groupings in the set of CCs leads to the problem of finding structures in the distance matrix 𝐃Metric{\bf D}_{\text{Metric}}. One possibility to carry out this structural analysis of the distance matrices is to apply methods that have been used for a long time in the investigation of hierarchical matrices (Liu et al., 2012; Hackbusch, 2015). Similar to image recognition, these methods aim to recognize patterns in matrices.

In a first step, the CCs are rearranged in such a way that the CC displaying the greatest distance to all others on average is depicted on the right. In a descending order, the CCs with the next smaller distances are arranged to the left.111This form of ordering is the same as sorting according to maximum rows or column total. As a result we gain an ordered set of CCs and the resulting surface plot changes as shown in the upper right part of Fig. 1. The similarity of different CCs with regard to the dynamics of the statistical key figures is given when the DTW distance is small and tends towards zero. This is the case for CCs in the upper left white corner in the sorted matrix. Starting from the top left corner in the direction of the main diagonal up to a certain distance dboundd_{\text{bound}}, the CCs thus delimited would represent the market core of similar CCs.

Since the height profile above the sorted distance matrix has a peaked, rough structure, cf. Fig. B.2 in the appendix, the delimitation of the set of CCs belonging to the core cannot be carried out reliably. Therefore, a modeling is first carried out, which represents the height profile more smoothly. We use a modeling method comparable to the analysis of hierarchical matrices, cf. e.g. Hackbusch (2015) and the corresponding literature cited therein.

There is a certain basic structure of the matrix that simplifies the modeling problem. The sorted distance matrix is square, symmetrical and has only positive elements. To the right and lower edge of the sorted distance matrix, the entries become larger on average, so that a concave structure is essentially present. In addition, we are only looking for a certain block in the sorted matrix, which starts in the upper left corner and is itself square.

The surface’s concave structure can be modeled well with radial basis functions, which have their centring points – similar to a frame – in the outer area of the edges of the sorted distance matrix. A brief overview of the radial basis function model class used is given in B. If the individual elements of the distance matrix are normalized between 0 and 1, the modeling leads to an area whose height profile can be seen in the lower right part of Fig. 1.

Refer to caption
Fig. 1: Shown is the procedure to identify and separate a core of some CCs within the whole market. The remaining CCs belong to a set which encloses the satellite.

In a next step, we define the boundary condition dboundd_{\text{bound}}, which delimits the set of CCs belonging to the core. The distance matrix solely incorporates positive elements, but it is not positive definite in all cases so that some eigenvalues can be negative and an analysis of the eigenvalue spectrum does not lead to the definition of a suitable threshold dboundd_{\text{bound}}.

Instead of this consideration, we analyze the empirical distribution function over the elements of the upper triangular matrix normalized between 0 and 1 and delimit an area that contains pp-percent of the smallest distances, cf. the lower left part of Fig. 1. In practice the pp-value will be somewhere between 60%90%60\%\ldots 90\% (dotted lines), where the steep slope of the empirical distribution function merges into the flatter area. This represents the tail area of the empirical distribution function in which the empirically measured DTW distances increase rapidly. If small values for dboundd_{\text{bound}} are found, the associated core is more homogeneous. The larger, the more heterogeneous the core becomes with regard to the statistical parameters. In our example we find the kink at about p75%p\approx 75\%. The associated threshold dbound=0.357d_{\text{bound}}=0.357 is used to draw a contour (white) in the modelled height profile on the right, which delimits an upper block matrix.

This block matrix describes the market core if the squared Euclidean metric is used. Table 3, penultimate column, shows the CCs belonging to the core if this metric is utilized (boolean 1 indicates: belonging to the core).

CC Name Metric Core
No. ID Manhattan Euclidean Sq. Eucl. Intersection
Threshold: 0.554 0.564 0.357
1 ANC Anoncoin 1 1 1 C
2 BTB BitBar 1 1 1 C
3 BTC Bitcoin 1 1 1 C
4 CSC CasinoCoin 0 0 0 S
5 DEM Deutsche.eMark 1 1 1 C
6 DMD Diamond 0 1 0 S
7 DGC Digitalcoin 1 1 1 C
8 DOGE Dogecoin 1 1 0 S
9 FTC Feathercoin 1 1 1 C
10 FLO FLO 1 1 1 C
11 FRC Freicoin 0 0 0 S
12 GLC GoldCoin 1 1 1 C
13 IFC Infinitecoin 0 0 0 S
14 LTC Litecoin 1 1 1 C
15 MEC Megacoin 1 1 1 C
16 NMC Namecoin 1 1 1 C
17 NVC Novacoin 1 1 1 C
18 NXT Nxt 1 1 1 C
19 OMNI Omni 1 1 1 C
20 PPC Peercoin 1 1 1 C
21 XPM Primecoin 1 1 1 C
22 QRK Quark 1 1 1 C
23 XRP Ripple 0 0 0 S
24 TAG TagCoin 1 1 0 S
25 TRC Terracoin 1 1 1 C
26 WDC WorldCoin 0 0 0 S
27 ZET Zetacoin 1 1 1 C
Table 3: : Analysis of the DTW distance for the different metrics. Identification of the core (C) and the satellite set (S).

In our analysis, we examine the DTW matrices for all metrics in the same way. The CCs belonging to the core according to the respective metric are shown in Tab. 3. Note that this method can also be used to find very similar CCs with almost the same statistical behavior if the lower kink is identified in the empirical distribution function and the bound dboundd_{\text{bound}} is thus determined. We have also examined this path of segmentation (not explicitly shown her). This segmentation leads to the delimination of five CCs in the white upper left corner of the lower right part in Fig. 1, which behave almost identically considering the statistical key figures over a long period of time.

In Tab. 3 it can be seen that the amount of CCs belonging to the core depends on the metric. In portfolio management it might come down to a decision of practicability which metric to use and which dependency to accept. However, this dependency can be avoided by considering all metrics and selecting those CCs as the market core, which are contained in the intersection of all metrics. This approach is illustrated in the last column of Tab. 3. In this column, all CCs belonging to the core according to all metrics are marked with C. This knowledge provides a decisive advantage in asset management, when integrating a certain share of CCs in a portfolio.

Table 2 shows examples of CCs as representatives of the identified core (Primecoin XPM and Zetacoin ZET) and the identified satellite (Diamond DMD and Freicoin FRC). In practice, a simple portfolio could be constructed as follows. For example, 5 - 10 CCs with high liquidity and market depth, which are similar to XPM/ZET and belong to the core, are selected from the entire data set. These CCs form the core investment. Individual CCs can then be selected from the satellite, which can be expected to offer a higher return if the risk is higher. In the first case, the tracking error can be determined in relation to the core and in the second case in relation to the overall market. In any case, the composition is optimized taking a specified limitation of the tracking error into account. Continuous control of the tracking error and tactical readjustment of the weights leads to tracking of the core (first case) or the overall market (second case), whereby the tracking error specified by the institutional investor is adhered to.

5 Conclusion

In our study, we show how a general, purely data-driven process can be set up successfully to separate an investment universe into similar assets (core) and dissimilar assets (satellite). We prove the feasibility of this approach and outline the necessary sequence of steps for the segmentation in detail. Using the example of the modern CC asset class, we carry out the separation of the investment universe into similar CCs (core) and dissimilar CCs (satellites) as the residual share. In addition, we ascertain interesting results concerning specifically the CCs.

The question raised at the beginning of whether Bitcoin actually represents the hard core of the cryptocurrency market can be answered in a differentiated manner. It turns out that although Bitcoin is counted as part of the core, it rather marks the edge of the core affiliation. A dominant role, which appears in other analyses, cannot be confirmed.

Our proposed segmentation can be used in portfolio management by institutional investors to track the core market with a few selected CCs in a tracking error approach. In order to increase returns, a higher-level management approach can then be used to build up individual positions in CCs that belong to the satellite, thus implementing a core-satellite portfolio.

One potential challenge for this approach might lie in liquidity problems, especially in the case of smaller altcoins (other than Bitcoin). However, studies indicate CCs to make up a smaller component of a portfolio of traditional assets, mitigating this issue (Dorfleitner and Lung, 2018; Schmitz and Hoffmann, 2020). In addition, methods such as the Liquidity Bounded Risk-return Optimization (LIBRO) approach by Trimborn et al. (2020) exist, which can be used to perform portfolio optimization under liquidity constraints. Furthermore, it is conceivable that liquid CCs are incorporated in the core so that they can be purchased anyway without the fear of liquidity restrictions. Beyond that, it can be assumed that the development of the CC market will make it suitable for larger investment volumes in the future.

As already mentioned, the proposed method is not limited to CCs. A suitable market segmentation in other asset classes is conceivable, as well. The advantages of product-based implementation of a topic-centered, combined ’core-satellite & tracking-error’ strategy in the private or institutional investor segment, is reserved for further studies.

Appendix A Stable distribution – the tail parameter α\alpha

The analyses in Börner et al. (2021) showed that the family of SDIs is the most promising for modeling the distribution of the returns of the CCs. Therefore, this family of functions is also used in the present study and will be introduced here in detail. Several different parametrizations exist for the SDI. In the following formulation we follow the presentation and the parametrization of the SDI described in Nolan (2020, Def. 1.4 therein).

SDIs are a class of probability distributions suitable for modeling heavy tails and skewness. A linear combination of two independent, identically distributed stable distributed random variables has the same distribution as the individual variables. A random variable XX has the SDI S(α,β,γ,δ)S(\alpha,\beta,\gamma,\delta) if its characteristic function is given by:

E[exp(itX)]=\displaystyle\text{E}\left[\exp\left(\text{i}tX\right)\right]=
{exp(iδt|γt|α[1+iβsign(t)tan(πα2)(|γt|1α1)])α1exp(iδt|γt|[1+iβsign(t)2πln(|γt|)])α=1\displaystyle\begin{cases}\exp\left(\text{i}\delta t-\left|\gamma t\right|^{\alpha}\Big{[}1+\text{i}\beta\text{sign}(t)\;\tan\left(\frac{\pi\alpha}{2}\right)\left(\left|\gamma t\right|^{1-\alpha}-1\right)\Big{]}\right)&\alpha\neq 1\\[8.0pt] \exp\left(\text{i}\delta t-\left|\gamma t\right|\hphantom{{{}^{\alpha}}}\Big{[}1+\text{i}\beta\text{sign}(t)\;\frac{2}{\pi}\ln\left(\left|\gamma t\right|\right)\Big{]}\right)&\alpha=1\end{cases} (1)

The first parameter 0<α20<\alpha\leq 2 is called the shape parameter and describes the tail of the distribution. Sometimes this parameter is also denoted as a tail parameter, index of stability or as characteristic exponent. The second parameter 1β+1-1\leq\beta\leq+1 is a skewness parameter. If β=0\beta=0, then the distribution is symmetric otherwise left-skewed (β<0\beta<0) or right-skewed (β>0\beta>0). When α\alpha is small, the skewness of β\beta is significant. As α\alpha increases, the effect of β\beta decreases. Further, γ+\gamma\in\mathbb{R^{+}} is called the scale parameter and δ\delta\in\mathbb{R} is the location parameter.

For the special case α=2\alpha=2 the characteristic function Eq. (A) reduces to E[exp(itX)]=exp(iδt(γt)2)\text{E}\left[\exp\left(\text{i}tX\right)\right]=\exp\left(\text{i}\delta t-(\gamma t)^{2}\right) and becomes independent of the skewness parameter β\beta and the SDI is equal to a normal distribution with mean δ\delta and standard deviation σ=2γ\sigma=\sqrt{2}\gamma. This is an important property for portfolio theory, for example, when considering multivariate distributions. Because it is basically possible to model normally distributed components of a random vector with the same function class.

In the main part the tail parameter α\alpha is estimated for each year under consideration on a weekly return basis and used as input data for the DTW distance analyse.

Appendix B Modeling with radial basis functions

In many scientific areas (Powell, 1977; Poggio and Girosi, 1990; Sahin, 1997; Biancolini, 2017), radial basis functions are used to carry out a function approximation of the following form:

y(𝐱)\displaystyle y({\bf x}) =m=1Mλmϕ(𝐱𝐱m)\displaystyle=\sum_{m=1}^{M}\lambda_{m}\,\phi(\|{\bf x}-{\bf x}_{m}\|) (2)

where y(𝐱)y({\bf x}) is a one dimensional function depending on 𝐱n{\bf x}\in\mathbb{R}^{n}. The function y(𝐱)y({\bf x}) is modelled as a sum of MM radial basis functions, each centred at a different centre 𝐱m{\bf x}_{m}, and weighted with an appropriate coefficient λm\lambda_{m}. The real value of every radial basis function is strictly positive and only depends on the distance between the point 𝐱{\bf x} and the centre 𝐱m{\bf x}_{m}. The distance r=𝐱𝐱mr=\|{\bf x}-{\bf x}_{m}\| is determined in a previously defined norm. We only use the Euclidean distance as the norm in our analyses.

To model and reconstruct the height profile over the distance matrix 𝐃{\bf D} in Sec. 4, we use radial basis functions of the Gauss type

ϕ(r)\displaystyle\phi(r) =exp(ar2)\displaystyle=\exp\left(-ar^{2}\right) (3)

with infinite support and a positive shape parameter aa. The latter can also be interpreted as the effective range of the radial basis function. If RR denotes the distance between two different centres and 0<p<10<p<1 denotes the desired residual effect at the next centre, then the area of effect can be set by aa due to: a=R2lnpa=-R^{-2}\ln p.

The parameter vector 𝝀\boldsymbol{\lambda} is determined using a least square approach. In some applications we found that the least squares fit had problems with ill-conditioned matrices. Therefore we extend our Lagrange function to be minimized by a regularization term. The latter term is also referred to as cost-functional and takes into account the costs of the deviation from a smooth function. The theoretical foundations of this approach goes back to early work from Tikhonov (1943, 1963). The implemented regularization procedure is nowadays standard (Poggio and Girosi, 1990), cf. also Sahin (1997); Biancolini (2017) and the huge amount of literature cited therein.

Hence, we set the Lagrange function

\displaystyle{\cal L} =σ2+α𝝀𝝀\displaystyle=\sigma^{2}+\alpha\boldsymbol{\lambda}^{\prime}\boldsymbol{\lambda} (4)

with σ2\sigma^{2} is the squared error between modelled, y^i\hat{y}_{i}, and sample values, yiy_{i}, for i=1,,Ni=1,\ldots,N with sample length NN. Furthermore, α\alpha is a positive real number called the regularization parameter. If α0\alpha\rightarrow 0, the problem is unconstrained and the resulting model can be completely determined from the sample. On the other hand, if, α\alpha\rightarrow\infty, the a priori desired smoothness of the resulting model dominates and leads to a highly smooth function, in the limit nearly flat and almost independent of the measured sample.

Finally, the solution to the minimization problem Eq. 4 is

𝝀\displaystyle\boldsymbol{\lambda} =(𝚽+αN𝐄)1𝐯.\displaystyle=\left(\boldsymbol{\Phi}+\frac{\alpha}{N}{\bf E}\right)^{-1}{\bf v}. (5)

Abbreviate ϕim=ϕ(𝐱i𝐱m)\phi_{im}=\phi(\|{\bf x}_{i}-{\bf x}_{m}\|) as the value of the mm-th radial basis function at the sample point 𝐱i{\bf x}_{i} for i=1,,Ni=1,\ldots,N and given the output yiy_{i}, than vector 𝐯=(yiϕim)m=1,,M{\bf v}={\left(\left\langle y_{i}\phi_{im}\right\rangle\right)}_{m=1,\ldots,M} and matrix 𝚽=(ϕikϕim)k,m=1,,M\boldsymbol{\Phi}={\left(\left\langle\phi_{ik}\phi_{im}\right\rangle\right)}_{k,m=1,\ldots,M} with \langle\cdot\rangle denotes the sample average. Further, 𝐄{\bf E} denotes the identity matrix in M×M\mathbb{R}^{M\times M}.

In practice in very few applications we assigned successive increasing values 0α<1000\leq\alpha<100 to the regularization parameter until the observable local roughness or heavily peaked structure of the modelled surface vanishes. We observed that the height profile of the distance matrix 𝐃{\bf D} is still well reconstructed, but the absolute height was modelled worse with increasing influence of the regularization. The modelling properties improve if a constant term is additively added to the model Eq. (2). The solution Eq. (5) do not change if only the number of radial basis functions is increased by 1, MM+1M\rightarrow M+1, and the value identical to 1 is assigned to the first radial basis function, ϕ1=1\phi_{1}=1 for all 𝐱n{\bf x}\in\mathbb{R}^{n}, and the changes are considered in the elements of vector 𝐯{\bf v} and Matrix 𝚽\boldsymbol{\Phi}.

The majority of the analyses could be carried out with α=0\alpha=0 and led without regularization procedures to very good results. For the results shown in the main part we have no regularization procedure applied.

In Fig. 2 an example of the modelling process with radial basis functions and α=0\alpha=0 is shown. The left picture shows the rough and peaked height structure dmnd_{mn} above the DTW distance matrix 𝐃SE{\bf D}_{\text{SE}} calculated in Sec. 4. It is the 3 dimensional counterpart of the upper right panel of Fig. 1 viewed from the upper left corner along the main diagonal. The graphic on the right shows the surface of the standardized height structure d^mn{\hat{d}}_{mn} modeled with radial basis functions. Also shown some contour lines (dashed white), each with a distance of 0.2 units. The contour to threshold dbound=0.357d_{\text{bound}}=0.357 for the squared Euclidean metric is shown in light grey, cf. Sec. 4. The bullets and the corresponding vertical dashed lines illustrate the centers and the respective position of the radial basis functions.

Refer to caption
Fig. 2: An explicit and detailed representation of the modeling process that led to the segmentation of the CC market in Sec. 4.

References

  • Aghabozorgi et al. (2015) Aghabozorgi, S., Seyed Shirkhorshidi, A., Ying Wah, T., 2015. Time-series clustering – a decade review. Information Systems 53, 16–38. doi:10.1016/j.is.2015.04.007.
  • Amenc et al. (2012) Amenc, N., Malaise, P., Martellini, L., 2012. Revisiting core-satellite investing - a dynamic model of relative risk management.
  • Aslanidis et al. (2020) Aslanidis, N., Bariviera, A.F., Savva, C.S., 2020. Weekly dynamic conditional correlations among cryptocurrencies and traditional assets. SSRN Electronic Journal doi:10.2139/ssrn.3550879.
  • Baur et al. (2018) Baur, D.G., Hong, K., Lee, A.D., 2018. Bitcoin: Medium of exchange or speculative assets? Journal of International Financial Markets, Institutions and Money 54, 177–189. doi:10.1016/j.intfin.2017.12.004.
  • Biancolini (2017) Biancolini, M.E., 2017. Fast Radial Basis Functions for Engineering Applications. Springer, Cham. doi:10.1007/978-3-319-75011-8.
  • Börner et al. (2021) Börner, C.J., Hoffmann, I., Krettek, J., Kürzinger, L.M., Schmitz, T., 2021. On the return distributions of a basket of cryptocurrencies and subsequent implications, working paper , 1–34.
  • Brauneis and Mestel (2018) Brauneis, A., Mestel, R., 2018. Price discovery of cryptocurrencies: Bitcoin and beyond. Economics Letters 165, 58–61.
  • Caporale et al. (2018) Caporale, G.M., Gil-Alana, L., Plastun, A., 2018. Persistence in the cryptocurrency market. Research in International Business and Finance 46, 141–148. doi:10.1016/j.ribaf.2018.01.002.
  • Dorfleitner and Lung (2018) Dorfleitner, G., Lung, C., 2018. Cryptocurrencies from the perspective of euro investors: a re-examination of diversification benefits and a new day-of-the-week effect. Journal of Asset Management 19, 472–494. doi:10.1057/s41260-018-0093-8.
  • ElBahrawy et al. (2017) ElBahrawy, A., Alessandretti, L., Kandler, A., Pastor-Satorras, R., Baronchelli, A., 2017. Evolutionary dynamics of the cryptocurrency market. The Royal Society Open Science 4.
  • Fry and Cheah (2016) Fry, J., Cheah, E.T., 2016. Negative bubbles and shocks in cryptocurrency markets. International Review of Financial Analysis 47, 343–352. doi:10.1016/j.irfa.2016.02.008.
  • Gandal et al. (2018) Gandal, N., Hamrick, J.T., Moore, T., Oberman, T., 2018. Price manipulation in the bitcoin ecosystem. Journal of Monetary Economics 95, 86–96.
  • Giorgino (2009) Giorgino, T., 2009. Computing and visualizing dynamic time warping alignments in r : The dtw package. Journal of Statistical Software 31. doi:10.18637/jss.v031.i07.
  • Glas (2019) Glas, T.N., 2019. Investments in cryptocurrencies: Handle with care! The Journal of Alternative Investments 22, 96–113.
  • Hackbusch (2015) Hackbusch, W., 2015. Hierarchical matrices: Algorithms and analysis. volume 49 of Springer series in computational mathematics. Springer, Heidelberg and New York and Dordrecht and London.
  • Hayes (2017) Hayes, A.S., 2017. Cryptocurrency value formation: An empirical study leading to a cost of production model for valuing bitcoin. Telematics & Informatics 34, 1308–1321. doi:10.1016/j.tele.2016.05.005.
  • Liu et al. (2012) Liu, X., Zhu, X.H., Qiu, P., Chen, W., 2012. A correlation-matrix-based hierarchical clustering method for functional connectivity analysis. Journal of neuroscience methods 211, 94–102. doi:10.1016/j.jneumeth.2012.08.016.
  • Majoros and Zempléni (2018) Majoros, S., Zempléni, A., 2018. Multivariate stable distributions and their applications for modelling cryptocurrency-returns. Working Paper 2018. URL: http://arxiv.org/pdf/1810.09521v1.
  • Markowitz (1952) Markowitz, H.M., 1952. Portfolio selection. The Journal of Finance 7, 77–91.
  • Methling and von Nitzsch (2019) Methling, F., von Nitzsch, R., 2019. Thematic portfolio optimization: challenging the core satellite approach. Financial Markets and Portfolio Management 33, 133–154. doi:10.1007/s11408-019-00329-0.
  • Nolan (2020) Nolan, J.P., 2020. Univariate Stable Distributions: Models for Heavy Tailed Data. Springer Series in Operations Research and Financial Engineering. 1st ed. 2020 ed., Springer International Publishing and Imprint: Springer, Cham. doi:10.1007/978-3-030-52915-4.
  • Poggio and Girosi (1990) Poggio, T., Girosi, F., 1990. Regularization algorithms for learning that are equivalent to multilayer networks. Science (New York, N.Y.) 247, 978–982. doi:10.1126/science.247.4945.978.
  • Powell (1977) Powell, M.J.D., 1977. Restart procedures for the conjugate gradient method. Mathematical Programming 12, 241–254. doi:10.1007/bf01593790.
  • Rivin et al. (2017) Rivin, I., Scevola, C., Davis, R., 2017. Cci30® - the crypto currencies index. URL: https://cci30.com/.
  • Sahin (1997) Sahin, F., 1997. A Radial Basis Function Approach to a Color Image Classification Problem in a Real Time Industrial Application: Thesis. Virgina Tech, Blacksburg, Virginia. URL: http://hdl.handle.net/10919/36847.
  • Sakoe and Chiba (1978) Sakoe, H., Chiba, S., 1978. Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing 26, 43–49.
  • Schmitz and Hoffmann (2020) Schmitz, T., Hoffmann, I., 2020. Re-evaluating cryptocurrencies’ contribution to portfolio diversification – a portfolio analysis with special focus on german investors. Working Paper URL: http://arxiv.org/pdf/2006.06237v2.
  • Sigaki et al. (2019) Sigaki, H.Y.D., Perc, M., Ribeiro, H.V., 2019. Clustering patterns in efficiency and the coming-of-age of the cryptocurrency market. Scientific reports 9, 1440. doi:10.1038/s41598-018-37773-3.
  • Tikhonov (1943) Tikhonov, A.N., 1943. [translated:] on the stability of inverse problems. Doklady Akademii Nauk SSSR 39, 195–198.
  • Tikhonov (1963) Tikhonov, A.N., 1963. [translated:] solution of incorrectly formulated problems and the regularization method. Doklady Akademii Nauk SSSR , 501–504.
  • Trimborn and Härdle (2018) Trimborn, S., Härdle, W.K., 2018. Crix an index for cryptocurrencies. Journal of Empirical Finance 49, 107–122.
  • Trimborn et al. (2020) Trimborn, S., Li, M., Härdle, W.K., 2020. Investing with cryptocurrencies - a liquidity constrained investments approach: Working paper. Journal of Financial Econometrics 18, 280–306.