This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Two Classes of Optimal Multi-Input Structures for Node Computations in Message Passing Algorithms

Teng Lu, Xuan He, and Xiaohu Tang
Information Coding and Transmission Key Lab of Sichuan Province,
Southwest Jiaotong University, Chengdu 611756, China
Email: [email protected], [email protected], [email protected]
Abstract

In this paper, we delve into the computations performed at a node within a message-passing algorithm. We investigate low complexity/latency multi-input structures that can be adopted by the node for computing outgoing messages 𝐲=(y1,y2,,yn)\mathbf{y}=(y_{1},y_{2},\ldots,y_{n}) from incoming messages 𝐱=(x1,x2,,xn)\mathbf{x}=(x_{1},x_{2},\ldots,x_{n}), where each yj,j=1,2,,ny_{j},j=1,2,\ldots,n is computed via a multi-way tree with leaves 𝐱\mathbf{x} excluding xjx_{j}. Specifically, we propose two classes of structures for different scenarios. For the scenario where complexity has a higher priority than latency, the star-tree-based structures are proposed. The complexity-optimal ones (as well as their lowest latency) of such structures are obtained, which have the near-lowest (and sometimes the lowest) complexity among all structures. For the scenario where latency has a higher priority than complexity, the isomorphic-directed-rooted-tree-based structures are proposed. The latency-optimal ones (as well as their lowest complexity) of such structures are obtained, which are proved to have the lowest latency among all structures.

Index Terms:
Multi-input structure, complexity, latency, low-density parity-check (LDPC) code, message passing algorithm.

I Introduction

A large variety of algorithms in the areas of coding, signal processing, and artificial intelligence can be viewed as instances of the message passing algorithm [1, 2]. Specific instances of such algorithms include Kalman filtering and smoothing [3, 4, 5]; the forward-backward algorithm for hidden Markov models [6, 7, 8]; probability propagation in Bayesian networks [9, 10]; and decoding algorithms for error-correcting codes such as low-density parity-check (LDPC) codes [11, 12, 13, 14, 15, 16, 17]. These algorithms operate under a graphical framework, where each node collects incoming messages via its connecting edges, computes outgoing messages, and propagates back outgoing messages via the same edges.

In particular, we focus on the message passing on a specific node. As shown in Fig. 1, the incoming messages to this node are expressed as a vector 𝐱=(x1,x2,,xn)\mathbf{x}=(x_{1},x_{2},\ldots,x_{n}), where xjx_{j} comes from the jj-th connecting edge, for each j[n]{1,2,,n}j\in[n]\triangleq\{1,2,\ldots,n\}. Subsequently, the node computes nn outgoing messages, represented as 𝐲=(y1,y2,,yn)\mathbf{y}=(y_{1},y_{2},\ldots,y_{n}), where yjf(x1,,xj1,xj+1,,xn)y_{j}\triangleq f(x_{1},...,x_{j-1},x_{j+1},...,x_{n}) is the outgoing message sent back to the jj-th connecting edge and ff denotes a general function for computing outgoing messages. Note that when computing yjy_{j}, the function ff does not include xjx_{j} as input. This is a well-known computation principle in message passing algorithms to avoid propagating xjx_{j}’s messages back to it again.

Refer to caption
Figure 1: Message passing on a specific node.

For any j[n]j\in[n], the processing of computing yjy_{j} can be abstracted as a directed rooted tree (DRT), whose root is yjy_{j} and leaves are 𝐱\mathbf{x} excluding xjx_{j}. In such a DRT, each ii-input internal node corresponds to an ii-input operator. For example, assume n=7n=7 and yj=f(x1,,xj1,xj+1,,xn),y_{j}=f(x_{1},...,x_{j-1},x_{j+1},...,x_{n}), j[n]\forall j\in[n]. The computation of 𝐲\mathbf{y} can be handled by the seven DRTs shown in Fig. 2. The processing of computing y1y_{1} can be represented as y1=f(f(x2,x3,x4),f(x5,x6,x7))y_{1}=f(f(x_{2},x_{3},x_{4}),f(x_{5},x_{6},x_{7})), corresponding to the first DRT in Fig. 2. The computation of any other yjy_{j} is analogous, and all these DRTs can be handled in parallel. As a result, the complexity for computing 𝐲\mathbf{y} by such seven DRTs can be measured by fourteen 3-input and seven 2-input nodes, and the latency can be measured by one 3-input and one 2-input nodes.

In fact, we can reuse the computation units corresponding to the same subtree among different DRTs, so that the complexity can be reduced under the same latency. Formally, we use a class of graphs, called structure to represent this process. For instance, the seven DRTs in Fig. 2 can be united into the structure in Fig. 3, such that this structure realizes the same computation of 𝐲\mathbf{y} but requires lower complexity under the same latency. More specifically, the complexity of the structure can be measured by eight 3-input and seven 2-input nodes. We remark that, the structure in Fig. 3 was indeed proposed in [17] for efficiently implementing the node update in stochastic LDPC decoders.

Refer to caption
Figure 2: DRTs rooted at 𝐲\mathbf{y} with n=7n=7.
Refer to caption
Figure 3: A structure in [17] for computation of 𝐲\mathbf{y} with n=7n=7.
Refer to caption
Figure 4: A latency-optimal isomorphic-DRT-based structure for computation of 𝐲\mathbf{y} with n=7n=7.

However, the latency and complexity may vary in different structures. For example, the computing process corresponding to the structure in Fig. 4 can also obtain 𝐲\mathbf{y} with n=7n=7. The complexity of the structure can be measured by seven 3-input and seven 2-input nodes, and the latency can be measured by one 3-input and one 2-input nodes. Therefore, under the same latency, the structure has a lower complexity (one less 3-input node) than that in Fig. 3. It is always of significant interest to find structures with lower complexity/latency. To this end, we [18] investigate a class of binary-input structures, but the systematic construction of low complexity and/or low latency muti-input structures remains open. Here “multi-input” means that there may exist kk-input computation nodes for any k[2,m]{2,3,,m}k\in[2,m]\triangleq\{2,3,...,m\}, where m2m\geq 2 is the maximum number of inputs for a node and it should be regarded as a preset and fixed integer throughout this paper.

In this paper, we focus on low complexity/latency multi-input structures for computing 𝐲\mathbf{y} from 𝐱\mathbf{x}. Specifically, we propose two classes of multi-input structures for different scenarios:

  • For the scenario where complexity has a higher priority than latency, we propose a class of structures, called star-tree-based structures. Among these structures, we derive algorithms for computing their lowest complexity as well as the lowest latency of the complexity-optimal structures. We derive an algorithm to obtain the complexity-optimal ones of such structures, and illustrate that they have the near-lowest (and sometimes the lowest) complexity among all structures. We further derive an algorithm to obtain the latency-optimal ones among the complexity-optimal star-tree-based structures.

  • For the scenario where latency has a higher priority than complexity, we propose a class of structures, called isomorphic-DRT-based structures. We derive an algorithm to obtain the latency-optimal ones of such structures, and prove that they have the lowest latency among all structures. We further develop a construction for the complexity-optimal ones among the latency-optimal isomorphic-DRT-based structures.

The remainder of this paper is organized as follows. Firstly, Section II presents preliminaries related to graphs and trees. Next, Section III defines the structures under consideration for node computation in this paper. Then, Sections IV and V delve into star-tree-based and isomorphic-DRT-based structures, respectively. Finally, Section VI concludes this paper.

II Preliminaries

In this section, we introduce preliminaries regarding graphs and trees, mainly based on [19]. For convenience, define +{0,1,}\mathbb{Z}_{+}\triangleq\{0,1,\ldots\} as the set of non-negative integers. For any i,j+i,j\in\mathbb{Z}_{+} with i<ji<j, define [i,j]{i,i+1,,j}[i,j]\triangleq\{i,i+1,\ldots,j\} and [j]{1,2,,j}[j]\triangleq\{1,2,\ldots,j\}. For any i[m1]i\in[m-1], let 𝐞i\mathbf{e}_{i} be a vector of length m1m-1 whose ii-th entry is 1 and other entries are 0.

II-A Graphs

A graph, denoted as GGvGeG\triangleq G_{v}\cup G_{e}, consists of a node set GvG_{v} and an edge set GeG_{e}. Any element in GeG_{e} is called a directed (resp. undirected) edge which is denoted by an ordered (resp. unordered) pair (a,b)Gv×Gv(a,b)\in G_{v}\times G_{v}. The term “ordered” (resp. “unordered”) implies that (a,b)(b,a)(a,b)\neq(b,a) (resp. (a,b)=(b,a)(a,b)=(b,a)). A graph GG is a directed (undirected) graph if and only if (iff) each element in GeG_{e} is a directed (undirected) edge. When graphically represented, directed edges are depicted using arrows, whereas undirected edges are depicted using lines. A graph GG^{\prime} is called a subgraph of GG if GGG^{\prime}\subseteq G.

In a directed graph GG, an edge (a,b)(a,b) is referred to as a leaving or outgoing edge of node aa, and an entering or incoming edge of node bb. In contrast, for an undirected graph GG, the edge (a,b)(a,b) is simply described as an edge of aa and bb. To describe the operations of removing or adding a node aa or an edge (a,b)(a,b) from/to a graph GG, we utilize the notation of subtraction/addition (represented by /+-/+). Specifically, GaG({a}{(a1,a2)G:a1=aora2=a})G-a\triangleq G\setminus(\{a\}\cup\{(a_{1},a_{2})\in G:a_{1}=a~{}\text{or}~{}a_{2}=a\}), G+aG{a}G+a\triangleq G\cup\{a\}, G(a,b)G{(a,b)}G-(a,b)\triangleq G\setminus\{(a,b)\}, and G+(a,b)G{a,b}{(a,b)}G+(a,b)\triangleq G\cup\{a,b\}\cup\{(a,b)\}.

For any graph GG and node aGa\in G, define d(a,G)d(a,G) as the degree of aa in GG. If GG is an undirected graph, d(a,G)d(a,G) is the number of edges connecting with aa. If GG is a directed graph, d(a,G)d(a,G) is the number of incoming edges of aa. For any graph GG and i+i\in\mathbb{Z}_{+}, define di(G)d_{i}(G) as the number of degree-ii nodes in GG. For any graph GG, define 𝐪(q1,q2,,qm1)+m1\mathbf{q}\triangleq(q_{1},q_{2},...,q_{m-1})\in\mathbb{Z}_{+}^{m-1} as the degree vector of GG, where for each i[m1]i\in[m-1], qi=di+2(G)q_{i}=d_{i+2}(G) if GG is undirected and qi=di+1(G)q_{i}=d_{i+1}(G) if GG is directed.

We call node sequence P=(v0,v1,,vk)P=(v_{0},v_{1},\ldots,v_{k}) a path in GG iff (vi1,vi)G,i[k](v_{i-1},v_{i})\in G,\forall i\in[k], where kk is called the length of PP. Moreover, PP is called a simple path iff vivj,0i<jkv_{i}\neq v_{j},\forall~{}0\leq i<j\leq k. For any a,bGa,b\in G, we say that aa is reachable from bb if there exists a path from bb to aa. The distance from aa to bb is defined as the length of the shortest path from aa to bb. If no such path exists, the distance is regarded as \infty. We further call PP a cycle if k2k\geq 2, v0=vkv_{0}=v_{k}, (v0,v1)(v1,v2)(v_{0},v_{1})\neq(v_{1},v_{2}), and (v1,v2,,vk)(v_{1},v_{2},\ldots,v_{k}) is a simple path.

For any directed graph GG, we call GG a directed acyclic graph (DAG) iff there exists no cycle in GG. For any DAG GG and aGa\in G, define E(a,G)({bG:a is reachable from b},{(b,b′′)G:a is reachable from b′′})E(a,G)\triangleq(\{b\in G:\text{$a$ is reachable from $b$}\},\{(b^{\prime},b^{\prime\prime})\in G:\text{$a$ is reachable from $b^{\prime\prime}$}\}) as a subgraph of GG. Moreover, we define E(G){E(a,G):aG}E(G)\triangleq\{E(a,G):a\in G\}. An undirected graph is connected if every node is reachable from all other nodes.

II-B Trees

A tree is defined as a connected, acyclic, undirected graph. Given any tree TT and any node aTa\in T, we call aa a leaf of TT if its degree d(a,T)=1d(a,T)=1. Otherwise, we call aa an internal node.

A rooted tree is a tree in which there is a unique node called the root of the tree. We consider a rooted tree TT, with its root denoted by r(T)r(T). The depth of any node aTa\in T is defined as the distance between aa and r(T)r(T). Additionally, for a non-negative integer ii, denote the ii-th level as the set of all the nodes with the depth ii. The height of TT is determined by the greatest depth among all its nodes. For any edge (a,b)T(a,b)\in T, if the depth of aa is greater than the depth of bb, then bb is designated as the parent of aa, and conversely, aa is the child of bb.

The directed variant of a rooted tree TT, say TT^{\prime}, is created by converting each undirected edge (a,b)T(a,b)\in T, where aa is a child of bb, into a directed edge from aa to bb in TT^{\prime}. We refer to TT^{\prime} as a directed rooted tree (DRT), with TT being its undirected counterpart. For any node aTa\in T^{\prime}, E(a,T)E(a,T^{\prime}) is called the subtree of TT^{\prime} rooted at aa. For any (a,b)T(a,b)\in T, let D(a,b,T)E(a,Tb)D(a,b,T)\triangleq E(a,T_{b}), where TbT_{b} is the result (a DRT) by making TT as a DRT with root bb. Let D(T){D(a,b,T):(a,b)T}D(T)\triangleq\{D(a,b,T):(a,b)\in T\}.

II-C Labels of Graphs

For any graph GG, we call GG a labelled graph iff every node in GG is given a unique label (as a result, each edge is also given a unique label). Conversely, if not all nodes in GG are uniquely labelled, GG is termed a partially unlabelled graph.

Two labelled graphs GG and GG^{\prime} are considered identical, i.e., G=GG=G^{\prime}, iff GG and GG^{\prime} have same labelled nodes and edges (and root for rooted trees). Meanwhile, two partially unlabelled graphs, GG and GG^{\prime}, are deemed identical if there exists a labelling scheme for the unlabelled nodes in both GG and GG^{\prime} that renders them labelled and identical.

Two partially unlabelled graphs GG and GG^{\prime} are the same iff there exists a way to label all unlabelled nodes in GG and GG^{\prime} such that GG and GG^{\prime} become labelled and the identical. Graphs GG and GG^{\prime} are isomorphic if there is a way to reassign labels to all nodes in both graphs, resulting in identically labelled GG and GG^{\prime}.

III Structures for Node Computation

Recall that 𝐱=(x1,x2,,xn)\mathbf{x}=(x_{1},x_{2},\ldots,x_{n}) and 𝐲=(y1,y2,,yn)\mathbf{y}=(y_{1},y_{2},\ldots,y_{n}) represent the incoming and outgoing messages, respectively. In this paper, we focus on the scenario where for each j[n]j\in[n], a DRT TjT_{j} is employed to delineate the computation of yjy_{j} from 𝐱\mathbf{x} excluding xjx_{j}. Specifically, within TjT_{j}, the leaves correspond to the incoming messages 𝐱\mathbf{x} excluding xjx_{j}, the internal nodes correspond to operations whose number of inputs is between 2 and mm, and the root r(Tj)r(T_{j}) corresponds to yjy_{j}. Some examples of such DRTs for n=7n=7 are depicted in Fig. 2.

Define an input set X={xj:j[n]}X=\{x_{j}:j\in[n]\} and an output set Y={yj:j[n]}Y=\{y_{j}:j\in[n]\}, where node xjx_{j} (resp. yjy_{j}) corresponds to the jj-th incoming message xjx_{j} (resp. outgoing message yjy_{j}). In this paper, we remark that for any graph GG and any node aGa\in G, aa is labelled in GG iff aa is an input node from XX or an output node from YY. Consequently, GG is partially unlabelled if Gv(XY)G_{v}\setminus(X\cup Y)\neq\emptyset. To describe the computation process, we introduce the structure defined as follows.

Definition 1.

For any DAG SS, we say that SS is a structure with input size nn iff SS fulfills the following properties.

  • 1)

    SS contains nn nodes with no incoming edges, which are exactly X={x1,x2,,xn}X=\{x_{1},x_{2},\ldots,x_{n}\}.

  • 2)

    SS contains nn nodes with no outgoing edges, which are exactly Y={y1,y2,,yn}Y=\{y_{1},y_{2},\ldots,y_{n}\}.

  • 3)

    For j[n]j\in[n], E(yj,S)E(y_{j},S) is a DRT whose leaves are exactly X{xj}X\setminus\{{x_{j}}\}.

  • 4)

    For two different nodes a,bSa,b\in S, E(a,S)E(b,S)E(a,S)\neq E(b,S).

  • 5)

    For any node aSXa\in S\setminus X, we have 2d(a,S)m2\leq d(a,S)\leq m.

We remark that Definition 1 corresponds to [18, Definition 2] in a special case where m=2m=2. For any n2n\geq 2, define 𝒮n\mathcal{S}_{n} as the set of all structures with input size nn. We may sometimes use subsets of a structure, referring to as substructures, which is defined below.

Definition 2.

A DAG SS^{\prime} is called a substructure iff there exists a structure SS satisfying that SSS^{\prime}\subseteq S. For any two substructures S1S_{1} and S2S_{2}, denote the union of S1S_{1} and S2S_{2} by S1S2S_{1}\vee S_{2}, where only one copy of the same subtrees is kept, corresponding to the fourth property of Definition 1.

According to Definition 2, for any two substructures S1S_{1} and S2S_{2}, S1S2S_{1}\vee S_{2} is still a substructure if S1S2Y=S_{1}\cap S_{2}\cap Y=\emptyset. Moreover, for any structure S𝒮nS\in\mathcal{S}_{n}, E(yj,S),j[n]E(y_{j},S),\forall j\in[n] is a substructure of SS, and S=j[n]E(yj,S)S=\vee_{j\in[n]}E(y_{j},S) always holds. A more specific example is that, the seven DRTs depicted in Fig. 2 are substructures whose union (under the operation \vee) results in the structure in Fig. 3.

In practice, each degree-ii node corresponds to an ii-input computation unit, the complexity and latency of which should increase as ii increases. Therefore, we define the complexity factor and latency factor to describe this fact. Specifically, for each i[0,m]i\in[0,m], denote cic_{i} and lil_{i} respectively as the complexity factor and latency factor of the ii-input computation node. The complexity and latency factors are determined by the actual situation of the hardware. In particular, c0,c1,l0c_{0},c_{1},l_{0} and l1l_{1} are set to 0, and it is reasonable to assume cicjc_{i}\leq c_{j} and liljl_{i}\leq l_{j} for 0i<jm0\leq i<j\leq m. Accordingly, we can define the complexity and latency of a substructure (as well as structure) as follows.

Definition 3.

For any substructure SS, the complexity of SS, denoted by c(S)c(S), is defined as the weighted number of nodes in SS:

c(S)i=2mdi(S)ci.c(S)\triangleq\sum_{i=2}^{m}d_{i}(S)c_{i}.
Definition 4.

For any substructure SS, the latency of SS is defined as

l(S)maxPSl(P,S),l(S)\triangleq\max_{P\subseteq S}{l(P,S)},

where P=(v0,v1,,vk)P=(v_{0},v_{1},\ldots,v_{k}) is a path in SS and l(P,S)i=0kld(vi,S)l(P,S)\triangleq\sum_{i=0}^{k}l_{d(v_{i},S)} is the latency of PP.

As an example, the complexity and latency of the structure in Fig. 4 are 7c2+7c37c_{2}+7c_{3} and l2+l3l_{2}+l_{3}, respectively. It is reasonable to use complexity and latency as two key criteria for evaluating the performance of a structure. In this paper, our main purpose is to discover the structures with low complexity and/or latency.

IV Star-Tree-Based Structures

In this section, we construct structures for the scenario where complexity has a higher priority than latency. To this end, we focus on a class of structures, called star-tree-based structures, which probably have the (near) lowest complexity among all structures. Specifically, Section IV-A proposes a class of trees, called star trees, and further presents a one-to-one mapping between star trees and star-tree-based structures. Next, Section IV-B establishes a necessary and sufficient condition between degree vectors and the existence of star trees. Thus, the complexity of a star-tree-based structure is completely determined by the degree vector of the corresponding star tree. Therefore, Theorem 1 is proposed to find the degree vectors that lead to the complexity-optimal star-tree-based structures. Finally, Section IV-C derives Theorem 2 to find a latency-optimal star-tree-based structure SS corresponding to a given degree vector 𝐪\mathbf{q}. As a result, if 𝐪\mathbf{q} is the uniquely optimal degree vector obtained from Theorem 1, then SS is a complexity-optimal star-tree-based structure which also has the lowest latency among all complexity-optimal star-tree-based structures.

IV-A Star Trees and Star-Tree-Based Structures

In this subsection, we propose a class of trees, called star trees, in Definition 5. Next, Construction 1 and Lemma 1 illustrate that, based on any star tree TT, we can obtain a low complexity structure, denoted as h(T)h(T), which is called a star-tree-based structure.

Definition 5 (star trees).

An nn-input star tree TT is an (undirected) tree fulfilling the following properties.

  • TT has nn leaves, which are exactly XX.

  • The degree of each internal node in TT is between 3 and m+1m+1.

Refer to caption
Figure 5: The only star tree T𝒯3T\in\mathcal{T}_{3}.
Refer to caption
Figure 6: An example for the transformation from star tree to DRTs.

Denote 𝒯n\mathcal{T}_{n} as the set of all nn-input star trees. For example, Fig. 5 shows the only star tree in 𝒯3\mathcal{T}_{3}, and Fig. 6(a) shows a star tree in 𝒯7\mathcal{T}_{7}. In fact, we can obtain a structure based on any star tree. For instance, consider the star tree in Fig. 6(a). First, making x1x_{1} as the root and changing undirected edges to directed ones can lead to the DRT in Fig. 6(b). Second, removing x1x_{1} and labelling the new root as y1y_{1} can result in the DRT in Fig. 6(c) which could be used for computing y1y_{1} from X{x1}X\setminus\{x_{1}\}. Third, proceed the above steps by replacing x1x_{1} and y1y_{1} with xjx_{j} and yj,j[2,7]y_{j},\forall j\in[2,7], respectively, leading to the seven DRTs in Fig. 7. Finally, these DRTs can be united (under the union operation \vee) to the structure in Fig. 8. We formally describe the construction from star trees to structures in Construction 1, and use Lemma 1 to state the correctness of the construction.

Refer to caption
Figure 7: The DRTs transformed from the star tree in Fig. 6(a).
Construction 1.

Given a star tree T𝒯nT\in\mathcal{T}_{n}, return

h(T)j[n],(a,xj)TD(a,xj,T)h(T)\triangleq\vee_{j\in[n],(a,x_{j})\in T}D(a,x_{j},T)

as the constructed star-tree-based structure corresponding to TT, where D(a,xj,T)D(a,x_{j},T) is a DRT defined in Section II-B.

All the structures derived from star trees via Construction 1 are called star-tree-based structures. For example, considering n=7n=7 and the star tree in Fig. 6(a), the DRTs D(a,xj,T),j[n]D(a,x_{j},T),\forall j\in[n] correspond to the seven DRTs in Fig. 7, and the star-tree-based structure h(T)h(T) corresponds to the structure in Fig. 8.

Lemma 1.

For any n3n\geq 3 and T𝒯nT\in\mathcal{T}_{n}, then h(T)𝒮nh(T)\in\mathcal{S}_{n}.

Proof:

For any n3n\geq 3, T𝒯nT\in\mathcal{T}_{n} and j[n]j\in[n], xjx_{j} is a leaf in TT. Let (a,xj)T(a,x_{j})\in T be the only edge of xjx_{j}. D(a,xj,T)D(a,x_{j},T) is a DRT with root yjy_{j} and leaves X{xj}X\setminus\{x_{j}\}. As a result, we have h(T)𝒮nh(T)\in\mathcal{S}_{n}. ∎

Refer to caption
Figure 8: A complexity-optimal star-tree-based structure for computation of 𝐲\mathbf{y} with n=7n=7.

According to Lemma 1, Construction 1 is a one-to-one mapping from star trees to star-tree-based structures. As shown in Fig. 8, each computation node is either reachable to yjy_{j} or reachable from xjx_{j}, for any j[7]j\in[7]. Noting that in any structure, a node reachable from xjx_{j} can never be contained in the DRT rooted at yjy_{j}, the computation nodes in Fig. 8 have been contained in as many DRTs in Fig. 7 as possible, indicating that they are fully reused. In particular, it was proved in [18] that for m=2m=2, the star-tree-based structures have the lowest complexity among all the binary-input structures. This implies that for m>2m>2, star-tree-based structures are likely to have the near-lowest (and sometimes the lowest) complexity among all structures. We thus focus on star-tree-based structures in this section.

IV-B The Lowest Complexity of Star-Tree-Based Structures

In this subsection, first, Lemma 2 illustrates that the complexity of a star-tree-based structure is totally determined by the degree vector of the corresponding star tree. Next, Lemma 3 establishes a necessary and sufficient condition between degree vectors and the existence of a star trees. Based on these preparations, we finally formulate Problem 1 and then solve it in Theorem 1, aiming to obtain complexity-optimal star-tree-based structures via optimizing degree vectors.

Lemma 2.

For any T𝒯nT\in\mathcal{T}_{n}, the complexity of h(T)h(T) is

c(h(T))=i=3m+1idi(T)ci1,c(h(T))=\sum_{i=3}^{m+1}id_{i}(T)c_{i-1}, (1)

where di(T)d_{i}(T) denotes the number of degree-ii nodes in TT and ci1c_{i-1} is the complexity factor of (i1)(i-1)-input nodes.

Proof:

It holds that E(h(T))=E(h(T))= j[n],(a,xj)TE(D(a,xj,T))=D(T)\cup_{j\in[n],(a,x_{j})\in T}E(D(a,x_{j},T))=D(T), where the notations can be found in Section II. As a result, we have

c(h(T))\displaystyle c(h(T)) =\displaystyle= a:ah(T)cd(a,h(T))=TE(h(T))cd(r(T),T)=TD(T)cd(r(T),T)\displaystyle\sum_{a:a\in h(T)}c_{d(a,h(T))}=\sum_{T^{\prime}\in E(h(T))}c_{d(r(T^{\prime}),T^{\prime})}=\sum_{T^{\prime}\in D(T)}c_{d(r(T^{\prime}),T^{\prime})}
=\displaystyle= (a,b)Tcd(a,D(a,b,T))=(a,b)Tcd(a,T)1=i=3m+1idi(T)ci1.\displaystyle\sum_{(a,b)\in T}c_{d(a,D(a,b,T))}=\sum_{(a,b)\in T}c_{d(a,T)-1}=\sum_{i=3}^{m+1}id_{i}(T)c_{i-1}.

Recall that the degree vector of TT is 𝐪=(d3(T),d4(T),,dm+1(T))\mathbf{q}=(d_{3}(T),d_{4}(T),\ldots,d_{m+1}(T)). Therefore, Lemma 2 indicates that c(h(T))c(h(T)) is fully determined by 𝐪\mathbf{q}. We are interested in optimizing degree vectors to generate complexity-optimal star-tree-based structures. Since Construction 1 establishes a one-to-one mapping between star trees and star-tree-based structures, the remaining question is that what degree vectors can result in valid nn-input star trees. The following lemma answers this question.

Lemma 3.

For any n3n\geq 3, there exists a star tree T𝒯nT\in\mathcal{T}_{n} with degree vector 𝐪=(q1,q2,,qm1)\mathbf{q}=(q_{1},q_{2},\ldots,q_{m-1}) iff

2+i=1m1iqi=n.\displaystyle 2+\sum_{i=1}^{m-1}iq_{i}=n. (2)
Proof:

Necessity: Let T𝒯nT\in\mathcal{T}_{n}. We prove that the degree vector 𝐪\mathbf{q} of TT satisfies (2). Firstly, for n=3n=3, there exists only one star tree as shown in Fig. 5, and (2) holds obviously. Secondly, for n4n^{\prime}\geq 4, assume (2) holds for any n[3,n1]n\in[3,n^{\prime}-1]. We attempt to prove (2) holds for n=nn=n^{\prime} in the following. If TT contains only one internal node, (2) holds obviously. If TT contains at least two internal nodes, there always exists an internal node aa that connects with one internal node and d(a,T)1d(a,T)-1 leaves. Without loss of generality, denote the d(a,T)1d(a,T)-1 leaves connecting with aa as xn,xn1,,xnd(a,T)+2x_{n^{\prime}},x_{n^{\prime}-1},\cdots,x_{n^{\prime}-d(a,T)+2}, respectively. Then, let T=Txnxn1xnd(a,T)+2T^{\prime}=T-x_{n^{\prime}}-x_{n^{\prime}-1}-\cdots-x_{n^{\prime}-d(a,T)+2}, and relabel aa as xnd(a,T)+2x_{n^{\prime}-d(a,T)+2}. We can verify that T𝒯nd(a,T)+2T^{\prime}\in\mathcal{T}_{n^{\prime}-d(a,T)+2}. Since (2) holds for n=nd(a,T)+2n=n^{\prime}-d(a,T)+2, we have 2+(i=1m1iqi)(d(a,T)2)=n(d(a,T)2)2+(\sum_{i=1}^{m-1}iq_{i})-(d(a,T)-2)=n^{\prime}-(d(a,T)-2), leading to 2+i=1m1(i2)qi=n2+\sum_{i=1}^{m-1}(i-2)q_{i}=n^{\prime}, i.e., (2) holds for n=nn=n^{\prime}. Therefore, the necessity is proved.

Sufficiency: Suppose the degree vector 𝐪\mathbf{q} satisfies (2). We provide Construction 2 to prove that there always exists a star tree T𝒯nT\in\mathcal{T}_{n} with degree vector 𝐪\mathbf{q}.

Construction 2.

For a given degree vector 𝐪\mathbf{q}, we construct a star tree by the following steps.

Step 1: Select a qi>0q_{i}>0. Then, we can construct an undirected tree TT, which contains one degree-(qi+2)(q_{i}+2) internal node and (qi+2)(q_{i}+2) leaves. Decrease qiq_{i} by 1.

Step 2: If 𝐪=𝟎\mathbf{q}=\mathbf{0}, go to Step 3. Otherwise, select a qi>0q_{i}>0 and select an arbitrary leaf bTb\in T. Connect qi+1q_{i}+1 new leaves to bb. We still use TT to denote the new tree and decrease qiq_{i} by 1. Repeat Step 2.

Step 3: Label the leaves in TT as x1,,xnx_{1},\ldots,x_{n} and keep the internal nodes unlabelled. Return TT as the constructed star tree.

For any n3n\geq 3, there exists at least one degree vector 𝐪=(q1,q2,,qm1)\mathbf{q}=(q_{1},q_{2},\ldots,q_{m-1}) satisfying (2) and to generate star tree via Construction 2. Note that it may result in multiple star trees, since there are multiple feasible choices of qiq_{i} (1i<m)(1\leq i<m) from 𝐪\mathbf{q} in Steps 1 and 2. Combining with Lemma 1, 𝐪\mathbf{q} corresponds to multiple star-tree-based structures. Denote 𝒮star(𝐪)\mathcal{S}^{\mathrm{star}}(\mathbf{q}) as the set of all the star-tree-based structures derived from 𝐪\mathbf{q}. Since the complexity of any structure in 𝒮star(𝐪)\mathcal{S}^{\mathrm{star}}(\mathbf{q}) is the same by Lemma 2, it suffices to optimize degree vectors to generate complexity-optimal star-tree-based structures. Specifically, define Cminstar(n)minT𝒯nc(h(T))C_{\min}^{\mathrm{star}}(n)\triangleq\min_{T\in\mathcal{T}_{n}}c(h(T)) as the minimum complexity of star-tree-based structures with input size nn. Computing Cminstar(n)C_{\min}^{\mathrm{star}}(n) is equivalent to solving the following optimization problem.

Problem 1: Cminstar(n)=min𝐪i=1m1(i+2)qici+1\displaystyle C_{\min}^{\mathrm{star}}(n)=\min\limits_{\mathbf{q}}\,\sum_{i=1}^{m-1}(i+2)q_{i}c_{i+1}
s.t.\displaystyle\,s.t.\quad 2+i=1m1iqi=n.\displaystyle 2+\sum_{i=1}^{m-1}iq_{i}=n.

The left problem is to solve Problem 1, which can be done in Algorithm 1. The following theorem illustrates the correctness and time complexity of Algorithm 1.

Algorithm 1 Finding optimal solution to Problem 1
0:  The input size nn.
0:  Cminstar(n)C_{\min}^{\mathrm{star}}(n).
1:  Initialize Cminstar(2)=0C_{\min}^{\mathrm{star}}(2)=0.
2:  for i=3i=3 to nn do
3:     Cminstar(i)=mint[m1],ti2{Cminstar(it)+(t+2)ct+1}C_{\min}^{\mathrm{star}}(i)=\min_{t\in[m-1],t\leq i-2}\{C_{\min}^{\mathrm{star}}(i-t)+(t+2)c_{t+1}\}.
4:  end for
5:  return  Cminstar(n)C_{\min}^{\mathrm{star}}(n).
Theorem 1.

Cminstar(n)C_{\min}^{\mathrm{star}}(n) can be computed by Algorithm 1 with time complexity O(mn)O(mn).

Proof:

In Algorithm 1, as shown in Line 1, we have Cminstar(2)=0C_{\min}^{\mathrm{star}}(2)=0, since y1=x2y_{1}=x_{2} and y2=x1y_{2}=x_{1} for n=2n=2, and no computation nodes are required. Then, for any i[3,n]i\in[3,n], assume 𝐪\mathbf{q} is an optimal solution to Problem 1 for Cminstar(i)C_{\min}^{\mathrm{star}}(i). For any t[m1]t\in[m-1] with qt>0q_{t}>0, let 𝐪=𝐪𝐞t\mathbf{q}^{\prime}=\mathbf{q}-\mathbf{e}_{t}, where 𝐞t\mathbf{e}_{t} is a vector with length m1m-1 whose tt-th entry is 1 and other entries are 0. Then 𝐪\mathbf{q}^{\prime} is an optimal solution for Cminstar(it)C_{\min}^{\mathrm{star}}(i-t), leading to Cminstar(i)=Cminstar(it)+(t+2)ct+1C_{\min}^{\mathrm{star}}(i)=C_{\min}^{\mathrm{star}}(i-t)+(t+2)c_{t+1}. Therefore, Cminstar(i)=mint[m1],ti2{Cminstar(it)+(t+2)ct+1}C_{\min}^{\mathrm{star}}(i)=\min_{t\in[m-1],t\leq i-2}\{C_{\min}^{\mathrm{star}}(i-t)+(t+2)c_{t+1}\} holds, corresponding to Line 3 of Algorithm 1.

Noting that Line 3 in Algorithm 1 has time complexity O(m)O(m) and is carried out O(n)O(n) times, the time complexity of Algorithm 1 is O(mn)O(mn). ∎

By Algorithm 1, we are able to determine the lowest complexity Cminstar(n)C_{\min}^{\mathrm{star}}(n) of star-tree-based structures with input size nn. Further by backtracking, we can obtain an (or all if needed) optimal degree vector 𝐪\mathbf{q} which leads to complexity-optimal star-tree-based structures 𝒮star(𝐪)\mathcal{S}^{\mathrm{star}}(\mathbf{q}). However, the structures in 𝒮star(𝐪)\mathcal{S}^{\mathrm{star}}(\mathbf{q}) may have different latency. We are interested in further obtaining the latency-optimal structures in 𝒮star(𝐪)\mathcal{S}^{\mathrm{star}}(\mathbf{q}) in the following subsection.

IV-C The Lowest Latency of Star-Tree-Based Structures for Given Degree Vectors

In this subsection, first, Lemma 4 defines the latency of star trees and shows that it can fully characterize the latency of the corresponding star-tree-based structures. Then, Lemma 5 illustrates how to compute the latency of star trees. Based on these preparations, we finally formulate Problem 2 and then solve it in Theorem 2, aiming to obtain the latency-optimal structures in 𝒮star(𝐪)\mathcal{S}^{\mathrm{star}}(\mathbf{q}) via minimizing the latency of star trees corresponding to a given degree vector 𝐪\mathbf{q}.

Lemma 4.

For any star tree TT, define

ϕ(T)max{i=0kld(vi,T)1:(v0,v1,,vk) is a simple path in T}\phi(T)\triangleq\max\left\{\sum_{i=0}^{k}l_{d(v_{i},T)-1}:(v_{0},v_{1},\ldots,v_{k})\text{ is a simple path in }T\right\}

as the latency of TT. Then, l(h(T))=ϕ(T)l(h(T))=\phi(T).

Proof:

Assume the star tree T𝒯nT\in\mathcal{T}_{n}. Obviously, there exists a simple path (v0,v1,,vk)T(v_{0},v_{1},\ldots,v_{k})\subseteq T iff there exists a path (v0,v1,,vk1)h(T)(v^{\prime}_{0},v^{\prime}_{1},\ldots,v^{\prime}_{k-1})\subseteq h(T), where d(vi,h(T))=d(vi,T)1d(v^{\prime}_{i},h(T))=d(v_{i},T)-1 for each i[0,k1]i\in[0,k-1]. As a result, we have l(h(T))=max(v0,v1,,vk1)h(T){i=0k1ld(vi,h(T))}=max(v0,v1,,vk)T{i=0kld(vi,T)1}=ϕ(T)l(h(T))=\max_{(v^{\prime}_{0},v^{\prime}_{1},\ldots,v^{\prime}_{k-1})\subseteq h(T)}\{\sum_{i=0}^{k-1}l_{d(v_{i},h(T))}\}=\max_{(v_{0},v_{1},\ldots,v_{k})\subseteq T}\{\sum_{i=0}^{k}l_{d(v_{i},T)-1}\}=\phi(T). ∎

As shown in Lemma 4, minimizing the latency of structures derived from the star trees with a given degree vector is equivalent to minimizing the latency of the star trees. The following lemma shows how to compute the latency of a star tree.

Lemma 5.

For any star tree TT with n3n\geq 3, there always exists an edge (a,b)T(a,b)\in T such that

{l(D(a,b,T))ld(a,T)1l(D(b,a,T)),l(D(a,b,T))>l(D(b,a,T)).\displaystyle\left\{\begin{array}[]{l}l(D(a,b,T))-l_{d(a,T)-1}\leq l(D(b,a,T)),\\ l(D(a,b,T))>l(D(b,a,T)).\end{array}\right. (5)

If (a,b)(a,b) satisfies (5), we have

ϕ(T)=l(D(a,b,T))+l(D(b,a,T)).\phi(T)=l(D(a,b,T))+l(D(b,a,T)). (6)
Proof:

Suppose (v0,v1,,vk)(v_{0},v_{1},\ldots,v_{k}) is a simple path in TT such that i=0kld(vi,T)1=ϕ(T)\sum_{i=0}^{k}l_{d(v_{i},T)-1}=\phi(T). There always exists an i[k]i\in[k] such that (5) holds for a=vi1a=v_{i-1} and b=vib=v_{i}. This completes the proof of (5).

We are now to prove (6). Suppose (a,b)T(a,b)\in T is an arbitrary edge satisfying (5). According to Definition 4, l(D(a,b,T))l(D(a,b,T)) (resp. l(D(b,a,T))l(D(b,a,T))) denotes the latency of the path in D(a,b,T)D(a,b,T) (resp. D(b,a,T)D(b,a,T)) from a leaf to aa (resp. bb), and we let this leaf be xix_{i} (resp. xjx_{j}), without loss of generality. Then, we have l(D(a,b,T))+l(D(b,a,T))=i=0kld(vi,T)1l(D(a,b,T))+l(D(b,a,T))=\sum_{i=0}^{k}l_{d(v_{i},T)-1}, where (v0,v1,,vk)(v_{0},v_{1},\ldots,v_{k}) is a simple path in TT with v0=xiv_{0}=x_{i} and vk=xjv_{k}=x_{j}. Therefore, we have ϕ(T)l(D(a,b,T))+l(D(b,a,T))\phi(T)\geq l(D(a,b,T))+l(D(b,a,T)).

To prove (6), it now suffices to prove l(T)l(D(a,b,T))+l(D(b,a,T))l(T)\leq l(D(a,b,T))+l(D(b,a,T)). Let xix_{i}, xjx_{j} be two arbitrary leaves in TT such that there exists a simple path connecting them and the path’s latency is l(T)l(T). If (xiD(a,b,T)(x_{i}\in D(a,b,T) and xjD(b,a,T))x_{j}\in D(b,a,T)) or (xjD(a,b,T)(x_{j}\in D(a,b,T) and xiD(b,a,T))x_{i}\in D(b,a,T)), we have l(T)l(D(a,b,T))+l(D(b,a,T))l(T)\leq l(D(a,b,T))+l(D(b,a,T)). If xi,xjD(a,b,T)x_{i},x_{j}\in D(a,b,T), we have l(T)2l(D(a,b,T))ld(a,T)1l(D(a,b,T))+l(D(b,a,T))l(T)\leq 2l(D(a,b,T))-l_{d(a,T)-1}\leq l(D(a,b,T))+l(D(b,a,T)). If xi,xjD(b,a,T)x_{i},x_{j}\in D(b,a,T), we have l(T)2l(D(b,a,T))ld(b,T)1<l(D(a,b,T))+l(D(b,a,T))l(T)\leq 2l(D(b,a,T))-l_{d(b,T)-1}<l(D(a,b,T))+l(D(b,a,T)). Therefore, l(T)l(D(a,b,T))+l(D(b,a,T))l(T)\leq l(D(a,b,T))+l(D(b,a,T)). ∎

Lemma 5 indicates that, the minimum latency of star trees with degree vector 𝐪\mathbf{q} is equal to the minimum value of l(D1)+l(D2)l(D_{1})+l(D_{2}), where D1D_{1} and D2D_{2} are two DRTs (respectively corresponding to D(a,b,T)D(a,b,T) and D(b,a,T)D(b,a,T) in Lemma 5) satisfy the following three conditions: (i) l(D1)ld(r(D1),D1)l(D2)l(D_{1})-l_{d(r(D_{1}),D_{1})}\leq l(D_{2}), (ii) l(D1)>l(D2)l(D_{1})>l(D_{2}), and (iii) di+1(D1)+di+1(D2)=qi,i[m1]d_{i+1}(D_{1})+d_{i+1}(D_{2})=q_{i},\forall i\in[m-1], i.e., the summation of D1D_{1} and D2D_{2}’s degree vectors equals 𝐪\mathbf{q}. As a result, computing Lminstar(𝐪)minS𝒮star(𝐪)l(S)L_{\min}^{\mathrm{star}}(\mathbf{q})\triangleq\min_{S\in\mathcal{S}^{\mathrm{star}}(\mathbf{q})}l(S), the minimum latency of the star-tree-based structures derived from degree vector 𝐪\mathbf{q}, is equivalent to solving the following optimizing problem.

Problem 2: Lminstar(𝐪)=minD1,D2l(D1)+l(D2)\displaystyle L_{\min}^{\mathrm{star}}(\mathbf{q})=\min\limits_{D_{1},D_{2}}\,l(D_{1})+l(D_{2})
s.t.\displaystyle\,s.t.\quad l(D1)ld(r(D1),D1)l(D2),\displaystyle l(D_{1})-l_{d(r(D_{1}),D_{1})}\leq l(D_{2}), (7)
l(D1)>l(D2),\displaystyle l(D_{1})>l(D_{2}), (8)
di+1(D1)+di+1(D2)=qi,i[m1].\displaystyle d_{i+1}(D_{1})+d_{i+1}(D_{2})=q_{i},\forall i\in[m-1]. (9)

To this end, for any degree vector 𝐮\mathbf{u} and positive integer tt, define τ(𝐮,t)=minD1,,Dt\tau(\mathbf{u},t)=\min_{D^{\prime}_{1},...,D^{\prime}_{t}} max{l(D1),,l(Dt)}\max\{l(D^{\prime}_{1}),...,l(D^{\prime}_{t})\}, where D1,,DtD^{\prime}_{1},...,D^{\prime}_{t} are DRTs and the summation of their degree vectors equals 𝐮\mathbf{u}. Here, τ(𝐮,t)\tau(\mathbf{u},t) returns the lowest latency of the substructures that have the degree vector 𝐮\mathbf{u} and consist of tt disjoint DRTs. It will be computed later in Algorithm 3. For any two degree vectors 𝐮\mathbf{u}, 𝐮+m1\mathbf{u}^{\prime}\in\mathbb{Z}_{+}^{m-1}, we say 𝐮𝐮\mathbf{u}\preceq\mathbf{u}^{\prime} iff for each i[m1]i\in[m-1], uiuiu_{i}\leq u_{i}^{\prime}. Then, we can solve Problem 2 by Algorithm 2.

Algorithm 2 Finding optimal solution to Problem 2
0:  𝐪\mathbf{q}.
0:  Lminstar(𝐪)L_{\min}^{\mathrm{star}}(\mathbf{q})
1:  %𝐮\mathbf{u} and 𝐪𝐮\mathbf{q}-\mathbf{u} are the degree vectors of D1D_{1} and D2D_{2}, respectively
2:  for each 𝐮𝐪\mathbf{u}\preceq\mathbf{q} do
3:     %The degree of D1D_{1}’s root is d(r(D1),D1)=i+1d(r(D_{1}),D_{1})=i+1
4:     for i=1i=1 to m1m-1 do
5:        if ui1u_{i}\geq 1 then
6:           %As 𝐮\mathbf{u} and ii are given, the minimum values of l(D1)l(D_{1}) and l(D2)l(D_{2}) are τ(𝐮𝐞i,i+1)+li+1\tau(\mathbf{u}-\mathbf{e}_{i},i+1)+l_{i+1} and τ(𝐪𝐮,1)\tau(\mathbf{q}-\mathbf{u},1), respectively
7:           if τ(𝐮𝐞i,i+1)τ(𝐪𝐮,1)\tau(\mathbf{u}-\mathbf{e}_{i},i+1)\leq\tau(\mathbf{q}-\mathbf{u},1) (i.e., Eq. (7)) and τ(𝐮𝐞i,i+1)+li+1>τ(𝐪𝐮,1)\tau(\mathbf{u}-\mathbf{e}_{i},i+1)+l_{i+1}>\tau(\mathbf{q}-\mathbf{u},1) (i.e., Eq. (8)) then
8:              l=τ(𝐮𝐞i,i+1)+li+1+τ(𝐪𝐮,1)l=\tau(\mathbf{u}-\mathbf{e}_{i},i+1)+l_{i+1}+\tau(\mathbf{q}-\mathbf{u},1). %The minimum value of l(D1)+l(D2)l(D_{1})+l(D_{2}) for given 𝐮\mathbf{u} and ii
9:              Lminstar(𝐪)=min{Lminstar(𝐪),l}L_{\min}^{\mathrm{star}}(\mathbf{q})=\min\{L_{\min}^{\mathrm{star}}(\mathbf{q}),l\}.
10:           end if
11:        end if
12:     end for
13:  end for
14:  return  Lminstar(𝐪)L_{\min}^{\mathrm{star}}(\mathbf{q}).
Theorem 2.

Given a degree vector 𝐪\mathbf{q} and τ(𝐮,t)\tau(\mathbf{u},t), 𝐮𝐪,t[m]\forall\mathbf{u}\leq\mathbf{q},t\in[m], Algorithm 2 can compute Lminstar(𝐪)L_{\min}^{\mathrm{star}}(\mathbf{q}) with time complexity O(mi=1m1(qi+1))O(m\prod_{i=1}^{m-1}(q_{i}+1)).

Proof:

Let 𝐮\mathbf{u} be the degree vector of D1D_{1} and i+1i+1 be the degree of D1D_{1}’s root. According to constraint (9), the degree vector of D2D_{2} must be 𝐪𝐮\mathbf{q}-\mathbf{u}. Therefore, as shown in Lines 2–13, by traversing 𝐮𝐪\mathbf{u}\preceq\mathbf{q} and ii from 1 to m1m-1, we can enumerate all the feasible solutions and find the one with lowest l(D1)+l(D2)l(D_{1})+l(D_{2}). Specifically, given 𝐮\mathbf{u} and ii, the minimum latency of D1D_{1} and D2D_{2} are τ(𝐪𝐞i,i+1)+li+1\tau(\mathbf{q}-\mathbf{e}_{i},i+1)+l_{i+1} and τ(𝐪𝐮,1)\tau(\mathbf{q}-\mathbf{u},1) respectively, and the conditions in Line 7 correspond to the constraints (7) and (8).

In Algorithm 2, Line 8 is carried out O(mi=1m1(qi+1))O(m\prod_{i=1}^{m-1}(q_{i}+1)) times, each of which has time complexity O(1)O(1). Therefore, the time complexity of Algorithm 2 is O(mi=1m1(qi+1))O(m\prod_{i=1}^{m-1}(q_{i}+1)). ∎

At this point, the left problem is to compute τ(𝐮,t)\tau(\mathbf{u},t), 𝐮𝐪,t[m]\forall\mathbf{u}\leq\mathbf{q},t\in[m]. We complete this task in Algorithm 3.

Algorithm 3 Computation of τ(𝐮,t),𝐮𝐪,t[m].\tau(\mathbf{u},t),\forall\mathbf{u}\preceq\mathbf{q},t\in[m].
0:  𝐪\mathbf{q}.
0:  τ(𝐮,t),𝐮𝐪,t[m].\tau(\mathbf{u},t),\forall\mathbf{u}\preceq\mathbf{q},t\in[m].
1:  for t=1t=1 to mm do
2:     τ(𝟎,t)=0\tau(\mathbf{0},t)=0.
3:  end for
4:  for each 𝐮𝐪\mathbf{u}\preceq\mathbf{q} such that 𝐮1||\mathbf{u}||_{1} is non-decreasing  do
5:     for t=1t=1 to mm do
6:        %Suppose GG is an arbitrary substructure that has degree vector 𝐮\mathbf{u} and consists of tt disjoint DRTs
7:        if t==1t==1 then
8:           % GG is a DRT. Fixing the degree of its root as d(r(G),G)=i+1d(r(G),G)=i+1, the graph Gr(G)G-r(G) has degree vector 𝐮𝐞i\mathbf{u}-\mathbf{e}_{i} and consists of i+1i+1 disjoint DRTs
9:           τ(𝐮,t)=mini[m1],ui>0{τ(𝐮𝐞i,i+1)+li+1}\tau(\mathbf{u},t)=\min_{i\in[m-1],u_{i}>0}\{\tau(\mathbf{u}-\mathbf{e}_{i},i+1)+l_{i+1}\}.
10:        else
11:           % GG consists of tt disjoint DRTs, named D1,D2,,DtD_{1},D_{2},\ldots,D_{t}. Fixing the degree vector of D1D_{1} as 𝐮\mathbf{u}^{\prime}, the graph GD1G\setminus D_{1} has degree vector 𝐮𝐮\mathbf{u}-\mathbf{u}^{\prime} and consists of t1t-1 disjoint DRTs
12:           τ(𝐮,t)=min𝐮𝐮max{τ(𝐮,1),τ(𝐮𝐮,t1)}\tau(\mathbf{u},t)=\min_{\mathbf{u}^{\prime}\preceq\mathbf{u}}\max\{\tau(\mathbf{u}^{\prime},1),\tau(\mathbf{u}-\mathbf{u}^{\prime},t-1)\}.
13:        end if
14:     end for
15:  end for
16:  return  τ(𝐮,t),𝐮𝐪,t[m].\tau(\mathbf{u},t),\forall\mathbf{u}\preceq\mathbf{q},t\in[m].
Lemma 6.

Given a degree vector 𝐪\mathbf{q}, then τ(𝐮,t),𝐮𝐪,t[m]\tau(\mathbf{u},t),\forall\mathbf{u}\preceq\mathbf{q},t\in[m] can be computed by Algorithm 3 with time complexity O(mi=1m1(qi+1)(qi+2)2)O(m\prod_{i=1}^{m-1}\frac{(q_{i}+1)(q_{i}+2)}{2}).

Proof:

Firstly, as shown in Line 2, for any t[m]t\in[m], we have τ(𝟎,t)=0\tau(\mathbf{0},t)=0, since there exist no computation nodes in the graph. Then, for any degree vector 𝐮𝐪\mathbf{u}\preceq\mathbf{q} and t[m]t\in[m], suppose GG is an arbitrary substructure that has degree vector 𝐮\mathbf{u} and consists of tt disjoint DRTs. If t=1t=1, the substructure GG is a DRT. Fixing the degree of its root as d(r(G),G)=i+1d(r(G),G)=i+1, the graph Gr(G)G-r(G) has degree vector 𝐮𝐞i\mathbf{u}-\mathbf{e}_{i} and consists of i+1i+1 disjoint DRTs. Therefore, when fixing d(r(G),G)=i+1d(r(G),G)=i+1, the minimum latency of GG is τ(𝐮𝐞i,i+1)+li+1\tau(\mathbf{u}-\mathbf{e}_{i},i+1)+l_{i+1}. As shown in Line 7, by traversing all the possible ii, we can find the minimum latency of GG:

τ(𝐮,t)=mini[m1],ui>0{τ(𝐮𝐞i,i+1)+li+1}.\tau(\mathbf{u},t)=\min_{i\in[m-1],u_{i}>0}\{\tau(\mathbf{u}-\mathbf{e}_{i},i+1)+l_{i+1}\}. (10)

If t>1t>1, the substructure GG consists of tt disjoint DRTs, named D1,D2,,DtD_{1},D_{2},\ldots,D_{t}. Fixing the degree vector of D1D_{1} as 𝐮\mathbf{u}^{\prime}, the graph GD1G\setminus D_{1} has degree vector 𝐮𝐮\mathbf{u}-\mathbf{u}^{\prime} and consists of t1t-1 disjoint DRTs. Therefore, when fixing the degree vector of D1D_{1} as 𝐮\mathbf{u}^{\prime}, the minimum latency of GG is max{τ(𝐮,1),τ(𝐮𝐮,t1)}\max\{\tau(\mathbf{u}^{\prime},1),\tau(\mathbf{u}-\mathbf{u}^{\prime},t-1)\}. As shown in Line 7, by traversing all the possible 𝐮\mathbf{u}^{\prime}, we can find the minimum latency of GG:

τ(𝐮,t)=min𝐮𝐮max{τ(𝐮,1),τ(𝐮𝐮,t1)}.\tau(\mathbf{u},t)=\min_{\mathbf{u}^{\prime}\preceq\mathbf{u}}\max\{\tau(\mathbf{u}^{\prime},1),\tau(\mathbf{u}-\mathbf{u}^{\prime},t-1)\}. (11)

As shown in Lines 4–10, we enumerate all the possible 𝐮\mathbf{u} and tt and compute τ(𝐮,t)\tau(\mathbf{u},t). Here, we enumerate 𝐮\mathbf{u} such that 𝐮1||\mathbf{u}||_{1} is non-decreasing and enumerate tt from 1 to mm, so that when computing τ(𝐮,t)\tau(\mathbf{u},t), the required values of τ(𝐮,t)\tau(\mathbf{u}^{\prime},t^{\prime}) have been already computed. Therefore, given 𝐪\mathbf{q}, Algorithm 3 can obtain τ(𝐮,t),𝐮𝐪,t[m]\tau(\mathbf{u},t),\forall\mathbf{u}\preceq\mathbf{q},t\in[m].

In the following, we investigate the time complexity of Algorithm 3. In Line 4, we enumerate 𝐮𝐪\mathbf{u}\preceq\mathbf{q}. In Line 5, we enumerate tt from 11 to mm. And in Line 9, we enumerate 𝐮𝐮\mathbf{u}^{\prime}\preceq\mathbf{u}. Thus, the number of computations is 𝐮𝐪m|{𝐮:𝐮𝐮}|\sum_{\mathbf{u}\preceq\mathbf{q}}m|\{\mathbf{u}^{\prime}:\mathbf{u}^{\prime}\preceq\mathbf{u}\}|. We can simplify it as follows:

𝐮𝐪m|{𝐮:𝐮𝐮}|\displaystyle\sum_{\mathbf{u}\preceq\mathbf{q}}m|\{\mathbf{u}^{\prime}:\mathbf{u}^{\prime}\preceq\mathbf{u}\}|
=\displaystyle= m𝐮𝐪|{𝐮:𝐮𝐮}|\displaystyle m\sum_{\mathbf{u}\preceq\mathbf{q}}|\{\mathbf{u}^{\prime}:\mathbf{u}^{\prime}\preceq\mathbf{u}\}|
=\displaystyle= mu1=0q1(u2,,um1)(q2,,qm1)|{𝐮:𝐮𝐮}|\displaystyle m\sum_{u_{1}=0}^{q_{1}}\sum_{(u_{2},\ldots,u_{m-1})\preceq(q_{2},\ldots,q_{m-1})}|\{\mathbf{u}^{\prime}:\mathbf{u}^{\prime}\preceq\mathbf{u}\}|
=\displaystyle= mu1=0q1(u2,,um1)(q2,,qm1)|{(u1,,um1):u1[0,u1],(u2,,um1)(u2,,um1)}|\displaystyle m\sum_{u_{1}=0}^{q_{1}}\sum_{(u_{2},\ldots,u_{m-1})\preceq(q_{2},\ldots,q_{m-1})}|\{(u^{\prime}_{1},\ldots,u^{\prime}_{m-1}):u^{\prime}_{1}\in[0,u_{1}],(u^{\prime}_{2},\ldots,u^{\prime}_{m-1})\preceq(u_{2},\ldots,u_{m-1})\}|
=\displaystyle= mu1=0q1((u1+1)(u2,,um1)(q2,,qm1)|{(u2,,um1):(u2,,um1)(u2,,um1)}|)\displaystyle m\sum_{u_{1}=0}^{q_{1}}((u_{1}+1)\sum_{(u_{2},\ldots,u_{m-1})\preceq(q_{2},\ldots,q_{m-1})}|\{(u^{\prime}_{2},\ldots,u^{\prime}_{m-1}):(u^{\prime}_{2},\ldots,u^{\prime}_{m-1})\preceq(u_{2},\ldots,u_{m-1})\}|)
=\displaystyle= m(q1+1)(q1+2)2(u2,,um1)(q2,,qm1)|{(u2,,um1):(u2,,um1)(u2,,um1)}|\displaystyle m\frac{(q_{1}+1)(q_{1}+2)}{2}\sum_{(u_{2},\ldots,u_{m-1})\preceq(q_{2},\ldots,q_{m-1})}|\{(u^{\prime}_{2},\ldots,u^{\prime}_{m-1}):(u^{\prime}_{2},\ldots,u^{\prime}_{m-1})\preceq(u_{2},\ldots,u_{m-1})\}|
=\displaystyle= m(q1+1)(q1+2)2(q2+1)(q2+2)2\displaystyle m\frac{(q_{1}+1)(q_{1}+2)}{2}\frac{(q_{2}+1)(q_{2}+2)}{2}
(u3,,um1)(q3,,qm1)|{(u3,,um1):(u3,,um1)(u3,,um1)}|\displaystyle\sum_{(u_{3},\ldots,u_{m-1})\preceq(q_{3},\ldots,q_{m-1})}|\{(u^{\prime}_{3},\ldots,u^{\prime}_{m-1}):(u^{\prime}_{3},\ldots,u^{\prime}_{m-1})\preceq(u_{3},\ldots,u_{m-1})\}|
=\displaystyle= mi=1m1(qi+1)(qi+2)2.\displaystyle m\prod_{i=1}^{m-1}\frac{(q_{i}+1)(q_{i}+2)}{2}.

The proof is completed. ∎

By Algorithms 2 and 3, we are able to determine the lowest latency Lminstar(𝐪)L_{\min}^{\mathrm{star}}(\mathbf{q}) of the star-tree-based structures 𝒮star(𝐪)\mathcal{S}^{\mathrm{star}}(\mathbf{q}) derived from degree vector 𝐪\mathbf{q}. Further by backtracking, we can find the corresponding latency-optimal star tree TT with degree vector 𝐪\mathbf{q}, and TT can be further used to generate a latency-optimal star-tree-based structure SS from 𝒮star(𝐪)\mathcal{S}^{\mathrm{star}}(\mathbf{q}) via Construction 1. Note that 𝐪\mathbf{q} can be any valid degree vector. However, if 𝐪\mathbf{q} is the uniquely optimal solution to Problem 1, SS has the lowest latency among all the complexity-optimal star-tree-based structures. On the other hand, if Problem 1 has multiple optimal solutions, we can enumerate them to find the latency-optimal ones among complexity-optimal star-tree-based structures.

V Isomorphic-DRT-Based Structures

In this section, we construct structures for the scenario where latency has a higher priority than complexity. To this end, we focus on a class of structures, called isomorphic-DRT-based structures. Since there always exist isomorphic-DRT-based structures (or structures derived from them) achieving the lowest latency, which is explicated in Lemma 8. Section V-A proposes a class of DRTs, called isomorphic DRTs, and further provides a construction from isomorphic DRTs to isomorphic-DRT-based structures. Next, Section V-B presents a construction from type vectors to isomorphic-DRTs. Here, the type vector is used to characterize an isomorphic DRT, each component of which is the number of levels that contain nodes with the corresponding degree. We further illustrate that given a type vector, the latency of its corresponding isomorphic-DRT-based structures is uniquely determined. Therefore, Theorem 3 is proposed to find the type vectors that lead to the latency-optimal isomorphic-DRT-based structures. Finally, Section V-C derives Construction 5 and Theorem 4 to find a complexity-optimal isomorphic-DRT-based structure SS corresponding to a given type vector 𝐰\mathbf{w}. As a result, if 𝐰\mathbf{w} is the uniquely optimal type vector obtained from Theorem 3, then SS is a latency-optimal isomorphic-DRT-based structure which also has the lowest complexity among all latency-optimal isomorphic-DRT-based structures.

V-A Isomorphic DRTs and Isomorphic-DRT-Based Structures

In this subsection, we propose a class of DRTs, called isomorphic DRTs, in Definition 6. Then, Construction 3 and Lemma 7 show that, based on any isomorphic DRT DD, we can obtain a structure, which is called an isomorphic-DRT-based structure. Finally, Lemma 8 illustrates that it suffices to investigate the isomorphic-DRT-based structures when the latency of structures is of interest.

Definition 6 (isomorphic DRT).

We refer to a DRT DD as an isomorphic DRT iff for any internal node aa in DD, all the subtrees rooted at aa’s children are isomorphic.

Denote 𝒟n\mathcal{D}_{n} as the set of all isomorphic DRTs with exactly nn-leaves. Fig. 2 shows an example of isomorphic DRTs in 𝒟6\mathcal{D}_{6}. In an isomorphic DRT DD, the nodes in a same level have the same degree (number of incoming edges). For the sake of convenience, for any i[2,m]i\in[2,m], we refer to a level of DD by type-ii level if it contains degree-ii nodes. Moreover, we define the type vector of DD by 𝐰=(w1,w2,,wm1)+m1\mathbf{w}=(w_{1},w_{2},\ldots,w_{m-1})\in\mathbb{Z}_{+}^{m-1}, where wi,i[m1]w_{i},i\in[m-1] is the number of type-(i+1)(i+1) levels in DD.

In the following, we propose a construction from isomorphic DRTs to structures in Construction 3, and use Lemma 7 to state the correctness of the construction.

Construction 3.

For any n3n\geq 3 and an isomorphic DRT D𝒟n1D\in\mathcal{D}_{n-1}, we construct the structure by the following steps:

Step 1: For any j[n]j\in[n], let DjD_{j} be a copy of DD, and label all the leaves of DjD_{j} as X{xj}X\setminus\{x_{j}\} respectively.

Step 2: Return S=j[n]DjS=\vee_{j\in[n]}D_{j} as the constructed structure.

Lemma 7.

For any n3n\geq 3 and D𝒟n1D\in\mathcal{D}_{n-1}, supposing that SS is a structure derived from DD via Construction 3, then S𝒮nS\in\mathcal{S}_{n}.

Proof:

It holds obviously, according to Definitions 1 and 3. ∎

All the structures that can be derived from isomorphic DRTs via Construction 3 are called isomorphic-DRT-based structures. Fig. 3 shows an example of isomorphic-DRT-based structure corresponding to the isomorphic DRTs given in Fig. 2. Note that the resultant structure of Construction 3 is generally not unique for a given isomorphic DRT DD, since we may label the leaves of DjD_{j} in different ways. For convenience, denote 𝒮isom(D)\mathcal{S}^{\mathrm{isom}}(D) as the set of all possible isomorphic-DRT-based structures derived from DD via Construction 3.

In the following, we illustrate that the minimum latency of the structures for given nn can be connected to the minimum latency of isomorphic-DRT-based structures with input size nnn^{\prime}\geq n.

Lemma 8.

For any S𝒮nS\in\mathcal{S}_{n}, there always exists an isomorphic-DRT-based structure S𝒮nS^{\prime}\in\mathcal{S}_{n^{\prime}} satisfying that nnn^{\prime}\geq n and l(S)l(S)l(S^{\prime})\leq l(S).

Proof:

For a given structure S𝒮nS\in\mathcal{S}_{n}, we attempt to construct an isomorphic-DRT-based structure SS^{\prime} satisfying that nnn^{\prime}\geq n and l(S)l(S)l(S^{\prime})\leq l(S) by the following steps.

Step 1: Let DD denote the DRT E(y1,S)E(y_{1},S) and then unlabel DD. Set i=1i=1.

Step 2: Let AiA_{i} be the set of nodes in the ii-th level of DD. If all the nodes in AiA_{i} are leaves, go to Step 3. Otherwise, select a node aAia\in A_{i}, such that the number of leaves of E(a,D)E(a,D) is equal to or greater than E(b,D),bAiE(b,D),\forall b\in A_{i}. Then, for each bAi{a}b\in A_{i}\setminus\{a\}, modify the E(b,D)E(b,D), so that E(b,D)E(b,D) is isomorphic to E(a,D)E(a,D). Increase ii by 1 and then repeat Step 2.

Step 3: Return SS^{\prime} derived from DD via Construction 3.

In Step 2, the number of DD’s leaves is either increased or unchanged, and the latency of DD is at most l(S)l(S). As a result, SS^{\prime} is an isomorphic-DRT-based structure satisfying that nnn^{\prime}\geq n and l(S)l(S)l(S^{\prime})\leq l(S). The proof is completed. ∎

According to Lemma 8, it suffices to investigate the latency-optimal isomorphic-DRT-based structures whose number of leaves is at least nn when the minimum latency of S𝒮nS\in\mathcal{S}_{n} is of interest. In particular, if we can first construct a latency-optimal isomorphic-DRT-based structure S𝒮nS^{\prime}\in\mathcal{S}_{n^{\prime}} with nnn^{\prime}\geq n, then properly we remove certain nodes (including xn+1,,xnx_{n+1},...,x_{n^{\prime}}, yn+1,,yny_{n+1},...,y_{n^{\prime}}) from SS^{\prime} to obtain a latency-optimal structure S𝒮nS\in\mathcal{S}_{n}, where l(S)=l(S)l(S)=l(S^{\prime}) by Lemma 8 and by the fact that removing nodes from SS^{\prime} does not increase latency. For instance, we can initially construct the latency-optimal isomorphic-DRT-based structure S𝒮7S^{\prime}\in\mathcal{S}_{7} shown in Fig. 4, and then remove the nodes x7x_{7} and y7y_{7} from SS^{\prime} to obtain the latency-optimal S𝒮6S\in\mathcal{S}_{6} shown in Fig. 9.

According to the above analysis, we only focus on the latency and complexity of isomorphic-DRT-based structures in the rest of this section,.

Refer to caption
Figure 9: A structure in 𝒮6\mathcal{S}_{6} which is generated from that in Fig. 4 by removing nodes x7x_{7} and y7y_{7}.

V-B The Lowest Latency of Isomorphic-DRT-Based Structures

In this subsection, first, Lemma 9 investigates the latency of isomorphic-DRT-based structures and find that given a type vector, the latency of its corresponding isomorphic-DRT-based structures is uniquely determined. Next, Construction 4 provides a method to obtain isomorphic DRTs for given type vectors. Based on these preparations, we finally formulate Problem 3 and solve it in Theorem 3, aiming to obtain latency-optimal isomorphic-DRT-based structures via optimizing type vectors.

Lemma 9.

For any n3n\geq 3, D𝒟n1D\in\mathcal{D}_{n-1} and S𝒮isom(D)S\in\mathcal{S}^{\mathrm{isom}}(D), the latency of SS is

l(S)=l(D)=i=1m1wili+1,l(S)=l(D)=\sum_{i=1}^{m-1}w_{i}l_{i+1}, (12)

where wiw_{i} denotes the number of type-(i+1)(i+1) levels in DD and li+1l_{i+1} is the latency factor of (i+1)(i+1)-input nodes.

Proof:

According to Definition 6, for any i,j[n+1]i,j\in[n+1] and iji\neq j, the path from xix_{i} to yjy_{j} has the latency i=1m1wili+1\sum_{i=1}^{m-1}w_{i}l_{i+1}. Noting Definition 3, the proof is completed. ∎

According to Lemma 9, the latency l(S)l(S) is fully determined by the type vector 𝐰=(w1,w2,,wm1)\mathbf{w}=(w_{1},w_{2},\ldots,w_{m-1}). We are interested in optimizing type vectors to generate latency-optimal isomorphic-DRT-based structures. Since Construction 3 can generate isomorphic-DRT-based structures from isomorphic DRTs, the remaining question is that what type vectors can result in valid (n1)(n-1)-input isomorphic DRTs. The following lemma answers this question.

Lemma 10.

For any n3n\geq 3, there exists an isomorphic DRT D𝒟n1D\in\mathcal{D}_{n-1} with type vector 𝐰=(w1,w2,,wm1)\mathbf{w}=(w_{1},w_{2},\ldots,w_{m-1}) iff

i=1m1(i+1)wi=n1.\displaystyle\prod_{i=1}^{m-1}(i+1)^{w_{i}}=n-1. (13)
Proof:

Necessity: According to Definition 6, the necessity holds obviously.

Sufficiency: Suppose the type vector 𝐰\mathbf{w} satisfies (13). We provide Construction 4 to prove that there always exists an isomorphic DRT D𝒟n1D\in\mathcal{D}_{n-1} with type vector 𝐰\mathbf{w}.

Construction 4.

For a given type vector 𝐰\mathbf{w}, we construct an isomorphic DRT by the following steps.

Step 1: Select a wi>0w_{i}>0. Then, we can construct a DRT DD, which contains one degree-(i+1)(i+1) internal node and (i+1)(i+1) leaves. Decrease wiw_{i} by 1.

Step 2: If 𝐰=𝟎\mathbf{w}=\mathbf{0}, return DD as the constructed isomorphic DRT. Otherwise, select a wi>0w_{i}>0 and replace each leave in DD with a subtree that contains one degree-(i+1)(i+1) internal node and (i+1)(i+1) leaves. Repeat Step 2.

Note that for some n3n\geq 3, there may exist no type vector satisfying (13) such that 𝒟n1=\mathcal{D}_{n-1}=\emptyset. However, as discussed at the end of Section V-A, we only need to focus on those nn such that (13) can be satisfied and 𝒟n1\mathcal{D}_{n-1}\neq\emptyset. Further recall that a type vector can generate multiple isomorphic DRTs via Construction 4 and an isomorphic DRT can also generate multiple isomorphic-DRT-based structures via Construction 3. As a result, a type vector corresponds to multiple isomorphic-DRT-based structures. Denote 𝒮isom(𝐰)\mathcal{S}^{\mathrm{isom}}(\mathbf{w}) as the set of all the isomorphic-DRT-based structures derived from degree vector 𝐰\mathbf{w}. Since the latency of any structure in 𝒮isom(𝐰)\mathcal{S}^{\mathrm{isom}}(\mathbf{w}) is the same by Lemma 9, it suffices to optimize type vectors to generate latency-optimal isomorphic-DRT-based structures. Specifically, define Lminisom(n1)minD𝒟n1l(D)L_{\min}^{\mathrm{isom}}(n-1)\triangleq\min_{D\in\mathcal{D}_{n-1}}l(D) for 𝒟n1\mathcal{D}_{n-1}\neq\emptyset, which is the minimum latency of isomorphic DRTs with input size n1n-1 as well as the minimum latency of isomorphic-DRT-based structures with input size nn. Computing Lminisom(n1)L_{\min}^{\mathrm{isom}}(n-1) is equivalent to solving the following optimization problem.

Problem 3: Lminisom(n1)=min𝐰i=1m1wili+1\displaystyle L_{\min}^{\mathrm{isom}}(n-1)=\min\limits_{\mathbf{w}}\,\sum_{i=1}^{m-1}w_{i}l_{i+1}
s.t.\displaystyle\,s.t.\quad i=1m1(i+1)wi=n1.\displaystyle\prod_{i=1}^{m-1}(i+1)^{w_{i}}=n-1. (14)

The left problem is to solve Problem 3, which has been done in Algorithm 4. The following theorem illustrates the correctness and time complexity of Algorithm 4.

Algorithm 4 Finding optimal solutions to Problem 3
0:  The input size nn.
0:  Lminisom(n1)L_{\min}^{\mathrm{isom}}(n-1).
1:  Initialize Lminisom(1)=0L_{\min}^{\mathrm{isom}}(1)=0.
2:  for i=2i=2 to n1n-1 do
3:     Lminisom(i)=mint[2,m] is a factor of i{Lminisom(i/t)+lt}L_{\min}^{\mathrm{isom}}(i)=\min_{t\in[2,m]\text{~{}is a factor of~{}}i}\{L_{\min}^{\mathrm{isom}}(i/t)+l_{t}\}.
4:  end for
5:  return  Lminisom(n1)L_{\min}^{\mathrm{isom}}(n-1).
Theorem 3.

Lminisom(n1)L_{\min}^{\mathrm{isom}}(n-1) can be computed by Algorithm 4 with time complexity O(mn)O(mn).

Proof:

In Algorithm 4, as shown in Line 1, we have Lminisom(1)=0L_{\min}^{\mathrm{isom}}(1)=0, since no computation nodes are required. Then, for any i[2,n1]i\in[2,n-1], assume 𝐰\mathbf{w} is an optimal solution to Problem 3 for Lminisom(i)L_{\min}^{\mathrm{isom}}(i). For any integer t[2,m]t\in[2,m] satisfying wt1>0w_{t-1}>0, then tt must be a factor of ii. Let 𝐰=𝐰𝐞t1\mathbf{w}^{\prime}=\mathbf{w}-\mathbf{e}_{t-1}, where 𝐞t1\mathbf{e}_{t-1} is a vector with length m1m-1 whose (t1)(t-1)-th entry is 1 and other entries are 0. Then we have 𝐰\mathbf{w}^{\prime} is an optimal solution for Lminisom(i/t)L_{\min}^{\mathrm{isom}}(i/t), leading to Lminisom(i)=Lminisom(i/t)+ltL_{\min}^{\mathrm{isom}}(i)=L_{\min}^{\mathrm{isom}}(i/t)+l_{t}. Therefore, Lminisom(i)=mint[2,m] is a factor of i{Lminisom(i/t)+lt}L_{\min}^{\mathrm{isom}}(i)=\min_{t\in[2,m]\text{~{}is a factor of~{}}i}\{L_{\min}^{\mathrm{isom}}(i/t)+l_{t}\} holds, corresponding to Line 3 of Algorithm 4.

Noting that Line 3 in Algorithm 4 has time complexity O(m)O(m) and is carried out O(n)O(n) times, the time complexity of Algorithm 4 is O(mn)O(mn). ∎

By Algorithm 4, we are able to determine the lowest latency Lminisom(n1)L_{\min}^{\mathrm{isom}}(n-1) of isomorphic-DRT-based structures with input size nn. Further by backtracking, we can obtain an (or all if needed) optimal type vector 𝐰\mathbf{w} which leads to latency-optimal isomorphic-DRT-based structures 𝒮isom(𝐰)\mathcal{S}^{\mathrm{isom}}(\mathbf{w}). However, the structures in 𝒮isom(𝐰)\mathcal{S}^{\mathrm{isom}}(\mathbf{w}) may have different complexity. We are interested in further obtaining the complexity-optimal structures in 𝒮isom(𝐰)\mathcal{S}^{\mathrm{isom}}(\mathbf{w}) in the following subsection.

V-C The Lowest Complexity of Isomorphic-DRT-Based Structures for Given Type Vectors

In this subsection, for a given degree vector 𝐰\mathbf{w}, we first generate an isomorphic-DRT-based structure S𝒮isom(𝐰)S\in\mathcal{S}^{\mathrm{isom}}(\mathbf{w}) via Construction 5, and then prove that SS has the lowest complexity among all isomorphic-DRT-based structures 𝒮isom(𝐰)\mathcal{S}^{\mathrm{isom}}(\mathbf{w}) in Theorem 4.

Construction 5.

For a given type vector 𝐰\mathbf{w}, we construct a structure SS by the following steps.

Step 1: Obtain an isomorphic DRT DD from 𝐰\mathbf{w} via Construction 4.

Step 2: Obtain an isomorphic-DRT-based structure SS from DD via Construction 3, where for each j[n]j\in[n], label leaves of DjD_{j} from left to right in a consecutive way: xj+1,,xn,x1,,xj1x_{j+1},\ldots,x_{n},x_{1},\ldots,x_{j-1}, where xnx_{n} and x1x_{1} are considered as consecutive to each other. Return SS as the constructed structure.

Fig. 4 shows a structure SS obtained from Construction 5 for n=7n=7, m=3m=3 and 𝐰=(1,1)\mathbf{w}=(1,1). The following theorem states that SS does have the lowest complexity among 𝒮isom(𝐰)\mathcal{S}^{\mathrm{isom}}(\mathbf{w}).

Theorem 4.

For a given type vector 𝐰\mathbf{w} and n=1+i=1m1(i+1)win=1+\prod_{i=1}^{m-1}(i+1)^{w_{i}}, let SS be the structure obtained from Construction 5. Then, S𝒮isom(𝐰)S\in\mathcal{S}^{\mathrm{isom}}(\mathbf{w}) and c(S)=i=1m1nwici+1=minS𝒮isom(𝐰)c(S)c(S)=\sum_{i=1}^{m-1}nw_{i}c_{i+1}=\min_{S^{\prime}\in\mathcal{S}^{\mathrm{isom}}(\mathbf{w})}c(S^{\prime}).

Proof:

Let S𝒮isom(𝐰)S^{\prime}\in\mathcal{S}^{\mathrm{isom}}(\mathbf{w}). We first prove

c(S)i=1m1nwici+1.c(S^{\prime})\geq\sum_{i=1}^{m-1}nw_{i}c_{i+1}. (15)

For any j[n]j\in[n], E(yj,S)E(y_{j},S^{\prime}) must be an isomorphic DRT of height ζ\zeta, where ζ=i=1m1wi\zeta=\sum_{i=1}^{m-1}w_{i}. For i[0,ζ]i\in[0,\zeta], let HiH_{i} be the ii-th level in SS^{\prime}, i.e., any node aHia\in H_{i} iff the distance from aa to some output node in SS is ii. As a result, H0=Y={y1,y2,,yn}H_{0}=Y=\{y_{1},y_{2},\ldots,y_{n}\}. Since SS^{\prime} is an isomorphic-DRT-based structure, the nodes in HiH_{i} have the same degree, which is referred to as ρi\rho_{i}. Our idea is to prove |Hi|n,i[ζ]|H_{i}|\geq n,\forall i\in[\zeta] such that c(S)=i=0ζ1|Hi|cρii=0ζ1ncρi=i=1m1nwici+1c(S^{\prime})=\sum_{i=0}^{\zeta-1}|H_{i}|c_{\rho_{i}}\geq\sum_{i=0}^{\zeta-1}nc_{\rho_{i}}=\sum_{i=1}^{m-1}nw_{i}c_{i+1}.

For any aSa\in S^{\prime}, let F(a)=(f1,f2,,fn)F(a)=(f_{1},f_{2},\ldots,f_{n}), where for any j[n]j\in[n], fj=1f_{j}=1 if xjE(a,S)x_{j}\in E(a,S^{\prime}) and fj=0f_{j}=0 otherwise. Meanwhile, for any i[0,ζ1]i\in[0,\zeta-1] and HHiH\subseteq H_{i}, let F(H)=aHF(a)F(H)=\oplus_{a\in H}F(a), where \oplus is the component-wise XOR operation. If H=H=\emptyset, let F(H)=(0,0,,0)F(H)=(0,0,\ldots,0) (nn zeros in total). Moreover, let F(i)={F(H):HHi,|H| is even},i[0,ζ1]F(i)=\{F(H):H\subseteq H_{i},|H|\text{~{}is even}\},\forall i\in[0,\zeta-1].

On the one hand, for any i[0,ζ2]i\in[0,\zeta-2] and H={a1,a2,,ak}HiH=\{a_{1},a_{2},\ldots,a_{k}\}\subseteq H_{i} with even |H||H|, we have F(H)=F(a1)F(ak)=(F(b1,1)F(b1,ρi))(F(bk,1)F(bk,ρi))F(i+1)F(H)=F(a_{1})\oplus\cdots\oplus F(a_{k})=(F(b_{1,1})\oplus\cdots\oplus F(b_{1,\rho_{i}}))\oplus\cdots\oplus(F(b_{k,1})\oplus\cdots\oplus F(b_{k,\rho_{i}}))\in F(i+1), where bk,jb_{k^{\prime},j} is the jj-th child of aka_{k^{\prime}} in E(ak,S)E(a_{k^{\prime}},S^{\prime}) for k[k],j[ρi]k^{\prime}\in[k],j\in[\rho_{i}]. That is, F(0)F(1)F(ζ1)F(0)\subseteq F(1)\subseteq\cdots\subseteq F(\zeta-1). On the other hand, for any HH0=YH\subseteq H_{0}=Y with even |H||H|, we have F(H)=(f1,f2,,fn)F(H)=(f_{1},f_{2},\ldots,f_{n}), where for any j[n]j\in[n], fj=1f_{j}=1 if yjHy_{j}\in H and fj=0f_{j}=0 otherwise. This leads to |F(0)|=2n1|F(0)|=2^{n-1}. As a result, we have |F(i)|2n1,i[0,ζ1]|F(i)|\geq 2^{n-1},\forall i\in[0,\zeta-1], implying that |Hi|n|H_{i}|\geq n as well as (15).

Let SS be the returned structure in Construction 5. Obviously, SSisom(𝐰)S\in S^{\mathrm{isom}}(\mathbf{w}). Given (15), it suffices to prove c(S)i=1m1nwici+1c(S)\leq\sum_{i=1}^{m-1}nw_{i}c_{i+1} to complete the proof of Theorem 4. Let AiA_{i} denote the ii-th level of SS. By Construction 5, all DRTs E(a,S)E(a,S), aAia\in A_{i} are isomorphic to each other and the leaves of each E(a,S)E(a,S) are labelled in a consecutive way, implying that |Ai|=|{E(a,S):aAi}|n|A_{i}|=|\{E(a,S):a\in A_{i}\}|\leq n. Then, c(S)i=1m1nwici+1c(S)\leq\sum_{i=1}^{m-1}nw_{i}c_{i+1} follows. ∎

We remark that for m=2m=2, any isomorphic DRT is a perfect DRT that was considered in [18]. As a result, Theorem 4 can be thought as an extension of [18, Theorem 6] from m=2m=2 to m2m\geq 2. Note that 𝐰\mathbf{w} can be any valid degree vector. However, if 𝐰\mathbf{w} is the uniquely optimal solution to Problem 3, SS has the lowest complexity among all the latency-optimal isomorphic-DRT-based structures. On the other hand, if Problem 3 has multiple optimal solutions, we can enumerate them to find the complexity-optimal ones among latency-optimal isomorphic-DRT-based structures.

VI Conclusion

In this paper, we have investigated the structures that can be used by a node in a message passing algorithm to compute outgoing messages from incoming messages. Specifically, for the scenario where complexity has a higher priority than latency, we proposed a class of structures, called star-tree-based structures. Within this framework, we derived algorithms to determine both the lowest achievable complexity and the corresponding lowest latency for these optimized structures. Next, for the scenario where latency has a higher priority than complexity, we proposed a class of structures termed isomorphic-DRT-based structures. We derived an algorithm to ascertain the lowest latency achievable with these structures and constructed latency-optimal isomorphic-DRT-based structures with the lowest complexity. Notably, our findings extend the work presented in [18] to accommodate multi-input scenarios. Our future work is to investigate optimal/near-optimal structures that can achieve a good tradeoff between complexity and latency.

References

  • [1] T. J. Richardson and R. L. Urbanke, Modern Coding Theory.   Cambridge University Press, 2008.
  • [2] F. R. Kschischang, B. J. Frey, and H. Loeliger, “Factor graphs and the sum-product algorithm,” IEEE Trans. Inf. Theory, vol. 47, no. 2, pp. 498–519, 2001.
  • [3] I. Arasaratnam and S. Haykin, “Cubature kalman filters,” IEEE Trans. Autom. Control., vol. 54, no. 6, pp. 1254–1269, 2009.
  • [4] W. Xu and F. Zhang, “FAST-LIO: A fast, robust lidar-inertial odometry package by tightly-coupled iterated kalman filter,” IEEE Robotics Autom. Lett., vol. 6, no. 2, pp. 3317–3324, 2021.
  • [5] X. Yu and Z. Meng, “Robust kalman filters with unknown covariance of multiplicative noise,” IEEE Trans. Autom. Control., vol. 69, no. 2, pp. 1171–1178, 2024.
  • [6] J. Gauvain and C. Lee, “Maximum a posteriori estimation for multivariate gaussian mixture observations of markov chains,” IEEE Trans. Speech Audio Process., vol. 2, no. 2, pp. 291–298, 1994.
  • [7] N. M. Ghahjaverestan, S. Masoudi, M. B. Shamsollahi, A. Beuchee, P. Pladys, D. Ge, and A. I. Hernández, “Coupled hidden markov model-based method for apnea bradycardia detection,” IEEE J. Biomed. Health Informatics, vol. 20, no. 2, pp. 527–538, 2016.
  • [8] C. Guo, Z. Liang, J. Zeng, M. Song, and Z. Xue, “A predictive hidden semi-markov model for bridges subject to chloride-induced deterioration,” in 21st IEEE International Conference on Software Quality, Reliability and Security, QRS 2021 - Companion, Hainan, China, December 6-10, 2021.   IEEE, 2021, pp. 751–756.
  • [9] S. Sun, C. Zhang, and G. Yu, “A bayesian network approach to traffic flow forecasting,” IEEE Trans. Intell. Transp. Syst., vol. 7, no. 1, pp. 124–132, 2006.
  • [10] J. Zeng, G. Zhou, Y. Qiu, C. Li, and Q. Zhao, “Bayesian tensor network structure search and its application to tensor completion,” Neural Networks, vol. 175, p. 106290, 2024.
  • [11] R. G. Gallager, “Low-density parity-check codes,” IRE Trans. Inf. Theory, vol. IT-8, no. 1, pp. 21–28, Jan. 1962.
  • [12] D. MacKay, “Good error-correcting codes based on very sparse matrices,” IEEE Trans. Inf. Theory, vol. 45, no. 2, pp. 399–431, 1999.
  • [13] T. J. Richardson and R. L. Urbanke, “The capacity of low-density parity-check codes under message-passing decoding,” IEEE Trans. Inf. Theory, vol. 47, no. 2, pp. 599–618, Feb. 2001.
  • [14] 3GPP, “NR; Multiplexing and channel coding,” Release 15, Technical Specification (TS) 38.212, 2017.
  • [15] Y. Ding, X. He, K. Cai, G. Song, B. Dai, and X. Tang, “An efficient joint decoding scheme for outer codes in DNA-based data storage,” in Proc. IEEE/CIC Int. Conf. Commun. China Workshops, Aug. 2023, pp. 1–6.
  • [16] J. Chen, A. Dholakia, E. Eleftheriou, M. P. Fossorier, and X.-Y. Hu, “Reduced-complexity decoding of LDPC codes,” IEEE Trans. Commun., vol. 53, no. 8, pp. 1288–1299, Aug. 2005.
  • [17] Y.-L. Ueng, C.-Y. Wang, and M.-R. Li, “An efficient combined bit-flipping and stochastic LDPC decoder using improved probability tracers,” IEEE Transactions on Signal Processing, vol. 65, no. 20, pp. 5368–5380, 2017.
  • [18] X. He, K. Cai, and L. Zhou, “A class of optimal structures for node computations in message passing algorithms,” IEEE Trans. Inf. Theory, vol. 68, no. 1, pp. 93–104, Jan. 2022.
  • [19] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to Algorithms: 3rd Edition.   Cambridge, MA, USA: MIT Press, 2009.