Two Classes of Optimal Multi-Input Structures for Node Computations in Message Passing Algorithms

Teng Lu, Xuan He, and Xiaohu Tang
Information Coding and Transmission Key Lab of Sichuan Province,
Southwest Jiaotong University, Chengdu 611756, China
Email: [email protected], [email protected], [email protected]

Abstract

In this paper, we delve into the computations performed at a node within a message-passing algorithm. We investigate low complexity/latency multi-input structures that can be adopted by the node for computing outgoing messages $\mathbf{y}=(y_{1},y_{2},\ldots,y_{n})$ from incoming messages $\mathbf{x}=(x_{1},x_{2},\ldots,x_{n})$ , where each $y_{j},j=1,2,\ldots,n$ is computed via a multi-way tree with leaves $\mathbf{x}$ excluding $x_{j}$ . Specifically, we propose two classes of structures for different scenarios. For the scenario where complexity has a higher priority than latency, the star-tree-based structures are proposed. The complexity-optimal ones (as well as their lowest latency) of such structures are obtained, which have the near-lowest (and sometimes the lowest) complexity among all structures. For the scenario where latency has a higher priority than complexity, the isomorphic-directed-rooted-tree-based structures are proposed. The latency-optimal ones (as well as their lowest complexity) of such structures are obtained, which are proved to have the lowest latency among all structures.

Index Terms:

Multi-input structure, complexity, latency, low-density parity-check (LDPC) code, message passing algorithm.

I Introduction

A large variety of algorithms in the areas of coding, signal processing, and artificial intelligence can be viewed as instances of the message passing algorithm [1, 2]. Specific instances of such algorithms include Kalman filtering and smoothing [3, 4, 5]; the forward-backward algorithm for hidden Markov models [6, 7, 8]; probability propagation in Bayesian networks [9, 10]; and decoding algorithms for error-correcting codes such as low-density parity-check (LDPC) codes [11, 12, 13, 14, 15, 16, 17]. These algorithms operate under a graphical framework, where each node collects incoming messages via its connecting edges, computes outgoing messages, and propagates back outgoing messages via the same edges.

In particular, we focus on the message passing on a specific node. As shown in Fig. 1, the incoming messages to this node are expressed as a vector $\mathbf{x}=(x_{1},x_{2},\ldots,x_{n})$ , where $x_{j}$ comes from the $j$ -th connecting edge, for each $j\in[n]\triangleq\{1,2,\ldots,n\}$ . Subsequently, the node computes $n$ outgoing messages, represented as $\mathbf{y}=(y_{1},y_{2},\ldots,y_{n})$ , where $y_{j}\triangleq f(x_{1},...,x_{j-1},x_{j+1},...,x_{n})$ is the outgoing message sent back to the $j$ -th connecting edge and $f$ denotes a general function for computing outgoing messages. Note that when computing $y_{j}$ , the function $f$ does not include $x_{j}$ as input. This is a well-known computation principle in message passing algorithms to avoid propagating $x_{j}$ ’s messages back to it again.

Figure 1: Message passing on a specific node.

For any $j\in[n]$ , the processing of computing $y_{j}$ can be abstracted as a directed rooted tree (DRT), whose root is $y_{j}$ and leaves are $\mathbf{x}$ excluding $x_{j}$ . In such a DRT, each $i$ -input internal node corresponds to an $i$ -input operator. For example, assume $n=7$ and $y_{j}=f(x_{1},...,x_{j-1},x_{j+1},...,x_{n}),$ $\forall j\in[n]$ . The computation of $\mathbf{y}$ can be handled by the seven DRTs shown in Fig. 2. The processing of computing $y_{1}$ can be represented as $y_{1}=f(f(x_{2},x_{3},x_{4}),f(x_{5},x_{6},x_{7}))$ , corresponding to the first DRT in Fig. 2. The computation of any other $y_{j}$ is analogous, and all these DRTs can be handled in parallel. As a result, the complexity for computing $\mathbf{y}$ by such seven DRTs can be measured by fourteen 3-input and seven 2-input nodes, and the latency can be measured by one 3-input and one 2-input nodes.

In fact, we can reuse the computation units corresponding to the same subtree among different DRTs, so that the complexity can be reduced under the same latency. Formally, we use a class of graphs, called structure to represent this process. For instance, the seven DRTs in Fig. 2 can be united into the structure in Fig. 3, such that this structure realizes the same computation of $\mathbf{y}$ but requires lower complexity under the same latency. More specifically, the complexity of the structure can be measured by eight 3-input and seven 2-input nodes. We remark that, the structure in Fig. 3 was indeed proposed in [17] for efficiently implementing the node update in stochastic LDPC decoders.

Refer to caption — Figure 2: DRTs rooted at $\mathbf{y}$ with $n=7$ .

However, the latency and complexity may vary in different structures. For example, the computing process corresponding to the structure in Fig. 4 can also obtain $\mathbf{y}$ with $n=7$ . The complexity of the structure can be measured by seven 3-input and seven 2-input nodes, and the latency can be measured by one 3-input and one 2-input nodes. Therefore, under the same latency, the structure has a lower complexity (one less 3-input node) than that in Fig. 3. It is always of significant interest to find structures with lower complexity/latency. To this end, we [18] investigate a class of binary-input structures, but the systematic construction of low complexity and/or low latency muti-input structures remains open. Here “multi-input” means that there may exist $k$ -input computation nodes for any $k\in[2,m]\triangleq\{2,3,...,m\}$ , where $m\geq 2$ is the maximum number of inputs for a node and it should be regarded as a preset and fixed integer throughout this paper.

In this paper, we focus on low complexity/latency multi-input structures for computing $\mathbf{y}$ from $\mathbf{x}$ . Specifically, we propose two classes of multi-input structures for different scenarios:

•

For the scenario where complexity has a higher priority than latency, we propose a class of structures, called star-tree-based structures. Among these structures, we derive algorithms for computing their lowest complexity as well as the lowest latency of the complexity-optimal structures. We derive an algorithm to obtain the complexity-optimal ones of such structures, and illustrate that they have the near-lowest (and sometimes the lowest) complexity among all structures. We further derive an algorithm to obtain the latency-optimal ones among the complexity-optimal star-tree-based structures.
•

For the scenario where latency has a higher priority than complexity, we propose a class of structures, called isomorphic-DRT-based structures. We derive an algorithm to obtain the latency-optimal ones of such structures, and prove that they have the lowest latency among all structures. We further develop a construction for the complexity-optimal ones among the latency-optimal isomorphic-DRT-based structures.

The remainder of this paper is organized as follows. Firstly, Section II presents preliminaries related to graphs and trees. Next, Section III defines the structures under consideration for node computation in this paper. Then, Sections IV and V delve into star-tree-based and isomorphic-DRT-based structures, respectively. Finally, Section VI concludes this paper.

II Preliminaries

In this section, we introduce preliminaries regarding graphs and trees, mainly based on [19]. For convenience, define $\mathbb{Z}_{+}\triangleq\{0,1,\ldots\}$ as the set of non-negative integers. For any $i,j\in\mathbb{Z}_{+}$ with $i<j$ , define $[i,j]\triangleq\{i,i+1,\ldots,j\}$ and $[j]\triangleq\{1,2,\ldots,j\}$ . For any $i\in[m-1]$ , let $\mathbf{e}_{i}$ be a vector of length $m-1$ whose $i$ -th entry is 1 and other entries are 0.

II-A Graphs

A graph, denoted as $G\triangleq G_{v}\cup G_{e}$ , consists of a node set $G_{v}$ and an edge set $G_{e}$ . Any element in $G_{e}$ is called a directed (resp. undirected) edge which is denoted by an ordered (resp. unordered) pair $(a,b)\in G_{v}\times G_{v}$ . The term “ordered” (resp. “unordered”) implies that $(a,b)\neq(b,a)$ (resp. $(a,b)=(b,a)$ ). A graph $G$ is a directed (undirected) graph if and only if (iff) each element in $G_{e}$ is a directed (undirected) edge. When graphically represented, directed edges are depicted using arrows, whereas undirected edges are depicted using lines. A graph $G^{\prime}$ is called a subgraph of $G$ if $G^{\prime}\subseteq G$ .

In a directed graph $G$ , an edge $(a,b)$ is referred to as a leaving or outgoing edge of node $a$ , and an entering or incoming edge of node $b$ . In contrast, for an undirected graph $G$ , the edge $(a,b)$ is simply described as an edge of $a$ and $b$ . To describe the operations of removing or adding a node $a$ or an edge $(a,b)$ from/to a graph $G$ , we utilize the notation of subtraction/addition (represented by $-/+$ ). Specifically, $G-a\triangleq G\setminus(\{a\}\cup\{(a_{1},a_{2})\in G:a_{1}=a~{}\text{or}~{}a_{2}=a\})$ , $G+a\triangleq G\cup\{a\}$ , $G-(a,b)\triangleq G\setminus\{(a,b)\}$ , and $G+(a,b)\triangleq G\cup\{a,b\}\cup\{(a,b)\}$ .

For any graph $G$ and node $a\in G$ , define $d(a,G)$ as the degree of $a$ in $G$ . If $G$ is an undirected graph, $d(a,G)$ is the number of edges connecting with $a$ . If $G$ is a directed graph, $d(a,G)$ is the number of incoming edges of $a$ . For any graph $G$ and $i\in\mathbb{Z}_{+}$ , define $d_{i}(G)$ as the number of degree- $i$ nodes in $G$ . For any graph $G$ , define $\mathbf{q}\triangleq(q_{1},q_{2},...,q_{m-1})\in\mathbb{Z}_{+}^{m-1}$ as the degree vector of $G$ , where for each $i\in[m-1]$ , $q_{i}=d_{i+2}(G)$ if $G$ is undirected and $q_{i}=d_{i+1}(G)$ if $G$ is directed.

We call node sequence $P=(v_{0},v_{1},\ldots,v_{k})$ a path in $G$ iff $(v_{i-1},v_{i})\in G,\forall i\in[k]$ , where $k$ is called the length of $P$ . Moreover, $P$ is called a simple path iff $v_{i}\neq v_{j},\forall~{}0\leq i<j\leq k$ . For any $a,b\in G$ , we say that $a$ is reachable from $b$ if there exists a path from $b$ to $a$ . The distance from $a$ to $b$ is defined as the length of the shortest path from $a$ to $b$ . If no such path exists, the distance is regarded as $\infty$ . We further call $P$ a cycle if $k\geq 2$ , $v_{0}=v_{k}$ , $(v_{0},v_{1})\neq(v_{1},v_{2})$ , and $(v_{1},v_{2},\ldots,v_{k})$ is a simple path.

For any directed graph $G$ , we call $G$ a directed acyclic graph (DAG) iff there exists no cycle in $G$ . For any DAG $G$ and $a\in G$ , define $E(a,G)\triangleq(\{b\in G:\text{$a$ is reachable from $b$}\},\{(b^{\prime},b^{\prime\prime})\in G:\text{$a$ is reachable from $b^{\prime\prime}$}\})$ as a subgraph of $G$ . Moreover, we define $E(G)\triangleq\{E(a,G):a\in G\}$ . An undirected graph is connected if every node is reachable from all other nodes.

II-B Trees

A tree is defined as a connected, acyclic, undirected graph. Given any tree $T$ and any node $a\in T$ , we call $a$ a leaf of $T$ if its degree $d(a,T)=1$ . Otherwise, we call $a$ an internal node.

A rooted tree is a tree in which there is a unique node called the root of the tree. We consider a rooted tree $T$ , with its root denoted by $r(T)$ . The depth of any node $a\in T$ is defined as the distance between $a$ and $r(T)$ . Additionally, for a non-negative integer $i$ , denote the $i$ -th level as the set of all the nodes with the depth $i$ . The height of $T$ is determined by the greatest depth among all its nodes. For any edge $(a,b)\in T$ , if the depth of $a$ is greater than the depth of $b$ , then $b$ is designated as the parent of $a$ , and conversely, $a$ is the child of $b$ .

The directed variant of a rooted tree $T$ , say $T^{\prime}$ , is created by converting each undirected edge $(a,b)\in T$ , where $a$ is a child of $b$ , into a directed edge from $a$ to $b$ in $T^{\prime}$ . We refer to $T^{\prime}$ as a directed rooted tree (DRT), with $T$ being its undirected counterpart. For any node $a\in T^{\prime}$ , $E(a,T^{\prime})$ is called the subtree of $T^{\prime}$ rooted at $a$ . For any $(a,b)\in T$ , let $D(a,b,T)\triangleq E(a,T_{b})$ , where $T_{b}$ is the result (a DRT) by making $T$ as a DRT with root $b$ . Let $D(T)\triangleq\{D(a,b,T):(a,b)\in T\}$ .

II-C Labels of Graphs

For any graph $G$ , we call $G$ a labelled graph iff every node in $G$ is given a unique label (as a result, each edge is also given a unique label). Conversely, if not all nodes in $G$ are uniquely labelled, $G$ is termed a partially unlabelled graph.

Two labelled graphs $G$ and $G^{\prime}$ are considered identical, i.e., $G=G^{\prime}$ , iff $G$ and $G^{\prime}$ have same labelled nodes and edges (and root for rooted trees). Meanwhile, two partially unlabelled graphs, $G$ and $G^{\prime}$ , are deemed identical if there exists a labelling scheme for the unlabelled nodes in both $G$ and $G^{\prime}$ that renders them labelled and identical.

Two partially unlabelled graphs $G$ and $G^{\prime}$ are the same iff there exists a way to label all unlabelled nodes in $G$ and $G^{\prime}$ such that $G$ and $G^{\prime}$ become labelled and the identical. Graphs $G$ and $G^{\prime}$ are isomorphic if there is a way to reassign labels to all nodes in both graphs, resulting in identically labelled $G$ and $G^{\prime}$ .

III Structures for Node Computation

Recall that $\mathbf{x}=(x_{1},x_{2},\ldots,x_{n})$ and $\mathbf{y}=(y_{1},y_{2},\ldots,y_{n})$ represent the incoming and outgoing messages, respectively. In this paper, we focus on the scenario where for each $j\in[n]$ , a DRT $T_{j}$ is employed to delineate the computation of $y_{j}$ from $\mathbf{x}$ excluding $x_{j}$ . Specifically, within $T_{j}$ , the leaves correspond to the incoming messages $\mathbf{x}$ excluding $x_{j}$ , the internal nodes correspond to operations whose number of inputs is between 2 and $m$ , and the root $r(T_{j})$ corresponds to $y_{j}$ . Some examples of such DRTs for $n=7$ are depicted in Fig. 2.

Define an input set $X=\{x_{j}:j\in[n]\}$ and an output set $Y=\{y_{j}:j\in[n]\}$ , where node $x_{j}$ (resp. $y_{j}$ ) corresponds to the $j$ -th incoming message $x_{j}$ (resp. outgoing message $y_{j}$ ). In this paper, we remark that for any graph $G$ and any node $a\in G$ , $a$ is labelled in $G$ iff $a$ is an input node from $X$ or an output node from $Y$ . Consequently, $G$ is partially unlabelled if $G_{v}\setminus(X\cup Y)\neq\emptyset$ . To describe the computation process, we introduce the structure defined as follows.

Definition 1.

For any DAG $S$ , we say that $S$ is a structure with input size $n$ iff $S$ fulfills the following properties.

1)

$S$ contains $n$ nodes with no incoming edges, which are exactly $X=\{x_{1},x_{2},\ldots,x_{n}\}$ .
2)

$S$ contains $n$ nodes with no outgoing edges, which are exactly $Y=\{y_{1},y_{2},\ldots,y_{n}\}$ .
3)

For $j\in[n]$ , $E(y_{j},S)$ is a DRT whose leaves are exactly $X\setminus\{{x_{j}}\}$ .
4)

For two different nodes $a,b\in S$ , $E(a,S)\neq E(b,S)$ .
5)

For any node $a\in S\setminus X$ , we have $2\leq d(a,S)\leq m$ .

We remark that Definition 1 corresponds to [18, Definition 2] in a special case where $m=2$ . For any $n\geq 2$ , define $\mathcal{S}_{n}$ as the set of all structures with input size $n$ . We may sometimes use subsets of a structure, referring to as substructures, which is defined below.

Definition 2.

A DAG $S^{\prime}$ is called a substructure iff there exists a structure $S$ satisfying that $S^{\prime}\subseteq S$ . For any two substructures $S_{1}$ and $S_{2}$ , denote the union of $S_{1}$ and $S_{2}$ by $S_{1}\vee S_{2}$ , where only one copy of the same subtrees is kept, corresponding to the fourth property of Definition 1.

According to Definition 2, for any two substructures $S_{1}$ and $S_{2}$ , $S_{1}\vee S_{2}$ is still a substructure if $S_{1}\cap S_{2}\cap Y=\emptyset$ . Moreover, for any structure $S\in\mathcal{S}_{n}$ , $E(y_{j},S),\forall j\in[n]$ is a substructure of $S$ , and $S=\vee_{j\in[n]}E(y_{j},S)$ always holds. A more specific example is that, the seven DRTs depicted in Fig. 2 are substructures whose union (under the operation $\vee$ ) results in the structure in Fig. 3.

In practice, each degree- $i$ node corresponds to an $i$ -input computation unit, the complexity and latency of which should increase as $i$ increases. Therefore, we define the complexity factor and latency factor to describe this fact. Specifically, for each $i\in[0,m]$ , denote $c_{i}$ and $l_{i}$ respectively as the complexity factor and latency factor of the $i$ -input computation node. The complexity and latency factors are determined by the actual situation of the hardware. In particular, $c_{0},c_{1},l_{0}$ and $l_{1}$ are set to 0, and it is reasonable to assume $c_{i}\leq c_{j}$ and $l_{i}\leq l_{j}$ for $0\leq i<j\leq m$ . Accordingly, we can define the complexity and latency of a substructure (as well as structure) as follows.

Definition 3.

For any substructure $S$ , the complexity of $S$ , denoted by $c(S)$ , is defined as the weighted number of nodes in $S$ :

c(S)\triangleq\sum_{i=2}^{m}d_{i}(S)c_{i}.

Definition 4.

For any substructure $S$ , the latency of $S$ is defined as

l(S)\triangleq\max_{P\subseteq S}{l(P,S)},

where $P=(v_{0},v_{1},\ldots,v_{k})$ is a path in $S$ and $l(P,S)\triangleq\sum_{i=0}^{k}l_{d(v_{i},S)}$ is the latency of $P$ .

As an example, the complexity and latency of the structure in Fig. 4 are $7c_{2}+7c_{3}$ and $l_{2}+l_{3}$ , respectively. It is reasonable to use complexity and latency as two key criteria for evaluating the performance of a structure. In this paper, our main purpose is to discover the structures with low complexity and/or latency.

IV Star-Tree-Based Structures

In this section, we construct structures for the scenario where complexity has a higher priority than latency. To this end, we focus on a class of structures, called star-tree-based structures, which probably have the (near) lowest complexity among all structures. Specifically, Section IV-A proposes a class of trees, called star trees, and further presents a one-to-one mapping between star trees and star-tree-based structures. Next, Section IV-B establishes a necessary and sufficient condition between degree vectors and the existence of star trees. Thus, the complexity of a star-tree-based structure is completely determined by the degree vector of the corresponding star tree. Therefore, Theorem 1 is proposed to find the degree vectors that lead to the complexity-optimal star-tree-based structures. Finally, Section IV-C derives Theorem 2 to find a latency-optimal star-tree-based structure $S$ corresponding to a given degree vector $\mathbf{q}$ . As a result, if $\mathbf{q}$ is the uniquely optimal degree vector obtained from Theorem 1, then $S$ is a complexity-optimal star-tree-based structure which also has the lowest latency among all complexity-optimal star-tree-based structures.

IV-A Star Trees and Star-Tree-Based Structures

In this subsection, we propose a class of trees, called star trees, in Definition 5. Next, Construction 1 and Lemma 1 illustrate that, based on any star tree $T$ , we can obtain a low complexity structure, denoted as $h(T)$ , which is called a star-tree-based structure.

Definition 5 (star trees).

An $n$ -input star tree $T$ is an (undirected) tree fulfilling the following properties.

•

$T$ has $n$ leaves, which are exactly $X$ .
•

The degree of each internal node in $T$ is between 3 and $m+1$ .

Figure 5: The only star tree

T\in\mathcal{T}_{3}

Denote $\mathcal{T}_{n}$ as the set of all $n$ -input star trees. For example, Fig. 5 shows the only star tree in $\mathcal{T}_{3}$ , and Fig. 6(a) shows a star tree in $\mathcal{T}_{7}$ . In fact, we can obtain a structure based on any star tree. For instance, consider the star tree in Fig. 6(a). First, making $x_{1}$ as the root and changing undirected edges to directed ones can lead to the DRT in Fig. 6(b). Second, removing $x_{1}$ and labelling the new root as $y_{1}$ can result in the DRT in Fig. 6(c) which could be used for computing $y_{1}$ from $X\setminus\{x_{1}\}$ . Third, proceed the above steps by replacing $x_{1}$ and $y_{1}$ with $x_{j}$ and $y_{j},\forall j\in[2,7]$ , respectively, leading to the seven DRTs in Fig. 7. Finally, these DRTs can be united (under the union operation $\vee$ ) to the structure in Fig. 8. We formally describe the construction from star trees to structures in Construction 1, and use Lemma 1 to state the correctness of the construction.

Construction 1.

Given a star tree $T\in\mathcal{T}_{n}$ , return

h(T)\triangleq\vee_{j\in[n],(a,x_{j})\in T}D(a,x_{j},T)

as the constructed star-tree-based structure corresponding to $T$ , where $D(a,x_{j},T)$ is a DRT defined in Section II-B.

All the structures derived from star trees via Construction 1 are called star-tree-based structures. For example, considering $n=7$ and the star tree in Fig. 6(a), the DRTs $D(a,x_{j},T),\forall j\in[n]$ correspond to the seven DRTs in Fig. 7, and the star-tree-based structure $h(T)$ corresponds to the structure in Fig. 8.

Lemma 1.

For any $n\geq 3$ and $T\in\mathcal{T}_{n}$ , then $h(T)\in\mathcal{S}_{n}$ .

Proof:

For any $n\geq 3$ , $T\in\mathcal{T}_{n}$ and $j\in[n]$ , $x_{j}$ is a leaf in $T$ . Let $(a,x_{j})\in T$ be the only edge of $x_{j}$ . $D(a,x_{j},T)$ is a DRT with root $y_{j}$ and leaves $X\setminus\{x_{j}\}$ . As a result, we have $h(T)\in\mathcal{S}_{n}$ . ∎

According to Lemma 1, Construction 1 is a one-to-one mapping from star trees to star-tree-based structures. As shown in Fig. 8, each computation node is either reachable to $y_{j}$ or reachable from $x_{j}$ , for any $j\in[7]$ . Noting that in any structure, a node reachable from $x_{j}$ can never be contained in the DRT rooted at $y_{j}$ , the computation nodes in Fig. 8 have been contained in as many DRTs in Fig. 7 as possible, indicating that they are fully reused. In particular, it was proved in [18] that for $m=2$ , the star-tree-based structures have the lowest complexity among all the binary-input structures. This implies that for $m>2$ , star-tree-based structures are likely to have the near-lowest (and sometimes the lowest) complexity among all structures. We thus focus on star-tree-based structures in this section.

IV-B The Lowest Complexity of Star-Tree-Based Structures

In this subsection, first, Lemma 2 illustrates that the complexity of a star-tree-based structure is totally determined by the degree vector of the corresponding star tree. Next, Lemma 3 establishes a necessary and sufficient condition between degree vectors and the existence of a star trees. Based on these preparations, we finally formulate Problem 1 and then solve it in Theorem 1, aiming to obtain complexity-optimal star-tree-based structures via optimizing degree vectors.

Lemma 2.

For any $T\in\mathcal{T}_{n}$ , the complexity of $h(T)$ is

c(h(T))=\sum_{i=3}^{m+1}id_{i}(T)c_{i-1},

(1)

where $d_{i}(T)$ denotes the number of degree- $i$ nodes in $T$ and $c_{i-1}$ is the complexity factor of $(i-1)$ -input nodes.

Proof:

It holds that $E(h(T))=$ $\cup_{j\in[n],(a,x_{j})\in T}E(D(a,x_{j},T))=D(T)$ , where the notations can be found in Section II. As a result, we have

	$\displaystyle c(h(T))$	$\displaystyle=$	$\displaystyle\sum_{a:a\in h(T)}c_{d(a,h(T))}=\sum_{T^{\prime}\in E(h(T))}c_{d(r(T^{\prime}),T^{\prime})}=\sum_{T^{\prime}\in D(T)}c_{d(r(T^{\prime}),T^{\prime})}$
		$\displaystyle=$	$\displaystyle\sum_{(a,b)\in T}c_{d(a,D(a,b,T))}=\sum_{(a,b)\in T}c_{d(a,T)-1}=\sum_{i=3}^{m+1}id_{i}(T)c_{i-1}.$

∎

Recall that the degree vector of $T$ is $\mathbf{q}=(d_{3}(T),d_{4}(T),\ldots,d_{m+1}(T))$ . Therefore, Lemma 2 indicates that $c(h(T))$ is fully determined by $\mathbf{q}$ . We are interested in optimizing degree vectors to generate complexity-optimal star-tree-based structures. Since Construction 1 establishes a one-to-one mapping between star trees and star-tree-based structures, the remaining question is that what degree vectors can result in valid $n$ -input star trees. The following lemma answers this question.

Lemma 3.

For any $n\geq 3$ , there exists a star tree $T\in\mathcal{T}_{n}$ with degree vector $\mathbf{q}=(q_{1},q_{2},\ldots,q_{m-1})$ iff

\displaystyle 2+\sum_{i=1}^{m-1}iq_{i}=n.

(2)

Proof:

Necessity: Let $T\in\mathcal{T}_{n}$ . We prove that the degree vector $\mathbf{q}$ of $T$ satisfies (2). Firstly, for $n=3$ , there exists only one star tree as shown in Fig. 5, and (2) holds obviously. Secondly, for $n^{\prime}\geq 4$ , assume (2) holds for any $n\in[3,n^{\prime}-1]$ . We attempt to prove (2) holds for $n=n^{\prime}$ in the following. If $T$ contains only one internal node, (2) holds obviously. If $T$ contains at least two internal nodes, there always exists an internal node $a$ that connects with one internal node and $d(a,T)-1$ leaves. Without loss of generality, denote the $d(a,T)-1$ leaves connecting with $a$ as $x_{n^{\prime}},x_{n^{\prime}-1},\cdots,x_{n^{\prime}-d(a,T)+2}$ , respectively. Then, let $T^{\prime}=T-x_{n^{\prime}}-x_{n^{\prime}-1}-\cdots-x_{n^{\prime}-d(a,T)+2}$ , and relabel $a$ as $x_{n^{\prime}-d(a,T)+2}$ . We can verify that $T^{\prime}\in\mathcal{T}_{n^{\prime}-d(a,T)+2}$ . Since (2) holds for $n=n^{\prime}-d(a,T)+2$ , we have $2+(\sum_{i=1}^{m-1}iq_{i})-(d(a,T)-2)=n^{\prime}-(d(a,T)-2)$ , leading to $2+\sum_{i=1}^{m-1}(i-2)q_{i}=n^{\prime}$ , i.e., (2) holds for $n=n^{\prime}$ . Therefore, the necessity is proved.

Sufficiency: Suppose the degree vector $\mathbf{q}$ satisfies (2). We provide Construction 2 to prove that there always exists a star tree $T\in\mathcal{T}_{n}$ with degree vector $\mathbf{q}$ .

Construction 2.

For a given degree vector $\mathbf{q}$ , we construct a star tree by the following steps.

Step 1: Select a $q_{i}>0$ . Then, we can construct an undirected tree $T$ , which contains one degree- $(q_{i}+2)$ internal node and $(q_{i}+2)$ leaves. Decrease $q_{i}$ by 1.

Step 2: If $\mathbf{q}=\mathbf{0}$ , go to Step 3. Otherwise, select a $q_{i}>0$ and select an arbitrary leaf $b\in T$ . Connect $q_{i}+1$ new leaves to $b$ . We still use $T$ to denote the new tree and decrease $q_{i}$ by 1. Repeat Step 2.

Step 3: Label the leaves in $T$ as $x_{1},\ldots,x_{n}$ and keep the internal nodes unlabelled. Return $T$ as the constructed star tree.

∎

For any $n\geq 3$ , there exists at least one degree vector $\mathbf{q}=(q_{1},q_{2},\ldots,q_{m-1})$ satisfying (2) and to generate star tree via Construction 2. Note that it may result in multiple star trees, since there are multiple feasible choices of $q_{i}$ $(1\leq i<m)$ from $\mathbf{q}$ in Steps 1 and 2. Combining with Lemma 1, $\mathbf{q}$ corresponds to multiple star-tree-based structures. Denote $\mathcal{S}^{\mathrm{star}}(\mathbf{q})$ as the set of all the star-tree-based structures derived from $\mathbf{q}$ . Since the complexity of any structure in $\mathcal{S}^{\mathrm{star}}(\mathbf{q})$ is the same by Lemma 2, it suffices to optimize degree vectors to generate complexity-optimal star-tree-based structures. Specifically, define $C_{\min}^{\mathrm{star}}(n)\triangleq\min_{T\in\mathcal{T}_{n}}c(h(T))$ as the minimum complexity of star-tree-based structures with input size $n$ . Computing $C_{\min}^{\mathrm{star}}(n)$ is equivalent to solving the following optimization problem.

	Problem 1:		$\displaystyle C_{\min}^{\mathrm{star}}(n)=\min\limits_{\mathbf{q}}\,\sum_{i=1}^{m-1}(i+2)q_{i}c_{i+1}$
	$\displaystyle\,s.t.\quad$		$\displaystyle 2+\sum_{i=1}^{m-1}iq_{i}=n.$

The left problem is to solve Problem 1, which can be done in Algorithm 1. The following theorem illustrates the correctness and time complexity of Algorithm 1.

Algorithm 1 Finding optimal solution to Problem 1

0: The input size

n

C_{\min}^{\mathrm{star}}(n)

1: Initialize

C_{\min}^{\mathrm{star}}(2)=0

2: for

i=3

n

C_{\min}^{\mathrm{star}}(i)=\min_{t\in[m-1],t\leq i-2}\{C_{\min}^{\mathrm{star}}(i-t)+(t+2)c_{t+1}\}

4: end for

5: return

C_{\min}^{\mathrm{star}}(n)

Theorem 1.

$C_{\min}^{\mathrm{star}}(n)$ can be computed by Algorithm 1 with time complexity $O(mn)$ .

Proof:

In Algorithm 1, as shown in Line 1, we have $C_{\min}^{\mathrm{star}}(2)=0$ , since $y_{1}=x_{2}$ and $y_{2}=x_{1}$ for $n=2$ , and no computation nodes are required. Then, for any $i\in[3,n]$ , assume $\mathbf{q}$ is an optimal solution to Problem 1 for $C_{\min}^{\mathrm{star}}(i)$ . For any $t\in[m-1]$ with $q_{t}>0$ , let $\mathbf{q}^{\prime}=\mathbf{q}-\mathbf{e}_{t}$ , where $\mathbf{e}_{t}$ is a vector with length $m-1$ whose $t$ -th entry is 1 and other entries are 0. Then $\mathbf{q}^{\prime}$ is an optimal solution for $C_{\min}^{\mathrm{star}}(i-t)$ , leading to $C_{\min}^{\mathrm{star}}(i)=C_{\min}^{\mathrm{star}}(i-t)+(t+2)c_{t+1}$ . Therefore, $C_{\min}^{\mathrm{star}}(i)=\min_{t\in[m-1],t\leq i-2}\{C_{\min}^{\mathrm{star}}(i-t)+(t+2)c_{t+1}\}$ holds, corresponding to Line 3 of Algorithm 1.

Noting that Line 3 in Algorithm 1 has time complexity $O(m)$ and is carried out $O(n)$ times, the time complexity of Algorithm 1 is $O(mn)$ . ∎

By Algorithm 1, we are able to determine the lowest complexity $C_{\min}^{\mathrm{star}}(n)$ of star-tree-based structures with input size $n$ . Further by backtracking, we can obtain an (or all if needed) optimal degree vector $\mathbf{q}$ which leads to complexity-optimal star-tree-based structures $\mathcal{S}^{\mathrm{star}}(\mathbf{q})$ . However, the structures in $\mathcal{S}^{\mathrm{star}}(\mathbf{q})$ may have different latency. We are interested in further obtaining the latency-optimal structures in $\mathcal{S}^{\mathrm{star}}(\mathbf{q})$ in the following subsection.

IV-C The Lowest Latency of Star-Tree-Based Structures for Given Degree Vectors

In this subsection, first, Lemma 4 defines the latency of star trees and shows that it can fully characterize the latency of the corresponding star-tree-based structures. Then, Lemma 5 illustrates how to compute the latency of star trees. Based on these preparations, we finally formulate Problem 2 and then solve it in Theorem 2, aiming to obtain the latency-optimal structures in $\mathcal{S}^{\mathrm{star}}(\mathbf{q})$ via minimizing the latency of star trees corresponding to a given degree vector $\mathbf{q}$ .

Lemma 4.

For any star tree $T$ , define

\phi(T)\triangleq\max\left\{\sum_{i=0}^{k}l_{d(v_{i},T)-1}:(v_{0},v_{1},\ldots,v_{k})\text{ is a simple path in }T\right\}

as the latency of $T$ . Then, $l(h(T))=\phi(T)$ .

Proof:

Assume the star tree $T\in\mathcal{T}_{n}$ . Obviously, there exists a simple path $(v_{0},v_{1},\ldots,v_{k})\subseteq T$ iff there exists a path $(v^{\prime}_{0},v^{\prime}_{1},\ldots,v^{\prime}_{k-1})\subseteq h(T)$ , where $d(v^{\prime}_{i},h(T))=d(v_{i},T)-1$ for each $i\in[0,k-1]$ . As a result, we have $l(h(T))=\max_{(v^{\prime}_{0},v^{\prime}_{1},\ldots,v^{\prime}_{k-1})\subseteq h(T)}\{\sum_{i=0}^{k-1}l_{d(v_{i},h(T))}\}=\max_{(v_{0},v_{1},\ldots,v_{k})\subseteq T}\{\sum_{i=0}^{k}l_{d(v_{i},T)-1}\}=\phi(T)$ . ∎

As shown in Lemma 4, minimizing the latency of structures derived from the star trees with a given degree vector is equivalent to minimizing the latency of the star trees. The following lemma shows how to compute the latency of a star tree.

Lemma 5.

For any star tree $T$ with $n\geq 3$ , there always exists an edge $(a,b)\in T$ such that

\displaystyle\left\{\begin{array}[]{l}l(D(a,b,T))-l_{d(a,T)-1}\leq l(D(b,a,T)),\\ l(D(a,b,T))>l(D(b,a,T)).\end{array}\right.

(5)

If $(a,b)$ satisfies (5), we have

\phi(T)=l(D(a,b,T))+l(D(b,a,T)).

(6)

Proof:

Suppose $(v_{0},v_{1},\ldots,v_{k})$ is a simple path in $T$ such that $\sum_{i=0}^{k}l_{d(v_{i},T)-1}=\phi(T)$ . There always exists an $i\in[k]$ such that (5) holds for $a=v_{i-1}$ and $b=v_{i}$ . This completes the proof of (5).

We are now to prove (6). Suppose $(a,b)\in T$ is an arbitrary edge satisfying (5). According to Definition 4, $l(D(a,b,T))$ (resp. $l(D(b,a,T))$ ) denotes the latency of the path in $D(a,b,T)$ (resp. $D(b,a,T)$ ) from a leaf to $a$ (resp. $b$ ), and we let this leaf be $x_{i}$ (resp. $x_{j}$ ), without loss of generality. Then, we have $l(D(a,b,T))+l(D(b,a,T))=\sum_{i=0}^{k}l_{d(v_{i},T)-1}$ , where $(v_{0},v_{1},\ldots,v_{k})$ is a simple path in $T$ with $v_{0}=x_{i}$ and $v_{k}=x_{j}$ . Therefore, we have $\phi(T)\geq l(D(a,b,T))+l(D(b,a,T))$ .

To prove (6), it now suffices to prove $l(T)\leq l(D(a,b,T))+l(D(b,a,T))$ . Let $x_{i}$ , $x_{j}$ be two arbitrary leaves in $T$ such that there exists a simple path connecting them and the path’s latency is $l(T)$ . If $(x_{i}\in D(a,b,T)$ and $x_{j}\in D(b,a,T))$ or $(x_{j}\in D(a,b,T)$ and $x_{i}\in D(b,a,T))$ , we have $l(T)\leq l(D(a,b,T))+l(D(b,a,T))$ . If $x_{i},x_{j}\in D(a,b,T)$ , we have $l(T)\leq 2l(D(a,b,T))-l_{d(a,T)-1}\leq l(D(a,b,T))+l(D(b,a,T))$ . If $x_{i},x_{j}\in D(b,a,T)$ , we have $l(T)\leq 2l(D(b,a,T))-l_{d(b,T)-1}<l(D(a,b,T))+l(D(b,a,T))$ . Therefore, $l(T)\leq l(D(a,b,T))+l(D(b,a,T))$ . ∎

Lemma 5 indicates that, the minimum latency of star trees with degree vector $\mathbf{q}$ is equal to the minimum value of $l(D_{1})+l(D_{2})$ , where $D_{1}$ and $D_{2}$ are two DRTs (respectively corresponding to $D(a,b,T)$ and $D(b,a,T)$ in Lemma 5) satisfy the following three conditions: (i) $l(D_{1})-l_{d(r(D_{1}),D_{1})}\leq l(D_{2})$ , (ii) $l(D_{1})>l(D_{2})$ , and (iii) $d_{i+1}(D_{1})+d_{i+1}(D_{2})=q_{i},\forall i\in[m-1]$ , i.e., the summation of $D_{1}$ and $D_{2}$ ’s degree vectors equals $\mathbf{q}$ . As a result, computing $L_{\min}^{\mathrm{star}}(\mathbf{q})\triangleq\min_{S\in\mathcal{S}^{\mathrm{star}}(\mathbf{q})}l(S)$ , the minimum latency of the star-tree-based structures derived from degree vector $\mathbf{q}$ , is equivalent to solving the following optimizing problem.

Problem 2:	$\displaystyle L_{\min}^{\mathrm{star}}(\mathbf{q})=\min\limits_{D_{1},D_{2}}\,l(D_{1})+l(D_{2})$
$\displaystyle\,s.t.\quad$	$\displaystyle l(D_{1})-l_{d(r(D_{1}),D_{1})}\leq l(D_{2}),$	(7)
	$\displaystyle l(D_{1})>l(D_{2}),$	(8)
	$\displaystyle d_{i+1}(D_{1})+d_{i+1}(D_{2})=q_{i},\forall i\in[m-1].$	(9)

To this end, for any degree vector $\mathbf{u}$ and positive integer $t$ , define $\tau(\mathbf{u},t)=\min_{D^{\prime}_{1},...,D^{\prime}_{t}}$ $\max\{l(D^{\prime}_{1}),...,l(D^{\prime}_{t})\}$ , where $D^{\prime}_{1},...,D^{\prime}_{t}$ are DRTs and the summation of their degree vectors equals $\mathbf{u}$ . Here, $\tau(\mathbf{u},t)$ returns the lowest latency of the substructures that have the degree vector $\mathbf{u}$ and consist of $t$ disjoint DRTs. It will be computed later in Algorithm 3. For any two degree vectors $\mathbf{u}$ , $\mathbf{u}^{\prime}\in\mathbb{Z}_{+}^{m-1}$ , we say $\mathbf{u}\preceq\mathbf{u}^{\prime}$ iff for each $i\in[m-1]$ , $u_{i}\leq u_{i}^{\prime}$ . Then, we can solve Problem 2 by Algorithm 2.

Algorithm 2 Finding optimal solution to Problem 2

\mathbf{q}

L_{\min}^{\mathrm{star}}(\mathbf{q})

1: %

\mathbf{u}

and

\mathbf{q}-\mathbf{u}

are the degree vectors of

D_{1}

and

D_{2}

, respectively

2: for each

\mathbf{u}\preceq\mathbf{q}

3: %The degree of

D_{1}

’s root is

d(r(D_{1}),D_{1})=i+1

4: for

i=1

m-1

5: if

u_{i}\geq 1

then

6: %As

\mathbf{u}

and

i

are given, the minimum values of

l(D_{1})

and

l(D_{2})

are

\tau(\mathbf{u}-\mathbf{e}_{i},i+1)+l_{i+1}

and

\tau(\mathbf{q}-\mathbf{u},1)

, respectively

7: if

\tau(\mathbf{u}-\mathbf{e}_{i},i+1)\leq\tau(\mathbf{q}-\mathbf{u},1)

(i.e., Eq. (7)) and

\tau(\mathbf{u}-\mathbf{e}_{i},i+1)+l_{i+1}>\tau(\mathbf{q}-\mathbf{u},1)

(i.e., Eq. (8)) then

l=\tau(\mathbf{u}-\mathbf{e}_{i},i+1)+l_{i+1}+\tau(\mathbf{q}-\mathbf{u},1)

. %The minimum value of

l(D_{1})+l(D_{2})

for given

\mathbf{u}

and

i

L_{\min}^{\mathrm{star}}(\mathbf{q})=\min\{L_{\min}^{\mathrm{star}}(\mathbf{q}),l\}

10: end if

11: end if

12: end for

13: end for

14: return

L_{\min}^{\mathrm{star}}(\mathbf{q})

Theorem 2.

Given a degree vector $\mathbf{q}$ and $\tau(\mathbf{u},t)$ , $\forall\mathbf{u}\leq\mathbf{q},t\in[m]$ , Algorithm 2 can compute $L_{\min}^{\mathrm{star}}(\mathbf{q})$ with time complexity $O(m\prod_{i=1}^{m-1}(q_{i}+1))$ .

Proof:

Let $\mathbf{u}$ be the degree vector of $D_{1}$ and $i+1$ be the degree of $D_{1}$ ’s root. According to constraint (9), the degree vector of $D_{2}$ must be $\mathbf{q}-\mathbf{u}$ . Therefore, as shown in Lines 2–13, by traversing $\mathbf{u}\preceq\mathbf{q}$ and $i$ from 1 to $m-1$ , we can enumerate all the feasible solutions and find the one with lowest $l(D_{1})+l(D_{2})$ . Specifically, given $\mathbf{u}$ and $i$ , the minimum latency of $D_{1}$ and $D_{2}$ are $\tau(\mathbf{q}-\mathbf{e}_{i},i+1)+l_{i+1}$ and $\tau(\mathbf{q}-\mathbf{u},1)$ respectively, and the conditions in Line 7 correspond to the constraints (7) and (8).

In Algorithm 2, Line 8 is carried out $O(m\prod_{i=1}^{m-1}(q_{i}+1))$ times, each of which has time complexity $O(1)$ . Therefore, the time complexity of Algorithm 2 is $O(m\prod_{i=1}^{m-1}(q_{i}+1))$ . ∎

At this point, the left problem is to compute $\tau(\mathbf{u},t)$ , $\forall\mathbf{u}\leq\mathbf{q},t\in[m]$ . We complete this task in Algorithm 3.

Algorithm 3 Computation of

\tau(\mathbf{u},t),\forall\mathbf{u}\preceq\mathbf{q},t\in[m].

\mathbf{q}

\tau(\mathbf{u},t),\forall\mathbf{u}\preceq\mathbf{q},t\in[m].

1: for

t=1

m

\tau(\mathbf{0},t)=0

3: end for

4: for each

\mathbf{u}\preceq\mathbf{q}

such that

||\mathbf{u}||_{1}

is non-decreasing do

5: for

t=1

m

6: %Suppose

G

is an arbitrary substructure that has degree vector

\mathbf{u}

and consists of

t

disjoint DRTs

7: if

t==1

then

8: %

G

is a DRT. Fixing the degree of its root as

d(r(G),G)=i+1

, the graph

G-r(G)

has degree vector

\mathbf{u}-\mathbf{e}_{i}

and consists of

i+1

disjoint DRTs

\tau(\mathbf{u},t)=\min_{i\in[m-1],u_{i}>0}\{\tau(\mathbf{u}-\mathbf{e}_{i},i+1)+l_{i+1}\}

10: else

11: %

G

consists of

t

disjoint DRTs, named

D_{1},D_{2},\ldots,D_{t}

. Fixing the degree vector of

D_{1}

\mathbf{u}^{\prime}

, the graph

G\setminus D_{1}

has degree vector

\mathbf{u}-\mathbf{u}^{\prime}

and consists of

t-1

disjoint DRTs

12:

\tau(\mathbf{u},t)=\min_{\mathbf{u}^{\prime}\preceq\mathbf{u}}\max\{\tau(\mathbf{u}^{\prime},1),\tau(\mathbf{u}-\mathbf{u}^{\prime},t-1)\}

13: end if

14: end for

15: end for

16: return

\tau(\mathbf{u},t),\forall\mathbf{u}\preceq\mathbf{q},t\in[m].

Lemma 6.

Given a degree vector $\mathbf{q}$ , then $\tau(\mathbf{u},t),\forall\mathbf{u}\preceq\mathbf{q},t\in[m]$ can be computed by Algorithm 3 with time complexity $O(m\prod_{i=1}^{m-1}\frac{(q_{i}+1)(q_{i}+2)}{2})$ .

Proof:

Firstly, as shown in Line 2, for any $t\in[m]$ , we have $\tau(\mathbf{0},t)=0$ , since there exist no computation nodes in the graph. Then, for any degree vector $\mathbf{u}\preceq\mathbf{q}$ and $t\in[m]$ , suppose $G$ is an arbitrary substructure that has degree vector $\mathbf{u}$ and consists of $t$ disjoint DRTs. If $t=1$ , the substructure $G$ is a DRT. Fixing the degree of its root as $d(r(G),G)=i+1$ , the graph $G-r(G)$ has degree vector $\mathbf{u}-\mathbf{e}_{i}$ and consists of $i+1$ disjoint DRTs. Therefore, when fixing $d(r(G),G)=i+1$ , the minimum latency of $G$ is $\tau(\mathbf{u}-\mathbf{e}_{i},i+1)+l_{i+1}$ . As shown in Line 7, by traversing all the possible $i$ , we can find the minimum latency of $G$ :

\tau(\mathbf{u},t)=\min_{i\in[m-1],u_{i}>0}\{\tau(\mathbf{u}-\mathbf{e}_{i},i+1)+l_{i+1}\}.

(10)

If $t>1$ , the substructure $G$ consists of $t$ disjoint DRTs, named $D_{1},D_{2},\ldots,D_{t}$ . Fixing the degree vector of $D_{1}$ as $\mathbf{u}^{\prime}$ , the graph $G\setminus D_{1}$ has degree vector $\mathbf{u}-\mathbf{u}^{\prime}$ and consists of $t-1$ disjoint DRTs. Therefore, when fixing the degree vector of $D_{1}$ as $\mathbf{u}^{\prime}$ , the minimum latency of $G$ is $\max\{\tau(\mathbf{u}^{\prime},1),\tau(\mathbf{u}-\mathbf{u}^{\prime},t-1)\}$ . As shown in Line 7, by traversing all the possible $\mathbf{u}^{\prime}$ , we can find the minimum latency of $G$ :

\tau(\mathbf{u},t)=\min_{\mathbf{u}^{\prime}\preceq\mathbf{u}}\max\{\tau(\mathbf{u}^{\prime},1),\tau(\mathbf{u}-\mathbf{u}^{\prime},t-1)\}.

(11)

As shown in Lines 4–10, we enumerate all the possible $\mathbf{u}$ and $t$ and compute $\tau(\mathbf{u},t)$ . Here, we enumerate $\mathbf{u}$ such that $||\mathbf{u}||_{1}$ is non-decreasing and enumerate $t$ from 1 to $m$ , so that when computing $\tau(\mathbf{u},t)$ , the required values of $\tau(\mathbf{u}^{\prime},t^{\prime})$ have been already computed. Therefore, given $\mathbf{q}$ , Algorithm 3 can obtain $\tau(\mathbf{u},t),\forall\mathbf{u}\preceq\mathbf{q},t\in[m]$ .

In the following, we investigate the time complexity of Algorithm 3. In Line 4, we enumerate $\mathbf{u}\preceq\mathbf{q}$ . In Line 5, we enumerate $t$ from $1$ to $m$ . And in Line 9, we enumerate $\mathbf{u}^{\prime}\preceq\mathbf{u}$ . Thus, the number of computations is $\sum_{\mathbf{u}\preceq\mathbf{q}}m|\{\mathbf{u}^{\prime}:\mathbf{u}^{\prime}\preceq\mathbf{u}\}|$ . We can simplify it as follows:

			$\displaystyle\sum_{\mathbf{u}\preceq\mathbf{q}}m\|\{\mathbf{u}^{\prime}:\mathbf{u}^{\prime}\preceq\mathbf{u}\}\|$
		$\displaystyle=$	$\displaystyle m\sum_{\mathbf{u}\preceq\mathbf{q}}\|\{\mathbf{u}^{\prime}:\mathbf{u}^{\prime}\preceq\mathbf{u}\}\|$
		$\displaystyle=$	$\displaystyle m\sum_{u_{1}=0}^{q_{1}}\sum_{(u_{2},\ldots,u_{m-1})\preceq(q_{2},\ldots,q_{m-1})}\|\{\mathbf{u}^{\prime}:\mathbf{u}^{\prime}\preceq\mathbf{u}\}\|$
		$\displaystyle=$	$\displaystyle m\sum_{u_{1}=0}^{q_{1}}\sum_{(u_{2},\ldots,u_{m-1})\preceq(q_{2},\ldots,q_{m-1})}\|\{(u^{\prime}_{1},\ldots,u^{\prime}_{m-1}):u^{\prime}_{1}\in[0,u_{1}],(u^{\prime}_{2},\ldots,u^{\prime}_{m-1})\preceq(u_{2},\ldots,u_{m-1})\}\|$

		$\displaystyle=$	$\displaystyle m\sum_{u_{1}=0}^{q_{1}}((u_{1}+1)\sum_{(u_{2},\ldots,u_{m-1})\preceq(q_{2},\ldots,q_{m-1})}\|\{(u^{\prime}_{2},\ldots,u^{\prime}_{m-1}):(u^{\prime}_{2},\ldots,u^{\prime}_{m-1})\preceq(u_{2},\ldots,u_{m-1})\}\|)$
		$\displaystyle=$	$\displaystyle m\frac{(q_{1}+1)(q_{1}+2)}{2}\sum_{(u_{2},\ldots,u_{m-1})\preceq(q_{2},\ldots,q_{m-1})}\|\{(u^{\prime}_{2},\ldots,u^{\prime}_{m-1}):(u^{\prime}_{2},\ldots,u^{\prime}_{m-1})\preceq(u_{2},\ldots,u_{m-1})\}\|$
		$\displaystyle=$	$\displaystyle m\frac{(q_{1}+1)(q_{1}+2)}{2}\frac{(q_{2}+1)(q_{2}+2)}{2}$
			$\displaystyle\sum_{(u_{3},\ldots,u_{m-1})\preceq(q_{3},\ldots,q_{m-1})}\|\{(u^{\prime}_{3},\ldots,u^{\prime}_{m-1}):(u^{\prime}_{3},\ldots,u^{\prime}_{m-1})\preceq(u_{3},\ldots,u_{m-1})\}\|$
		$\displaystyle=$	$\displaystyle m\prod_{i=1}^{m-1}\frac{(q_{i}+1)(q_{i}+2)}{2}.$

The proof is completed. ∎

By Algorithms 2 and 3, we are able to determine the lowest latency $L_{\min}^{\mathrm{star}}(\mathbf{q})$ of the star-tree-based structures $\mathcal{S}^{\mathrm{star}}(\mathbf{q})$ derived from degree vector $\mathbf{q}$ . Further by backtracking, we can find the corresponding latency-optimal star tree $T$ with degree vector $\mathbf{q}$ , and $T$ can be further used to generate a latency-optimal star-tree-based structure $S$ from $\mathcal{S}^{\mathrm{star}}(\mathbf{q})$ via Construction 1. Note that $\mathbf{q}$ can be any valid degree vector. However, if $\mathbf{q}$ is the uniquely optimal solution to Problem 1, $S$ has the lowest latency among all the complexity-optimal star-tree-based structures. On the other hand, if Problem 1 has multiple optimal solutions, we can enumerate them to find the latency-optimal ones among complexity-optimal star-tree-based structures.

V Isomorphic-DRT-Based Structures

In this section, we construct structures for the scenario where latency has a higher priority than complexity. To this end, we focus on a class of structures, called isomorphic-DRT-based structures. Since there always exist isomorphic-DRT-based structures (or structures derived from them) achieving the lowest latency, which is explicated in Lemma 8. Section V-A proposes a class of DRTs, called isomorphic DRTs, and further provides a construction from isomorphic DRTs to isomorphic-DRT-based structures. Next, Section V-B presents a construction from type vectors to isomorphic-DRTs. Here, the type vector is used to characterize an isomorphic DRT, each component of which is the number of levels that contain nodes with the corresponding degree. We further illustrate that given a type vector, the latency of its corresponding isomorphic-DRT-based structures is uniquely determined. Therefore, Theorem 3 is proposed to find the type vectors that lead to the latency-optimal isomorphic-DRT-based structures. Finally, Section V-C derives Construction 5 and Theorem 4 to find a complexity-optimal isomorphic-DRT-based structure $S$ corresponding to a given type vector $\mathbf{w}$ . As a result, if $\mathbf{w}$ is the uniquely optimal type vector obtained from Theorem 3, then $S$ is a latency-optimal isomorphic-DRT-based structure which also has the lowest complexity among all latency-optimal isomorphic-DRT-based structures.

V-A Isomorphic DRTs and Isomorphic-DRT-Based Structures

In this subsection, we propose a class of DRTs, called isomorphic DRTs, in Definition 6. Then, Construction 3 and Lemma 7 show that, based on any isomorphic DRT $D$ , we can obtain a structure, which is called an isomorphic-DRT-based structure. Finally, Lemma 8 illustrates that it suffices to investigate the isomorphic-DRT-based structures when the latency of structures is of interest.

Definition 6 (isomorphic DRT).

We refer to a DRT $D$ as an isomorphic DRT iff for any internal node $a$ in $D$ , all the subtrees rooted at $a$ ’s children are isomorphic.

Denote $\mathcal{D}_{n}$ as the set of all isomorphic DRTs with exactly $n$ -leaves. Fig. 2 shows an example of isomorphic DRTs in $\mathcal{D}_{6}$ . In an isomorphic DRT $D$ , the nodes in a same level have the same degree (number of incoming edges). For the sake of convenience, for any $i\in[2,m]$ , we refer to a level of $D$ by type- $i$ level if it contains degree- $i$ nodes. Moreover, we define the type vector of $D$ by $\mathbf{w}=(w_{1},w_{2},\ldots,w_{m-1})\in\mathbb{Z}_{+}^{m-1}$ , where $w_{i},i\in[m-1]$ is the number of type- $(i+1)$ levels in $D$ .

In the following, we propose a construction from isomorphic DRTs to structures in Construction 3, and use Lemma 7 to state the correctness of the construction.

Construction 3.

For any $n\geq 3$ and an isomorphic DRT $D\in\mathcal{D}_{n-1}$ , we construct the structure by the following steps:

Step 1: For any $j\in[n]$ , let $D_{j}$ be a copy of $D$ , and label all the leaves of $D_{j}$ as $X\setminus\{x_{j}\}$ respectively.

Step 2: Return $S=\vee_{j\in[n]}D_{j}$ as the constructed structure.

Lemma 7.

For any $n\geq 3$ and $D\in\mathcal{D}_{n-1}$ , supposing that $S$ is a structure derived from $D$ via Construction 3, then $S\in\mathcal{S}_{n}$ .

Proof:

It holds obviously, according to Definitions 1 and 3. ∎

All the structures that can be derived from isomorphic DRTs via Construction 3 are called isomorphic-DRT-based structures. Fig. 3 shows an example of isomorphic-DRT-based structure corresponding to the isomorphic DRTs given in Fig. 2. Note that the resultant structure of Construction 3 is generally not unique for a given isomorphic DRT $D$ , since we may label the leaves of $D_{j}$ in different ways. For convenience, denote $\mathcal{S}^{\mathrm{isom}}(D)$ as the set of all possible isomorphic-DRT-based structures derived from $D$ via Construction 3.

In the following, we illustrate that the minimum latency of the structures for given $n$ can be connected to the minimum latency of isomorphic-DRT-based structures with input size $n^{\prime}\geq n$ .

Lemma 8.

For any $S\in\mathcal{S}_{n}$ , there always exists an isomorphic-DRT-based structure $S^{\prime}\in\mathcal{S}_{n^{\prime}}$ satisfying that $n^{\prime}\geq n$ and $l(S^{\prime})\leq l(S)$ .

Proof:

For a given structure $S\in\mathcal{S}_{n}$ , we attempt to construct an isomorphic-DRT-based structure $S^{\prime}$ satisfying that $n^{\prime}\geq n$ and $l(S^{\prime})\leq l(S)$ by the following steps.

Step 1: Let $D$ denote the DRT $E(y_{1},S)$ and then unlabel $D$ . Set $i=1$ .

Step 2: Let $A_{i}$ be the set of nodes in the $i$ -th level of $D$ . If all the nodes in $A_{i}$ are leaves, go to Step 3. Otherwise, select a node $a\in A_{i}$ , such that the number of leaves of $E(a,D)$ is equal to or greater than $E(b,D),\forall b\in A_{i}$ . Then, for each $b\in A_{i}\setminus\{a\}$ , modify the $E(b,D)$ , so that $E(b,D)$ is isomorphic to $E(a,D)$ . Increase $i$ by 1 and then repeat Step 2.

Step 3: Return $S^{\prime}$ derived from $D$ via Construction 3.

In Step 2, the number of $D$ ’s leaves is either increased or unchanged, and the latency of $D$ is at most $l(S)$ . As a result, $S^{\prime}$ is an isomorphic-DRT-based structure satisfying that $n^{\prime}\geq n$ and $l(S^{\prime})\leq l(S)$ . The proof is completed. ∎

According to Lemma 8, it suffices to investigate the latency-optimal isomorphic-DRT-based structures whose number of leaves is at least $n$ when the minimum latency of $S\in\mathcal{S}_{n}$ is of interest. In particular, if we can first construct a latency-optimal isomorphic-DRT-based structure $S^{\prime}\in\mathcal{S}_{n^{\prime}}$ with $n^{\prime}\geq n$ , then properly we remove certain nodes (including $x_{n+1},...,x_{n^{\prime}}$ , $y_{n+1},...,y_{n^{\prime}}$ ) from $S^{\prime}$ to obtain a latency-optimal structure $S\in\mathcal{S}_{n}$ , where $l(S)=l(S^{\prime})$ by Lemma 8 and by the fact that removing nodes from $S^{\prime}$ does not increase latency. For instance, we can initially construct the latency-optimal isomorphic-DRT-based structure $S^{\prime}\in\mathcal{S}_{7}$ shown in Fig. 4, and then remove the nodes $x_{7}$ and $y_{7}$ from $S^{\prime}$ to obtain the latency-optimal $S\in\mathcal{S}_{6}$ shown in Fig. 9.

According to the above analysis, we only focus on the latency and complexity of isomorphic-DRT-based structures in the rest of this section,.

V-B The Lowest Latency of Isomorphic-DRT-Based Structures

In this subsection, first, Lemma 9 investigates the latency of isomorphic-DRT-based structures and find that given a type vector, the latency of its corresponding isomorphic-DRT-based structures is uniquely determined. Next, Construction 4 provides a method to obtain isomorphic DRTs for given type vectors. Based on these preparations, we finally formulate Problem 3 and solve it in Theorem 3, aiming to obtain latency-optimal isomorphic-DRT-based structures via optimizing type vectors.

Lemma 9.

For any $n\geq 3$ , $D\in\mathcal{D}_{n-1}$ and $S\in\mathcal{S}^{\mathrm{isom}}(D)$ , the latency of $S$ is

l(S)=l(D)=\sum_{i=1}^{m-1}w_{i}l_{i+1},

(12)

where $w_{i}$ denotes the number of type- $(i+1)$ levels in $D$ and $l_{i+1}$ is the latency factor of $(i+1)$ -input nodes.

Proof:

According to Definition 6, for any $i,j\in[n+1]$ and $i\neq j$ , the path from $x_{i}$ to $y_{j}$ has the latency $\sum_{i=1}^{m-1}w_{i}l_{i+1}$ . Noting Definition 3, the proof is completed. ∎

According to Lemma 9, the latency $l(S)$ is fully determined by the type vector $\mathbf{w}=(w_{1},w_{2},\ldots,w_{m-1})$ . We are interested in optimizing type vectors to generate latency-optimal isomorphic-DRT-based structures. Since Construction 3 can generate isomorphic-DRT-based structures from isomorphic DRTs, the remaining question is that what type vectors can result in valid $(n-1)$ -input isomorphic DRTs. The following lemma answers this question.

Lemma 10.

For any $n\geq 3$ , there exists an isomorphic DRT $D\in\mathcal{D}_{n-1}$ with type vector $\mathbf{w}=(w_{1},w_{2},\ldots,w_{m-1})$ iff

\displaystyle\prod_{i=1}^{m-1}(i+1)^{w_{i}}=n-1.

(13)

Proof:

Necessity: According to Definition 6, the necessity holds obviously.

Sufficiency: Suppose the type vector $\mathbf{w}$ satisfies (13). We provide Construction 4 to prove that there always exists an isomorphic DRT $D\in\mathcal{D}_{n-1}$ with type vector $\mathbf{w}$ .

Construction 4.

For a given type vector $\mathbf{w}$ , we construct an isomorphic DRT by the following steps.

Step 1: Select a $w_{i}>0$ . Then, we can construct a DRT $D$ , which contains one degree- $(i+1)$ internal node and $(i+1)$ leaves. Decrease $w_{i}$ by 1.

Step 2: If $\mathbf{w}=\mathbf{0}$ , return $D$ as the constructed isomorphic DRT. Otherwise, select a $w_{i}>0$ and replace each leave in $D$ with a subtree that contains one degree- $(i+1)$ internal node and $(i+1)$ leaves. Repeat Step 2.

∎

Note that for some $n\geq 3$ , there may exist no type vector satisfying (13) such that $\mathcal{D}_{n-1}=\emptyset$ . However, as discussed at the end of Section V-A, we only need to focus on those $n$ such that (13) can be satisfied and $\mathcal{D}_{n-1}\neq\emptyset$ . Further recall that a type vector can generate multiple isomorphic DRTs via Construction 4 and an isomorphic DRT can also generate multiple isomorphic-DRT-based structures via Construction 3. As a result, a type vector corresponds to multiple isomorphic-DRT-based structures. Denote $\mathcal{S}^{\mathrm{isom}}(\mathbf{w})$ as the set of all the isomorphic-DRT-based structures derived from degree vector $\mathbf{w}$ . Since the latency of any structure in $\mathcal{S}^{\mathrm{isom}}(\mathbf{w})$ is the same by Lemma 9, it suffices to optimize type vectors to generate latency-optimal isomorphic-DRT-based structures. Specifically, define $L_{\min}^{\mathrm{isom}}(n-1)\triangleq\min_{D\in\mathcal{D}_{n-1}}l(D)$ for $\mathcal{D}_{n-1}\neq\emptyset$ , which is the minimum latency of isomorphic DRTs with input size $n-1$ as well as the minimum latency of isomorphic-DRT-based structures with input size $n$ . Computing $L_{\min}^{\mathrm{isom}}(n-1)$ is equivalent to solving the following optimization problem.

	Problem 3:		$\displaystyle L_{\min}^{\mathrm{isom}}(n-1)=\min\limits_{\mathbf{w}}\,\sum_{i=1}^{m-1}w_{i}l_{i+1}$
	$\displaystyle\,s.t.\quad$		$\displaystyle\prod_{i=1}^{m-1}(i+1)^{w_{i}}=n-1.$		(14)

The left problem is to solve Problem 3, which has been done in Algorithm 4. The following theorem illustrates the correctness and time complexity of Algorithm 4.

Algorithm 4 Finding optimal solutions to Problem 3

0: The input size

n

L_{\min}^{\mathrm{isom}}(n-1)

1: Initialize

L_{\min}^{\mathrm{isom}}(1)=0

2: for

i=2

n-1

L_{\min}^{\mathrm{isom}}(i)=\min_{t\in[2,m]\text{~{}is a factor of~{}}i}\{L_{\min}^{\mathrm{isom}}(i/t)+l_{t}\}

4: end for

5: return

L_{\min}^{\mathrm{isom}}(n-1)

Theorem 3.

$L_{\min}^{\mathrm{isom}}(n-1)$ can be computed by Algorithm 4 with time complexity $O(mn)$ .

Proof:

In Algorithm 4, as shown in Line 1, we have $L_{\min}^{\mathrm{isom}}(1)=0$ , since no computation nodes are required. Then, for any $i\in[2,n-1]$ , assume $\mathbf{w}$ is an optimal solution to Problem 3 for $L_{\min}^{\mathrm{isom}}(i)$ . For any integer $t\in[2,m]$ satisfying $w_{t-1}>0$ , then $t$ must be a factor of $i$ . Let $\mathbf{w}^{\prime}=\mathbf{w}-\mathbf{e}_{t-1}$ , where $\mathbf{e}_{t-1}$ is a vector with length $m-1$ whose $(t-1)$ -th entry is 1 and other entries are 0. Then we have $\mathbf{w}^{\prime}$ is an optimal solution for $L_{\min}^{\mathrm{isom}}(i/t)$ , leading to $L_{\min}^{\mathrm{isom}}(i)=L_{\min}^{\mathrm{isom}}(i/t)+l_{t}$ . Therefore, $L_{\min}^{\mathrm{isom}}(i)=\min_{t\in[2,m]\text{~{}is a factor of~{}}i}\{L_{\min}^{\mathrm{isom}}(i/t)+l_{t}\}$ holds, corresponding to Line 3 of Algorithm 4.

Noting that Line 3 in Algorithm 4 has time complexity $O(m)$ and is carried out $O(n)$ times, the time complexity of Algorithm 4 is $O(mn)$ . ∎

By Algorithm 4, we are able to determine the lowest latency $L_{\min}^{\mathrm{isom}}(n-1)$ of isomorphic-DRT-based structures with input size $n$ . Further by backtracking, we can obtain an (or all if needed) optimal type vector $\mathbf{w}$ which leads to latency-optimal isomorphic-DRT-based structures $\mathcal{S}^{\mathrm{isom}}(\mathbf{w})$ . However, the structures in $\mathcal{S}^{\mathrm{isom}}(\mathbf{w})$ may have different complexity. We are interested in further obtaining the complexity-optimal structures in $\mathcal{S}^{\mathrm{isom}}(\mathbf{w})$ in the following subsection.

V-C The Lowest Complexity of Isomorphic-DRT-Based Structures for Given Type Vectors

In this subsection, for a given degree vector $\mathbf{w}$ , we first generate an isomorphic-DRT-based structure $S\in\mathcal{S}^{\mathrm{isom}}(\mathbf{w})$ via Construction 5, and then prove that $S$ has the lowest complexity among all isomorphic-DRT-based structures $\mathcal{S}^{\mathrm{isom}}(\mathbf{w})$ in Theorem 4.

Construction 5.

For a given type vector $\mathbf{w}$ , we construct a structure $S$ by the following steps.

Step 1: Obtain an isomorphic DRT $D$ from $\mathbf{w}$ via Construction 4.

Step 2: Obtain an isomorphic-DRT-based structure $S$ from $D$ via Construction 3, where for each $j\in[n]$ , label leaves of $D_{j}$ from left to right in a consecutive way: $x_{j+1},\ldots,x_{n},x_{1},\ldots,x_{j-1}$ , where $x_{n}$ and $x_{1}$ are considered as consecutive to each other. Return $S$ as the constructed structure.

Fig. 4 shows a structure $S$ obtained from Construction 5 for $n=7$ , $m=3$ and $\mathbf{w}=(1,1)$ . The following theorem states that $S$ does have the lowest complexity among $\mathcal{S}^{\mathrm{isom}}(\mathbf{w})$ .

Theorem 4.

For a given type vector $\mathbf{w}$ and $n=1+\prod_{i=1}^{m-1}(i+1)^{w_{i}}$ , let $S$ be the structure obtained from Construction 5. Then, $S\in\mathcal{S}^{\mathrm{isom}}(\mathbf{w})$ and $c(S)=\sum_{i=1}^{m-1}nw_{i}c_{i+1}=\min_{S^{\prime}\in\mathcal{S}^{\mathrm{isom}}(\mathbf{w})}c(S^{\prime})$ .

Proof:

Let $S^{\prime}\in\mathcal{S}^{\mathrm{isom}}(\mathbf{w})$ . We first prove

c(S^{\prime})\geq\sum_{i=1}^{m-1}nw_{i}c_{i+1}.

(15)

For any $j\in[n]$ , $E(y_{j},S^{\prime})$ must be an isomorphic DRT of height $\zeta$ , where $\zeta=\sum_{i=1}^{m-1}w_{i}$ . For $i\in[0,\zeta]$ , let $H_{i}$ be the $i$ -th level in $S^{\prime}$ , i.e., any node $a\in H_{i}$ iff the distance from $a$ to some output node in $S$ is $i$ . As a result, $H_{0}=Y=\{y_{1},y_{2},\ldots,y_{n}\}$ . Since $S^{\prime}$ is an isomorphic-DRT-based structure, the nodes in $H_{i}$ have the same degree, which is referred to as $\rho_{i}$ . Our idea is to prove $|H_{i}|\geq n,\forall i\in[\zeta]$ such that $c(S^{\prime})=\sum_{i=0}^{\zeta-1}|H_{i}|c_{\rho_{i}}\geq\sum_{i=0}^{\zeta-1}nc_{\rho_{i}}=\sum_{i=1}^{m-1}nw_{i}c_{i+1}$ .

For any $a\in S^{\prime}$ , let $F(a)=(f_{1},f_{2},\ldots,f_{n})$ , where for any $j\in[n]$ , $f_{j}=1$ if $x_{j}\in E(a,S^{\prime})$ and $f_{j}=0$ otherwise. Meanwhile, for any $i\in[0,\zeta-1]$ and $H\subseteq H_{i}$ , let $F(H)=\oplus_{a\in H}F(a)$ , where $\oplus$ is the component-wise XOR operation. If $H=\emptyset$ , let $F(H)=(0,0,\ldots,0)$ ( $n$ zeros in total). Moreover, let $F(i)=\{F(H):H\subseteq H_{i},|H|\text{~{}is even}\},\forall i\in[0,\zeta-1]$ .

On the one hand, for any $i\in[0,\zeta-2]$ and $H=\{a_{1},a_{2},\ldots,a_{k}\}\subseteq H_{i}$ with even $|H|$ , we have $F(H)=F(a_{1})\oplus\cdots\oplus F(a_{k})=(F(b_{1,1})\oplus\cdots\oplus F(b_{1,\rho_{i}}))\oplus\cdots\oplus(F(b_{k,1})\oplus\cdots\oplus F(b_{k,\rho_{i}}))\in F(i+1)$ , where $b_{k^{\prime},j}$ is the $j$ -th child of $a_{k^{\prime}}$ in $E(a_{k^{\prime}},S^{\prime})$ for $k^{\prime}\in[k],j\in[\rho_{i}]$ . That is, $F(0)\subseteq F(1)\subseteq\cdots\subseteq F(\zeta-1)$ . On the other hand, for any $H\subseteq H_{0}=Y$ with even $|H|$ , we have $F(H)=(f_{1},f_{2},\ldots,f_{n})$ , where for any $j\in[n]$ , $f_{j}=1$ if $y_{j}\in H$ and $f_{j}=0$ otherwise. This leads to $|F(0)|=2^{n-1}$ . As a result, we have $|F(i)|\geq 2^{n-1},\forall i\in[0,\zeta-1]$ , implying that $|H_{i}|\geq n$ as well as (15).

Let $S$ be the returned structure in Construction 5. Obviously, $S\in S^{\mathrm{isom}}(\mathbf{w})$ . Given (15), it suffices to prove $c(S)\leq\sum_{i=1}^{m-1}nw_{i}c_{i+1}$ to complete the proof of Theorem 4. Let $A_{i}$ denote the $i$ -th level of $S$ . By Construction 5, all DRTs $E(a,S)$ , $a\in A_{i}$ are isomorphic to each other and the leaves of each $E(a,S)$ are labelled in a consecutive way, implying that $|A_{i}|=|\{E(a,S):a\in A_{i}\}|\leq n$ . Then, $c(S)\leq\sum_{i=1}^{m-1}nw_{i}c_{i+1}$ follows. ∎

We remark that for $m=2$ , any isomorphic DRT is a perfect DRT that was considered in [18]. As a result, Theorem 4 can be thought as an extension of [18, Theorem 6] from $m=2$ to $m\geq 2$ . Note that $\mathbf{w}$ can be any valid degree vector. However, if $\mathbf{w}$ is the uniquely optimal solution to Problem 3, $S$ has the lowest complexity among all the latency-optimal isomorphic-DRT-based structures. On the other hand, if Problem 3 has multiple optimal solutions, we can enumerate them to find the complexity-optimal ones among latency-optimal isomorphic-DRT-based structures.

VI Conclusion

In this paper, we have investigated the structures that can be used by a node in a message passing algorithm to compute outgoing messages from incoming messages. Specifically, for the scenario where complexity has a higher priority than latency, we proposed a class of structures, called star-tree-based structures. Within this framework, we derived algorithms to determine both the lowest achievable complexity and the corresponding lowest latency for these optimized structures. Next, for the scenario where latency has a higher priority than complexity, we proposed a class of structures termed isomorphic-DRT-based structures. We derived an algorithm to ascertain the lowest latency achievable with these structures and constructed latency-optimal isomorphic-DRT-based structures with the lowest complexity. Notably, our findings extend the work presented in [18] to accommodate multi-input scenarios. Our future work is to investigate optimal/near-optimal structures that can achieve a good tradeoff between complexity and latency.

References

[1] T. J. Richardson and R. L. Urbanke, Modern Coding Theory. Cambridge University Press, 2008.
[2] F. R. Kschischang, B. J. Frey, and H. Loeliger, “Factor graphs and the sum-product algorithm,” IEEE Trans. Inf. Theory, vol. 47, no. 2, pp. 498–519, 2001.
[3] I. Arasaratnam and S. Haykin, “Cubature kalman filters,” IEEE Trans. Autom. Control., vol. 54, no. 6, pp. 1254–1269, 2009.
[4] W. Xu and F. Zhang, “FAST-LIO: A fast, robust lidar-inertial odometry package by tightly-coupled iterated kalman filter,” IEEE Robotics Autom. Lett., vol. 6, no. 2, pp. 3317–3324, 2021.
[5] X. Yu and Z. Meng, “Robust kalman filters with unknown covariance of multiplicative noise,” IEEE Trans. Autom. Control., vol. 69, no. 2, pp. 1171–1178, 2024.
[6] J. Gauvain and C. Lee, “Maximum a posteriori estimation for multivariate gaussian mixture observations of markov chains,” IEEE Trans. Speech Audio Process., vol. 2, no. 2, pp. 291–298, 1994.
[7] N. M. Ghahjaverestan, S. Masoudi, M. B. Shamsollahi, A. Beuchee, P. Pladys, D. Ge, and A. I. Hernández, “Coupled hidden markov model-based method for apnea bradycardia detection,” IEEE J. Biomed. Health Informatics, vol. 20, no. 2, pp. 527–538, 2016.
[8] C. Guo, Z. Liang, J. Zeng, M. Song, and Z. Xue, “A predictive hidden semi-markov model for bridges subject to chloride-induced deterioration,” in 21st IEEE International Conference on Software Quality, Reliability and Security, QRS 2021 - Companion, Hainan, China, December 6-10, 2021. IEEE, 2021, pp. 751–756.
[9] S. Sun, C. Zhang, and G. Yu, “A bayesian network approach to traffic flow forecasting,” IEEE Trans. Intell. Transp. Syst., vol. 7, no. 1, pp. 124–132, 2006.
[10] J. Zeng, G. Zhou, Y. Qiu, C. Li, and Q. Zhao, “Bayesian tensor network structure search and its application to tensor completion,” Neural Networks, vol. 175, p. 106290, 2024.
[11] R. G. Gallager, “Low-density parity-check codes,” IRE Trans. Inf. Theory, vol. IT-8, no. 1, pp. 21–28, Jan. 1962.
[12] D. MacKay, “Good error-correcting codes based on very sparse matrices,” IEEE Trans. Inf. Theory, vol. 45, no. 2, pp. 399–431, 1999.
[13] T. J. Richardson and R. L. Urbanke, “The capacity of low-density parity-check codes under message-passing decoding,” IEEE Trans. Inf. Theory, vol. 47, no. 2, pp. 599–618, Feb. 2001.
[14] 3GPP, “NR; Multiplexing and channel coding,” Release 15, Technical Specification (TS) 38.212, 2017.
[15] Y. Ding, X. He, K. Cai, G. Song, B. Dai, and X. Tang, “An efficient joint decoding scheme for outer codes in DNA-based data storage,” in Proc. IEEE/CIC Int. Conf. Commun. China Workshops, Aug. 2023, pp. 1–6.
[16] J. Chen, A. Dholakia, E. Eleftheriou, M. P. Fossorier, and X.-Y. Hu, “Reduced-complexity decoding of LDPC codes,” IEEE Trans. Commun., vol. 53, no. 8, pp. 1288–1299, Aug. 2005.
[17] Y.-L. Ueng, C.-Y. Wang, and M.-R. Li, “An efficient combined bit-flipping and stochastic LDPC decoder using improved probability tracers,” IEEE Transactions on Signal Processing, vol. 65, no. 20, pp. 5368–5380, 2017.
[18] X. He, K. Cai, and L. Zhou, “A class of optimal structures for node computations in message passing algorithms,” IEEE Trans. Inf. Theory, vol. 68, no. 1, pp. 93–104, Jan. 2022.
[19] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to Algorithms: 3rd Edition. Cambridge, MA, USA: MIT Press, 2009.

			$\displaystyle\sum_{\mathbf{u}\preceq\mathbf{q}}m\|\{\mathbf{u}^{\prime}:\mathbf{u}^{\prime}\preceq\mathbf{u}\}\|$
		$\displaystyle=$	$\displaystyle m\sum_{\mathbf{u}\preceq\mathbf{q}}\|\{\mathbf{u}^{\prime}:\mathbf{u}^{\prime}\preceq\mathbf{u}\}\|$
		$\displaystyle=$	$\displaystyle m\sum_{u_{1}=0}^{q_{1}}\sum_{(u_{2},\ldots,u_{m-1})\preceq(q_{2},\ldots,q_{m-1})}\|\{\mathbf{u}^{\prime}:\mathbf{u}^{\prime}\preceq\mathbf{u}\}\|$
		$\displaystyle=$	$\displaystyle m\sum_{u_{1}=0}^{q_{1}}\sum_{(u_{2},\ldots,u_{m-1})\preceq(q_{2},\ldots,q_{m-1})}\|\{(u^{\prime}_{1},\ldots,u^{\prime}_{m-1}):u^{\prime}_{1}\in[0,u_{1}],(u^{\prime}_{2},\ldots,u^{\prime}_{m-1})\preceq(u_{2},\ldots,u_{m-1})\}\|$