Optimal Ternary Codes with Weight and Distance in -Metric
Abstract
The study of constant-weight codes in -metric was motivated by the duplication-correcting problem for data storage in live DNA. It is interesting to determine the maximum size of a code given the length , weight , minimum distance and the alphabet size . In this paper, based on graph decompositions, we determine the maximum size of ternary codes with constant weight and distance for all sufficiently large length . Previously, this was known only for a very sparse family of density .
Index Terms:
DNA storage, constant-weight code, -metric, packing.I Introduction
Codes in -metric distance have attracted a lot attention in the last decade for their useful applications in rank-modulation scheme for flash memory [1, 2, 3, 4, 5, 6]. In the recent error correcting problem of data storage in live DNA, -metric was also revealed to play an important role in attacking the tandem duplication errors [7]. To store information in live DNA, the reliability of data is crucial. A tandem duplication error, which means an error happens by inserting a copy of a segment of the DNA adjacent to its original position during replication, is one of the errors we must fight against during DNA coding. According to [8], tandem duplications constitute about 3% of our human genome and may cause serious phenomena and severe information loss. As a consequence, duplication correcting codes have been studied by many recent works, for example [7, 9, 10, 11, 12]. Specially, the work in [7] connects duplication correcting codes with constant-weight codes (CWCs) in -metric over integers, and shows that the existence of optimal CWCs with certain weights and lengths will result in optimal tandem duplication correcting codes [7, Theorem 20].
Motivated by this connection, it is interesting to study CWCs in -metric. The central problem regarding CWCs is to determine the exact value of maximum cardinality of a code with given length, weight, minimum distance, and the alphabet size. CWCs with Hamming distance have been well studied [13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24] due to its wide applications in coding for bandwidth-efficient channels [25], the design of oligonucleotide sequences for DNA computing [26, 27], and so on. If the code is binary, the Hamming distance is indeed the -metric. However, when it is -ary with , the metrics are different. The study of non-binary CWCs with -metric is very rare. Based on our knowledge, only works in [28, 29, 30] are involved.
In this paper, we consider ternary CWCs with -metric. A ternary code of constant -weight , minimum -distance and length is denoted by an code. The maximum size among all codes is denoted by , and a code achieving this size is called optimal. Optimal codes were constructed in [30] when for all possible distances and length with only finite many cases undetermined. For general weight and distance , the authors in [30] provided an upper bound , and showed that this bound can be achieved for is large enough and or . This is only a fraction of the numbers when is big.
Our main contribution in this paper is showing that, given the weight , the upper bound of provided in [30] can be achieved for all sufficiently large , which greatly improves the result in [30]. We state our main result as follows.
Main Result: For any integer , for all sufficiently large .
Further, we show that there exists an optimal code that is balanced unless when is odd and with certain restrictions. Here, a code is balanced means that the collection of supports of all codewords, viewed as cliques, forms a decomposition of the complete graph . Our methods rely on a famous result of Alon et al. [31] on graph decompositions, which reduces the whole problem to looking for a sparse code with desired properties.
This paper is organized as follows. In Section II, we first introduce some necessary definitions and notations, then connect codes with graph packings to reduce the problem of finding an optimal code to finding a proper sub-code . Section III gives the general idea of how to find the sub-code , and illustrates the idea by taking with certain restrictions. A complete solution to the cases of is given in Section IV, where a nonexistence result of an optimal balanced code with under additional conditions is deduced. In Section V, we deal with all other cases, with by an algorithmic construction of the required sub-code . Finally we conclude our results in Section VI.
II Preliminaries
For integers we denote and for short. Let be an integer. A -ary code of length is a set of vectors in , in which . An element in is called a codeword. For two codewords and , the -distance between and is . Here the sum and subtraction are on . The -weight of is the -distance between and vector . If a code satisfies that the -weight of any codeword is for some integer , then we call a constant-weight code. If further the minimum -distance between any two distinct codewords in is at least , we say that is an code.
In this paper, we focus on ternary codes, that is The maximum size among all codes is denoted by We call an code optimal if the size of reaches The following lemma from [30] gives us an upper bound for
Lemma II.1.
[30]
When weight is fixed, let for convenience. Our main work is to show that the upper bound can be achieved when is sufficiently large for any fixed .
For any codeword , the support of is defined as There is a canonical one-to-one correspondence between a vector in to a subset of . We can express the vector as a subset , which we call the labeled support of . Specially for any , if , we say the position is labeled by the symbol in the codeword . For short, we usually write a member in the form .
Example II.1.
When , codewords and can be described as and , respectively. Furthermore in , the position 3 is labeled by .
For ternary codes of constant weight , the type of a codeword is denoted by for some nonnegative integers and satisfying . Here, a codeword is of type if there are in total different positions labeled by and different positions labeled by in it. We usually write instead of for convenience. By a simple observation, there exists a necessary and sufficient condition of an code.
Fact II.1.
A ternary CWC code with weight is an code if and only if the following conditions are satisfied:
-
(a)
For any two different codewords and in , .
-
(b)
If there exists a codeword and a position such that , then for any other codeword , .
On the one hand, conditions (a) and (b) can lead to an code trivially. On the other hand, if any one of them are violated, there must be two different codewords in such that the -distance between them is less than We will use the language in graph packings to rewrite this fact in the next subsection.
II-A From Codes to Graph Packings
A graph is a pair , where is a finite set of vertices and is a set of -subsets of , called edges. The order of is the number of vertices, denoted by . Correspondingly, means the numbers of edges. The degree of a vertex in is the number of edges containing , and it is denoted by A graph is called complete if each pair of vertices is contained in an edge. We write the complete graph with order as .
We have given a one-to-one correspondence between a ternary vector and its labeled support. Here we give another one-to-one correspondence between a ternary vector and a colored complete subgraph. Given a vector , we map it to a complete subgraph (clique) of with vertex set equipped with a -coloring, such that the vertex is colored by . By these correspondences, a vector in , a labeled support in , and a -colored clique are indeed the same objects, so they share the same notations like weight and type.
Next we will restate Fact II.1 in the graph packing language. The packing of a graph is a set of its subgraphs such that any edge of appears in at most one subgraph in . If the packing covers all edges in , we call it a decomposition of . If all are isomorphic to some graph , then we call an -packing. Further if an -packing is also a decomposition, we call it an -decomposition. We have the following corollary from Fact II.1.
Corollary II.1.
A ternary CWC code with weight is an code if and only if the following conditions are satisfied:
-
The set forms a packing of .
-
Each vertex is colored by in at most once, i.e. there exists at most one codeword such that is colored by 2 in .
The proof of Corollary II.1 is straightforward from Fact II.1. By Corollary II.1, the problem of finding an optimal code turns to the one of finding a maximum-sized packing of by colored cliques with the same weight satisfying condition . Here we point out that the number of codewords of type with in an code is no more than due to condition . This means the set of codewords of type is an absolute majority when the upper bound is attained for large .
Our desired CWC is constructed by a packing of with two parts , where consists of all cliques of order in the packing, and consists of all others. So corresponds to all codewords of type and corresponds to at most codewords of type with some . The theorem below given in [31] guarantees the existence of as long as we have a suitable , which has been used in [30] for a construction of optimal CWCs of distance for several congruence classes. We need some notations to state this theorem.
For a graph without isolated vertices, let denote the greatest common divisor of the degrees of all vertices of . A graph is called -divisible if is divisible by , while is called nowhere -divisible if no vertex of has degree divisible by . The -packing number of , denoted by is the maximum cardinality of an -packing of .
Theorem II.1.
[31] Let be a graph with edges, and let . Then there exist , and such that for any -divisible or nowhere -divisible graph with vertices and ,
unless when is -divisible and , in which case
Here, is the degree of vertex , rounded down to the closest multiple of .
Remark II.1.
If we choose , then . For any given positive integer , if we can find a subgraph of such that and for any vertex ,
then there exists an integer such that if , an -decomposition of is guaranteed.
In fact, by the condition , we have for any . This remark can be immediately derived from Theorem II.1.
By Corollary II.1 and Remark II.1, to construct an code of size , it is sufficient to find a subgraph of such that the following conditions are satisfied:
-
(1)
satisfies the conditions in Remark II.1, so that has a -decomposition . The cliques in will produce all codewords of type .
-
(2)
has a packing of size containing cliques of order less than , such that each clique could be colored by independently to become a colored clique of weight , and each vertex is colored by at most once among all cliques in . The cliques in will yield all codewords of type for some .
If is also a decomposition of , we say that the desired code is a balanced code, that is, the set of cliques corresponding to all codewords forms a decomposition of . How to construct such a graph with a packing by colored cliques, or equivalently the sub-code consisting of only codewords of type for some , needs some techniques. The most important tool used throughout this paper is the modular Golomb ruler.
II-B Modular Golomb rulers
An modular Golomb ruler [32] is a set of integers , such that all of the differences, , are distinct and nonzero. Notice that all calculations here are in . The set is called the set of differences of the modular Golomb ruler. Given an modular Golomb ruler , we construct codewords of type for any . Here the set of positions is replaced by with a natural order. These codewords have pairwise distances at least by definition, and each position in is labeled by exactly once. Using the language of graphs, these colored cliques of type are edge-disjoint on and each vertex in is colored by exactly once among these colored cliques.
Example II.2.
The set is an modular Golomb ruler. The eight codewords are and Each vertex in is colored by exactly once.
We show an example to explain how we use modular Golomb ruler to get subgraphs and of and then lead to an optimal code. The example below is from [30, Theorem VI.2.].
Example II.3.
When , there exists an modular Golomb ruler [32]. Let the whole graph be of vertex set By our discussion above, the codewords , from the modular Golomb ruler form a packing by colored cliques of type . Let , which has a trivial decomposition of size , and let For any , Further, there are edges in . When it is easy to check that satisfies all conditions in Remark II.1 and then there exists a decomposition of by colored cliques of type as long as With simple calculations we get Hence the code forms a balanced code of size by Corollary II.1. This lead to the following conclusion:
Given any , if for any sufficiently large integer
A finite set is free from an modular Golomb ruler if all , as elements of , do not appear in the set of differences. Such an modular Golomb ruler is called -free. It is trivial to observe that the modular Golomb ruler in Example II.2 is -free. Specially, by definition if is a -free modular Golomb ruler, so is the set for all
Lemma II.2.
For any integer and a set with the largest element , there exists a -free modular Golomb ruler when . Specially when is empty, we define .
Proof.
We will greedily find pairwise distinct integers in , which form a -free modular Golomb ruler when viewed as elements of . Start with . The set is clearly a -free modular Golomb ruler.
For any , assume that there exists pairwise distinct integers in such that forms a -free modular Golomb ruler. We consider the th step. If there exists an integer , such that both the following conditions are satisfied,
-
•
for any ;
-
•
for any is not in the set of differences of ,
then is a -free modular Golomb ruler after we set There are at most integers in violate the first condition and at most integers violate the second one. When , by pigeonhole principle such always exists. ∎
Before ending this section, we mention that in our constructions in the sections to follow, we usually use with a natural order, to denote the set of positions of the code, or equivalently the vertex set of to be packed. This setting will help us to give a clearer prospect for the structure of our desired subgraph and its packing with colored cliques.
III General Idea
In this section, we illustrate the idea of our main construction of by simple cases such that the resulting code is optimal and balanced. To make our constructions simple, we assume that only contains codewords of types and . Let , and denote the number of codewords of types , and , respectively.
Based on the above assumption, we first figure out the valid numbers , , and of codewords with different types by the balanced property. We start with , , and , which in total reaches the upper bound. Note that the current is the union of cliques of order from these codewords of type . Let be the number in the range such that
(1) |
that is
(2) |
Let , then the right hand side of Equation (1) is the number of edges in . If , i.e., , then there does not exist a -decomposition of , that is the code can not be balanced. To make it balanced, we need to modify the initial numbers , and so that all cliques consume extra edges. We do the following two operations without changing the sum .
-
(O1)
Replace one codeword of type by a codeword of type to consume more edges, i.e., , ;
-
(O2)
Replace two codewords of type by a codeword of type and one of type to consume one more edge, i.e., , and .
By these two operations, we are able to figure out some values of , and , such that the total number of codewords still holds, and extra edges are all consumed, that is, they are valid for a balanced code. Suppose that we do times of (O1) and times of (O2), then is just any nonnegative integral solution to
(3) |
Then , , . We always choose as the smallest nonnegative integer such that Equation (3) has a nonnegative integral solution.
Next, we try to construct codewords of type and codewords of type to form a new subgraph . It is clear now that the new will satisfy . To apply Theorem II.1 or Remark II.1, we need that is -divisible, or equivalently, for all , . For any we define and the number of cliques of type and , respectively, containing in . Thus and
Denote . For simplicity, we assume can only be the least two non-negative integers satisfying this equation. Assume that with . Let
(4) |
Then .
When constructing , we follow an additional rule that each vertex can be in at most one clique of type i.e., or for all Since there are in total cliques of type , Note that , then Let be the number of vertices having in . By the handshaking lemma, , and we have
That is
(5) |
So far, we have two equations (3) and (5), and three undetermined parameters . It is possible for us to find a common nonnegative integral solution to both equations. Note that the value reflects the distribution of vertices in codewords of type and in some way. In Example II.3, an modular Golomb ruler produces codewords of type such that each vertex occurs in exactly such codewords, which contributes to or for each .
Based on the above valid choices of , we have the following result.
Lemma III.1.
Given an integer and a sufficiently large for some , let and be non-negative integers satisfying Equations (3)-(5). If is an code consisting of codewords of type and codewords of type , such that
-
(1)
there are exactly positions (or vertices of ) with , all others are ;
-
(2)
the sets for all codewords of type are disjoint,
then there is an optimal balanced code and .
Proof.
Let be a graph and We apply Remark II.1 to prove that has a -decomposition of size .
First, for any Specially, since we have i.e. is -divisible.
With all conditions satisfied, Remark II.1 gives us an integer such that if there exists a -decomposition of . By vertex-coloring all cliques in with color , is a code with codewords of type
By Lemma III.1, to show that an optimal balanced code exists with , we only need to find an code satisfying Lemma III.1. Next, we take to show the way of computing the valid triple , and then the numbers and . By computing the right hand side of Equation (2), we know that . Assume that in the following constructions, since in this case we only need to do operation (O1). We will give constructions for code satisfying Lemma III.1.
III-A and
Let , and for some . Then we do operation (O1) times to consume edges, that is and . The value and in this case, so by (5). The candidates of numbers of codewords of types , and are , and . Let be the complete graph with vertex set . The code is constructed as follows.
When is large enough such that , by Lemma II.2, there exists an modular Golomb ruler on . Define codewords of type for . This gives us cliques of type which form a packing of . Define . Each vertex in is colored by exactly once, and contained in different colored cliques of , that is . The vertices are not contained in any colored clique of , that is . So is an code satisfying Lemma III.1, thus for sufficiently large .
III-B and
Let , and for some . Then we do operation (O1) times to consume edges, that is and . The value and in this case, so . The numbers of codewords of types , and will be , and .
Let , and let the vertex set of be the union of three parts: , and The set will be the vertices with . The code is constructed as follows.
When is large enough such that , by Lemma II.2 there exists a -free modular Golomb ruler . From this, we get codewords of type , for . The other codewords of type are constructed as follows, whose supports form a partition of the vertex set of . When , ; when , . Here, means that all integers in the interval are labeled by .
Let . It is clear that each vertex of is colored by at most once. We claim that as a set of colored cliques forms a packing of . If it is false, there exists two different , such that and have one edge in common. In other words, and are both in As and are both packings already, we assume and for some and Thus, both and are in and we may assume as integers. Since and are in , which contradicts to the fact that is a -free modular Golomb ruler. So is an code. Further, each vertex is contained in different colored cliques in , that is ; each is contained in only one colored clique in , that is . So satisfies Lemma III.1, and for sufficiently large .
IV Completing the Cases
Subsections III-A and III-B give us two similar constructions of the required codes when and . The property always holds when is even. In this section, we assume that is odd, and consider constructions of optimal codes when and .
IV-A and
Let and for some Then we do operation (O1) times and operation (O2) times to consume edges, that is and The value and in this case, and So the candidates of numbers of codewords of type , and are , and
Let be the complete graph with vertex set , where . The set has vertices, , and . The set consists of vertices, , and , . An code consisting of codewords of type and codewords of type is given below.
Construction IV.1.
Suppose is large enough so that . We construct six disjoint classes of codewords below.
-
(1)
The codewords of type are
Let Note that , form a partition of the set , and each , is labeled by exactly once.
-
(2)
Since , there exists a -free Golomb ruler , which derives the first codewords of type ,
Let . Each vertex appears in exactly cliques in and is labeled by in exactly one codeword.
-
(3)
Let . Since , the next codewords of type are
Let . Note that , are pairwise disjoint, covering all in , and elements in . Further, each is labeled by exactly once.
-
(4)
Let . Note that and . We consider a sequence with elements in as follows. When , ; when , In this sequence, each , appears times and each , appears times. The third class of codewords of type are
Let Each element from appears in exactly one clique in , where . Each , is labeled by once.
-
(5)
Since , is a positive integer. The fourth class of codewords of type are
Let Each element from in appears in exactly one clique in , where . Each , is labeled by once.
-
(6)
The last codewords of type are
for . Let The sets are pairwise disjoint, and cover all the remaining elements in and . Further, each , is labeled by once.
Finally, Let and All vertices in other than , are colored by exactly once in For any vertex , More specifically, is only contained in , while is only contained in . Similarly, any vertex in is contained in exactly one clique in and cliques in so . Each vertex is contained in exactly cliques in and no clique in , so . It is left to check that as a set of cliques forms a packing of .
If it is false, there exists two different such that and have one edge in common. Note that from modular Golomb ruler, is already a packing, and is also a packing by simple verification. So we assume and If all vertices in are in while no codeword in has more than one element in , hence and are edge disjoint. If , then both and are in Suppose that as integers. Since and by our construction , which contradicts to the fact that is a -free modular Golomb ruler.
Theorem IV.1.
For any , there exists an integer such that, for any integer and . Furthermore, there exists a balanced optimal code.
IV-B and
A balanced optimal code may not exist for certain parameters and . In this subsection, we will show the nonexistence of balanced optimal codes when and . In other words, if there is an code of size , then there must be at least one edge which is not covered by any colored clique. We first state that a code reaching the theoretical upper bound cannot be balanced.
Theorem IV.2.
Suppose that and for some . Then an code of size is not balanced.
We need the following lemma to prove Theorem IV.2.
Lemma IV.1.
Under the same parameters as in Theorem IV.2, if there is a balanced code of size , then must contain a codeword which is neither of type nor of type
Proof.
Suppose to the contrary that consists of only codewords of type and , and , are their numbers, respectively. Then
Since is balanced,
Combining both equations, we get . But substituting into this expression, we have , which is not even an integer. ∎
Proof of Theorem IV.2.
Suppose that is a balanced code of size . Let , be the number of codewords in with type . By Lemma IV.1, there exists some such that . Consider the complete graph with vertex set . Let be the union of all cliques from codewords of type with , and let be the number of non-isolated vertices in . Since a codeword of type labels vertices by , the condition in Corollary II.1 gives us the following inequality.
(6) |
For any vertex , let be the number of codewords of type whose support contains . Since is balanced, is -divisible, and so is . Thus for any
That is, for any vertex , there exists some integer such that
The integer if and only if is not isolated in . Sum both sides above over all vertices to get
(7) |
On the other hand,
(8) |
which holds if and only if for all . This contradicts to Lemma IV.1. ∎
Since a balanced code of size does not exist, we next construct an optimal code that is not balanced. Let and for some . We only allow codewords of type and . By doing operation (O1) times and consuming more edges, there are finally edges left uncovered. In this case, and
We denote all vertices of as the union of three parts: one single vertex , vertices , and Define a sequence of different vertices by
Let , then by Lemma II.2 there exists a -free modular Golomb ruler on . Assume that . When is large enough such that , we can embed into in a natural way so that is also a -free modular Golomb ruler in . The following calculations are all in .
Let Let . Replace the codeword by two other special ones of type ,
Let which is a set of cliques of type . Define a set of edges where . It is easy to see that is not contained in any clique of . Then let and . We will show that forms a packing of , and has a -decomposition when is large enough.
Lemma IV.2.
The set above forms a packing of .
Proof.
Suppose not, then there are two different codewords , such that and share at least one edge. Since is a packing, and cannot both belong to . Furthermore, since and for each the vertices , and , are all distinct, that is to say .
Suppose that and . Since is a -free modular Golomb ruler and each , is less than while . The cliques , never share any edge with for . So forms a packing of ∎
Lemma IV.3.
There exists an integer such that when , has a -decomposition.
Proof.
If all conditions given in Remark II.1 hold, the proof is complete. For any vertex , if for , it is contained in cliques of and one edge in . So If for , is in one clique of and one edge in . So If , it is contained in cliques in , and Furthermore, for each vertex ,
We check that is an integer.
By Remark II.1, there exists an integer such that when , the -decomposition of exists. ∎
We color all cliques in as type and get a code of codewords. By Corollary II.1, is an optimal code with size but not balanced. Combining the result above and Subsection III-A, we have proven the following theorem.
Theorem IV.3.
For any integer , there exists an integer such that for any and .
However, by Theorem IV.2, an optimal balanced code does not exist when , and for some .
V Constructions for with
In this section, we show that an optimal balanced code exists for sufficiently large when . Our method is similar to the cases when , that is, constructing a code satisfying Lemma III.1, but is more complicated. Our main result is the following theorem.
Theorem V.1.
For any integers and , there exists an integer such that for any and there is a balanced code with size , that is .
V-A Strategy
When is fixed, let for some be a sufficiently large integer. With defined in Equation (2), there are unique non-negative integers and such that for and In this sense, and are functions of with values smaller than
We apply operations (O1) times and (O2) times to digest the exceeding edges, that is and . By Equation (4), and . Then by Equation (5), , which is a positive integer by Equation (2). So the desired numbers of codewords of types , and are and , respectively.
By Lemma III.1, it suffices to construct a code consisting of codewords of type and codewords of type satisfying all conditions in Lemma III.1. The vertex set of is defined as a union of three parts with the prescribed distribution of different values of in each part as follows.
-
(R1)
Let and , then the first part of is . There will be vertices in with , and all others with .
-
(R2)
Let be the second part consisting vertices, which are , and , . All vertices in have .
-
(R3)
Let be the third part consisting all remaining vertices, which are , with , and , with . Here . Note that , and there will be vertices that are not colored by in the desired .
Note that in this setting, the number of vertices with is exactly , and with is . See Fig. 1 for an illustration of the vertices and . Our strategy of constructing is as follows.

First, we construct an code over consisting of codewords of type , such that the requirement of in (R1) is satisfied. See Construction V.1 and Proposition V.1 for details of . View as a code of length over the vertex set of . Note that colors each vertex in by exactly once.
Second, we construct codewords of type
whose supports form a partition of the set . Let Each is colored by exactly once.
Third, construct an array with entries in , where , such that
-
•
the first column of covers each vertex , , and each vertex in except some vertices exactly once.
-
•
the number of occurrence of each in is if , and is if , where is defined in (R2) and (R3).
Such an array exists by simply checking the equality . Each row of will serve as a candidate of a codeword of type by coloring the first element by and all others by . However, the current is not valid since each row may have repeated elements. Let denote the collection of colored rows of the current . We would like to do some operations of , so that finally, will give the desired . The symbol means the disjoint union of collections.
Before presenting our algorithms of the operations of in SectionV-C, we give a construction of .
V-B Construction of and its Properties
Construction V.1.
Suppose that . Start from an modular Golomb ruler , and view it as a subset in , where . The code consisting of codewords of type is constructed below.
-
(1)
For each and let
-
(2)
For each and , let
where the vertex is colored by and all other vertices are colored by .
Then It is easy to verify that each vertex in is colored by 2 exactly once in .
Proposition V.1.
The code in Construction V.1 is an code. Further, there are exactly vertices in with , and all others with , which satisfies the requirement of in (R1).
Proof.
For any vertex , there are exactly cliques containing . Let for some . If is in then is contained in exactly cliques , otherwise does not appear in any clique of This proves the distribution of .
It is left to prove that the collection of all supports of codewords in forms a packing. Here are five cases.
-
(1)
By modular Golomb ruler, for any and
-
(2)
Since any vertex of modulo equals , then for any with any and ,
-
(3)
It is easy to see that for any
-
(4)
For any and , we also have Otherwise, there exists a pair from with such that the following two equations hold:
We calculate the differences of these two equations on both side and get However, , a contradiction.
-
(5)
For any , and , we have That is because all vertices in are the same while all vertices of are pairwise different after modulo
∎
V-C Constructing
Let , , and be defined as before. By abuse of notation, we also use , , and to denote the corresponding collections of unlabeled (multi) subsets of . We express as row vectors of an array with each row being the support of a codeword in , where the only vertex labeled by is always located in the first column. Let , be obtained from and by deleting the first column.
Let be the disjoint union of the collections , and , denoted by . Observe that if we can make an code by only exchanging entries in and , then the resultant code will satisfy all conditions of Lemma III.1, that is, contain codewords of type whose supports are pairwise disjoint (codewords in ), contain codewords of type (codewords in modified ), and has exactly vertices with and all others with . Since each vertex is colored by in exactly once except vertices , by Corollary II.1, it suffices to exchange entries in and so that there are no repeated entries in each member of and no common pairs among all members of . The initial state of is already an code, so we can do this exchange one by one on all entries of with a natural order.
In the algorithm to follow, all our operations are supposed to exchange some entry in and some entry in only. Let denote the th row of , and let be the th entry of . Suppose that we have modified all entries with for any , and for , such that
-
(h1)
the updated has no repeated vertex in each element;
-
(h2)
the updated entries are pairwise distinct;
-
(h3)
the updated multiset is a set, and further a packing over .
Now we focus on the entry . If the following properties (p1)-(p3) are already satisfied, we do nothing but change if , or change and if . Otherwise, we need to find an entry in the current , so that exchanging and will result in that
-
(p1)
the updated have no repeated vertex in each element;
-
(p2)
the updated entries are pairwise distinct;
-
(p3)
the updated multiset is a set, and further a packing over .
The properties (p1)-(p3) will be used as the new (h1)-(h3) in the next step of modification. We claim that the existence of is always possible for fixed and sufficiently large . We give the proof below.
The number of rows of is , so the set of distinct entries in any updated is of size at most . Each vertex of can appear in at most members of , so the set of vertices in appearing in the same member of with some vertex in is of size at most . All vertices in appear in at most members of . Since when goes to infinity, , there is at least one codeword of containing no vertex in . Choose such a codeword and a vertex that is colored by . It is clear that is an entry of . Let be modified by replacing with , and update . Replace by and update . We show that the three properties (p1)–(p3) follow after this round of updating.
In fact, by the disjointness between and , contains no vertices from . Thus properties (p1) and (p2), and the part of telling the multiset is a set in property (p3) are obvious. The only thing need to prove is the packing property in (p3). For convenience, let be the collection before exchanging and , and let be after the exchange. If (p3) is false, there exist two vertices and in such that at least two different elements, denoted as and , in contain both of them. If and are both out of , the number of members in containing both and is the same as that in . By induction hypothesis is already a packing and thus such two elements and do not exist. As for the case when contains at least one vertex of and , contradiction can be derived directly by that is disjoint from .
Finally, we complete the proof by showing the initial hypothesis holds. When and , no entries of need to be modified before. So properties (h1)-(h3) are satisfied naturally from our constructions of and . Thus we can find an to exchange such that (p1)-(p3) are satisfied, which will be used as the new (h1)-(h3) when we modify . Continuing this exchanging algorithm until we modify all entries in , so that the modified is a packing over . Then is the set with the prescribed color structures.
VI Conclusion
In this paper we study ternary constant weight codes in -metric for any given length , weight and minimum distance We show that the upper bound on the maximum size of an code in [30] can be achieved for all sufficiently large . Our main result is stated below by combining all results in Theorems IV.1, IV.3 and V.1.
Theorem VI.1.
For any integer , there exists an integer such that for any , . Further, there exists an optimal code that is balanced if and only if the following does not happen.
-
•
The weight is odd, and for some integer such that
The optimal codes that we construct only contain codewords of types , and . The method we use is transferring the problem of finding an optimal code to the problem of finding a packing of of the same size by colored cliques with extra restrictions. We use -free modular Golomb rulers to find a packing with size no more than of colored cliques to produce all codewords of types and . Our constructions of satisfy the conditions of Lemma III.1, so that has a -decomposition by Theorem II.1 when is large enough, which produces all codewords of type .
Our method works well on the distance . However, it does not work when the minimum distance Meanwhile, since the existence of a -decomposition in Theorem II.1 is not explicit, efficient algorithms to find out an optimal code in polynomial time are in need. We leave these problems for future study.
References
- [1] A. Barg and A. Mazumdar, “Codes in permutations and error correction for rank modulation,” IEEE Transactions on Information Theory, vol. 56, no. 7, pp. 3158–3165, 2010.
- [2] A. Jiang, H. Li, and Y. Wang, “Error scrubbing codes for flash memories,” in 2009 11th Canadian Workshop on Information Theory. IEEE, 2009, pp. 32–35.
- [3] L. G. Tallini and B. Bose, “On -distance error control codes,” in 2011 IEEE International Symposium on Information Theory Proceedings. IEEE, 2011, pp. 1061–1065.
- [4] H. Zhou, M. Schwartz, A. A. Jiang, and J. Bruck, “Systematic error-correcting codes for rank modulation,” IEEE Transactions on Information Theory, vol. 61, no. 1, pp. 17–32, 2014.
- [5] F. Farnoud, V. Skachek, and O. Milenkovic, “Error-correction in flash memories via codes in the ulam metric,” IEEE Transactions on Information Theory, vol. 59, no. 5, pp. 3003–3020, 2013.
- [6] G. Kabatiansky and S. Kruglik, “On codes correcting constant number of errors in metric,” in Sbornik trudov 39-æmezhdisciplinarnoæxkody-konferencii IPPI RAN “Informacionnye tehnologii i sistemy 2015”, 2015, pp. 152–157.
- [7] S. Jain, F. F. Hassanzadeh, M. Schwartz, and J. Bruck, “Duplication-correcting codes for data storage in the DNA of living organisms,” IEEE Transactions on Information Theory, vol. 63, no. 8, pp. 4996–5010, 2017.
- [8] E. Lander, L. Linton, B. Birren, C. Nusbaum, M. Zody, J. Baldwin, K. Devon, K. Dewar, M. Doyle, W. FitzHugh et al., “Initial sequencing and analysis of the human genome.” Nature, vol. 409, no. 6822, pp. 860–921, 2001.
- [9] M. Kovačević and V. Y. F. Tan, “Asymptotically optimal codes correcting fixed-length duplication errors in DNA storage systems,” IEEE Communications Letters, vol. 22, no. 11, pp. 2194–2197, 2018.
- [10] A. Lenz, N. Jünger, and A. Wachter-Zeh, “Bounds and constructions for multi-symbol duplication error correcting codes,” arXiv preprint arXiv:1807.02874, 2018.
- [11] Y. Tang, Y. Yehezkeally, M. Schwartz, and F. F. Hassanzadeh, “Single-error detection and correction for duplication and substitution channels,” in 2019 IEEE International Symposium on Information Theory (ISIT). IEEE, 2019, pp. 300–304.
- [12] Y. Yehezkeally and M. Schwartz, “Reconstruction codes for DNA sequences with uniform tandem-duplication errors,” IEEE Transactions on Information Theory, pp. 1–1, 2019.
- [13] R. Graham and N. Sloane, “Lower bounds for constant weight codes,” IEEE Transactions on Information Theory, vol. 26, no. 1, pp. 37–43, 1980.
- [14] A. Brouwer, J. Shearer, N. Sloane, and W. Smith, “A new table of constant weight codes,” IEEE Transactions on Information Theory, vol. 36, no. 6, pp. 1334–1380, 1990.
- [15] E. Agrell, A. Vardy, and K. Zeger, “Upper bounds for constant-weight codes,” IEEE Transactions on Information Theory, vol. 46, no. 7, pp. 2373–2395, 2000.
- [16] H. Zhang, X. Zhang, and G. Ge, “Optimal ternary constant-weight codes with weight 4 and distance 5,” IEEE Transactions on Information Theory, vol. 58, no. 5, pp. 2706–2718, 2012.
- [17] Y. M. Chee, H. Zhang, and X. Zhang, “Complexity of dependences in bounded domains, Armstrong codes, and generalizations,” IEEE Transactions on Information Theory, vol. 61, no. 2, pp. 812–819, 2014.
- [18] Y. M. Chee, G. Ge, H. Zhang, and X. Zhang, “Hanani triple packings and optimal -ary codes of constant weight three,” Designs, Codes and Cryptography, vol. 75, no. 3, pp. 387–403, 2015.
- [19] Y. M. Chee and X. Zhang, “Linear size constant-composition codes meeting the Johnson bound,” IEEE Transactions on Information Theory, vol. 64, no. 2, pp. 909–917, 2017.
- [20] Y. M. Chee, H. M. Kiah, H. Zhang, and X. Zhang, “Constructions of optimal and near-optimal multiply constant-weight codes,” IEEE Transactions on Information Theory, vol. 63, no. 6, pp. 3621–3629, 2017.
- [21] Y. M. Chee, F. Gao, H. M. Kiah, A. C. Hung Ling, H. Zhang, and X. Zhang, “Decompositions of edge-colored digraphs: A new technique in the construction of constant-weight codes and related families,” SIAM Journal on Discrete Mathematics, vol. 33, no. 1, pp. 209–229, 2019.
- [22] H. Cao, L. Ji, and L. Zhu, “Constructions for generalized steiner systems GS(3, 4, v, 2),” Designs, Codes and Cryptography, vol. 45, pp. 185–197, 2007.
- [23] H. Zhang and G. Ge, “Optimal ternary constant-weight codes of weight four and distance six,” IEEE Transactions on Information Theory, vol. 56, no. 5, pp. P.2188–2203, 2010.
- [24] G. Bogdanova, “New bounds for the maximum size of ternary constant weight codes,” Serdica Math J, vol. 26, no. 1, pp. 5–12, 2000.
- [25] D. J. Costello and G. D. Forney, “Channel coding: The road to channel capacity,” Proceedings of the IEEE, vol. 95, no. 6, pp. 1150–1177, 2007.
- [26] O. D. King, “Bounds for DNA codes with constant GC-content,” The Electronic Journal of Combinatorics, vol. 10, no. 1, p. 33, 2003.
- [27] O. Milenkovic and N. Kashyap, “On the design of codes for DNA computing,” in International Workshop on Coding and Cryptography. Springer, 2005, pp. 100–119.
- [28] M. Kovačević and V. Y. F. Tan, “Codes in the space of multisets-coding for permutation channels with impairments,” IEEE Transactions on Information Theory, vol. 64, no. 7, pp. 5156–5169, 2018.
- [29] H. Jinushi and K. Sakaniwa, “A construction method for multilevel error-correcting codes based on absolute summation weight,” in Abst. 1990 IEEE Int. Symp. Inform. Theory, 1990, p. 87.
- [30] T. Chen, Y. Ma, and X. Zhang, “Optimal codes with small constant weight in -metric,” arXiv preprint arXiv:2003.00483v2, 2020.
- [31] N. Alon, Y. Caro, and R. Yuster, “Packing and covering dense graphs,” Journal of Combinatorial Designs, vol. 6, no. 6, pp. 451–472, 1998.
- [32] J. B. Shearer, “Difference triangle sets,” in Handbook of Combinatorial Designs. Chapman and Hall/CRC, 2006, pp. 436–440.