Abstract

In distributed computing/storage/machine learning system, the method of encoding and decoding combing shift-and-addition (SA) and zigzag decoding (ZD) is proposed to solve the problem of high computational complexity. However, in each encoded packet, one element takes part in the encoding only once, so the obtained overhead is extremely high. In this work, based on the idea of multidimensional encoding/modulation, we propose to employ one element of the encoding process multiple times when constructing one encoded packet based on the Cauchy matrix, thereby leveraging the favourable properties of the code based on Cauchy matrix. The overhead is reduced from square to logarithmic in certain parameters. Compared with the overhead of the existing square computational complexity, it is greatly reduced.

1. Introduction

In the era of big data [1], the amount of data is growing at a doubling rate annually. The way of data processing has been shifted from the centralized data processing to the distributed data processing. However, in distributed applications, not all devices are reliable. Some devices may fail to work, or the performance of the devices is not consistent. In any task of data processing, there will be some unreliable devices whose computing speed is slower than the average speed, which are called stragglers [2, 3]. For example, in a data center of Facebook, more than 100 nodes may fail per day [4, 5]. The completion time of data processing is constrained by the slowest working node. Therefore, how to deal with stragglers becomes a challenge for data processing. To solve this problem, network coding techniques have been developed, and the code with combination property (CP) is proposed: original packets are encoded into packets, where , and any out of these packets are able to recover the original data. The code with CP has been widely used in distributed systems, including distributed storage (DS) [610], distributed computing (DC) [1116], and distributed machine learning [17].

In distributed systems, linear code is adopted in most of the coding technologies, but linear code involves a lot of multiplication and division operations, which greatly increase the complexity of coding and decoding. For the sake of low computation complexity, a kind of CP-ZD code [1820] with CP is proposed, which combines shift-and-addition (SA) and zigzag decoding (ZD) [21]. However, for the case where one element takes part in the encoding only once when constructing an encoded packet, the overhead is as high as square of the parameters’ number ( ).

Multidimensional encoding/modulation promises high data rate [18, 19], which has been used as promising technique in communication [20] and distributed systems [22, 23]. As a result, to further reduce the overhead, based on the idea of multidimensional encoding/modulation, we propose the idea of one element taking part in the encoding process multiple times when constructing one encoded packet. Using the properties of the code based on Cauchy matrix in finite field, we design a framework of one element taking part in the encoding process multiple times for constructing an encoded packet. The overhead of this coding framework reduces from square to logarithmic with respect to the parameter. Specifically, the idea that one element takes part in the encoding only once in each encoded packet can be interpreted as each source packet being treated as an element and occurring at most once in a coded packet. Similarly, the idea that one element takes part in the encoding process multiple times when constructing one encoded packet is that each source packet occurs multiple times in an encoded packet, which is added to its own multiple distinct shifts.

2. Preliminary

2.1. Definition of Cauchy Matrix

Given , , let , then the matrix is called the Cauchy matrix [24], and its determinant is as follows:

Similarly, a Cauchy matrix over a finite field is defined as follows: let and be two sets of elements in a finite field. Among them, and . If for , , the following is satisfied: (1)(2), , (3)Then, the following matrix is called a Cauchy matrix over a finite field:

It is straightforward to obtain the following theorem from the construction rules of the Cauchy matrix:

Theorem 1. When is a Cauchy matrix, any square submatrix of is nonsingular, where indicates the number of rows and columns of the submatrix (); then,

In other words, every submatrix of the Cauchy matrix is invertible.

2.2. The Arithmetic Operation in Finite Fields

Finite field is a field with a finite number of elements, for example, represents a finite field containing elements. Before describing the arithmetic operation in finite fields, we briefly introduce the concept of the primitive polynomial.

The primitive polynomial is essentially a polynomial that cannot be factored. When a finite field determines its primitive polynomials, the arithmetic operations in that field are also determined. In general, the primitive polynomial of a field can be obtained by looking up the table, and the primitive polynomial of a field is not unique. Take the finite field as an example, there is more than one primitive polynomial over , and the most common primitive polynomial is . Table 1 shows some of the primitive polynomials [25] present in .

The addition and subtraction operation [22] of finite fields are the XOR operation in polynomial calculation. The rule for adding and subtracting is to XOR coefficients of the same order in two polynomials, and there is no difference between the two operations, such as . At present, the multiplication and division operations [23] of finite fields usually count on the look-up tables. Each field has positive and negative tables, which are denoted as and , respectively, on the field. Taking as an example, its table and are generated as shown in Table 2 [16]:

If the multiplication and division operations are performed on the field, as shown in Table 2, the multiplication operation is as follows:

and the division operation is as follows:

2.3. Transformation from Field to Field

Field is constructed by finding a primitive polynomial of degrees on and then enumerating elements (in polynomial form) by using the generating element . The addition in this field is performed using polynomial addition, and multiplication is performed using polynomial multiplication and modulo the result with respect to , such field can be written as [25], which can also be said that field and field are isomorphic [26].

The conversion rule for field to field [25] is the conversion of numerical form to polynomial form. Taking as an example, the implementation steps are as follows:

Step 1. Initialize the set as .

Step 2. Multiply the last element of the set by , such as , and modulo the result with respect to if the resulting degree is greater than or equal to .

Step 3. Continue Step 2 until there are elements in the set, at which point the last element is multiplied by and modulo , resulting in a value of 1.

To better understand the above steps, let us elaborate on a simple example:

Example 1. Suppose ; then, the original polynomial is ; to construct , we initialize the set as , so the next element is ; since the degree of the element is 2, modulo it with respect to , which resulting in . Therefore, four elements are generated: , and the corresponding numerical forms are , which are shown in Table 3. If we continue, we can get the following: According to Step 3, we can end the enumeration.

2.4. Mathematical Model

We want to construct code that possesses the CP. This section adopts the method in reference [27]. We represent each packet as a polynomial of , where a number is denoted by several bits within this packet. Source packet is represented with the polynomial form, as shown in Formula (7), , where indicates the length of the source packet and indicates the right shift by one bit.

For , the -th encoded packet can be expressed as . Let denote the number of parity packets. For , the polynomial form of the -th parity packet is as follows: where , ,.

Combined with the systematic packets and parity packets, the final coding expression is shown in the following formula: where , , and which is a matrix with dimension of , is a identity matrix, and is a shift matrix,

The exponent of the element in is indicated by , whose actual meaning is the shifted bits of packets.

3. Encoding Design

The encoding framework that one element takes part in the encoding process multiple times when constructing one encoded packet based on Cauchy matrix is mainly constructed in three steps, and the detailed rules are as follows:

Step 1. Determine the size of the finite field and the dimension of the Cauchy matrix according to the relation of coding parameters .

According to the relation of , when and is a positive integer, we can determine the size of the finite field to be , where and the sign indicates an integer ceiling function. In finite field , the dimension of Cauchy matrix over is determined as according to the size of and . The specific construction process of the corresponding Cauchy matrix (i.e., the coding matrix) is to determine the element set of and first. The elements of could be any elements in , , which correspond to parity packets. The elements of are any elements in the elements in finite field, , which correspond to parity packets. According to the construction rules of Cauchy matrix in the finite field, the following relationships should be met between the element sets of and :

If for , , the following is satisfied: (1)(2)(3)

Then, the elements in and are different, and a coding matrix with dimension can be constructed as follows:

Step 2. Convert the numeric form of elements in the coding matrix to polynomial form.

Through the arithmetic operation in finite fields and the transformation from field to field in Section 2.3, the matrix is transformed into the polynomial form. Each element of can be uniquely represented by a polynomial, that is, .

In particular, the exponent of of each polynomial represents the size of the bit-shifting. For example, represents that a systematic package is shifted to the right by 2 and 1 bits, respectively, and then added over. For example, the shift value of the source packet of length is , as shown in Figure 1.

Step 3. Determine the shift matrix combined with the systematic code.

Combined with the systematic code, a coding matrix with dimension is obtained through vertical connection.

In order to better understand the design rules that one element takes part in the encoding process multiple times, we will walk through the coding steps with a concrete example.

Example 2. If , then the system has systematic packages and parity packages, where and .

Step 1. We have , , so we can determine that the size of the finite field is . For the finite field , the commonly used polynomial is chosen as the primitive polynomial. In this case, and take 3 and 5 elements, respectively, and the elements of and are different. Taking and as an example, the coding matrix is as follows:

Step 2. Through the arithmetic operations in the finite field, the addition operation is converted to the XOR operation, for example, . Multiplication and division can be calculated by looking up tables. For example, Table 2 lists the positive and negative tables of . Based on the above, we can convert the matrix into the following form:

Through the transformation from field to field in Section 2.3, the elements of coding matrix are expressed by polynomials one by one to obtain the coding matrix :

Step 3. By vertically concatenating 5 systematic codes, we can obtain the coding matrix:

4. Properties of the Code Based on Cauchy Matrix

4.1. CP Property

Before we prove the CP property of the code, we will introduce the properties and lemmas mentioned in the proof.

Isomorphism property: the mathematical idea of isomorphism is to establish a one-to-one mapping of two sets that have the same properties associated with operations. For example, assuming that the sets and of algebraic operations are isomorphic, if one set has a property that is only relevant to the algebraic operations of this set, then the other set has exactly similar properties [28].

Lemma 2. Any square submatrix of is invertible.

Proof. According to Section 3, matrix is a Cauchy matrix, and the transformation from to its polynomial form is equivalent to the transformation from field to field , as shown in Section 2.3. In addition, since field and field are isomorphic [17] and the isomorphism property indicates that reversibility in a field will remain reversibility in the isomorphism field, the reversibility of in field will be mapped to that of in field . Theorem 1 says that any submatrix of is invertible, so any submatrix of is also invertible.

CP property: Any out of encoded packets are able to recover the information of the original packets.

Proof. First, we use a mathematical model to solve the above coding and decoding problems. The Cauchy matrix and the coding matrix are constructed from the previous coding design in Section 3, where and represents the matrix with the dimension of . For example, where element 0 means that the source packet involved in encoding shifts 0 bit, which means nonshift, and element means that the source packet does not participate in encoding.
According to the mathematical model of Section 2.4, in the finite field , the matrix above is represented by polynomial form of , where is the radix (assuming that the modulus of is greater than 1), and every element of matrix is raised to a proper power and then mod the original polynomial of the finite field. This process is actually a transformation from field to field . For example, the identity matrix to applies the above transformation to where element 1 indicates that the source packet participating in the encoding is not shifted, and element 0 indicates that the source packet is not participating in the encoding. The transformation process of matrix to is shown in Section 2.3. At this point, the polynomial form of the coding matrix is which is a matrix with a size of .
Based on the polynomial form of above and the mathematical model in Section 2.4, the polynomial represents the encoded packet and the polynomial represents the systematic packet. The coding relationship can be expressed as . Take any packets from coded packets, that is, extract lines at the same position from encoded packet and form , corresponding to take lines at the same position from and form matrix .The encoding relationship can be expressed by the expression .
Based on the above model, satisfying CP property is equivalent to the invertibility of . We claim that is invertible due to the following reasons:
In take any rows from and form matrix . In this case, to be decoded can be formed in two ways: First, does not contain any rows of ; from Lemma 2 above, we know that every square submatrix of is invertible, so is invertible. Second, is composed of rows in and rows in , where represents rows in and represents rows in , and . Since the matrix is a known systematic packet, substituting it into is equivalent to deleting columns from it. The deleted is equivalent to extracting the submatrix from , where , as shown in the following Example 3. Similarly, Lemma 2 shows that any square submatrix of is invertible, so is also invertible.
To sum up, is invertible, so this coding framework can meet the CP property.

Example 3. Following Example 2, if , the system has 5 systematic packets, parity packets, where . The matrix has been constructed as Combining the identity matrix , If we choose any submatrix in , where is composed of some parts of and , let and then, it can be seen from the matrix that the known systematic packets are , so can be simplified as which is equivalent to the submatrix of . According to Lemma 2, any submatrix of is invertible, so is invertible.

4.2. Zigzag Decodability (ZD) Property

The ZD property of one element which takes part in the encoding process multiple times when constructing one encoded packet can be proved by experiment that it cannot be fully zigzag decodable. Starting from the experiment, we set several groups of parameters to obtain the probability of zigzag decodability of one element which takes part in the encoding process multiple times. Through experimental simulation, we set several sets of parameters as , , , and , respectively, and the conditions of zigzag decodable and not are shown in Table 4.

It can be seen from Figure 2 that the encoding framework that one element takes part in the encoding process multiple times when constructing one encoded packet based on the Cauchy matrix has about 80% probability of ZD decoding, which means that there is about 20% probability that zigzag decoding will not be possible. The following two examples illustrate the case that the encoding framework is able or unable to perform zigzag decoding.

Example 4. Following Example 2 where , we assume that there are encoded packets , where and . Based on the above, the remaining problem is to use the encoded packets to decode the three source packets . The corresponding coding matrix is as follows: Assuming that the length of the encoded packets is , we can obtain the decoding diagram of Figure 3 through shift-and-addition encoding by row. In this condition, the coding framework can be zigzagged: can be obtained directly from the first exposed bit of . Substitute into the first bit of to obtain through shift-and-addition, and then, substitute into to obtain . In this manner, the first bit of each source packet has been obtained. The decoding of the second bit of each source packet is similar to the decoding of the first bit. Substitute , , into , respectively, so that can be obtained. Substitute into the second bit of , and can be obtained by XOR operation. Finally, substitute into , and then, can be obtained.

Example 5. Following Example 2 where , we assume that there are encoded packets , where and .Therefore, the remaining problem is to use the encoded packets to decode the three source packets to form the encoding matrix, Assuming the length of the encoded packets is , through shift-and-addition encoding by row, we cannot perform zigzag decoding in this case. From the first bit of each coded packet, only is exposed bit. When it is substituted into the second bit of , respectively, no new exposed bit can be obtained. The whole decoding process is locked, so the zigzag decoding cannot proceed, as shown in Figure 4.

5. Performance Analysis

In the case of one element takes part in the encoding only once in shift-and-addition encoding, the overhead of several existing CP-ZD codes is the square of or . As can be seen from Section 3, the overhead of the encoding framework that one element taking part in the encoding process multiple times when constructing one encoded packet is related to the size of the finite field; if the size of the finite field is , then . Therefore, the maximum overhead of this encoding framework is determined by the number of encoded packets, and the overhead is . Therefore, the overhead OH has a logarithmic relationship with the number of encoded packets, which has a huge advantage over the existing zigzag codes. However, due to the existence of multiple encodings, a source packet may be encoded once or more, which is more complex than the case of single encodings, leading to the possibility of decoding failure during zigzag decoding, as shown in Example 5. As can be seen from Figure 2, the ZD decoding rate of the encoding framework that one element takes part in the encoding process multiple times when constructing one encoded packet based on Cauchy matrix in shift-and-addition is about 80%.

In general, compared with the encoding framework of one element taking part in the encoding only once in shift-and-addition, the encoding framework of elements’ multiparticipation based on the Cauchy matrix has a good constraint on the overhead and can reduce the overhead from the existing square level to the logarithmic level. However, it has some losses in ZD properties and cannot guarantee ZD decoding.

6. Conclusions

Aiming at the problem of high overhead in CP-ZD codes, in this paper, we design a coding framework based on the idea of elements taking part in encoding multiple times in constructing an encoded packet based on Cauchy matrix and shift-and-addition. It is proved here that the framework has CP properties, but not completely with ZD properties. Experimental results show that the ZD decoding rate of this code is about 80%. However, the overhead is , which is reduced from the existing square level to the logarithmic level. The coding framework confirms the advantage of the element participating in the encoding for multiple times and lays a foundation for future research.

7. Future Works

The new coding framework proposed in this paper satisfies the CP property, but not the ZD property. At present, there is no mature encoding framework with one element taking part in encoding process multiple times. Aiming at the idea of elements taking part in encoding multiple times, it is obviously of prospective academic and application value to design a feasible ZD decoding framework. What is more, it is also necessary to study the closed form expression of CP-ZD code, which can be used to describe the necessary and sufficient conditions of CP-ZD code that elements taking part in encoding process multiple times.

Data Availability

No data were used to support this study.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This study was supported by the National Key Research and Development Program under Grant 2019YFB1803305, the Natural Science Foundation of China (62071304), the Natural Science Foundation of Guangdong Province (2020A1515010381), the Basic Research Foundation of Shenzhen City (20200826152915001), the Guangdong Basic and Applied Basic Research Foundation (2022A1515011219), the Natural Science Foundation of Shenzhen City (JCYJ20190808120415286), and the Natural Science Foundation of Shenzhen University (00002501).