Abstract
The computation of channel capacity is a classical issue in information theory. We prove that algorithms based on self-concordant functions can be used to deal with such issues, especially when constrains are included. A new algorithm to compute the channel capacity per unit cost is proposed. The same view is suited to the computation of maximum entropy. All the algorithms are of polynomial time.
1. Introduction
The computation of channel capacity is a classical issue in information theory. Much attention has been paid to it [1–3]. Despite some progress [3], the Arimoto-Blahut algorithm [4–7] is still the most used method. This method has been developed into a general algorithm, alternative minimization method [8], and has found applications in various fields [9–11]. Recently, the computation of channel capacity with constraints, such as the computation of capacity-cost function, is becoming increasingly important [5, 12–15]. The Arimoto-Blahut algorithm is very limited in dealing with such issues. Due to lack of appropriate means to eliminate the Lagrange multipliers, the method in [5], which is based on Arimoto-Blahut algorithm, cannot obtain the optimal point.
The computation of channel capacity has always been an optimization problem, but no one optimization algorithm, such as the gradient method and Newton’s method, and so forth, has become the mainstream method. One reason is that so far no one optimization algorithm can be as effective as the Arimoto-Blahut algorithm. However, with advances in optimization theory and with the increasing importance of constrained channel capacity issues, the situation is changing.
In this paper, some new algorithms based on self-concordant function are proposed. Besides the effectiveness—it is of polynomial time—the algorithm can compute constrained channel capacity easily. In Section 2, we introduced the definition of the self-concordant function and the main properties, as well as some algorithms for convex optimization problems. In Section 3, we give some new algorithms for channel capacity with or without constraints and for channel capacity per unit cost. Because it is based on the self-concordant function, these algorithms are of polynomial time. Some conclusions are given in Section 4.
2. Self-Concordant Function and Convex Programming
For convenience, in this section, we make a brief introduction to self-concordant functions, including the definition, main propositions, and algorithms for convex optimization.
2.1. Self-Concordant Function
Self-concordant function theory, published by Nesterov and Nemirovskii in 1994 [16, 17], is of landmark importance for establishing general interior point polynomial-time algorithms for convex programming. The theory reveals the main property of the interior point method for linear programming initiated by Karmarkar in 1984 [18]: self-concordance of the barrier function.
Suppose that is three times continuously differentiable and strictly convex, where the domain is an open convex subset in .
Definition 1 (see [19]). Let . A convex function is self-concordant if for all .
Definition 2. A function is self-concordant if it is self-concordant along every line in its domain, that is, if the function is a self-concordant function of for all and for all .
Since linear and convex quadratic functions have zero third derivatives, they are trivial self-concordant functions. The most common nontrivial self-concordant function is the logarithmic function −log .
We can get more self-concordant functions due to the following simple and useful combination rules [19].
Proposition 3. If , are self-concordant on and , , then is self-concordant on D.
Proposition 4. If is self-concordant on , and is a linear map, , then is self-concordant on .
Proposition 5. If is self-concordant on . Let . Then, is self-concordant on .
2.2. Newton’s Method for Self-Concordant Functions
Consider the convex optimization problem where is a bounded, closed, convex subset with a nonempty interior. Let be a convex function.
Damped Newton Algorithm
Given an initial point , with tolerance .
Repeat
(1)Compute the Newton step
and Newton decrement
(2) Stopping criterion. Quit if .
(3) If , then
else
If is a self-concordant function on , then the damped Newton algorithm above possesses the following good properties:(i)all if .(ii)if , then .(iii)if at some iteration we have , then we are in the region of quadratic convergence of the method, that is, for every ,
(iv)the number of Newton steps required to find a point with is bounded by
for some constant .
For the convex optimization problem with equality constraints , Newton’s method and the bound (8) are still valid as long as the initial point satisfies , and the Newton step and decrement are computed as follows:
For general convex optimization problems are
where are convex and twice continuously differentiable, and is a matrix with rank . By means of the logarithmic barrier function, (10) can be formulated approximately as an equality constrained problem as follows:
where
is called the logarithmic barrier function for (10). Let be the optimal point of (11) and let be the optimal value of (10), then . Therefore, we can simply take and solve the equality constrained problem (11) using Newton’s method to get an -solution of (10), and if the objective function of (11) is self-concordant, then the number of Newton’s iterate is bounded by (8). However, this method usually requires a large , which may bring about a large and a large number of Newton’s iterate, thus it is rarely used. A commonly used method is the path-following method as follows.
Path-Following Method
Given strictly feasible , with tolerance .
Repeat
(1)Centering step.Compute by minimizing , subject to , starting at .(2)Update. .(3)Stopping criterion. Quit if .(4)Increase . .
If for any is self-concordant, then the total number of Newton steps in the path-following method, not counting the initial centering step, is
The term in brackets is the iterate number of from to , and the term in parentheses is the iterate number of Newton’s method per centering step.
3. Algorithms Based on Self-Concordant Functions for Channel Capacity
In this section, it is proved that the channel capacity, with or without constrains, can be computed by the path-following method in polynomial time. Meanwhile, we prove that channel capacity per unit cost is a single-peak function of the expected cost.
3.1. Channel Capacity without Constrains
Let and be the input and output alphabets, respectively. Let and be the distributions of and , respectively. Let be a transition matrix from to . Hence,
If is the mutual information between and , then
Channel capacity is defined as follows: Equation (16) is a function of and and we denote it by .
The Arimoto-Blahut algorithm utilized (16) as follows. In (16), both and are unknown, but given , (16) can get its maximum at and given , (16) can get its maximum at If , then the channel capacity can be approximated by computing through (18) and (19), alternatively. Let be the mutual information , then,
In order to get an algorithm based on the self-concordant function, we utilize (15). Let By (15), the channel capacity can be expressed as follows: It is a standard convex optimization problem. Unfortunately, the objective function of (22) is not self-concordant, even when the logarithmic barrier −log is added.
It seems that we should add as constraints for (22) since there are log in the objective function of (22). But the are not independent variables. It can be computed by (14) and maintain positive as long as all the , hold. Nevertheless, we still add the logarithmic barrier −log in the objective of (22) to get a self-concordant objective function.
For , consider the convex optimization problem as follows: In order to show the self-concordance of the objective function of (23), we need Proposition 6.
Proposition 6. For any is a self-concordant function on .
Proof. We have Thus
By Propositions 3–6, the objective function of (23) is self-concordant, so we can solve it by the path following method, and the number of Newton iterations is bounded by (13).
Equation (13) shows that solving (23) by the path following method is a polynomial time algorithm. In addition, a main advantage is that the algorithm can deal with constrains very easily.
3.2. Channel Capacity with Constrains and Channel Capacity Per Unit Cost
Consider the channel capacity with constraints as follows: For , [5] gave some algorithms based on the Arimoto-Blahut method. Because one cannot eliminate the Lagrange multiplier, the algorithms in [5] cannot get the optimal solution for (26).
By adding logarithmic barriers for the inequality constraints in the objective of (26) as follows, we can get a polynomial time algorithm for (26): The objective function of (27) is self-concordant, so we can solve (27) by the path-following method.
Channel capacity per unit cost is one of the typical problems of channel capacity with constraints [15, 20–22]. By [15], channel capacity per unit cost can be computed by where is the capacity-cost function, which is the solution of the following constrained problem [12]: Equation (29) is a special case of (26) (). Here we discuss the channel capacity per unit cost in a more general manner.
Theorem 7. Let be a convex set. Let be a concave function on and a convex function on . Let , and suppose . Let . Then,(1)is a concave increasing function,(2)let then is either a unimodal function or a monotone function.
Proof. (1) Let , then . Therefore, , For any , there are such that and , and there are such that and . Since is convex and is concave, we get
Since is arbitrary, is concave.
(2) Without loss of generality, we can assume is differentiable. If not, we can approach it by a differentiable function to any precision [23]. Suppose there is a , such that
Let ; since is concave, we have ). Let
It is the equation of a straight line that has slope ) and passes through the . Since is increasing and concave, we get and
that is, .
Let , since is concave, ). Let
It is the equation of a straight line that has slope ) and passes through . Since is increasing and concave, we get and
that is, .
Therefore, is a maximum point of function .
(3) If
then for any , there are , such that
Let , then , so
Since is arbitrary, we get
In turn, if
then for any , there are , , such that
Since
so
Therefore, if
is valid, then so is
and vice versa.
Suppose (46) is valid, for any , there are , such that
Let , then
Since is arbitrary, we get
On the other hand, since (47) is valid too, for any , there are , such that
Let , such that and , then
Since is arbitrary, we get
If is the average mutual information defined by (15) and , then (3) of Theorem 7 is Theorem 2 in [15].
In addition to average mutual information, entropy is another frequently used function in applications [24]. It is obvious that Theorem 7 is valid when is an entropy function (as a function of the distribution of input symbols). It is useful to point out that Theorem 7 is still valid when is a positive definite quadratic function [25].
Making use of (2) of Theorem 7, we can write an algorithm to compute the supas follows. The algorithm is based on the 0.618 method used to locate the optimal point of a single peak function.
Algorithm 8 (0.618 method). Let , and assume .
(1) Given search interval , and tolerance . compute and as follows:
; evaluate and . Let .
(2) If , then go to 3, else go to 4.
(3) If , then output and , else, let ; evaluate , go to 5.
(4) If , then output and , else, let ; evaluate , go to 5.
(5) , go to 2.
Even if , that is when there are zero cost input symbols, the algorithm is still valid.
3.3. Maximum Entropy
A similar but simpler problem is the computation of maximum entropy. The typical form is as follows: where are known, and can be computed by sample data, . The GIS algorithm [26–28] is one of the most common algorithms for (54), which is only suited to linear constraints. In spite of that there is a good discussion on the computation in [27]; however, we cannot ensure that GIS is a polynomial time algorithm.
In [25] a maximum entropy problem with nonlinear constraints is considered. It is as follows: where is a positive definite matrix.
We can solve (54) and (55) in polynomial time. In fact, for (54), we add logarithmic barriers in its objective function, for (55), we add logarithmic barrier and in its objective function, and the problems can be transformed into (3). It is good to know that the function in (56) is self-concordant. Therefore, the path following method is valid.
4. An Example
Let transmission matrix be as follows:
Let . Suppose that the initial distribution is discrete uniform, with penalty factor and initial penalty parameter . Iterating 70 times by the path-following method, we get channel capacity and the optimal distribution:.
With the Arimoto-Blahut algorithm, iterate 180 times, we get the same results.
Let be cost vector of the input symbols. Let be the expect cost of the input symbols. The curve of is shown in Figure 1; By Algorithm 8, the channel capacity per unit cost is 0.5801, and it is attained at . The search interval is [1, 5.8]. The number of iterations of the path-following is 17. The number of iterations of the 0.618 method is 19. The number of iterations of Newton’s algorithm is 5.
Let be a positive definite quadratic function, whereThe curve of is shown in Figure 2. The maximum is 0.2156, and it is attained at . The number of iterations of the path-following is 17. The number of iterations of the 0.618 method is 19. The number of iterations of Newton’s algorithm is 4.
5. Conclusion
By means of self-concordant function theory, the computation of channel capacity, especially, when there are constraints and when constraints are nonlinear, becomes very simple. When the numerator is a general concave function and the denominator is a general convex function, the formula about channel capacity per unit cost is still valid. Furthermore, the function is single peak, hence we can get some new algorithms for channel capacity per unit cost.