Journal of Electrical and Computer Engineering

Volume 2012 (2012), Article ID 318946, 9 pages

http://dx.doi.org/10.1155/2012/318946

## Computation of Channel Capacity Based on Self-Concordant Functions

^{1}Search Center for Complex System Science, University of Shanghai for Science and Technology, Shanghai 200093, China^{2}Shanghai Stock Communication Co., Ltd., Shanghai 200131, China

Received 14 July 2011; Revised 1 November 2011; Accepted 15 November 2011

Academic Editor: Tamal Bose

Copyright © 2012 Da-gang Tian and Yi-qun Huang. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

The computation of channel capacity is a classical issue in information theory. We prove that algorithms based on self-concordant functions can be used to deal with such issues, especially when constrains are included. A new algorithm to compute the channel capacity per unit cost is proposed. The same view is suited to the computation of maximum entropy. All the algorithms are of polynomial time.

#### 1. Introduction

The computation of channel capacity is a classical issue in information theory. Much attention has been paid to it [1–3]. Despite some progress [3], the Arimoto-Blahut algorithm [4–7] is still the most used method. This method has been developed into a general algorithm, alternative minimization method [8], and has found applications in various fields [9–11]. Recently, the computation of channel capacity with constraints, such as the computation of capacity-cost function, is becoming increasingly important [5, 12–15]. The Arimoto-Blahut algorithm is very limited in dealing with such issues. Due to lack of appropriate means to eliminate the Lagrange multipliers, the method in [5], which is based on Arimoto-Blahut algorithm, cannot obtain the optimal point.

The computation of channel capacity has always been an optimization problem, but no one optimization algorithm, such as the gradient method and Newton’s method, and so forth, has become the mainstream method. One reason is that so far no one optimization algorithm can be as effective as the Arimoto-Blahut algorithm. However, with advances in optimization theory and with the increasing importance of constrained channel capacity issues, the situation is changing.

In this paper, some new algorithms based on self-concordant function are proposed. Besides the effectiveness—it is of polynomial time—the algorithm can compute constrained channel capacity easily. In Section 2, we introduced the definition of the self-concordant function and the main properties, as well as some algorithms for convex optimization problems. In Section 3, we give some new algorithms for channel capacity with or without constraints and for channel capacity per unit cost. Because it is based on the self-concordant function, these algorithms are of polynomial time. Some conclusions are given in Section 4.

#### 2. Self-Concordant Function and Convex Programming

For convenience, in this section, we make a brief introduction to self-concordant functions, including the definition, main propositions, and algorithms for convex optimization.

##### 2.1. Self-Concordant Function

Self-concordant function theory, published by Nesterov and Nemirovskii in 1994 [16, 17], is of landmark importance for establishing general interior point polynomial-time algorithms for convex programming. The theory reveals the main property of the interior point method for linear programming initiated by Karmarkar in 1984 [18]: self-concordance of the barrier function.

Suppose that is three times continuously differentiable and strictly convex, where the domain is an open convex subset in .

*Definition 1 (see [19]). *Let . A convex function is self-concordant if
for all .

*Definition 2. *A function is self-concordant if it is self-concordant along every line in its domain, that is, if the function is a self-concordant function of for all and for all .

Since linear and convex quadratic functions have zero third derivatives, they are trivial self-concordant functions. The most common nontrivial self-concordant function is the logarithmic function −log .

We can get more self-concordant functions due to the following simple and useful combination rules [19].

Proposition 3. *If , are self-concordant on and , , then is self-concordant on D.*

Proposition 4. *If is self-concordant on , and is a linear map, , then is self-concordant on .*

Proposition 5. *If is self-concordant on . Let . Then, is self-concordant on .*

##### 2.2. Newton’s Method for Self-Concordant Functions

Consider the convex optimization problem where is a bounded, closed, convex subset with a nonempty interior. Let be a convex function.

*Damped Newton Algorithm*

Given an initial point , with tolerance .

**Repeat**

(1)Compute the Newton step
and Newton decrement

(2) Stopping criterion. **Quit** if .

(3) If , then

else

If is a self-concordant function on , then the damped Newton algorithm above possesses the following good properties:(i)all if .(ii)if , then .(iii)if at some iteration we have , then we are in the region of quadratic convergence of the method, that is, for every ,
(iv)the number of Newton steps required to find a point with is bounded by
for some constant .

For the convex optimization problem with equality constraints , Newton’s method and the bound (8) are still valid as long as the initial point satisfies , and the Newton step and decrement are computed as follows:

For general convex optimization problems are
where are convex and twice continuously differentiable, and is a matrix with rank . By means of the logarithmic barrier function, (10) can be formulated approximately as an equality constrained problem as follows:
where
is called the logarithmic barrier function for (10). Let be the optimal point of (11) and let be the optimal value of (10), then . Therefore, we can simply take and solve the equality constrained problem (11) using Newton’s method to get an -solution of (10), and if the objective function of (11) is self-concordant, then the number of Newton’s iterate is bounded by (8). However, this method usually requires a large , which may bring about a large and a large number of Newton’s iterate, thus it is rarely used. A commonly used method is the path-following method as follows.

*Path-Following Method*

Given strictly feasible , with tolerance .

**Repeat**

(1)Centering step.Compute by minimizing , subject to , starting at .(2)Update. .(3)Stopping criterion. **Quit** if .(4)Increase . .

If for any is self-concordant, then the total number of Newton steps in the path-following method, not counting the initial centering step, is

The term in brackets is the iterate number of from to , and the term in parentheses is the iterate number of Newton’s method per centering step.

#### 3. Algorithms Based on Self-Concordant Functions for Channel Capacity

In this section, it is proved that the channel capacity, with or without constrains, can be computed by the path-following method in polynomial time. Meanwhile, we prove that channel capacity per unit cost is a single-peak function of the expected cost.

##### 3.1. Channel Capacity without Constrains

Let and be the input and output alphabets, respectively. Let and be the distributions of and , respectively. Let be a transition matrix from to . Hence,

If is the mutual information between and , then

Channel capacity is defined as follows: Equation (16) is a function of and and we denote it by .

The Arimoto-Blahut algorithm utilized (16) as follows. In (16), both and are unknown, but given , (16) can get its maximum at and given , (16) can get its maximum at If , then the channel capacity can be approximated by computing through (18) and (19), alternatively. Let be the mutual information , then,

In order to get an algorithm based on the self-concordant function, we utilize (15). Let By (15), the channel capacity can be expressed as follows: It is a standard convex optimization problem. Unfortunately, the objective function of (22) is not self-concordant, even when the logarithmic barrier −log is added.

It seems that we should add as constraints for (22) since there are log in the objective function of (22). But the are not independent variables. It can be computed by (14) and maintain positive as long as all the , hold. Nevertheless, we still add the logarithmic barrier −log in the objective of (22) to get a self-concordant objective function.

For , consider the convex optimization problem as follows: In order to show the self-concordance of the objective function of (23), we need Proposition 6.

Proposition 6. *For any is a self-concordant function on .*

*Proof. *We have
Thus

By Propositions 3–6, the objective function of (23) is self-concordant, so we can solve it by the path following method, and the number of Newton iterations is bounded by (13).

Equation (13) shows that solving (23) by the path following method is a polynomial time algorithm. In addition, a main advantage is that the algorithm can deal with constrains very easily.

##### 3.2. Channel Capacity with Constrains and Channel Capacity Per Unit Cost

Consider the channel capacity with constraints as follows: For , [5] gave some algorithms based on the Arimoto-Blahut method. Because one cannot eliminate the Lagrange multiplier, the algorithms in [5] cannot get the optimal solution for (26).

By adding logarithmic barriers for the inequality constraints in the objective of (26) as follows, we can get a polynomial time algorithm for (26): The objective function of (27) is self-concordant, so we can solve (27) by the path-following method.

Channel capacity per unit cost is one of the typical problems of channel capacity with constraints [15, 20–22]. By [15], channel capacity per unit cost can be computed by where is the capacity-cost function, which is the solution of the following constrained problem [12]: Equation (29) is a special case of (26) (). Here we discuss the channel capacity per unit cost in a more general manner.

Theorem 7. *Let be a convex set. Let be a concave function on and a convex function on . Let , and suppose . Let . Then,*(1)*is a concave increasing function,*(2)*let
then is either a unimodal function or a monotone function. *

*Proof. *(1) Let , then . Therefore, , For any , there are such that and , and there are such that and . Since is convex and is concave, we get
Since is arbitrary, is concave.

(2) Without loss of generality, we can assume is differentiable. If not, we can approach it by a differentiable function to any precision [23]. Suppose there is a , such that

Let ; since is concave, we have ). Let
It is the equation of a straight line that has slope ) and passes through the . Since is increasing and concave, we get and
that is, .

Let , since is concave, ). Let
It is the equation of a straight line that has slope ) and passes through . Since is increasing and concave, we get and
that is, .

Therefore, is a maximum point of function .

(3) If
then for any , there are , such that
Let , then , so
Since is arbitrary, we get
In turn, if
then for any , there are , , such that
Since
so
Therefore, if
is valid, then so is
and vice versa.

Suppose (46) is valid, for any , there are , such that
Let , then
Since is arbitrary, we get

On the other hand, since (47) is valid too, for any , there are , such that
Let , such that and , then
Since is arbitrary, we get

If is the average mutual information defined by (15) and , then (3) of Theorem 7 is Theorem 2 in [15].

In addition to average mutual information, entropy is another frequently used function in applications [24]. It is obvious that Theorem 7 is valid when is an entropy function (as a function of the distribution of input symbols). It is useful to point out that Theorem 7 is still valid when is a positive definite quadratic function [25].

Making use of (2) of Theorem 7, we can write an algorithm to compute the supas follows. The algorithm is based on the 0.618 method used to locate the optimal point of a single peak function.

*Algorithm 8 (0.618 method). *Let , and assume .

(1) Given search interval , and tolerance . compute and as follows:

; evaluate and . Let .

(2) If , then go to 3, else go to 4.

(3) If , then output and , else, let ; evaluate , go to 5.

(4) If , then output and , else, let ; evaluate , go to 5.

(5) , go to 2.

Even if , that is when there are zero cost input symbols, the algorithm is still valid.

##### 3.3. Maximum Entropy

A similar but simpler problem is the computation of maximum entropy. The typical form is as follows: where are known, and can be computed by sample data, . The GIS algorithm [26–28] is one of the most common algorithms for (54), which is only suited to linear constraints. In spite of that there is a good discussion on the computation in [27]; however, we cannot ensure that GIS is a polynomial time algorithm.

In [25] a maximum entropy problem with nonlinear constraints is considered. It is as follows: where is a positive definite matrix.

We can solve (54) and (55) in polynomial time. In fact, for (54), we add logarithmic barriers in its objective function, for (55), we add logarithmic barrier and in its objective function, and the problems can be transformed into (3). It is good to know that the function in (56) is self-concordant. Therefore, the path following method is valid.

#### 4. An Example

Let transmission matrix be as follows:

Let . Suppose that the initial distribution is discrete uniform, with penalty factor and initial penalty parameter . Iterating 70 times by the path-following method, we get channel capacity and the optimal distribution:.

With the Arimoto-Blahut algorithm, iterate 180 times, we get the same results.

Let be cost vector of the input symbols. Let be the expect cost of the input symbols. The curve of is shown in Figure 1; By Algorithm 8, the channel capacity per unit cost is 0.5801, and it is attained at . The search interval is [1, 5.8]. The number of iterations of the path-following is 17. The number of iterations of the 0.618 method is 19. The number of iterations of Newton’s algorithm is 5.

Let be a positive definite quadratic function, whereThe curve of is shown in Figure 2. The maximum is 0.2156, and it is attained at . The number of iterations of the path-following is 17. The number of iterations of the 0.618 method is 19. The number of iterations of Newton’s algorithm is 4.

#### 5. Conclusion

By means of self-concordant function theory, the computation of channel capacity, especially, when there are constraints and when constraints are nonlinear, becomes very simple. When the numerator is a general concave function and the denominator is a general convex function, the formula about channel capacity per unit cost is still valid. Furthermore, the function is single peak, hence we can get some new algorithms for channel capacity per unit cost.

#### References

- M. Chiang and S. Boyd, “Geometric programming duals of channel capacity and rate distortion,”
*IEEE Transactions on Information Theory*, vol. 50, no. 2, pp. 245–258, 2004. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus - A. Ben-Tal and M. Teboulle, “Extension of some results for channel capacity using a generalized information measure,”
*Applied Mathematics and Optimization*, vol. 17, no. 2, pp. 121–132, 1988. View at Google Scholar · View at Scopus - X. B. Liang, “An algebraic, analytic, and algorithmic investigation on the capacity and capacity-achieving input probability distributions of finite-input-finite-output discrete memoryless channels,”
*IEEE Transactions on Information Theory*, vol. 54, no. 3, pp. 1003–1023, 2008. View at Publisher · View at Google Scholar · View at Scopus - S. Arimoto, “Algorithm for computing the capacity of arbitrary discrete memoryless channels,”
*IEEE Transactions on Information Theory*, vol. 18, no. 1, pp. 14–20, 1972. View at Google Scholar · View at Scopus - R. E. Blahut, “Computation of channel capacity and rate- distortion functions,”
*IEEE Transactions on Information Theory*, vol. 18, no. 4, pp. 460–473, 1972. View at Google Scholar · View at Scopus - F. Dupuis, W. Yu, and F. M. J. Willems, “Blahut-Arimoto algorithms for computing channel capacity and rate-distortion with side information,” in
*Proceedings of IEEE International Symposium on Information Theory*, p. 179, July 2004. - Y. Yu, “Squeezing the arimotoblahut algorithm for faster convergence,”
*IEEE Transactions on Information Theory*, vol. 56, no. 7, Article ID 5484972, pp. 3149–3157, 2010. View at Publisher · View at Google Scholar - I. Csiszar and G. Tusnady, “Information geometry and alternating minimization procedures,”
*Statistics and Decisions*, supplement 1, pp. 205–237, 1984. View at Google Scholar - U. Niesen, D. Shah, and G. W. Wornell, “Adaptive alternating minimization algorithms,”
*IEEE Transactions on Information Theory*, vol. 55, no. 3, pp. 1423–1429, 2009. View at Publisher · View at Google Scholar - W. Byrne, “Alternating minimization and Boltzmann machine learning,”
*IEEE Transactions on Neural Networks*, vol. 3, no. 4, pp. 612–620, 1992. View at Publisher · View at Google Scholar · View at PubMed - Y. Wang, J. F. Yang, W. Yin, and Y. Zhang, “A new alternating minimization algorithm for total variation image reconstruction,”
*SIAM Journal on Imaging Sciences*, vol. 1, no. 3, pp. 248–272, 2008. View at Google Scholar - R. W. Yeung,
*Information Theory and Network Coding*, Springer, Berlin, Germany, 2008. - A. S. Khayrallah and D. L. Neuhoff, “Coding for channels with cost constraints,”
*IEEE Transactions on Information Theory*, vol. 42, no. 3, pp. 854–867, 1996. View at Google Scholar - F. Alajaji and N. Whalen, “The capacity-cost function of discrete additive noise channels with and without feedback,”
*IEEE Transactions on Information Theory*, vol. 46, no. 3, pp. 1131–1140, 2000. View at Google Scholar - S. Verdu, “On channel capacity per unit cost,”
*IEEE Transactions on Information Theory*, vol. 36, no. 5, pp. 1019–1030, 1990. View at Publisher · View at Google Scholar · View at MathSciNet - Y. Nesterov and A. Nemirovskii,
*Interior-Point Polynomial Methods in Convex Programming*, Society for Industrial and Applied Mathematics, 1994. - J. Renegar,
*A Mathematical View of Interior-Point Methods in Convex Optimization*, Society for Industrial and Applied Mathematics, 2001. - N. Karmarkar, “A new polynomial-time algorithm for linear programming,”
*Combinatorica*, vol. 4, no. 4, pp. 373–395, 1984. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet - S. Boyd and L. Vandenberghe,
*Convex Optimization*, Cambridge University Press, Cambridge, UK, 2004. - R. Lav, “Varshney, Variations On Channel Capacity per Unit Cost,” 2005, http://web.mit.edu/lrv/www/writing/cap_cost.pdf.
- M. Mitzenmacher, “A survey of results for deletion channels and related synchronization channels,”
*Probability Surveys*, vol. 6, pp. 1–33, 2009. View at Google Scholar - M. Kleiner and B. Rimoldi, “On fidelity per unit cost,” in
*IEEE International Symposium on Information Theory (ISIT '09)*, pp. 1639–1643, July 2009. View at Publisher · View at Google Scholar - K. A. Kopotun, D. Leviatan, and I. A. Shevchuk, “Convex polynomial approximation in the uniform norm: conclusion,”
*Canadian Journal of Mathematics*, vol. 57, no. 6, pp. 1224–1248, 2005. View at Google Scholar - M. Dudík, S. J. Phillips, and R. E. Schapire, “Maximum entropy density estimation with generalized regularization and an application to species distribution modeling,”
*Journal of Machine Learning Research*, vol. 8, pp. 1217–1260, 2007. View at Google Scholar - A. K. Bera and S. Y. Park, “Optimal portfolio diversification using the maximum entropy principle,”
*Econometric Reviews*, vol. 27, no. 4–6, pp. 484–512, 2008. View at Publisher · View at Google Scholar - J. Darroch and D. Ratcli, “Generlized iterative scaling for log-linear models,”
*Annals of Mathematical Statistics*, vol. 43, no. 5, pp. 1470–1480, 1972. View at Google Scholar - G. Liang, B. Yu, and N. Taft, “Maximum entropy models: convergence rates and applications in dynamic system monitoring,” in
*Proceedings of IEEE International Symposium on Information Theory*, pp. 168–175, July 2004. - K. Nigam, J. Lafferty, and A. Mccallum, “Using maximum entropy for text classification,” in
*the Workshop on Machine Learning for Information Filtering (IJCAI '99)*, pp. 61–67, 1999.