Table of Contents Author Guidelines Submit a Manuscript
Journal of Electrical and Computer Engineering
Volume 2012 (2012), Article ID 318946, 9 pages
Research Article

Computation of Channel Capacity Based on Self-Concordant Functions

1Search Center for Complex System Science, University of Shanghai for Science and Technology, Shanghai 200093, China
2Shanghai Stock Communication Co., Ltd., Shanghai 200131, China

Received 14 July 2011; Revised 1 November 2011; Accepted 15 November 2011

Academic Editor: Tamal Bose

Copyright © 2012 Da-gang Tian and Yi-qun Huang. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


The computation of channel capacity is a classical issue in information theory. We prove that algorithms based on self-concordant functions can be used to deal with such issues, especially when constrains are included. A new algorithm to compute the channel capacity per unit cost is proposed. The same view is suited to the computation of maximum entropy. All the algorithms are of polynomial time.

1. Introduction

The computation of channel capacity is a classical issue in information theory. Much attention has been paid to it [13]. Despite some progress [3], the Arimoto-Blahut algorithm [47] is still the most used method. This method has been developed into a general algorithm, alternative minimization method [8], and has found applications in various fields [911]. Recently, the computation of channel capacity with constraints, such as the computation of capacity-cost function, is becoming increasingly important [5, 1215]. The Arimoto-Blahut algorithm is very limited in dealing with such issues. Due to lack of appropriate means to eliminate the Lagrange multipliers, the method in [5], which is based on Arimoto-Blahut algorithm, cannot obtain the optimal point.

The computation of channel capacity has always been an optimization problem, but no one optimization algorithm, such as the gradient method and Newton’s method, and so forth, has become the mainstream method. One reason is that so far no one optimization algorithm can be as effective as the Arimoto-Blahut algorithm. However, with advances in optimization theory and with the increasing importance of constrained channel capacity issues, the situation is changing.

In this paper, some new algorithms based on self-concordant function are proposed. Besides the effectiveness—it is of polynomial time—the algorithm can compute constrained channel capacity easily. In Section 2, we introduced the definition of the self-concordant function and the main properties, as well as some algorithms for convex optimization problems. In Section 3, we give some new algorithms for channel capacity with or without constraints and for channel capacity per unit cost. Because it is based on the self-concordant function, these algorithms are of polynomial time. Some conclusions are given in Section 4.

2. Self-Concordant Function and Convex Programming

For convenience, in this section, we make a brief introduction to self-concordant functions, including the definition, main propositions, and algorithms for convex optimization.

2.1. Self-Concordant Function

Self-concordant function theory, published by Nesterov and Nemirovskii in 1994 [16, 17], is of landmark importance for establishing general interior point polynomial-time algorithms for convex programming. The theory reveals the main property of the interior point method for linear programming initiated by Karmarkar in 1984 [18]: self-concordance of the barrier function.

Suppose that 𝑓𝐷𝑅 is three times continuously differentiable and strictly convex, where the domain 𝐷 is an open convex subset in 𝑅𝑛.

Definition 1 (see [19]). Let 𝑛=1. A convex function 𝑓𝑅𝑅 is self-concordant if ||𝑓||(𝑥)2𝑓(𝑥)3/2(1) for all 𝑥𝐷.

Definition 2. A function 𝑓𝑅𝑛𝑅 is self-concordant if it is self-concordant along every line in its domain, that is, if the function 𝑔(𝑡)=𝑓(𝑥+𝑡𝑣) is a self-concordant function of 𝑡 for all 𝑥𝐷 and for all 𝑣𝑅𝑛.
Since linear and convex quadratic functions have zero third derivatives, they are trivial self-concordant functions. The most common nontrivial self-concordant function is the logarithmic function −log 𝑥.
We can get more self-concordant functions due to the following simple and useful combination rules [19].

Proposition 3. If 𝑓1, 𝑓2 are self-concordant on 𝐷 and 𝑎11, 𝑎21, then 𝑎1𝑓1+𝑎2𝑓2 is self-concordant on D.

Proposition 4. If 𝑓 is self-concordant on 𝐷𝑅𝑛, and 𝐴𝑅𝑚𝑅𝑛 is a linear map, 𝑏𝑅𝑛, then 𝑓(𝐴𝑥+𝑏) is self-concordant on {𝑥𝑅𝑚𝐴𝑥+𝑏𝐷}.

Proposition 5. If 𝑓𝑖 is self-concordant on 𝐷𝑖𝑅𝑛𝑖,𝑖=1,2,,𝑚. Let 𝐷=𝐷1××𝐷𝑚. Then, 𝑓(𝑥1,,𝑥𝑚)=𝑓1(𝑥1)++𝑓𝑚(𝑥𝑚) is self-concordant on 𝐷.

2.2. Newton’s Method for Self-Concordant Functions

Consider the convex optimization problemmin𝐷𝑓(𝑥),(2) where 𝐷𝑅𝑛 is a bounded, closed, convex subset with a nonempty interior. Let 𝑓(𝑥) be a convex function.

Damped Newton Algorithm
Given an initial point 𝑥0int(𝐷), with tolerance 𝜀>0.

(1)Compute the Newton step Δ𝑥𝑛𝑡=2𝑓(𝑥)1𝑓(𝑥),(3) and Newton decrement 𝜆(𝑓,𝑥)=𝑓(𝑥)𝑇2𝑓(𝑥)1𝑓(𝑥)1/2.(4)
(2) Stopping criterion. Quit if 𝜆(𝑓,𝑥)2/2𝜀.
(3) If 𝜆(𝑓,𝑥)>1/4, then 1𝑥=𝑥+1+𝜆(𝑓,𝑥)Δ𝑥𝑛𝑡,(5)
else 𝑥=𝑥+Δ𝑥𝑛𝑡.(6)
If 𝑓(𝑥) is a self-concordant function on 𝐷, then the damped Newton algorithm above possesses the following good properties:(i)all 𝑥int(𝐷) if 𝑥0int(𝐷).(ii)if 𝜆(𝑓,𝑥)>0.25, then 𝑓(𝑥𝑖)𝑓(𝑥𝑖+1)0.026.(iii)if at some iteration 𝑖 we have 𝜆(𝑓,𝑥𝑖)0.25, then we are in the region of quadratic convergence of the method, that is, for every 𝑗𝑖, 𝜆𝑓,𝑥𝑗+12𝜆2𝑓,𝑥𝑗12𝜆𝑓,𝑥𝑗,𝑓𝑥𝑗𝑝𝜆2𝑓,𝑥𝑗21𝜆𝑓,𝑥𝑗.(7)(iv)the number of Newton steps required to find a point 𝑥int(𝐷) with 𝜆(𝑓,𝑥𝑖)𝜀<1 is bounded by𝐶𝑓𝑥(0)𝑝+log2log21𝜀(8) for some constant 𝐶.
For the convex optimization problem with equality constraints 𝐴𝑥=𝑏, Newton’s method and the bound (8) are still valid as long as the initial point 𝑥0int(𝐷) satisfies 𝐴𝑥0=𝑏, and the Newton step Δ𝑥𝑛𝑡 and decrement 𝜆(𝑓,𝑥) are computed as follows: 2𝑓(𝑥)𝐴𝑇𝐴0Δ𝑥𝑛𝑡𝑤=0,𝑓(𝑥)𝜆(𝑓,𝑥)=Δ𝑥𝑇𝑛𝑡2𝑓(𝑥)Δ𝑥𝑛𝑡1/2.(9)
For general convex optimization problems are min𝑓0(𝑥),s.t𝑓𝑖(𝑥)0,𝑖=1,,𝑚,𝐴𝑥=𝑏,(10) where 𝑓0,,𝑓𝑚𝑅𝑛𝑅 are convex and twice continuously differentiable, and 𝐴 is a 𝑝×𝑛 matrix with rank (𝐴)=𝑝<𝑛. By means of the logarithmic barrier function, (10) can be formulated approximately as an equality constrained problem as follows: min𝑡𝑓0(𝑥)+𝜙(𝑥),s.t𝐴𝑥=𝑏,(11) where 𝜙(𝑥)=𝑚𝑖=1log𝑓𝑖(𝑥)(12) is called the logarithmic barrier function for (10). Let 𝑥(𝑡) be the optimal point of (11) and let 𝑝 be the optimal value of (10), then 𝑓0(𝑥(𝑡))𝑝𝑚/𝑡. Therefore, we can simply take 𝑡=𝑚/𝜀 and solve the equality constrained problem (11) using Newton’s method to get an 𝜀-solution of (10), and if the objective function of (11) is self-concordant, then the number of Newton’s iterate is bounded by (8). However, this method usually requires a large 𝑡, which may bring about a large 𝑓(𝑥(0))𝑝 and a large number of Newton’s iterate, thus it is rarely used. A commonly used method is the path-following method as follows.

Path-Following Method
Given strictly feasible 𝑥,𝑡=𝑡0>0,𝜇>1, with tolerance 𝜀>0.

(1)Centering step.Compute 𝑥(𝑡) by minimizing 𝑡𝑓0(𝑡)+𝜑, subject to 𝐴𝑥=𝑏, starting at 𝑥.(2)Update. 𝑥=𝑥(𝑡).(3)Stopping criterion. Quit if 𝑚/𝑡<𝜀.(4)Increase 𝑡. 𝑡=𝜇𝑡.

If for any 𝑡>0,𝑓(𝑥)=𝑡𝑓0(𝑥)+𝜑(𝑥) is self-concordant, then the total number of Newton steps in the path-following method, not counting the initial centering step, is𝑡𝑁=log𝑚/0𝜀×log𝜇𝑚(𝜇1log𝜇)+log2log21𝜀.(13)

The term in brackets is the iterate number of 𝑡 from 𝑡0 to 𝑚/𝜀, and the term in parentheses is the iterate number of Newton’s method per centering step.

3. Algorithms Based on Self-Concordant Functions for Channel Capacity

In this section, it is proved that the channel capacity, with or without constrains, can be computed by the path-following method in polynomial time. Meanwhile, we prove that channel capacity per unit cost is a single-peak function of the expected cost.

3.1. Channel Capacity without Constrains

Let 𝑋={𝑥1,,𝑥𝑚} and 𝑌={𝑦1,,𝑦𝑛} be the input and output alphabets, respectively. Let 𝑝(𝑥)=(𝑝(𝑥1),,𝑝(𝑥𝑚)) and 𝑞(𝑦)=(𝑞(𝑦1),,𝑞(𝑦𝑛)) be the distributions of 𝑋 and 𝑌, respectively. Let 𝑇=(𝑝(𝑦𝑗𝑥𝑖))𝑚×𝑛=(𝑝𝑖𝑗)𝑚×𝑛be a transition matrix from 𝑋 to 𝑌. Hence,𝑞𝑦𝑗𝑥=𝑝1𝑝1𝑗𝑥++𝑝𝑚𝑝𝑚𝑗,𝑗=1,,𝑛.(14)

If 𝐼(𝑋;𝑌) is the mutual information between 𝑋 and 𝑌, then𝐼(𝑋;𝑌)=𝑖,𝑗𝑝𝑥𝑖𝑝𝑦𝑗𝑥𝑖𝑝𝑦log𝑗𝑥𝑖𝑞𝑦𝑗(15)=𝑖,𝑗𝑝𝑥𝑖𝑝𝑦𝑗𝑥𝑖𝑝𝑥log𝑖𝑦𝑗𝑝𝑥𝑖.(16)

Channel capacity is defined as follows:𝐶=max𝑝(𝑥)𝐼(𝑋;𝑌).(17) Equation (16) is a function of 𝑝(𝑥) and 𝑝(𝑥|𝑦) and we denote it by 𝐼(𝑝(𝑥);𝑝(𝑥|𝑦)).

The Arimoto-Blahut algorithm utilized (16) as follows. In (16), both 𝑝(𝑥) and 𝑝(𝑥|𝑦) are unknown, but given 𝑝(𝑥), (16) can get its maximum at𝑝𝑥𝑖𝑦𝑗=𝑝𝑥𝑖𝑝𝑦𝑗𝑥𝑖𝑚𝑘=1𝑝𝑥𝑘𝑝𝑦𝑗𝑥𝑘,𝑖=1,,𝑚.𝑗=1,,𝑛,(18) and given 𝑝(𝑥|𝑦), (16) can get its maximum at𝑝𝑥𝑖=𝑗𝑝𝑥𝑖𝑦𝑗𝑝(𝑦𝑗𝑥𝑖)𝑚𝑖=1𝑗𝑝𝑥𝑖𝑦𝑗𝑝(𝑦𝑗𝑥𝑖),𝑖=1,,𝑚.(19) If 𝑝0(𝑥)=(1/𝑚,,1/𝑚), then the channel capacity can be approximated by computing 𝑝0(𝑥|𝑦),𝑝1(𝑥),𝑝1(𝑥|𝑦), through (18) and (19), alternatively. Let 𝐶(𝑁+1,𝑁) be the mutual information 𝐼(𝑝𝑁+1(𝑥);𝑝𝑁(𝑥|𝑦)), then,𝐶𝐶(𝑁+1,𝑁)log𝑚𝑁.(20)

In order to get an algorithm based on the self-concordant function, we utilize (15). Let𝑐𝑖=𝑛𝑗=1𝑝𝑦𝑗𝑥𝑖𝑦log𝑝𝑗𝑥𝑖.(21) By (15), the channel capacity can be expressed as follows:min𝑚𝑖=1𝑐𝑖𝑝𝑖+𝑛𝑗=1𝑞𝑗log𝑞𝑗,s.t𝑞𝑗=𝑚𝑖=1𝑝𝑖𝑝𝑦𝑗𝑥𝑖𝑝,𝑗=1,,𝑛,1++𝑝𝑚𝑝=1,1>0,,𝑝𝑚>0.(22) It is a standard convex optimization problem. Unfortunately, the objective function of (22) is not self-concordant, even when the logarithmic barrier −log 𝑝𝑖 is added.

It seems that we should add 𝑞𝑗>0 as constraints for (22) since there are log𝑞𝑗 in the objective function of (22). But the 𝑞𝑗 are not independent variables. It can be computed by (14) and maintain positive as long as all the 𝑝𝑖>0,𝑖=1,2,,𝑚, hold. Nevertheless, we still add the logarithmic barrier −log 𝑞𝑗 in the objective of (22) to get a self-concordant objective function.

For 𝑡>0, consider the convex optimization problem as follows:min𝑡𝑚𝑖=1𝑐𝑖𝑝𝑖+𝑡𝑛𝑗=1𝑞𝑗log𝑞𝑗𝑚𝑖=1log𝑝𝑖𝑛𝑗=1log𝑞𝑗,s.t𝑞𝑗=𝑚𝑖=1𝑝𝑖𝑝𝑦𝑗𝑥𝑖𝑝,𝑗=1,,𝑛,1++𝑝𝑚=1.(23) In order to show the self-concordance of the objective function of (23), we need Proposition 6.

Proposition 6. For any 𝑡>0,𝑓(𝑥)=𝑡𝑥log𝑥log𝑥 is a self-concordant function on 𝑥>0.

Proof. We have 𝑓1(𝑥)=𝑡log𝑥+𝑡𝑥,𝑓𝑡(𝑥)=𝑥+1𝑥2,||𝑓||=𝑡(𝑥)𝑥2+2𝑥3.(24) Thus ||𝑓||(𝑥)𝑓(𝑥)3/2=(𝑡𝑥+2)(𝑡𝑥+1)3/22.(25)

By Propositions 36, the objective function of (23) is self-concordant, so we can solve it by the path following method, and the number of Newton iterations is bounded by (13).

Equation (13) shows that solving (23) by the path following method is a polynomial time algorithm. In addition, a main advantage is that the algorithm can deal with constrains very easily.

3.2. Channel Capacity with Constrains and Channel Capacity Per Unit Cost

Consider the channel capacity with constraints as follows:max𝐼(𝑋;𝑌),s.t𝑚𝑗=1𝑎𝑖𝑗𝑝𝑗𝐸𝑖𝑝,𝑖=1,,𝑟,1++𝑝𝑚𝑝=1,10,,𝑝𝑚0.(26) For 𝑟=2, [5] gave some algorithms based on the Arimoto-Blahut method. Because one cannot eliminate the Lagrange multiplier, the algorithms in [5] cannot get the optimal solution for (26).

By adding logarithmic barriers for the inequality constraints in the objective of (26) as follows, we can get a polynomial time algorithm for (26):min𝑡𝑚𝑖=1𝑐𝑖𝑝𝑖+𝑡𝑛𝑚𝑗=1𝑖=1𝑝𝑖𝑝𝑦𝑗𝑥𝑖log𝑚𝑖=1𝑝𝑖𝑝𝑦𝑗𝑥𝑖𝑚𝑖=1log𝑝𝑖𝑛𝑗=1log𝑚𝑖=1𝑝𝑖𝑝𝑦𝑗𝑥𝑖𝑟𝑘=1𝐸log𝑘𝑚𝑖=1𝑝𝑖𝑎𝑟𝑖,s.t𝑝1++𝑝𝑚=1.(27) The objective function of (27) is self-concordant, so we can solve (27) by the path-following method.

Channel capacity per unit cost is one of the typical problems of channel capacity with constraints [15, 2022]. By [15], channel capacity per unit cost can be computed by𝐶=sup𝛽>0𝐶(𝛽)𝛽,(28) where 𝐶(𝛽) is the capacity-cost function, which is the solution of the following constrained problem [12]:max𝐼(𝑋;𝑌),s.t𝑚𝑗=1𝑎𝑗𝑝𝑗𝑝𝛽,1++𝑝𝑚𝑝=1,10,,𝑝𝑚0.(29) Equation (29) is a special case of (26) (𝑟=1). Here we discuss the channel capacity per unit cost in a more general manner.

Theorem 7. Let 𝐷𝑅𝑛 be a convex set. Let 𝐹(𝑥) be a concave function on 𝐷 and 𝐺(𝑥) a convex function on 𝐷. Let 𝑎=inf{𝐺(𝑥)|𝑥𝐷},𝑏=sup{𝐺(𝑥)|𝑥𝐷}, and suppose 𝑎0. Let 𝐶(𝛽)=sup{𝐹(𝑥)|𝐺(𝑥)𝛽}. Then,(1)𝐶(𝛽)is a concave increasing function,(2)let 𝜌(𝛽)=𝐶(𝛽)𝛽,(30) then 𝜌(𝛽) is either a unimodal function or a monotone function. sup𝐷𝐹(𝑥)𝐺(𝑥)=sup𝑎<𝛽<𝑏𝐶(𝛽)𝛽.(31)

Proof. (1) Let 𝛽1>𝛽2, then {𝑥|𝐺(𝑥)𝛽2}{𝑥|𝐺(𝑥)𝛽1}. Therefore, 𝐶(𝛽2)𝐶(𝛽1), For any 𝜀>0, there are 𝑥1 such that 𝐺(𝑥1)𝛽1 and 𝐶(𝛽1)=𝐹(𝑥1)+𝜀, and there are 𝑥2 such that 𝐺(𝑥2)𝛽2 and 𝐶(𝛽2)=𝐹(𝑥2)+𝜀. Since 𝐺(𝑥) is convex and 𝐹(𝑥) is concave, we get 𝐺𝛼𝑥1+(1𝛼)𝑥2𝑥𝛼𝐺1+𝑥(1𝛼)𝐺2𝛼𝛽1+(1𝛼)𝛽2,𝛽𝛼𝐶1𝛽+(1𝛼)𝐶2𝑥=𝛼𝐹1𝑥+(1𝛼)𝐹2+𝜀𝐹𝛼𝑥1+(1𝛼)𝑥2+𝜀𝐶𝛼𝛽1+(1𝛼)𝛽2+𝜀.(32) Since 𝜀 is arbitrary, 𝐶(𝛽) is concave.
(2) Without loss of generality, we can assume 𝐶(𝛽) is differentiable. If not, we can approach it by a differentiable function to any precision [23]. Suppose there is a 𝛽0(𝑎,𝑏), such that 𝜌|||(𝛽)𝛽=𝛽0=0,thatis𝐶𝛽0𝛽0𝛽𝐶0=0.(33)
Let 𝛽1>𝛽0; since 𝐶(𝛽) is concave, we have 𝐶(𝛽1)𝐶(𝛽0). Let 𝜌(||𝛽)𝛽=𝛽0=0,thatis𝐶𝛽0𝛽0𝛽𝐶0=0.(34) It is the equation of a straight line that has slope 𝐶(𝛽1) and passes through the (𝛽0,𝐶(𝛽0)). Since 𝐶(𝛽) is increasing and concave, we get 𝑦1(𝛽1)𝐶(𝛽1) and 𝐶𝛽1𝐶𝛽1𝛽1𝐶𝛽1𝑦1𝛽1𝛽1=𝐶𝛽1𝛽0𝛽1𝐶𝛽0𝛽1𝐶𝛽0𝛽0𝛽1𝐶𝛽0𝛽1=0,(35) that is, 𝜌(𝛽1)0.
Let 𝛽2<𝛽0, since 𝐶(𝛽) is concave, 𝐶(𝛽2)𝐶(𝛽0). Let 𝑦0(𝛽)=𝐶𝛽0𝛽𝛽0𝛽+𝐶0.(36) It is the equation of a straight line that has slope 𝐶(𝛽0) and passes through (𝛽0,𝐶(𝛽0)). Since 𝐶(𝛽) is increasing and concave, we get 𝑦0(𝛽2)𝐶(𝛽2) and 𝐶𝛽2𝐶𝛽2𝛽2𝐶𝛽2𝑦0𝛽2𝛽2𝐶𝛽0𝑦0𝛽2𝛽2=𝐶𝛽0𝛽0𝛽2𝐶𝛽0𝛽2=0,(37) that is, 𝜌(𝛽2)0.
Therefore, 𝛽0 is a maximum point of function 𝜌(𝛽)=𝐶(𝛽)/𝛽.
(3) If sup𝐷𝐹(𝑥)𝐺(𝑥)=+,(38) then for any 𝑁>0, there are 𝑥0, such that 𝐹𝑥0𝐺𝑥0>𝑁.(39) Let 𝐺(𝑥0)=𝛽0, then 𝐶(𝛽0)=sup{𝐹(𝑥)|𝐺(𝑥)𝛽0}𝐹(𝑥0), so sup𝑎<𝛽<𝑏𝐶(𝛽)𝛽𝐶𝛽0𝛽0𝐹𝑥0𝐺𝑥0>𝑁.(40) Since 𝑁 is arbitrary, we get sup𝑎<𝛽<𝑏𝐶(𝛽)𝛽=+.(41) In turn, if sup𝑎<𝛽<𝑏𝐶(𝛽)𝛽=+,(42) then for any 𝑁>0, there are 𝛽0, 𝑎<𝛽0<𝑏, such that 𝐶𝛽0𝛽0>𝑁.(43) Since sup𝐷𝐹(𝑥)𝐺(𝑥)sup𝐺(𝑥)𝛽0𝐹(𝑥)𝐺(𝑥)sup𝐺(𝑥)𝛽0𝐹(𝑥)𝛽0=𝐶𝛽0𝛽0>𝑁(44) so sup𝐷𝐹(𝑥)𝐺(𝑥)=+.(45) Therefore, if sup𝐷𝐹(𝑥)𝐺(𝑥)<+(46) is valid, then so is sup𝑎<𝛽<𝑏𝐶(𝛽)𝛽<+(47) and vice versa.
Suppose (46) is valid, for any 𝜀>0, there are 𝑥0𝐷, such that 𝐹𝑥0𝐺𝑥0+𝜀=sup𝐷𝐹(𝑥)𝐺(𝑥).(48) Let 𝐺(𝑥0)=𝛽0, then sup𝐷𝐹(𝑥)=𝐹𝑥𝐺(𝑥)0𝐺𝑥0𝐶𝛽+𝜀0𝛽0+𝜀sup𝑎<𝛽<𝑏𝐶(𝛽)𝛽+𝜀.(49) Since 𝜀 is arbitrary, we get sup𝐷𝐹(𝑥)𝐺(𝑥)sup𝑎<𝛽<𝑏𝐶(𝛽)𝛽.(50)
On the other hand, since (47) is valid too, for any 𝜀>0, there are 𝑎<𝛽0<𝑏, such that 𝐶𝛽0𝛽0+𝜀=sup𝑎<𝛽<𝑏𝐶(𝛽)𝛽.(51) Let 𝑥0𝐷, such that 𝐺(𝑥0)𝛽0 and 𝐹(𝑥0)+𝜀𝛽0=sup{𝐹(𝑥)|𝐺(𝑥)𝛽0}=𝐶(𝛽0), then sup𝐷𝐹(𝑥)𝐺(𝑥)sup𝐺(𝑥)𝛽0𝐹(𝑥)𝐹𝑥𝐺(𝑥)0𝛽0=sup𝑎<𝛽<𝑏𝐶(𝛽)𝛽2𝜀.(52) Since 𝜀 is arbitrary, we get sup𝐷𝐹(𝑥)𝐺(𝑥)sup𝑎<𝛽<𝑏𝐶(𝛽)𝛽.(53)

If 𝐹(𝑥)=𝐼(𝑋,𝑌) is the average mutual information defined by (15) and 𝐺(𝑥)=𝐸[𝑏(𝑋)], then (3) of Theorem 7 is Theorem  2 in [15].

In addition to average mutual information, entropy is another frequently used function in applications [24]. It is obvious that Theorem 7 is valid when 𝐹(𝑥) is an entropy function (as a function of the distribution of input symbols). It is useful to point out that Theorem 7 is still valid when 𝐺(𝑥) is a positive definite quadratic function [25].

Making use of (2) of Theorem 7, we can write an algorithm to compute the sup(𝐹(𝑥)/𝐺(𝑥))as follows. The algorithm is based on the 0.618 method used to locate the optimal point of a single peak function.

Algorithm 8 (0.618 method). Let 𝑎=inf{𝐺(𝑥)|𝑥𝐷},𝑏=sup{𝐺(𝑥)|𝑥𝐷}, and assume 𝑎0.
(1) Given search interval [𝑎,𝑏], and tolerance 𝜀>0. compute 𝜆 and 𝜇 as follows:
𝜆=𝑎+0.382(𝑏𝑎),𝜇=𝑎+0.618(𝑏𝑎); evaluate 𝐶(𝜆)/𝜆 and 𝐶(𝜇)/𝜇. Let 𝑘=1.
(2) If 𝐶(𝜆)/𝜆>𝐶(𝜇)/𝜇, then go to 3, else go to 4.
(3) If 𝑏𝜆𝜀, then output 𝜇 and 𝐶(𝜇)/𝜇, else, let 𝑎=𝜆,𝜆=𝜇,𝐶(𝜆)/𝜆=𝐶(𝜇)/𝜇,𝜇=𝑎+0.618(𝑏𝑎); evaluate 𝐶(𝜇)/𝜇, go to 5.
(4) If 𝜇𝑎𝜀, then output 𝜆 and 𝐶(𝜆)/𝜆, else, let 𝑏=𝜇,𝜇=𝜆,𝐶(𝜇)/𝜇=𝐶(𝜆)/𝜆,𝜆=𝑎+0.382(𝑏𝑎); evaluate 𝐶(𝜆)/𝜆, go to 5.
(5)  𝑘=𝑘+1, go to 2.
Even if 𝑎=0, that is when there are zero cost input symbols, the algorithm is still valid.

3.3. Maximum Entropy

A similar but simpler problem is the computation of maximum entropy. The typical form is as follows:min𝑚𝑖=1𝑝𝑖𝑝log𝑖𝜋𝑖,s.t𝑚𝑗=1𝑎𝑖𝑗𝑝𝑗=𝐸𝑖𝑝,𝑖=1,,𝑟,1++𝑝𝑚𝑝=1,10,,𝑝𝑚0,(54) where 𝜋𝑖 are known, and 𝐸𝑖 can be computed by sample data, 𝑖=1,,𝑟. The GIS algorithm [2628] is one of the most common algorithms for (54), which is only suited to linear constraints. In spite of that there is a good discussion on the computation in [27]; however, we cannot ensure that GIS is a polynomial time algorithm.

In [25] a maximum entropy problem with nonlinear constraints is considered. It is as follows:min𝑚𝑖=1𝑝𝑖𝑝log𝑖𝜋𝑖,s.t𝑚𝑖=1𝑝𝑖𝑎𝑖𝑖,𝑗𝜎𝑖𝑗𝑝𝑖𝑝𝑗𝑝𝑏,1++𝑝𝑚𝑝=1,10,,𝑝𝑚0,(55) where (𝜎𝑖𝑗)𝑚×𝑚 is a positive definite matrix.

We can solve (54) and (55) in polynomial time. In fact, for (54), we add logarithmic barriers log𝑝𝑖 in its objective function, for (55), we add logarithmic barrier log𝑝𝑖 andlog𝑚𝑖=1𝑝𝑖𝑎𝑖𝑖,𝑗𝜎𝑖𝑗𝑝𝑖𝑝𝑗𝑏(56) in its objective function, and the problems can be transformed into (3). It is good to know that the function in (56) is self-concordant. Therefore, the path following method is valid.

4. An Example

Let transmission matrix 𝑃 be as follows: 𝑃=

Let 𝜀=0.00001. Suppose that the initial distribution is discrete uniform, with penalty factor 𝜇=2 and initial penalty parameter 𝑡0=10. Iterating 70 times by the path-following method, we get channel capacity 𝐶=1.370656 and the optimal distribution:(0.0893518,0.2414400,0.1743530,0.1548483,0.0000003,0.0811754,0.1946234,0.0000004,0.0642042,0.0000080).

With the Arimoto-Blahut algorithm, iterate 180 times, we get the same results.

Let 𝑏=(1,2,5,7,6,8,3,9,3,4)𝑇 be cost vector of the input symbols. Let 𝐺(𝑝)=𝑏𝑇𝑝 be the expect cost of the input symbols. The curve of 𝐶(𝛽)/𝛽 is shown in Figure 1; By Algorithm 8, the channel capacity per unit cost is 0.5801, and it is attained at 𝛽=1.8325. The search interval is [1, 5.8]. The number of iterations of the path-following is 17. The number of iterations of the 0.618 method is 19. The number of iterations of Newton’s algorithm is 5.

Figure 1: Curve of 𝐶(𝛽)/𝛽, where 𝐺(𝑥) is the expected cost of the input symbol.

Let 𝐺(𝑥)=𝑏𝑥+𝑥𝐷𝑥 be a positive definite quadratic function, where𝐷=4.27933.63903.32993.15052.59393.29773.17773.01313.22074.71073.63904.45363.68853.66442.25663.77253.39123.16152.16363.73113.32993.68854.84362.88522.96073.99941.95092.65992.76744.19903.15053.66442.88524.10042.27732.89203.84542.61062.67863.30132.59392.25662.96072.27733.63622.27642.04941.38582.49843.08553.29773.77253.99942.89202.27645.41582.92413.21133.05803.08603.17773.39121.95093.84542.04942.92416.02923.86793.23173.45823.01313.16152.65992.61061.38583.21133.86794.92803.42043.52313.22072.16362.76742.67862.49843.05803.23173.42045.18563.65434.71073.73114.19903.30133.08553.08603.45823.52313.65436.5953.(58)The curve of 𝐶(𝛽)/𝛽 is shown in Figure 2. The maximum is 0.2156, and it is attained at 𝛽=5.9093. The number of iterations of the path-following is 17. The number of iterations of the 0.618 method is 19. The number of iterations of Newton’s algorithm is 4.

Figure 2: Curve of 𝐶(𝛽)/𝛽, where 𝐺(𝑥) is a positive definite quadratic function.

5. Conclusion

By means of self-concordant function theory, the computation of channel capacity, especially, when there are constraints and when constraints are nonlinear, becomes very simple. When the numerator is a general concave function and the denominator is a general convex function, the formula about channel capacity per unit cost is still valid. Furthermore, the function 𝐶(𝛽)/𝛽 is single peak, hence we can get some new algorithms for channel capacity per unit cost.


  1. M. Chiang and S. Boyd, “Geometric programming duals of channel capacity and rate distortion,” IEEE Transactions on Information Theory, vol. 50, no. 2, pp. 245–258, 2004. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  2. A. Ben-Tal and M. Teboulle, “Extension of some results for channel capacity using a generalized information measure,” Applied Mathematics and Optimization, vol. 17, no. 2, pp. 121–132, 1988. View at Google Scholar · View at Scopus
  3. X. B. Liang, “An algebraic, analytic, and algorithmic investigation on the capacity and capacity-achieving input probability distributions of finite-input-finite-output discrete memoryless channels,” IEEE Transactions on Information Theory, vol. 54, no. 3, pp. 1003–1023, 2008. View at Publisher · View at Google Scholar · View at Scopus
  4. S. Arimoto, “Algorithm for computing the capacity of arbitrary discrete memoryless channels,” IEEE Transactions on Information Theory, vol. 18, no. 1, pp. 14–20, 1972. View at Google Scholar · View at Scopus
  5. R. E. Blahut, “Computation of channel capacity and rate- distortion functions,” IEEE Transactions on Information Theory, vol. 18, no. 4, pp. 460–473, 1972. View at Google Scholar · View at Scopus
  6. F. Dupuis, W. Yu, and F. M. J. Willems, “Blahut-Arimoto algorithms for computing channel capacity and rate-distortion with side information,” in Proceedings of IEEE International Symposium on Information Theory, p. 179, July 2004.
  7. Y. Yu, “Squeezing the arimotoblahut algorithm for faster convergence,” IEEE Transactions on Information Theory, vol. 56, no. 7, Article ID 5484972, pp. 3149–3157, 2010. View at Publisher · View at Google Scholar
  8. I. Csiszar and G. Tusnady, “Information geometry and alternating minimization procedures,” Statistics and Decisions, supplement 1, pp. 205–237, 1984. View at Google Scholar
  9. U. Niesen, D. Shah, and G. W. Wornell, “Adaptive alternating minimization algorithms,” IEEE Transactions on Information Theory, vol. 55, no. 3, pp. 1423–1429, 2009. View at Publisher · View at Google Scholar
  10. W. Byrne, “Alternating minimization and Boltzmann machine learning,” IEEE Transactions on Neural Networks, vol. 3, no. 4, pp. 612–620, 1992. View at Publisher · View at Google Scholar · View at PubMed
  11. Y. Wang, J. F. Yang, W. Yin, and Y. Zhang, “A new alternating minimization algorithm for total variation image reconstruction,” SIAM Journal on Imaging Sciences, vol. 1, no. 3, pp. 248–272, 2008. View at Google Scholar
  12. R. W. Yeung, Information Theory and Network Coding, Springer, Berlin, Germany, 2008.
  13. A. S. Khayrallah and D. L. Neuhoff, “Coding for channels with cost constraints,” IEEE Transactions on Information Theory, vol. 42, no. 3, pp. 854–867, 1996. View at Google Scholar
  14. F. Alajaji and N. Whalen, “The capacity-cost function of discrete additive noise channels with and without feedback,” IEEE Transactions on Information Theory, vol. 46, no. 3, pp. 1131–1140, 2000. View at Google Scholar
  15. S. Verdu, “On channel capacity per unit cost,” IEEE Transactions on Information Theory, vol. 36, no. 5, pp. 1019–1030, 1990. View at Publisher · View at Google Scholar · View at MathSciNet
  16. Y. Nesterov and A. Nemirovskii, Interior-Point Polynomial Methods in Convex Programming, Society for Industrial and Applied Mathematics, 1994.
  17. J. Renegar, A Mathematical View of Interior-Point Methods in Convex Optimization, Society for Industrial and Applied Mathematics, 2001.
  18. N. Karmarkar, “A new polynomial-time algorithm for linear programming,” Combinatorica, vol. 4, no. 4, pp. 373–395, 1984. View at Publisher · View at Google Scholar · View at Zentralblatt MATH · View at MathSciNet
  19. S. Boyd and L. Vandenberghe, Convex Optimization, Cambridge University Press, Cambridge, UK, 2004.
  20. R. Lav, “Varshney, Variations On Channel Capacity per Unit Cost,” 2005,
  21. M. Mitzenmacher, “A survey of results for deletion channels and related synchronization channels,” Probability Surveys, vol. 6, pp. 1–33, 2009. View at Google Scholar
  22. M. Kleiner and B. Rimoldi, “On fidelity per unit cost,” in IEEE International Symposium on Information Theory (ISIT '09), pp. 1639–1643, July 2009. View at Publisher · View at Google Scholar
  23. K. A. Kopotun, D. Leviatan, and I. A. Shevchuk, “Convex polynomial approximation in the uniform norm: conclusion,” Canadian Journal of Mathematics, vol. 57, no. 6, pp. 1224–1248, 2005. View at Google Scholar
  24. M. Dudík, S. J. Phillips, and R. E. Schapire, “Maximum entropy density estimation with generalized regularization and an application to species distribution modeling,” Journal of Machine Learning Research, vol. 8, pp. 1217–1260, 2007. View at Google Scholar
  25. A. K. Bera and S. Y. Park, “Optimal portfolio diversification using the maximum entropy principle,” Econometric Reviews, vol. 27, no. 4–6, pp. 484–512, 2008. View at Publisher · View at Google Scholar
  26. J. Darroch and D. Ratcli, “Generlized iterative scaling for log-linear models,” Annals of Mathematical Statistics, vol. 43, no. 5, pp. 1470–1480, 1972. View at Google Scholar
  27. G. Liang, B. Yu, and N. Taft, “Maximum entropy models: convergence rates and applications in dynamic system monitoring,” in Proceedings of IEEE International Symposium on Information Theory, pp. 168–175, July 2004.
  28. K. Nigam, J. Lafferty, and A. Mccallum, “Using maximum entropy for text classification,” in the Workshop on Machine Learning for Information Filtering (IJCAI '99), pp. 61–67, 1999.