Abstract

Although cross-layer has been thought as one of the most effective and efficient ways for multimedia communications over wireless networks and a plethora of research has been done in this area, there is still lacking of a rigorous mathematical model to gain in-depth understanding of cross-layer design tradeoffs, spanning from application layer to physical layer. As a result, many existing cross-layer designs enhance the performance of certain layers at the price of either introducing side effects to the overall system performance or violating the syntax and semantics of the layered network architecture. Therefore, lacking of a rigorous theoretical study makes existing cross-layer designs rely on heuristic approaches which are unable to guarantee sound results efficiently and consistently. In this paper, we attempt to fill this gap and develop a new methodological foundation for cross-layer design in wireless multimedia communications. We first introduce a delay-distortion-driven cross-layer optimization framework which can be solved as a large-scale dynamic programming problem. Then, we present new approximate dynamic programming based on significance measure and sensitivity analysis for high-dimensional nonlinear cross-layer optimization in support of real-time multimedia applications. The major contribution of this paper is to present the first rigorous theoretical modeling for integrated cross-layer control and optimization in wireless multimedia communications, providing design insights into multimedia communications over current wireless networks and throwing light on design optimization of the next-generation wireless multimedia systems and networks.

1. Introduction

In recent years, ubiquitous computing devices such as laptop computers, PDAs, smart phones, automotive computing devices, and wearable computers have been ever growing in popularity and capability, and people have begun more heavily to rely on these ubiquitous computing devices. Therefore, there has been a strong user demand for bringing multimedia streaming to the devices such as iTunes, PPLive, MSN, and YouTube. However, bringing delay-sensitive and loss-tolerant multimedia services based on the current wireless Internet is a very challenging task due to the fact that the original design goal of the Internet is to offer simple delay-insensitive loss-sensitive data services with little QoS consideration. Therefore, this shift of design goal urges us to rethink the current Internet architecture and develop a new design methodology for multimedia communications over the current and future wireless Internet. So far, cross-layer design has been thought as one of the most effective and efficient ways to provide quality of service (QoS) over wireless networks, and it has been receiving many research efforts. The basic idea of cross-layer design is to fully utilize the interactions among design variables (system parameters) residing in different network functional entities (network layers) to achieve the optimal design performance of time-varying wireless networks.

In order to achieve the global optimality of cross-layer design, we need to consider design variables and the interactions among them as much as possible. However, more does not necessarily mean better. The more design variables we consider, the more difficult is orchestrating a large number of design variables to make them work harmonically and synergetically. From the point of view of nonlinear optimization, the number of design variables increases and the size of state space of the objective function will increase exponentially, making the optimization problem unmanageable. To overcome this problem, one often used approach is to reduce the size of the problem at the system modeling phase and then solve the simplified problem by using various optimization algorithms such as gradient-based local search, linear/nonlinear programming, genetic algorithm, exhaustive search, and heuristic-based approach like artificial neural networks.

However, reducing a high-dimensional cross-layer optimization problem to a low-dimensional problem in the system modeling phase raises a series of questions:(1)how to evaluate the fidelity of the simplified problem compared with the problem as what it should be,(2)how to evaluate the quality of the suboptimal solution to the global optimum,(3)how to evaluate the robustness of the solution, that is, whether the solution can guarantee the predictable sound results at all possible circumstances. Unfortunately, at the time of this writing, we have no clear answers to all these three questions.

Moreover, reducing the size of the problem in the problem formulation means that only part of the current Internet architecture can be considered, causing a shift of the design goal of multimedia services from the best user experience to some layer-specific performance metrics such as distortion at the application layer, delay at the network layer, and goodput at the MAC/PHY layer. This shift of design goal may cause an “Ellsberg paradox,” where each individual design variable makes good decisions for maximizing the objective function. But the overall outcome violates the expected utility function. In other words, breaking a big problem into several smaller problems in the system modeling phase can only increase the solvability of the original problem but cannot guarantee that it is a good solution. The “Ellsberg paradox” also tells us that the traditional additive measure such as probability measure may no longer hold in the context of cross-layer design due to the possible strong coupling (interdependency) among design variables. At the point of this writing, there have been many researches done on interdependency modeling in the context of cross-layer design, but they are mostly qualitative rather than quantitative approaches, and their applications are still within the scope of local cross-layer optimization.

We argue that all aforementioned difficulties in the area of cross-layer design of wireless multimedia communications are due to lacking of methodological foundation and in-depth understanding of cross-layer behavior. Our goal is to provide a flexible yet scalable theoretical cross-layer framework to accommodate all major design variables of interest, spanning from application layer to physical layer, for delay-bounded multimedia communications over wireless single/multihop networks. We start from proposing an integrated cross-layer framework for the best user experience. Although the engineering side of cross-layer design is not the main focus of this paper, we still briefly discuss how to utilize the methodological foundation to achieve real-time multimedia communications through a fast algorithm for large-scale global cross-layer optimization based on quantitative significance measure and sensitivity analysis.

The rest of the paper is organized as follows. We briefly introduce the related work in Section 2. In Section 3, we present a unified theoretical cross-layer framework for wireless multimedia communications based on link adaptation, rate-distortion theory, and dynamic programming. A further discussion of how to apply the proposed methodological foundation for real-time applications is made in Section 4, where new feature-based approximate dynamic programming is introduced, followed by the conclusion in Section 5.

In literature, topics involving video delivery over multihop networks such as video coding, multihop routing, QoS provisioning, link adaptation are separately studied. Therefore, the corresponding video compression efficiency and the transmission efficiency are also separately optimized. In prediction mode, selection of video coding, periodic intracoding of whole frames [1], continuous blocks [2], or random blocks [3] has been firstly proposed. These methods apply intracoding uniformly to all the regions of the frame. Then, “content-adaptive" methods are proposed to apply frequent intra-update to regions that undergo significant changes [4], or where a rough estimate of decoder error exceeds a given threshold [5, 6]. A significant advance over the above early heuristic mode switching strategies is the rate-distortion (RD) optimized mode selection. The RD optimized mode selection is achieved by choosing a mode that minimizes the quantization distortion between the original frame/macroblock and the reconstructed one under a given bit budget [7, 8]. However, the encoders in [7, 8] have no capability to accurately estimate the overall distortion. So, the selected prediction mode is not necessarily optimal. The work in [9] proposes an algorithm to optimally estimate the overall distortion of decoder frame reconstruction due to quantization, error propagation, and error concealment. The accurate estimate is integrated into a rate-distortion-based framework for optimal switching between intracoding and intercoding modes per macroblock. However, the joint optimization between mode selection and video transmission parameters under wireless environment is not addressed in [9]. The work in [10] presents an end-to-end approach to solve the fundamental problem of RD optimized mode selection over packet-switched networks, but it only aims at Internet peer-to-peer video communication.

In routing for video delivery in multihop networks, an application-centric cross-layer approach has been proposed to formulate an optimal routing problem for multiple description video communications [11]. Physical and MAC layer dynamics of wireless links are translated into network layer parameters. The application layer performance, that is, average video distortion, is considered as the function of network layer performance metrics, for example, bandwidth, loss, and path correlation. But the routing metric, that is, average video distortion, is roughly computed from a simple rate-distortion model without discussion on selection of source coding parameters. The same problem goes to [12] and [13] even though optimal paths are selected optimizing the quality under various constraints. In addition, in [13] exhaustive algorithm is adopted for the determination of the cross-layer optimized mesh-network path selection, which may incur heavy computational load and make it unpractical for real applications.

Cross-layer optimized wireless video has been studied from different aspects, such as cross-layer architecture [14, 15], content analysis [1618], video compression and RD optimization [24, 6, 7, 8, 9, 10, 19, 20, 21], source packetization [22, 23], QoS provisioning [2426], application-centric routing [1113, 27], queueing and scheduling [2831], energy efficiency [32, 33], and link adaptation [34, 35]. To reach a global optimality at the level of frame or video sequence rather than at the level of packet, we need to evaluate the overall distortion and the effect of packet pipelining in a network on the total delay of a frame or a video sequence. To the best of our knowledge, although some works focus on the cross-layer design for video delivery over multihop wireless networks, there is still no substantial work that can reach such kinds of global optimality.

In wireless video, optimization has to be done over multiple source coding units, such as frames and pixel blocks, for the best reconstructed video quality. There is “only one exact method for solving problems of optimization over time; in the general case of nonlinearities with random disturbance, it is dynamic programming (DP)" [36]. However, the biggest challenge of applying dynamic programming in practical large-scale problems is curse of dimensionality [37], where the size of state space normally increases exponentially with the number of control variables increasing. Therefore, the most sensible way is to map a huge state space to a much smaller feature space (), which is called approximate dynamic programming (ADP), also known as neurodynamic programming, adaptive dynamic programming, adaptive critics, or reinforced learning, depending on in which discipline the technique is used [36, 38, 39].

Existing ADP approaches have largely ignored the interdependencies among control variables, which might lead to loose approximation error bounds. Nonadditive measure theory was developed to characterize the interactions among control variables [4043], and it has been widely used in various areas. Choquet integral [44] is regarded as the most effective and efficient way to calculate nonadditive measure and has received a significant amount of research [4548]. Since nonadditive measure is defined on the power set, fast algorithms [49] have been studied to speed up the calculation process. However, current research on nonadditive measure still focuses on static linear systems with commensurable data [50].

3. A Theoretical Cross-Layer Framework for Wireless Multimedia Communications

3.1. Problem Statement

In the protocol stack of multimedia over wireless networks, each layer has one or multiple key system parameters which would significantly impact the overall system performance. At the application layer, tradeoff between rate and distortion is an inherent feature of every lossy compression scheme for video source coding. Prediction mode and quantization level are two critical parameters. At the network layer, routing algorithm is important to find the best delivery path over a single/multihop wireless network. At the data link layer, hybrid automatic repeat request (HARQ), media access control protocols, and packetization are often used to maintain a low packet loss rate. However, the choice of maximum retransmission number is a tradeoff between resultant packet delay and packet loss rate. Note that for real-time multimedia applications, we might not consider HARQ due to strict delay constraints. At the physical layer, adaptive modulation and coding scheme is an important tradeoff between transmission rate and packet loss rate. Furthermore, the end-to-end performance is not completely determined by the parameters of individual layer, but rather by all parameters of all layers. For example, the end-to-end delay consists of propagation delay (determined by the number of hops of the selected path), transmission delay (determined by channel conditions, modulation and channel coding, maximum retransmission number, and source rate), and queueing delay (determined by source rate, transmission rate, and the selected path). Moreover, due to the time-varying nature of wireless channels, each node in the network should be capable of adjusting these parameters quickly to maintain a good instantaneous performance. Clearly, the layer-separated design no longer guarantees an optimal end-to-end performance for multimedia delivery over wireless networks.

3.2. Methodology

We develop a cross-layer framework to optimize multimedia communications over single/multihop wireless networks. In order to demonstrate the main idea of the proposed framework as shown in Figure 1, at the application layer, we implement our framework based on the ITU-T H.264 standard. The rate-distortion tradeoff in video source coding makes it very critical to select suitable video coding parameters such as prediction mode (PM) and quantization parameter (QP). Without losing generality, we consider a multihop wireless network scenario in which all nodes can act as either a source or destination as well as a router for other nodes. To carry out end-to-end delay-bounded multimedia communications, at the network layer, we assume that certain routing protocols are used to come up with the routing table. Then, a quality-aware routing algorithm needs to be developed to select the best multihop path from the source to the destination. Each hop adopts adaptive modulation and coding (AMC) at the physical layer to overcome the adverse effects caused by the time-varying channel condition.

Let us denote by the number of frames of a video clip, , and let be the macroblocks of frame . Since each frame is processed in units of macroblock (corresponding to 16*16 pixels in the original frame), let denote the coding parameter vector of macroblock in frame as quantization parameter (QP) and prediction mode choice (I or P frame). Let denote the consumed bits in coding the macroblock with the coding parameter vector ; then the total bits consumed by the frame can be expressed as .

We assume that the considered multihop network consists of nodes . For any two nodes and , if can directly communicate with , we say that there exists a hop between and . Let denote the hop between the node and the node . Considering the time-varying nature of the network, let , denote all the connectivity information within the network when transmitting the frame . Accurate can be obtained from certain routing protocols such as OLSR routing protocol. Let be a path for transmitting frame from the source node to the destination node. Clearly, there exist and . Let us denote and as the channel SNR and transmission rate of the link with , and as the modulation mode and associated channel coding rate, and as the number of retransmissions. Then, the delay in transmitting the frame on the link can be written as . Clearly, the total delay in transmitting the whole video clip can be expressed byLet denote the reconstructed th frame at the receiver side. Using the mean square error as distortion metric, the overall expected distortion for the whole video clip is

Note that in this work, of can be calculated by any distortion estimation method such as the mean square error (MSE) estimation method and the recursive optimal per-pixel estimate (ROPE) method. Likewise, any error concealment schemes can be used at the receiver side to further enhance the perceivable video quality. Since the formulation discussed above considers consecutive video frames, the spatial-temporal correlation among frames and macroblocks has been taken into account in the global optimization framework.

Thus, the proposed cross-layer framework for wireless multimedia communications can be formulated as where is a predefined delay budget for delivering the given video clip.

Recall that the focus of the proposed framework is to jointly find the optimal parameter set for each frame , including the source coding , the delivery path , the maximum number of retransmissions , and the modulation with the associate coding . Here, is the index of each hop on the path . Clearly, the optimal solution for the problem described by (3) can be written as with the delay constraint

Clearly, in (4) we assume that the decoder side has a sufficient size buffer to hold part of the decoded video frames, say, a group of pictures. Given the dramatically fast growing silicon performance and the decreasing size and cost for the memory and silicon, the assumption is reasonable for most scenarios. But when the size of decoder buffer is constrained, (4) would be rewritten as follows: with the delay constraintwhere represents each of the frames, which has a delay constraint. Clearly, (6) does add difficulties on top of (4), although a number of constraints are included to eliminate some valid solutions for the original problem.

Note that the unique feature of (4) is that it is essentially a convex function, which has been shown in large amount of research done on rate-distortion relationship under the context of multimedia processing and transmissions. In other words, there always exists a global optimality of this formulation. This is a very important conclusion, since other existing global cross-layer optimization frameworks focusing on network QoS or using decomposition approach cannot guarantee the convexity of all decomposed subproblems. For a given multimedia application, the global optimization problem described in (4) turns into a constrained nonlinear optimization problem, which can be solved by Lagrangian multipliers (LMs) and Lagrangian relaxation (LR) [51]. So, we can use the derived Lagrangian cost function as the unified cost function. In this work, the cost function is the average distortion over the given video clip .

For the global optimality of system performance, we need to optimize current control action over time ; in other words, current control action needs to be chosen with considerations of future cost . For example, the end user will evaluate the perceivable video quality based on the overall quality of the whole video clip rather than the quality of each individual video frame. Therefore, the cost function for optimization over time based on (4) isHere, is the state at time , and the value is introduced to capture the future cost (i.e., ) at time incurred as a result of taking the control action at time .

So far, there is only one exact method for global optimization over time with nonlinearities and random disturbances [36], which is dynamic programming (DP). DP provides methods for choosing a value function to derive an optimal policy . There has been a plenty of research on how to use DP-based algorithms for multimedia processing and transmission. In order to use DP to find the global optimality of (4), a unified cost-to-go function has to be constructed:

Then, the global optimization problem turns into calculating the cost-to-go function , which is the overall cost to be incurred in the finite horizon of steps.

3.3. Numerical Results

We have evaluated the performance of the proposed integrated cross-layer framework through extensive simulations based on H.264 JM12.2 codec. In general, we are interested in comparing our integrated cross-layer design with the best possible results of H.264 codec. Our goal here is to illustrate the difference of performance gain between the global optimality achieved by the proposed framework and the superposition of multiple local optimality done separately at different network layer (s). In this paper, the best baseline performance is derived: (1) at the application layer, it uses the rate control scheme of H.264 codec; (2) at the network layer, it always chooses the path with the best average SNR at each hop; (3) at the MAC and PHY layers, it always chooses the AMC scheme for the shortest delay while keeping the predefined PER performance.

From the simulation results, up to 3 dB PSNR gain can be achieved by using the proposed approach compared with using the existing piecemeal approach, as shown in Figure 2.

Remark 1. We have proposed a top-down theoretical cross-layer framework for multimedia over wireless networks, and the correctness of the proposed methodology is based on its rigorous theoretical foundation. Moreover, the proposed methodology is based on dynamic programming, which means that it is very flexible and scalable; any interaction of interest in the system can be easily integrated into the proposed framework. Since we consider all the major interactions of interest spanning from application layer to physical layer, we have overcome the major drawback of existing cross-layer designs where the simplification occurs at the system modeling phase rather than the problem solving phase. Therefore, the proposed methodology provides the true global optimality and a new design guidance to the cross-layer design for multimedia over various wireless networks.

4. Further Discussion

In this section, we will further discuss how to apply the aforementioned global optimization framework for real-time multimedia communications as formulated in (4). This is not only practically important but also theoretically interesting.

4.1. Problem Statement

So far, we have presented a new theoretical framework for cross-layer design of multimedia communications over wireless networks, which provides a sound methodological foundation for us to evaluate cross-layer designs using dynamic programming (DP) which has been widely adopted to study sequential decision-making problems (stochastic control). However, the practical applications of dynamic programming are limited mostly due to the dual curses of dimensionality and uncertainty, that is, the large size of underlying state space of the cost-to-go function which is a function of the current state for evaluating the expected future cost to be incurred. The “curse of dimensionality" means that the computational complexity of the cross-layer design can be increased exponentially when the number of considered design variables increases. The “curse of uncertainty" (modeling) indicates the fact that in a complex networking system there exist various uncertainties making it very difficult to know the explicit system model and/or states. Generally speaking, uncertainties can be classified into two categories: measurement uncertainties and model uncertainties. Under the context of cross-layer design, measurement uncertainties are mainly caused by randomness in data collecting process such as inaccurate channel feedback, while model uncertainties are mainly caused by various approximations made in system modeling process such as approximations made on channel quality, traffic load, node mobility, number of users, and user behaviors. For cross-layer design, uncertainties existing in interdependency among design variables may cause severe performance degradation. Therefore, the “dual curses” make cross-layer optimization a very challenging problem.

4.2. Methodology
4.2.1. Feature-Based Approximate Dynamic Programming

The most sensible and rational way to deal with the difficulty caused by “dual curses” is to generate a compact parametric representation (compact representation, for brevity) to approximate the cost-to-go function for a significant complexity reduction through mapping the huge state space to a much smaller feature space characterized by a compact representation.

Currently, the selection of a compact representation largely relies on heuristics which somewhat contradicts the nonheuristic aspects of the dynamic programming methodology. Therefore, we propose a new method based on nonadditive measure theory, which can dynamically generate compact representations of the huge state space. Unlike other nonlinear feature-extraction approaches such as artificial neural network, the proposed method is adaptive and nonheuristic in the sense that it allows us to quantitatively characterize the significance or the desirability of state vectors with considerations of interactions among different state variables. Therefore, new feature-based approximate dynamic programming can be developed based on the adaptive feature extraction and compact representation.

We consider a large-scale dynamic programming problem defined on a finite state space . Let denote the cardinality of ; thus we have , and , where is the number of control actions for the parameter . Our goal is to quantitatively characterize the significance effect of parameters on the cost-to-go function .

4.2.2. Feature-Based Compact Representation

In the context of dynamic programming, the cost-to-go vector is defined as a vector whose components are the cost-to-go values of various states. The cost-to-go function specifies the mapping from states to cost-to-go values. Therefore, the optimal cost-to-go vector of policy with initial state is defined byand the policy at state is defined by The dynamic programming problem is to seek the optimal policy to achieve

In large-scale dynamic programming problems, the size of state space normally increases exponentially with the number of state variables, making it extremely difficult to compute and store each component of the cost-to-go function. Therefore, the most sensible way is to map a huge state space to a much smaller feature space (). Formally, a compact representation can be described as a scheme for recording a high-dimensional cost-to-go vector using a lower-dimensional parameter vector . So, if we can obtain an approximation of to , we may still generate a near optimal control policy but with significant computational acceleration satisfying

In the context of approximate dynamic programming, we would like to see that when approaches , is getting close to . Therefore, a compact representation can be described as a mapping of to associated with a cost-to-go vector. Each component of of the mapping is the th component of a cost-to-go vector represented by the parameter vector .

Formally, a feature is defined as a function from the state space into a finite set of feature values. In stochastic multistage decision processes, we might need several features, , forming a feature vector for each state . The feature vector indicates the desirability or significance of the associated state . Therefore, for a feature-based compact representation, the component of can be written as .

For approximate dynamic programming using feature-based compact representation, the approximate cost-to-go function is where is defined as an approximation architecture with , meaning that will only cover the most significant finite region of . In order to achieve the best quality of approximation, it would be highly desirable to have effective and efficient parameter-selection and feature-extraction algorithms. Unfortunately, the existing feature-extraction and parameter-selection algorithms are mainly based on heuristics such as Q-learning and neural network, but those methods lack for sound engineering judgement.

4.2.3. Feature Extraction and Parameter Selection Based on Significance Measure

Feature extraction requires us to catch the “dominant nonlinearities" in the optimal cost-to-go function . Then, based on the extracted features, the parameter vectors can be determined and so can be .

In our preliminary study [53], a new method for feature extraction, called significance measure, has been proposed based on nonadditive measure theory [40]. The unique feature of significance measure is that the nonlinear interactions among state variables on the cost-to-go function can be quantitatively measured by solving a generalized nonlinear Choquet integral. As shown in our preliminary study [53], the feature-based approximation can be expressed aswhere is state variable, is the cost-to-go function, and is observation of state variable. The impact of interactions among state variables on the cost-to-go function is described by a set function defined on the power set of state variables satisfying the condition of vanishing at the empty set, that is, with . The set function is called nonadditive measure [40]. There has been a lot of research done to find the optimal by solving the nonlinear integral equation such as Choquet integral [48, 54] based on a set of observation data. An advantage of the proposed significance measure method described above is that it only needs system operation data (simulations), which can be easily acquired from the device drivers. Therefore, it is fairly efficient in terms of computation and storage. Significant measure and sensitivity analysis.

Once, we determine the significance measure of state variables corresponding to different parameter sets. Then, the parameter set with the largest can be directly used for parameter selection. Furthermore, the value of each parameter set can be interpreted as feature, since it reflects the parameter significance towards the cost-to-go function. We can choose the parameter set having the largest value of to be the compact representation of the high-dimensional cost-to-go vector . Therefore, various approximate dynamic programming approaches using feature-based compact representation can utilize the new method for compact representation, feature extraction, and parameter selection. For example, if we adopt feature-based look-up table approximate dynamic programming architecture, the approximated cost-to-go function is , or we can use if using linear approximate architecture.

4.3. Numerical Results

As discussed earlier, based on the significance measure and sensitivity analysis, we can derive a new method for feature extraction and compact representation for approximating the original large-scale dynamic programming. Using the same problem setting as of Figure 1, a simple example to illustrate the basic idea of the proposed approach is devised. First, an operational data set in the format of [QP, Path, AMC, Value of cost-to-go function] has been collected by uniformly sampling the dynamic programming state space. Then, the significance measure algorithm, as presented in [53, 55] was applied to the collected data. The derived significance measure of control variables and their interdependencies can be derived as shown in Tabl @ IV-B3, where columns 1–3 represent significance measure of control variables, where (column 1) indicates the significant impact of each subset of control variables () (column 2) on the cost-to-go function (column 3) based on the collected measurements. The original three-dimension () DP problem can be approximated by a two-dimension () ADP problem. Columns 4-9 represent MSE distortion of DP versus ADP versus the best baseline under different frame delay budgets (T), where three ADP values are corresponding to adopt different fixed paths (, , or ) in the approximation.

In this simulation, based on the significance measure, the interaction between QP and AMC has the most significant impact on the cost-to-go function, meaning that “path" is not as significant as the other variables. So, it could be excluded from the optimal search. This way, the cardinality of the approximated state space can be reduced by three times. Compared with the global optimal performance, the maximum approximation error caused by excluding path from the DP search is , corresponding to the shortest delay budget; however, in this case, the result of ADP-based solution still outperforms the best baseline H.264 performance by .

Remark 2. In this section, we propose a new method for feature extraction and compact representation of approximate dynamic programming, which is based on the significance measure of each set of design variables. We discuss a novel feature-based approximate dynamic programming approach for solving the large-scale dynamic programming problem in support of real-time multimedia applications. Furthermore, since all the significant measures of a power set of design variables are available, a scalable complexity framework by exploring the tradeoff between the quality of approximation (QoA) and the quality of service (QoS) could be developed in future. Note that the proposed significance measure method and the feature-based approximate dynamic programming approach are fairly generic and are applicable for any large-scale design optimization and real-time control scenarios.

5. Conclusion

The major challenges of current cross-layer design for multimedia communications over wireless networks are (1) lacking of understanding of cross-layer behaviors, (2) simplifying cross-layer design at the system modeling phase, and (3) relying on heuristic approaches. We argue that all these challenges are caused by lacking of a new methodology for cross-layer design of multimedia communications over wireless networks. This has motivated us to propose a new methodological foundation for cross-layer design of multimedia communications over wireless networks, which has made two major contributions to the research area: (1) the theoretical framework with major design variables spanning from application layer to physical layer for cross-layer design of multimedia communications over wireless networks, and (2) the novel feature-based approximate dynamic programming approach based on a new significance measure method to understand cross-layer behaviors and speed up large-scale cross-layer optimization. The proposed methodological foundation is fairly general and can be applicable to other applications in multimedia communications. However, we are not trying to solve all the problems in this paper; rather, we are trying to look into this challenging problem from a different angle and open up a new research direction for future studies in the field of wireless multimedia communications. We believe that the proposed methodological foundation will significantly contribute to the emerging research areas such as service- and application-oriented QoS provisioning in the future Internet.