Abstract

One of the most important challenges for next generation all-IP networks is the convergence and interaction of wireless and wired networks in a smooth and efficient manner. This challenge will need to be faced if broadcast transmission networks are to converge with IP infrastructure. The 2nd generation of DVB standards supports the Generic Stream, allowing the direct transmission of IP-based content using the Generic Stream Encapsulation (GSE), in addition to the native Transport Stream (TS). However, the current signalling framework is based on MPEG-2 Tables that rely upon the TS. This paper examines the feasibility of providing a GSE signalling framework, eliminating the need for the TS. The requirements and potential benefits of this new approach are described. It reviews prospective methods that may be suitable for network discovery and selection and analyses different options for the transport and syntax of this signalling metadata. It is anticipated that the design of a GSE-only signalling system will enable DVB networks to function as a part of the Internet.

1. Introduction

The first generation of DVB standards [13] uses a time-division transmission multiplexing method derived directly from the Moving Pictures Expert Group-2 Transport Stream (MPEG-2 TS) standards [4]. The MPEG-2 specifications define the Program Specific Information (PSI), a Table-based signalling system that is multiplexed with the content and allows a receiver to identify MPEG-2 Programs and to demultiplex their Program Elements from the TS. These Tables are segmented in Sections and directly encapsulated into MPEG-2 TS packets, as shown in Figure 1. The Digital Video Broadcasting (DVB) project specified additional types of Table, DVB-Service Information (SI) [5] while the Advanced Television System Committee (ATSC) also defined a set of Tables for the US market [6].

Current signalling metadata relies on this TS packet format [4]. The 2nd generation of DVB systems, DVB-S2/C2/T2 [79], preserved this signalling framework utilising MPEG-2 encoded Tables. Some transmission systems use IP-based Service Discovery and Selection (SD&S) procedures to obtain content metadata, for example, acquisition of an Electronic Service Guide (ESG), the network signalling necessary for the initial bootstrapping is sent using MPEG-2 encoded Tables, for example, in DVB-Handheld (DVB-H) and DVB-Satellite to Handheld (DVB-SH) systems [10, 11].

SD&S is a generic term that has been used to describe various discovery and selection procedures for, mainly, IP-based content metadata. The term Network Discovery and Selection (ND&S) is defined in this paper to describe the discovery and selection of network signalling metadata such as the acquisition of PSI/SI.

Figure 2 illustrates the process of ND&S and SD&S for an example transmission system. ND&S procedures are separated into two logical parts: network discovery starts by acquiring the network bootstrap information from a well-known link stream, followed by selection of the required network service.

First, when a multiplex has been identified at a receiver, the receiver will need to perform a bootstrap of the signalling system, network bootstrapping. The same transmission multiplex may carry bootstrap information for more than one network, if the multiplex supports multiple logical networks. Bootstrap information could also relate to other network services transmitted over other multiplexes, possibly using different transmission technologies. Once the bootstrap has been performed, the receiver has the basic information required to discover the signalling stream—that is, the logical flow of signalling information relating to a network service from which it wishes to receive content. The receiver then needs to set a filter that extracts the appropriate signalling from the multiplex selected based on the required network service. The receiver can, then, be used to perform address/service resolution, identifying the required elementary streams or an IP stream that can be used to locate content.

In an all-IP system, the content may be directly accessed, or may be accessed via a content guide such as an ESG. The content guide may be discovered from content bootstrap information provided in a well-known IP stream. More than one network may reference the same content stream as in the case of duplicate multicast content. Similarly, more than one content guide may be active within a network service and the content bootstrap can then be used to select the appropriate content guide. DVB-H and DVB-SH systems follow this two-stage procedure once the PSI/SI information has been extracted.

Current broadcast transmission networks using the TS format can provide platforms for high-speed unidirectional IP transmission, not just for TV-based services. A convergent IP-oriented architecture will ease integration of transmission systems and enable development of multi-network service delivery platforms. The benefits of a DVB IP-based signalling architecture are discussed in Section 3.

The remainder of the paper is divided as follows: a brief description of the current DVB signalling is given in Section 2, GSE suitability, the envisaged IP/GSE signalling framework, and its potential benefits are discussed in Section 3. The requirements of the GSE-only signalling architecture will be identified in Section 4. Then, the different areas that comprise the GSE-only signalling system are analysed, and methods that may address them are discussed in Section 5. Finally, conclusions and future work are stated in Section 6.

2. Current Signalling Framework

In current DVB systems, key-signalling information is sent, in the Programme Association Table (PAT) of PSI, using its well-known 13-bit Packet Identifier (PID) value in the TS packet header. This allows a receiver to readily extract this PID from a received TS multiplex. Figure 3 shows a schematic diagram of this PID acquisition procedure.

In many cases, equipment has hardware support to filter PID values, initially, set to well-known values defined by the MPEG standard, that is, the fixed PIDs of the PAT, the Conditional Access Table (CAT), and the Transport Stream Descriptor Table (TSDT). Once the PAT has been received and the respective PID of the Network Information Table (NIT) has been extracted (step 1 in Figure 3), the receiver filters this PID, accesses the NIT and re-tunes, if necessary (step 2). Next, the terminal can access the appropriate PAT from where the PIDs of the MPEG-2 Programs’ Program Map Table (PMT) can be found (step 3). The receiver can, then, setup filters to receive other PIDs to acquire a full set of relevant signalling information. The receiver can acquire Audio/Video (A/V) Program Elements through their PIDs, which are also advertised in the PMT (step 4). The PID of the Forward Link Signalling (FLS) [12] is also advertised in the PMT.

As shown in Figure 1, alongside these signalling Tables directly encapsulated into TS packets, A/V and data services are adapted to the TS using adaptation protocols such as Packetised Elementary Stream (PES) and Multiprotocol Encapsulation (MPE). If required, data can be placed directly in TS packets using the Unidirectional Lightweight Encapsulation (ULE) [13] protocol.

3. An All-IP Second Generation Transmission Network

The 2nd generation of DVB standards [79] foresees the possibility of converged IP-based transmission that supports both broadcast applications and broadband access service by adopting a common IP-based infrastructure. This converged network would bridge the gap between broadcast transmission and traditional networks.

To support a converged approach, the 2nd generation of DVB transmission standards introduced the Generic Stream (GS) in addition to the TS. The GS may be used to carry packets of different sizes, eliminating the TS packet format. The GS is, primarily, expected to be used for network services, where IP packets and other network-layer protocols can be efficiently encapsulated using the Generic Stream Encapsulation, GSE, protocol [14, 15].

GSE provides a network-oriented adaptation layer. Each network layer Protocol Data Unit (PDU) is prefixed by a GSE header, which is shown in Figure 4. GSE supports flexible fragmentation, adapting the encapsulated data to a range of possible physical-layer frame sizes. GSE offers a higher encapsulation efficiency (2%–5% better than the TS counterpart when padding is used for data packets [14]). In addition, GSE is extensible, which allows implementing additional features through its extension headers [16], for example, security, header compression, and timestamps. The base header, present in every encapsulated packet, is 4 Bytes. The additional fields, present only in some packets, are shown shadowed in Figure 4.

While GSE defines the adaptation needed to support data transmission, there is no current specification for a signalling system that could replace the MPEG-2 TS signalling by a system using IP over GSE.

A transition to an IP-based content and signalling will enable common use of IP delivery techniques at the receiver, presenting new opportunities for integrating broadcast content with standard IP applications, and the introduction of value-added services. An IP-based transmission network design also enables the use of data networks (e.g., using wired/wireless Ethernet or mobile platforms) for onward delivery to the TV receiver. An IP-based approach allows reuse of existing techniques and protocol machinery (for configuration, management, accounting, encryption, authentication, etc.). This can support evolution of the services and be used to manage the network and monitor performance.

IP-based transmission products are already available for TV contribution networks and digital satellite news gathering. For example, IP satellite news gathering can significantly benefit from the improved efficiency of DVB-S2 while also utilising standards-based IP-based media codecs.

Broadcast transmission can supplement existing wireless infrastructure where sufficient capacity is not available, provide a resilient alternate path, or be used to roll out new services. Broadcast networks are especially suited to services that can exploit cost-efficient wide-area delivery using IP multicast.

This paper proposes a framework based on the reference model shown in Figure 5, where we replace TS-L2 (in Figure 1) by the adaptation layer, GS-L2. The signalling metadata is placed at the application layer level, GS-L5, while IP at GS-L3 allows convergence with the Internet. ND&S procedures refer to an IP signalling system associating IP addresses and services with a stream and a specific transmission multiplex. The MPEG-2 TS format is also included to support legacy services.

One simple solution is to encapsulate TS packets in GSE through its TS-Concat extension header [16]. This format allows one or more TS packets to be sent within one GSE packet by combining the group of TS packets with a 4B GSE base header (Figure 4). For a single TS Packet this additional overhead is less than 2%.

Encapsulating the current TS-packed Tables into GSE packets could be an attractive transition method while both TS and GS multiplexes are in use. However, it is likely to constrain the evolution towards an all-IP network and it does not provide an efficient way to transmit PSI/SI Tables. The total overhead would consist of GSE and TS packet headers, and the TS packet padding. For example, if a 30B Table were sent in one TS Packet, the overhead comprised by the TS headers (5B), padding (153B), and the GSE base header (4B) would be 162B, if a Label field (Figure 4) is not used. The impact of overheads on the system efficiency should be considered (preliminary overhead analysis of different encapsulation methods is provided in Section 5.5.2).

However, this is only a partial solution. If signalling were transported using GSE packets instead of TS packets, there would not be a direct equivalent to the PID filters used for TS, that is, GSE does not contain a PID field. Thus, the receiver will need to identify which physical-layer frames or GSE packets carry the required network signalling information. Potential procedures to recognise GSE packets conveying signalling are proposed and discussed in this paper.

4. GSE-Only Framework Requirements

This section derives a set of requirements for transition to a GSE-only signalling framework.

4.1. IP Interoperability

The signalling system needs to support the IP protocol stack, as the envisaged system depicted in Figure 5. It must be able to coexist with and provide metadata for IP-based protocols such as the Real-time Transport Protocol (RTP) [17] or the File Delivery over Unidirectional Transport (FLUTE) [18]. As DVB networks become an integral part of the Internet, the use of IP network signalling will allow all-IP delivery of services such as IPTV. Importantly, other supporting functions (including network management and related content) may utilise well-known IP-based tools, which may potentially reduce the cost of development and operation.

4.2. Coexistence with MPEG-2 TS Services

During transition, there is a need to allow the exchange of TS signalling information over a GS transmission network. Various options exist that may enable this transport, including the transmission of MPEG-2 Sections over UDP/IP using GSE encapsulation, or a direct mapping between MPEG-2 SI/PSI and GSE, for example, using the GSE TS-Concat extension [16]. In considering the need for coexistence, the cost of translation and the additional cost (if any) of transmission must be analysed.

4.3. Similar or Higher Efficiency as Current TS Signalling

The overhead arising from the protocols used in the envisaged IP-based signalling framework of Figure 5 must be mitigated. Although the signalling traffic for typical MPEG-2 TS PSI/SI use-cases typically contributes a small fraction of the total available bandwidth, the performance of the system needs to be evaluated and compared to the efficiency achieved by current MPEG-2 TS SI/PSI systems. Methods must be examined to reduce the additional transmission overhead, such as header compression (e.g., techniques based on Robust Header Compression (ROHC) [19, 20]) or the use of link mechanisms, such as the GSE PDU-Concat extension [16]. This extension allows several IP packets to be delivered to the same destination (GS-L2 address) using a single GSE packet, up to the maximum GSE payload length of 64000B. For example, if ten IP packets were sent in a single GSE packet, this would save 35% of the GS-L2 overhead.

4.4. Signalling Security

When desired, signalling may be secured in an all-IP solution. The security requirements can be different for discovery functions (where all receivers may initially need access during bootstrap), and for individual signalling streams (which may be authorised to specific groups of users). Security of the signalling stream may be provided using a GSE security extension [16]. Alternatively, or in combination, the signalling information may be directly protected by authentication and encryption of the metadata.

4.5. Enabling Service Discovery and Service Description Metadata

The new signalling system should enable a receiver to perform a “network scan” to discover the network and content, equivalent to the current PAT functionality. That is, it would allow a receiver to determine which networks and what content is available by decoding the GS without a priori information. The network discovery methods should identify the multiplex and resolve to a Network Point of Attachment/Medium Access Control (NPA/MAC) address at the GSE level. Supporting a “network scan” will place requirements on the repetition rates of the network signalling stream.

4.6. Providing Easy Identification of Signalling in GSE Streams

A receiver must quickly and efficiently identify the GSE packets carrying network signalling information within the GS. This is needed to provide fast service acquisition and may help in changing to a different service (e.g., to provide fast acquisition of signalling information when zapping between channels). The chosen mechanisms also need to ensure this procedure is not processing intensive at the receiver.

4.7. Quality of Service (QoS) and Timing Reconstruction

The delivery requirements for network signalling need to be considered. It is assumed that packet loss due to link corruption may be disregarded, since in most cases the physical-layer waveform will provide a quasi-error-free service using a combination of physical-layer parameters and Forward Error Correction (FEC) coding (e.g., a certain ModCod in DVB-S2). The repetition of signalling also improves robustness and allows fast PSI/SI bootstrap acquisition. The description syntax should allow easily inclusion of QoS descriptors for the network service. A/V timing needs to be synchronised, requiring mechanisms equivalent to the Program Clock Reference/Network Clock Reference (PCR/NCR), for example, using RTP timestamps. The GSE timestamp extension header [16] does not provide the required resolution for synchronisation, since it was designed to support functions with less stringent timing accuracy, such as monitoring and management operations.

4.8. Extensible Syntax

The network signalling metadata syntax should provide a “user-friendly” description to facilitate modification, extension and/or enhancement of the signalling to support new formats and methods from a network/content provider. It should also enable easy addition of new signalling schemes that may be needed to support new applications and new services (requiring new descriptors or Tables).

4.9. Separation of Network and Content Signalling

Network and content signalling should be organised and sent independently from each other, so that a receiver can acquire network signalling faster than that of its content counterpart. This can also permit a receiver to acquire appropriate signalling without the need to parse the entire GS. That is, the identification of GSE packets carrying network signalling should not involve the filtering of all frames at levels GS-L1 or GS-L2. In addition, the method to achieve this separation should be applicable to any DVB standard, allowing sending network signalling with the same technique over any DVB physical frame, making it bearer-agnostic.

4.10. Requirements for ND&S and/or SD&S

Together these requirements may be used to derive a new signalling framework. Requirements 4.2, 4.3, 4.6 and 4.9 involve network discovery procedures, while requirement 4.1 includes network selection and SD&S techniques. For a better understanding, Table 1 identifies requirements applicable to network discovery, network selection and service discovery and selection.

5. GSE-Only Signalling Framework

This section analyses methods to provide GSE signalling identification and ND&S procedures. The signalling transport protocol and signalling syntax are also studied to identify which may be suitable for a GSE-only signalling framework and may meet the requirements stated in the previous section. Some methods are already used for IP-based signalling of content metadata, however, all current DVB systems use network signalling based on MPEG-2 encoded Tables.

5.1. GSE Signalling Identification

Since there are no PIDs in a GSE-only signalling architecture, the first step towards this framework is to provide ND&S by filtering of signalling information at the GS-L1 or GS-L2 layers, to identify which GSE packets convey signalling information. Procedures for identification of packets carrying signalling metadata are needed to minimise receiver processing. Appropriate techniques can also assist in meeting the requirements for separation of network and content signalling.

A range of techniques is available, as presented in Table 2 and described in detail below. This includes use of fields in the frame header and the allocation of protocol codepoints. Some of these procedures may be jointly used, for example, methods 5.1.4 and 5.1.5. The methods are organised by increasing amounts of information that would need to be parsed by a receiver joining the network. The final solution should preserve flexibility to use different higher layer protocols, introduce security when required, and provide flexibility to optimise the overhead (e.g., use of header compression).

5.1.1. Assignment of a Dedicated Transmission Stream

It is possible to reserve entire transmission frames at the physical-layer for use by a separate signalling stream. This stream could be identified by a physical-layer identifier, for example, a well-known Input Stream Identifier (ISI) value in DVB-S2/T2. A receiver performing a bootstrap may skip all frames with a different ISI, reducing the receiver information processing load. However, this method could reduce overall system efficiency when the frame size is large. Receivers need to be setup to process more than one ISI, this approach is being tested in some present systems.

5.1.2. Assignment of Fields in the Physical Frame Header

Rather than dedicate a specific channel to signalling, the control information in the physical-layer header may be extended to carry network signalling information. This approach resembles the use of the Fast Information Channel (FIC) in ATSC Mobile Digital Television systems [21]. The FIC channel provides a network bootstrap method that is specified outside the normal frame payload, and hence is independent of the data channel carrying Reed-Solomon (RS) FEC frames, shown in Figure 6. Its data unit is the FIC-Chunk, which provides the binding information between the Mobile/Handheld (M/H) services and the M/H ensembles. A M/H ensemble is a set of consecutive RS frames with the same FEC coding. Information such as the ensemble ID, Tables carried by the ensemble, the number of services carried and the service ID is carried by the FIC-Chunk. This approach is an optimisation of the physical-layer, which may be processed independently of the content. This enables fast tuning and simpler processing at the receiver.

The DVB physical-layer specifications do not provide an equivalent physical-layer signalling channel, although the DVB-S2/T2 frame headers currently have unallocated bits. A single bit in these frames could signal if a GSE packet conveying signalling is present in the frame, otherwise a receiver seeking signalling may ignore the frame. Additional bits could be used, if appropriate and available, to help define the type of signalling, for example, bootstrap or network services signalling. This would require an update to the present DVB transmission standards.

5.1.3. Alignment of Signalling Transmission to a Time-Slicing Frame

Time-slicing is a well-known method used for power-saving in DVB-H and DVB-SH systems. This technique could be applied to signalling to allow a receiver to know which frames may contain signalling information and allow a receiver to skip processing of frames that are known to not contain signalling PDUs. Timeslicing information (i.e., prior knowledge of times when signalling data is to be sent) would allow a synchronised receiver to disregard a proportion of physical-layer frames. Such an approach may be desirable for mobile applications, and could be extended to all signalling messages in any new system.

5.1.4. Placement of a GSE Packet at a Known Position in a Frame

Transmission frames are typically long compared to the PDUs (signalling or data) that they carry. Processing of a frame that may contain one or more signalling PDUs could be simplified if the signalling information was inserted at a known position within a frame. The flexibility in the fragmentation algorithm of GSE would allow signalling packets to always be placed at the start of the frame payload. Although a receiver would need to inspect all frames, it may then skip any remaining payload after finding the first GSE packet in the S2/T2 frame that does not contain signalling information. This method does not require any change to the present physical-layer or GSE standards.

5.1.5. Allocation of a Dedicated GSE Type Field Value

A receiver needs a simple way to demultiplex GSE signalling packets from data packets. One option is to use the GSE Type field. This may be performed in two ways:

(a) Assign a Well-Known Mandatory Type Field [13, 14].
A mandatory Type field directly precedes the GSE PDU (Figure 4). One mandatory Type is required for each type of signalling information, for example, if IPv4 or IPv6 is used, the version of IP will need to be signalled, or if header compression is used. GSE-level encryption would prevent visibility of a mandatory Type field prior to decryption.

(b) Assign a Well-Known Optional Type Field.
An optional Type field [13] is a separate tag inserted after the GSE base header. In this case, the tag would be used to indicate that the encapsulated PDU carries signalling data. The original Type field would also be present (indicating the version of IP, use of encryption, etc.), so operation would resemble the Router Alert option in the IP header. This is the simplest method, but will add 2B of overhead to the GSE header.

The Internet Assigned Numbers Authority (IANA) assigns Mandatory and Optional Type values. The Institute of Electrical and Electronics Engineers (IEEE) also register EtherTypes that can be used as mandatory Type values.

5.1.6. Allocation of a Dedicated Label/NPA or IP Address

The demultiplexing of signalling from data packets may be aided by using well-known values of other protocol fields. Two methods have been identified at the GSE and IP levels that may assist in this process:

(a) Assignment of a Well-Known Multicast NPA Address.
It is attractive to use well-known L2 addresses for bootstrapping, for example, an IANA DVB multicast IP address that maps to a MAC/NPA address [22], but this has limited use for discovery. After bootstrap, a receiver may move to one of several network services, and it may be natural to assign different address to each service. If this method were used to identify signalling, this would prevent suppression of the NPA address/label field in GSE. This suggests there can be no single address binding that applies to all scenarios.

(b) Assignment of a Well-Known IP Address.
This method would allow suppression of the NPA/MAC address but requires an IP packet format. Filtering using the IP address is not recommended, since it would preclude the use of link header compression or encryption of the GSE packet payload (since all packets would have to be decompressed and/or decrypted before filtering). The system would also increase complexity when other GSE extensions are present (e.g., timestamps). Well-known IP multicast destination addresses are used in many IP bootstrap procedures, and when present, these would normally result in a mapping to well-known MAC addresses [22].

5.1.7. Assignment of a Well-Known UDP Port

This method requires an IP packet format, as in Section 5.1.6, and deep packet inspection (i.e., parsing of the IP and transport headers). It is not compatible with header compression and with other extension headers. This would not be recommended since the receiver would have to process all packets at GS and IP level to finally filter those conveying signalling at the transport layer level.

5.2. Network Discovery and Selection

This section proposes a two-stage approach which could be used for ND&S, in common with other IP-based systems to provide content discovery and selection. Once the GSE packets carrying signalling metadata are filtered at the GS-L1 or GS-L2 layers, a bootstrap will be performed to select the appropriate network signalling information. The network signalling information can then be used to select the required network service. The procedures below are based on IP satisfying the requirement for IP interoperability when enabling service discovery.

A bootstrap method eliminates the need to manually enter a bootstrap entry point, for example, the need to configure IP/NPA addresses out of band or using device configuration. Instead the device only has to be configured with the logical name for the network to which it is attached.

The format of network bootstrap information could be a Table structure that maps logical names to appropriate discovery entry points, that is, IP addresses where the discovery information can be found. Such a Table may be equivalent to the IP/MAC Notification Table (INT) used by DVB-H systems to signal the availability and location of IP streams. Another format could use a multicast Domain Name Server Service (mDNS SRV) record [23] to specify the network service discovery entry points, similar to the procedure recommended for DVB-IPTV [24]. SRV records convey information about the service, such as the transport protocol used, its priority and the IP address of the server providing the service.

For broadcast networks, the bootstrap could be sent using a well-known IP multicast address. This approach is similar to that for DVB service discovery (dvbservdsc) information, that is, dvbservdsc information is provided, by default, on the IANA-registered well-known dvbservdsc multicast address of 224.0.23.14 for IPv4 and FF0X:0:0:0:0:0:0:12D for IPv6, and on the IANA-registered well-known dvbservdsc port 3937 via TCP and UDP [25].

For bidirectional networks, ND&S entry point addresses may be found through the following three options: the Simple Service Discovery Protocol (SSDP) over UDP, SRV records via DNS over UDP or SRV records via DHCP option 15 over UDP. SSDP, defined by Microsoft and Hewlett-Packard, is specified as the Universal Plug and Play (UPnP) discovery protocol [26]. It uses part of the header field format of HTTP1.1. Since it is only partially based on HTTP1.1, it is carried by UDP instead of TCP. A drawback is that SSDP is a proprietary standard. SRV records via DNS or via DHCP option 15 are SD&S procedures (for content metadata) recommended by DVB-IPTV [24] and also used by the Open IPTV Forum (OIPF) framework [27] as well as for the signalling of DVB interactive applications [28]. The methods described for bidirectional networks, are not suited to unidirectional broadcast since they rely on the existence of a return channel. A unidirectional solution applicable to both scenarios, broadcast and interactive, is desirable.

5.3. Signalling Transport Protocol

Selection of a transport protocol for the signalling metadata needs to take into consideration the requirements (similar efficiency than that of TS signalling) and characteristics (high repetition rates) of the metadata.

For unicast scenarios with bidirectional connectivity, HTTP over TCP is a commonly chosen method for unicast content metadata transport since it is used by DVB-IPTV, DVB-H, DVB-SH and OIPF architectures.

A/V data is transmitted over RTP via UDP/IP in DVB-IPTV and DVB-H systems. RTP with an extension header [17] carrying timestamps could be used for synchronisation, as the equivalent to PCR/NCR. Signalling metadata could potentially be sent in a new defined payload format for RTP. RTP can open up a set of media-related services, such as source identification, packet loss measure, jitter control, and reliability techniques. The extension header of RTP may also provide means of performing discovery, although it would also add an overhead of 12 or 16B per Section.

The DVB SD&S Transport Protocol (DVBSTP) [24] over UDP has been specified for reliable multicast SD&S content metadata delivery in architectures compliant with DVB-IPTV [24] and OIPF [27]. It transports eXtensible Markup Language (XML) [29] records and defines the type of payload carried through its Payload ID field (e.g., Content on Demand, Broadcast discovery information). A Compression field indicates the type of compression encoding, if any. A DVBSTP header adds an overhead of at least 12B per Section. The redundancy for network signalling may not be needed when Tables are transmitted at high repetition rates. Since the DVBSTP header provides signalling identification though its Payload ID field, it would allow the receiver to determine whether a signalling stream contains replicated metadata that has been already received or metadata that the receiver does not wish to receive.

The FLUTE [18] protocol has been used for content guide transport over UDP in DVB-H, DVB-SH and ATSC Mobile DTV [21]. FLUTE builds on the Asynchronous Layered Coding (ALC) specification to provide scalable, unidirectional, multicast distribution of objects. ALC/FLUTE was also recommended for the design of a new transport protocol for the delivery of Internet Media Guides (IMGs) by the IETF Multiparty Multimedia Session Control (MMUSIC) group, when seeking to provide a format for content metadata over the Internet [30].

Since the requirements for transport of network signalling metadata differ from those for content metadata, the transport protocols listed above may not be suitable. For example, ALC/FLUTE offers support for FEC-based reliability although this may increase processing overhead and is not required when data is repeated frequently. It also increases transmission cost. DVBSTP adds an overhead of at least 12B and provide reliability (also not required). DVBSTP does provide an indication of the type of XML-record carried through its 1B Payload ID field and the type of compression used through a 3-bit Compression field. These features, together with the ability to determine if content (Table) is encrypted before processing the payload are attractive for a transport protocol. Further work is needed to determine whether the overhead is justified and whether this choice of transport can be efficiently combined with the metadata encoding to optimise overall performance, or whether a new alternate lightweight protocol is preferable.

5.4. Signalling Syntax

This section reviews a set of candidate methods for representing the metadata. It discusses existing SI/PSI, the Session Description Protocol (SDP) [31] and SDP with negotiation capabilities (SDPng) [32], and finally use of XML [29].

5.4.1. Direct Encapsulation of PSI/SI

PSI, SI and FLS syntax has been standardised in MPEG-2 [4, 5, 12] and were outlined in Section 2. Even though this Table-based format is expected to continue for backwards compatibility, it is desirable the transition to a more flexible syntax to allow extensibility and evolution of signalling. Any new method should support MPEG-2 PSI/SI to satisfy the requirement for coexistence.

5.4.2. SDP and SDPng

The IETF MMUSIC group standardised SDP [31] for multimedia session description over IP. SDP defines a format for session description to announce sessions and their parameters to prospective receivers; it does not specify a transport protocol. In bidirectional networks, SDP is commonly transported using the Session Initiation Protocol (SIP), specified in RFC 3261, or the Real Time Streaming Protocol (RTSP), specified in RFC 2326. In a multicast IP network, RFC 2974 specifies how SDP may be transported over the Session Announcement Protocol (SAP) using a set of well-known multicast addresses.

The ESG in DVB-H, the OIPF framework and multicast sessions in ATSC use SDP records. Even though SDP is an IP-level method, it does not provide link-specific information to identify a network service or physical-layer tuning parameters for the transmission multiplex (e.g., frequency, transmission mode, and ISI). Hence, it would need to be extended to be suitable for network signalling.

The IETF started to develop an updated SDP protocol, SDPng. This was intended to address the lack of negotiation capabilities in SDP by providing alternatives for session parameter configurations. That is, an IP host would be able to negotiate session parameters according to its system capabilities. Proposals for SDPng used the XML syntax, Document Type Definitions (DTDs) and Schemas, to allow extensibility. It was one candidate method to convey session parameters for an IMG [30]. However, work on SDPng has not continued since 2003, with no specifications defined, and therefore is not applicable for a GSE-only signalling framework.

5.4.3. XML

The eXtensible Markup Language, XML [29], has been standardised by the Worldwide Web Consortium, W3C. It is now a common syntax for network control information and content metadata. The DVB-IPTV, DVB-H, DVB-SH, OIPF, UPnP and ATSC systems define XML Schemas, while DVB interactive application metadata is defined as XML DTDs [28]. XML Schemas have been developed to make it easier to create and enhance the encoded information and are preferred over DTDs, for example, XML Schemas defined for DVB-IPTV can also be used in the OIPF framework. In contrast to DTDs, XML Schemas provide support for namespaces, can constrain data based on common data types, and present object oriented features such as type derivation. In addition, XML allows encryption, which could be used to provide signalling security.

A Uniform Resource Name (URN) namespace has been defined for naming resources defined within DVB standards by [25]. DVB specifies XML Schemas and DTDs, namespaces and other types of resource [33]. XML network signalling, in parallel with classical SI/PSI Tables may be used in interactive DVB applications for hybrid broadcast/broadband environments [28].

In a GSE-only signalling framework, metadata syntax could be converted to XML. A simple, but effective method could retain the segmentation of the PSI/SI Tables, since the Section mechanism is an important element of the PSI/SI structure, to allow easy access to parts of the Table. In the XML encoding, the PID may be substituted by the IP destination address and UDP port number, similar to the approach proposed in [34]. This substitution also allows reuse of Tables after the PIDs are mapped to these IP addresses/ports within the PSI/SI.

Since encoding the signalling metadata in XML significantly increases the information rate (due to its inherent verbosity), this will decrease bandwidth efficiency. However, XML data may be readily compressed, for example, two compression algorithms are recommended for DVB-H content metadata: GZIP [35] and BiM [36]. The GZIP format uses the deflate algorithm (RFC 1952). This combines an index (dictionary) approach together with Huffman compression. GZIP is effective on streams with recurring patterns of data, especially when used with large data sets. The ISO MPEG-7 group defined the Binary MPEG format for XML, BiM, as an alternative to text representation. BiM was proposed for TV-Anytime content metadata and afterwards recommended for DVB-IPTV and DVB-H ESG.

BiM compression can reduce the transmission cost up to 60% of the MPEG-2 encoded PSI/SI Tables [34]. However, this adds complexity to the system, since XML Schemas are needed at the receiver to decompress the encoded Sections.

GZIP presents lower complexity than BiM, since no Schemas are needed before decompression at the receiver. This makes it attractive for handheld terminals to minise the processing requirements. However, its compression gain is typically much lower than for BIM; PSI/SI Sections converted into XML and compressed with GZIP can increase the overall volume of data by 30% compared to the original binary encoded size [34]. Section 5.5.2 provides some example comparison of overhead. As in many compression technologies patents need to be considered. Patents have been already registered for a tool for BiM compression.

Other XML compression algorithms are in the process of being developed. One is the Efficient XML Interchange (EXI) by W3C [37]. EXI not only achieves higher compression gains than GZIP, but also presents a lower decoding complexity since Schemas are not necessarily needed at the receiver when performing a network scan.

5.5. GSE/IP Signalling System Prospective Methods

Table 3 presents a summary of the prospective methods described in this section. For completeness, it includes methods currently specified for bidirectional links, which are indicated by an asterisk. While techniques may be combined, it is recommended that at least one technique is used at GS-L1 to identify signalling, in order to reduce processing requirements at the receiver.

Table 4 relates these methods to the requirements identified in Section 4. Overall, the processing cost of decoding at the receiver is important when analysing the use of any of the potential methods listed in Tables 3 and 4, in particular those for identification of signalling in GSE streams. We suggest using the extensible syntax XML for network service description metadata. XML encryption and compression would enable signalling security and are expected to result in similar bandwidth efficiency to that of TS, respectively.

5.5.1. Encapsulation

Several network signalling encapsulation options exist for a GSE-only system.(1)The GSE TS-Concat extension [16] may be used to enable the coexistence with TS services but increases overhead above the current MPEG-2 TS by adding GSE headers. A method is, however, needed to relate the metadata to the IP address.(2)To reduce overhead, the SI Table may be directly encapsulated as a PDU in the GSE payload. Since a Section should not be larger than 1024B [5], a GSE payload may be able to carry more than one Section. A method is however need to relate the metadata to the IP address.(3)Network metadata may be encapsulated as UDP datagram’s over IP, similar to the current encapsulation performed in DVB-H systems where ESG XML records are sent over FLUTE via UDP/IP. Recent techniques for IP/UDP header compression, such as ROHC [19, 20], may in future further reduce IP overhead.(4)The PDU-Concat extension [16] can improve system efficiency when transmitting small IP packets by combining several in a single GSE payload, subject to the maximum payload length of 64000B.

5.5.2. Overhead Analysis

This section compares the transmission cost for sending network signalling. The overhead respect to the Table size resulting from the candidate techniques is shown in Table 5. Three Tables sizes were analysed: one comprising a small section of 30B, a second with a 1024B Section and a final Table comprising four 1024B Sections. The overhead was calculated for a set of DVB-S2/T2 frame sizes (data field lengths, DFL). In DVB-S2, network signalling is expected to be sent using the most robust ModCod supported in the network to reduce the probability of loss and to allow signalling acquisition in all channel conditions. Hence, this will typically result in frames with a small DFL.

The methods were compared to native transmission of MPEG-2 encoded Sections using the TS, as in current DVB signalling. Padding is added to each TS packet, as necessary. Table 5 shows that this padding results in an overhead for a 30B Table that is more than fifty times higher than the corresponding value for the 1024 and 4096B Tables. Section overhead, only considering the TS headers, is 16.6% for the 30B Table and 2.4% for the 1024 and 4096B Tables, but since a whole TS packet needs to be sent, the overhead becomes 526 and 10.1%, respectively, as shown in Table 5.

Section 5.1.1 proposed a method in which the TS packets were sent on a separate stream using a dedicated stream (ISI value). The fixed size of the frame results in a significant overhead given that there is insufficient signalling data to fill the frame. It is assumed that the overall transmission rate allocated to signalling does not result in any empty frames (although the burst-nature of signalling data may be hard to match to a fixed transmission rate).

Encapsulation of TS packets in GSE, as a transition method, is also analysed. GSE TS-Concat is considered for Tables with multiple Sections. As expected, the overhead is higher than that for native TS transmission. This is also higher than the IP-based encapsulation methods.

The next set of methods considers IP-based protocols and XML-translated Sections. Each Section is encapsulated by DVBSTP/UDP/IP or FLUTE/ALC/UDP/IP, where the additional headers contribute 40B. The GSE PDU-Concat extension is used for the Table with four Sections. It is assumed that signalling identification is carried at GS-L1 (e.g., optional GSE Type fields are no considered).

The overhead for the medium and large Tables represents a trade-off with the benefits provided by an IP-based signalling system. Small Tables negatively impact the efficiency of an IP-based signalling framework regardless of the encapsulation technique and frame size, however the overhead is always lower than that of native MPEG-2 TS. This overhead is further reduced when header compression (HC) is considered. Estimates of the compressed size using either GZIP or BiM algorithms are provided. This assumes that BiM compression of the XML-encoded Section results in a reduction of 40% with respect to the size of the MPEG-2 encoded Section [34]. In contrast, applying GZIP to a XML-encoded Section results in an increase of 30% with respect to the size of the MPEG-2 encoded Section [34]. Despite this, using XML with GZIP results in less than half the overhead of the native TS method for the 1024 and 4096B Tables.

DVBSTP and FLUTE resulted in the same overhead. Although, DVBSTP was designed to ease processing of SD&S XML records at the receiver, it results in significant overhead for small PDUs. The 12B of overhead introduced above the UDP layer is seen as an upper bound. This overhead could be reduced further by design of a lightweight transport protocol to replace the DVBSTP header, or by combined optimisation of content-encoding and transport protocol.

The UDP/IP headers are assumed to be compressed to 3B when using a form of header compression, although no method has currently been specified for use with DVB. The use of header compression for signalling should be analysed further given the positive effect on reducing the overhead.

6. Conclusions and Future Work

The convergence of DVB networks with IP infrastructure bridges the gap between broadcast transmission and traditional networks. Current MPEG-2 systems are already used to transmit IP packets, mostly using MPE or ULE, it is expected that future DVB transmission networks adopt an all-IP approach by gradually replacing the TS by the GS. Transition to IP-based content and signalling will enable common use of IP delivery techniques at the receiver, presenting new opportunities for integrating broadcast content with standard IP applications, and the introduction of value-added services.

One major challenge to transitioning broadcast services to the GS is the lack of a GSE-only signalling framework. IP-based procedures for content metadata exist in DVB systems, but current signalling is implemented through MPEG-2 TS Tables. This paper explains the need for a GSE-only signalling framework and formulates a set of requirements, reviews a range of candidate methods, including current IP-based methods and has derived their potential benefits.

The proposed methods can identify GSE packets carrying signalling and replace the role of PIDs. In addition, current IP-based methods may be used as prospective techniques for ND&S procedures and for the signalling syntax. Options were also presented for a signalling transport protocol. Methods for encoding metadata that allow extensibility and easy modification were examined. XML Schemas are strong candidates because of their extensibility characteristics and current common use for content metadata. Indicative performance data is used to compare the anticipated overhead for the various approaches.

This work is intended to guide and inform future standardisation work. As future work, we intend to select the optimal candidate methods and propose a GSE-only signalling architecture. The high-level requirements in terms of signalling for different scenarios, for example, fixed broadcast, interactive, will be also defined, as well as the specification for mapping current SI/PSI/FLS MPEG-2 encoded Tables to their XML-based counterparts.

Acknowledgment

The authors acknowledge the support of the European Space Agency (ESA) Contract 22471/09/NL/AD.