Guidelines for implementing a Web TV Service

25 Mar 2013 by David Corvoysier

The HTML5 specification has reached a level of maturity that allows TV services to be delivered in a Web browser. This article provides a set of guidelines to implement a typical TV service using Web Technologies, and gives details about the level of support to be expected for each feature.

The HTML5 video tag allows audio and video files to be rendered directly by the browser, although most implementations will actually delegate most of the multimedia processing to underlying components.

The HTML5 video is supported by all recent desktop and mobile browsers. Please refer to caniuse/video or “The State Of HTML5 Video” for details.

The following paragraphs provides a description of the Media elements features that are the most relevant from a TV service perspective, focussing on the features included in the HTML 5.0 specifications.

Newer specifications are being developed at the time this article is written, and will not be detailed here:

Even though early implementations already exist in Google Chrome, these specifications are not mature yet and it is too early to rely on them to develop a mainstream TV service, unless you are able to control both the user-agent and the server.

Controls

Setting the controls properties of the Media element to true will activate the user-agent native multimedia controls.

It is very likely however that a web TV application will require a level of interaction with the user leading to overriding at least some of the default behaviors assumed by these controls, that would therefore only be useful in debug mode.

Media formats

Due to the lack of consensus on this subject, the HTML5 specification doesn’t mandate any specific audio or video format: it is up to the user-agent (ie the browser) to define which format should be supported, the decision being mainly driven by licensing terms.

As of today, there is still two competing set of Media formats:

  • MP4/H264/AAC
  • WebM/VP8/Vorbis

In the past, there was a clear split between browser vendors, with Apple and Microsoft backing MP4/H264/AAC (for which they have patents) and facing a strong opposition coming from Opera and Firefox, Chrome mostly remaining neutral on the subject.

The situation has evolved a bit, since H264/AAC decoding is often either supported by the underlying hardware (especially on mobile chipsets), or a system-wide multimedia framework (like gstreamer for instance), thus mitigating the licensing issues.

Firefox therefore now supports what they call ‘patents-encumbered’ media formats if they are already available on the system.

In the meantime, VP8 failed to get a real momentum, probably due to its lack of proven improvements towards H264.

As a consequence, the most sensible option today is to choose MP4/H264/AAC as the main (only ?) codec combination for encoding your content, as it has the widest level of support.

Adapting content to the target device

Event if you restrict yourself to a single combination of container and codecs, it is highly recommended to be able to adapt the video content you deliver to the device that will render it.

The HTML5 video tag supports multiple Media sources to be specified for a specific content, and it is up to the browser to select the one that is the most appropriate based on the Media resource selection algorithm.

You can find various encoding recommendations on the web to address multiple devices. This article provides a detailed list of encoding profiles for desktop, mobile and other embedded devices.

The list of encoding you would typically need to support is to be defined on a service basis, but as a rule of thumb, for a generic purpose TV service, supporting at least the three ‘standards’ resolutions is recommended:

  • Low Definition: 480x360
  • Standard Definition: 1280x720
  • High Definition: 1980x1080

Alternative media resources for a single multimedia content are specified using the source element as children of the video element.

The source element has two attributes that are used by the browser to select the appropriate resource:

  • the type attribute defines the Media format of the content,
  • the media attribute can be used by the service to describe the device the resource is intended for using the Media Query syntax.

The type attribute comprises a mandatory MIME type and an optional codecs parameters using the syntax described in RFC4281.

It is recommended to use the codecs parameter to explicitly specify the audio and video codecs of a specific resource.

In the example below, three alternative resources are provided with an increasing level of video complexity (baseline, extended, main):

    <video>
      <source src='video.mp4' 
              type='video/mp4; codecs="avc1.42E01E, mp4a.40.2"'>
      <source src='video.mp4' 
              type='video/mp4; codecs="avc1.58A01E, mp4a.40.2"'>
      <source src='video.mp4' 
              type='video/mp4; codecs="avc1.4D401E, mp4a.40.2"'>
    </video>

Unfortunately, the codecs parameter is limited to the description of the codecs, and cannot be used to describe Media features, such as spatial resolution. It is however possible to work around this limitation using the media attribute.

The media attribute comprises a type parameter followed by several media expressions.

Although several types have been defined in the legacy HTML and CSS specifications, only all, screen and print are actually supported.

It is in particular not advised to use the handheld type to specify that a specific Media resource is intended to be rendered on a mobile device: use instead the widely supported device-width and device-height based expressions:

    <video>
      <source src='video.mp4' 
              type='video/mp4; codecs="avc1.42E01E, mp4a.40.2"'
              media='screen and (max-device-width:480px)'>
      <source src='video.mp4'
              type='video/mp4; codecs="avc1.58A01E, mp4a.40.2"'
              media='screen and (min-device-width:480px) and (max-device-width:1280px)'>
      <source src='video.mp4'
              type='video/mp4; codecs="avc1.4D401E, mp4a.40.2"'
              media='screen and (min-device-width:1280px)'>
    </video>

Content playback

A content can be set to play automatically by setting the autoplay attribute to true.

    <video src="video.mp4" autoplay></video>

Alternatively, the application can call explicitly the play method of the Media element:

    video.play();

Special case: Unlock playback on iOS devices

On iOS devices were the user may be on a cellular network, no data can be fetched from the network until the user initiates it (Please refer to this article for details).

Unlocking the playback can then only be achieved:

  • through the video native controls,
  • by calling the play() method in a user event callback (ie a key, touch or mouse event).

During playback, the user-agent provides feeback to the application by:

  • updating the currentTime attribute,
  • generating timeupdate events at regular intervals (typically 15 to 250ms).

If no specific playback position has been specified, the user-agent will start the playback at the initial playback position defined in the stream.

The application can seek programmatically in the media timeline by setting the currentTime attribute to a new playback position:

    video.currentTime = 10;

Alternatively, a playback position can be specified declaratively using a Media fragment URI:

    <video src="video.mp4#t=10'></video>

For some Media resources, seeking may be limited to only some parts of the content: the user-agent therefore exposes the time ranges to when it is possible to seek through the seekable attribute of the Media element. The start of the first element in the seekable time range represents the earliest possible playback position.

Example: a live broadcast content would always have an earliest playback position that is equal to the current playback position.

Trick-modes are achieved by altering the value of the playbackRate attribute:

  • normal: rate = 1.0
  • pause: rate = 0
  • slow-forward: 0 < rate < 1.0
  • fast-forward: 1.0 < rate
  • slow-rewind: -1.0 <= rate < 0
  • fast-rewind: rate < -1.0

When playing a content backwards, the audio is muted. It may also be muted by the user-agent when playing a content forward at a rate that is not 1.0.

If the earliest playback position is reached when playing backwards, the playback stops.

Buffering strategies

This paragraph assumes that the reader is familiar with the Media element state machine.

When presenting multimedia content for playback, a TV application must find the best buffering strategy to address two contradictory user expectations:

  • to be able to watch the content as soon as possible (“low latency”),
  • to be able to watch the content without any interruption (“play through”).

In addition, the TV application will usually want to optimize network bandwidth and memory consumption, typically by avoiding unnecessary downloads.

Control buffering before playback

By default, the user-agent will apply an automatic strategy to aggressively preload a content as soon as a valid source has been identified for the presented media.

The HTML5 Media element however exposes a preload attribute to allow the web application to define the amount of data that can safely be preloaded by the user-agent before the content playback is explicitly started.

Note: The value of the preload attribute is ignored when a content is in autoplay.

By setting the preload attribute to none, the application can prevent the user-agent from downloading any data before the Media element is explicitly requested to play the content.

Setting the preload attribute to metadata will tell the user-agent to download only the amount of data required to identify the content duration and dimensions.

Reverting the preload attribute to its default auto value will tell the user-agent to aggressively preload the content, as if the content was about to be played.

Note: in terms of buffering, setting preload to auto before requesting a content to play is equivalent to calling the play method directly.

As a rule of thumb:

  • all contents should be inserted by default with preload explicitly set to none,
  • the contents that are likely to be played should have preload set to metadata,

The preload attribute can also be used once playback has started, as explained in the next paragraph.

Control buffering during playback

During playback, the preload attribute allows the web application to control how much data is being buffered in advance:

  • setting preload to auto allows the user-agent to aggressively download content, up to having it entirely stored in memory,
  • setting preload to metadata tells the user-agent to limit its internal buffers to the amount of data required to play the content without interruption.

The amount of data buffered can be queried using the buffered attribute, allowing the application to dynamically adjust preload for a finer grained-control over the buffering policy.

A application willing to limit buffering during playback would typically:

  • start with preload = auto,
  • on download progress, set preload = metadata if we are above the buffering threshold,
  • on playback timeupdate events, set preload = auto if we are below the buffering threshold.

See example code below:

    function getBufferedRange(video) {
        var i = video.buffered.length - 1;
        while((i>0) && 
              (video.buffered.start(i)>video.currentTime) {
            i--;
        }
        return (video.buffered.end(i) - video.currentTime);
    }
    video.ontimeupdate = function (e) {
        if (getBufferedRange(video) < THRESHOLD){
            video.preload = 'auto';            
        }
    }
    video.onprogress = function (e) {
        if (getBufferedRange(video) >= THRESHOLD){
            video.preload = 'metadata';            
        }
    }

Avoid interruptions in playback

The default behaviour of a Media player in ‘autoplay’ mode is to wait until enough data to be able to play the content through has been retrieved before starting rendering a content on the screen.

Using the same terminology as the HTML5 specification: in autoplay mode, the playback doesn’t start until the HAVE_ENOUGH_DATA state has been reached.

There is therefore no specific configuration to apply to achieve that behaviour but to set the autoplay attribute to true.

    <video src='video.mp4' autoplay></video>

An alternative would be to listen to the canplaythrough event and call the play method explicitly.

    video.addEventListener('canplaythrough',
	function (e) {
		play();
	},false);

Minimize latency

The playback of a content cannot start before the Media player has received enough data to decode at least a few frames.

Using the same terminology as the HTML5 specification: the playback cannot start before the HAVE_FUTURE_DATA state has been reached.

In order to start the playback of a content as soon as possible, a web application can detect the transition to the HAVE_FUTURE_DATA state by listening to the canplay event, and call the play explicitly:

    video.addEventListener('canplay',
	function (e) {
		play();
	},false);

In-band tracks

The HTML5 Media element supports multiple inband tracks for a specific media content: for example, in addition to the primary video and audio tracks, a media resource could have foreign-language dubbed dialogues, director’s commentaries, audio descriptions, alternative angles, or sign-language overlays.

In-band media tracks

Inband media tracks are exposed through the audioTracks and videoTracks attributes, and would become available as soon as the Media player has reached the HAVE_META_DATA state.

To select programmatically a specific media track, an application would thus typically listen to the loadedmetadata event and select the relevant track from the track lists.

Each media track is identified by the following parameters:

  • id : typically mapped to the format used in the media container,
  • kind : particularly relevant are ‘main’ and ‘captions’,
  • label : to be presented to the user,
  • language.

A single video track can be active at a given time: the currently active video track can be set programmatically using the selected attribute:

    video.onloadedmetadata = function (e) {
        for(i=0;i<video.videoTracks.length;i++) {
            if (video.videoTracks[i].kind == 'alternative') {
                video.videoTracks[i].selected = true;
            }
        }
    } 

Alternatively, it can be selected declaratively using a Media fragment URI in the form track=label:

    <video src="myvideo#track=Alternative"></video> 

Multiple audio tracks can be active at the same time: in that case, their audio will be mixed.

A specific audio track can be selected programmatically by setting its enabled attribute to true.

The example below illustrates how a single audio language can be selected programmatically:

    video.onloadedmetadata = function (e) {
        for(i=0;i<video.audioTracks.length;i++) {
            if ((video.audioTracks[i].kind == 'main') 
             && (video.audioTracks[i].language == 'fr')) {
                var current = video.audioTracks.selectedIndex;
                video.audioTracks[current].enabled = false;
                video.audioTracks[i].enabled = true;
            }
        }
    } 

Inband media tracks are however today not supported by any desktop browser, and even worse, the corresponding bindings into WebKit and Gecko are yet to be implemented.

In-band text tracks

Inband text tracks are exposed through the textTracks attribute of the media element, and would become available as soon as the Media player has reached the HAVE_META_DATA state.

Each Text track is composed of a list cues that represent individual pieces of timed metadata. If a text track is active (see below), the user-agent will generate an cue event every time it reaches a point in the timeline that corresponds to a cue.

A specific text track can have three different states that are controlled by its mode attribute:

  • disabled: the track is simply ignored by the user-agent,
  • hidden: the track is active, the user-agent generates events for the track cues, but nothing is displayed on screen,
  • showing: the track is active, the user-agent generates events for the track cues, the cues are displayed on screen (if the track is a subtitle track).

The example below illustrates how to activate and display a french subtitles text track as an overlay to the video content:

    for(var j=0; j < tracks.length;j++){
        track = video.textTracks[j];
        if(track.kind === "subtitles"){
            if(track.language == 'fr'){
                track.mode = 'showing';
            }else{
                track.mode = 'disabled';
            }
        }
    }

Inband text tracks are supported only by Safari on desktop.

Out-of-band text tracks

In addition to in-band media tracks, the HTML5 Media element supports out-of-band text tracks that can be used to complement a media with subtitles, audio descriptions, chapters or any kind of metadata to be synchronized with the multimedia content.

Out-of-band Text Tracks can be specified declaratively as children of a media element using the track element:

    <video src="sintel.mp4">
      <track kind="subtitles" 
             label="English subtitles" 
             src="sintel_en.vtt" srclang="en" default></track>
      <track kind="subtitles"
             label="Sous-titres français"
             src="sintel_fr.vtt" srclang="fr"></track>
    </video>

Alternatively, they can be built entirely from javascript using the addTextTrack method. Please refer to the specification for details.

Once registered, the out-of-bands text tracks are available like any other test track through the textTracks attribute of the Media element.

Out-of-band text tracks are supported by Safari and Google Chrome on desktop. Both will however only display subtitles using the WebVTT format.

For a more detailed introduction to Out-of-band Text Tracks, please refer to this article.

comments powered by Disqus