The HTML5 specification has reached a level of maturity that allows TV services to be delivered in a Web browser. This article provides a set of guidelines to implement a typical TV service using Web Technologies, and gives details about the level of support to be expected for each feature.
The HTML5 video tag allows audio and video files to be rendered directly by the browser, although most implementations will actually delegate most of the multimedia processing to underlying components.
The following paragraphs provides a description of the Media elements features that are the most relevant from a TV service perspective, focussing on the features included in the HTML 5.0 specifications.
Newer specifications are being developed at the time this article is written, and will not be detailed here:
- Encrypted Media Extensions: adds support for Digital Rights Management (very controversial)
Even though early implementations already exist in Google Chrome, these specifications are not mature yet and it is too early to rely on them to develop a mainstream TV service, unless you are able to control both the user-agent and the server.
controls properties of the Media element to true will activate the user-agent native multimedia controls.
It is very likely however that a web TV application will require a level of interaction with the user leading to overriding at least some of the default behaviors assumed by these controls, that would therefore only be useful in debug mode.
Due to the lack of consensus on this subject, the HTML5 specification doesn’t mandate any specific audio or video format: it is up to the user-agent (ie the browser) to define which format should be supported, the decision being mainly driven by licensing terms.
As of today, there is still two competing set of Media formats:
In the past, there was a clear split between browser vendors, with Apple and Microsoft backing MP4/H264/AAC (for which they have patents) and facing a strong opposition coming from Opera and Firefox, Chrome mostly remaining neutral on the subject.
The situation has evolved a bit, since H264/AAC decoding is often either supported by the underlying hardware (especially on mobile chipsets), or a system-wide multimedia framework (like gstreamer for instance), thus mitigating the licensing issues.
Firefox therefore now supports what they call ‘patents-encumbered’ media formats if they are already available on the system.
In the meantime, VP8 failed to get a real momentum, probably due to its lack of proven improvements towards H264.
As a consequence, the most sensible option today is to choose MP4/H264/AAC as the main (only ?) codec combination for encoding your content, as it has the widest level of support.
Adapting content to the target device
Event if you restrict yourself to a single combination of container and codecs, it is highly recommended to be able to adapt the video content you deliver to the device that will render it.
The HTML5 video tag supports multiple Media sources to be specified for a specific content, and it is up to the browser to select the one that is the most appropriate based on the Media resource selection algorithm.
You can find various encoding recommendations on the web to address multiple devices. This article provides a detailed list of encoding profiles for desktop, mobile and other embedded devices.
The list of encoding you would typically need to support is to be defined on a service basis, but as a rule of thumb, for a generic purpose TV service, supporting at least the three ‘standards’ resolutions is recommended:
- Low Definition: 480x360
- Standard Definition: 1280x720
- High Definition: 1980x1080
Alternative media resources for a single multimedia content are specified
source element as children of the
source element has two attributes that are used by the
browser to select the appropriate resource:
typeattribute defines the Media format of the content,
mediaattribute can be used by the service to describe the device the resource is intended for using the Media Query syntax.
type attribute comprises a mandatory MIME type and an
codecs parameters using the syntax described in
It is recommended to use the
codecs parameter to explicitly
specify the audio and video codecs of a specific resource.
In the example below, three alternative resources are provided with an increasing level of video complexity (baseline, extended, main):
codecs parameter is limited to the
description of the codecs, and cannot be used to describe Media features,
such as spatial resolution. It is however possible to work around this
limitation using the
media attribute comprises a
followed by several media expressions.
Although several types have been defined in the legacy HTML and CSS
It is in particular not advised to use the
handheld type to
specify that a specific Media resource is intended to be rendered on a
mobile device: use instead the widely supported
device-height based expressions:
A content can be set to play automatically by setting the
autoplay attribute to
Alternatively, the application can call explicitly the
play method of the Media element:
Special case: Unlock playback on iOS devices
On iOS devices were the user may be on a cellular network, no data can be fetched from the network until the user initiates it (Please refer to this article for details).
Unlocking the playback can then only be achieved:
- through the video native controls,
- by calling the
play()method in a user event callback (ie a key, touch or mouse event).
During playback, the user-agent provides feeback to the application by:
- updating the
timeupdateevents at regular intervals (typically 15 to 250ms).
If no specific playback position has been specified, the user-agent will start the playback at the initial playback position defined in the stream.
The application can seek programmatically in the media timeline by setting the
attribute to a new playback position:
Alternatively, a playback position can be specified declaratively using a Media fragment URI:
For some Media resources, seeking may be limited to only some parts of
the content: the user-agent therefore exposes the time ranges to when it
is possible to seek through the
seekable attribute of the
The start of the first element in the seekable time range represents the
earliest possible playback position.
Example: a live broadcast content would always have an earliest playback position that is equal to the current playback position.
Trick-modes are achieved by altering the value of the
- normal: rate = 1.0
- pause: rate = 0
- slow-forward: 0 < rate < 1.0
- fast-forward: 1.0 < rate
- slow-rewind: -1.0 <= rate < 0
- fast-rewind: rate < -1.0
When playing a content backwards, the audio is muted. It may also be muted by the user-agent when playing a content forward at a rate that is not 1.0.
If the earliest playback position is reached when playing backwards, the playback stops.
This paragraph assumes that the reader is familiar with the Media element state machine.
When presenting multimedia content for playback, a TV application must find the best buffering strategy to address two contradictory user expectations:
- to be able to watch the content as soon as possible (“low latency”),
- to be able to watch the content without any interruption (“play through”).
In addition, the TV application will usually want to optimize network bandwidth and memory consumption, typically by avoiding unnecessary downloads.
Control buffering before playback
By default, the user-agent will apply an automatic strategy to aggressively preload a content as soon as a valid source has been identified for the presented media.
The HTML5 Media element however exposes a
preload attribute to allow the web application to define the amount of data that can safely be preloaded by the user-agent before the content playback is explicitly started.
Note: The value of the
preload attribute is ignored when a content is in autoplay.
By setting the
preload attribute to
none, the application can prevent the user-agent from downloading any data before the Media element is explicitly requested to play the content.
preload attribute to
metadata will tell the user-agent to download only the amount of data required to identify the content duration and dimensions.
preload attribute to its default
auto value will tell the user-agent to aggressively preload the content, as if the content was about to be played.
Note: in terms of buffering, setting
auto before requesting a content to play is equivalent to calling the
play method directly.
As a rule of thumb:
- all contents should be inserted by default with
preloadexplicitly set to
- the contents that are likely to be played should have
preload attribute can also be used once playback has started, as explained in the next paragraph.
Control buffering during playback
During playback, the
preload attribute allows the web application to control how much data is being buffered in advance:
autoallows the user-agent to aggressively download content, up to having it entirely stored in memory,
metadatatells the user-agent to limit its internal buffers to the amount of data required to play the content without interruption.
The amount of data buffered can be queried using the
buffered attribute, allowing the application to dynamically adjust
preload for a finer grained-control over the buffering policy.
A application willing to limit buffering during playback would typically:
- start with
preload = auto,
- on download
preload = metadataif we are above the buffering threshold,
- on playback
preload = autoif we are below the buffering threshold.
See example code below:
Avoid interruptions in playback
The default behaviour of a Media player in ‘autoplay’ mode is to wait until enough data to be able to play the content through has been retrieved before starting rendering a content on the screen.
Using the same terminology as the HTML5 specification: in autoplay mode, the playback doesn’t start until the
HAVE_ENOUGH_DATA state has been reached.
There is therefore no specific configuration to apply to achieve that behaviour but to set the
autoplay attribute to
An alternative would be to listen to the
canplaythrough event and call the
play method explicitly.
The playback of a content cannot start before the Media player has received enough data to decode at least a few frames.
Using the same terminology as the HTML5 specification: the playback cannot start before the
HAVE_FUTURE_DATA state has been reached.
In order to start the playback of a content as soon as possible, a web application can detect the transition to the
HAVE_FUTURE_DATA state by listening to the
canplay event, and call the
The HTML5 Media element supports multiple inband tracks for a specific media content: for example, in addition to the primary video and audio tracks, a media resource could have foreign-language dubbed dialogues, director’s commentaries, audio descriptions, alternative angles, or sign-language overlays.
In-band media tracks
Inband media tracks are exposed through the
videoTracks attributes, and would become available as soon as the Media player has reached the
To select programmatically a specific media track, an application would thus typically listen to the
loadedmetadata event and select the relevant track from the track lists.
Each media track is identified by the following parameters:
id: typically mapped to the format used in the media container,
kind: particularly relevant are ‘main’ and ‘captions’,
label: to be presented to the user,
A single video track can be active at a given time: the currently active video track can be set programmatically using the
Alternatively, it can be selected declaratively using a Media fragment URI in the form
Multiple audio tracks can be active at the same time: in that case, their audio will be mixed.
A specific audio track can be selected programmatically by setting its
enabled attribute to true.
The example below illustrates how a single audio language can be selected programmatically:
Inband media tracks are however today not supported by any desktop browser, and even worse, the corresponding bindings into WebKit and Gecko are yet to be implemented.
In-band text tracks
Inband text tracks are exposed through the
textTracks attribute of the media element, and would become available as soon as the Media player has reached the
Each Text track is composed of a list cues that represent individual pieces of timed metadata. If a text track is active (see below), the user-agent will generate an
cue event every time it reaches a point in the timeline that corresponds to a cue.
A specific text track can have three different states that are controlled by its
disabled: the track is simply ignored by the user-agent,
hidden: the track is active, the user-agent generates events for the track cues, but nothing is displayed on screen,
showing: the track is active, the user-agent generates events for the track cues, the cues are displayed on screen (if the track is a subtitle track).
The example below illustrates how to activate and display a french subtitles text track as an overlay to the video content:
Inband text tracks are supported only by Safari on desktop.
Out-of-band text tracks
In addition to in-band media tracks, the HTML5 Media element supports out-of-band text tracks that can be used to complement a media with subtitles, audio descriptions, chapters or any kind of metadata to be synchronized with the multimedia content.
Out-of-band Text Tracks can be specified declaratively as children of a media element using the
addTextTrack method. Please refer to the specification for details.
Once registered, the out-of-bands text tracks are available like any other test track through the
textTracks attribute of the Media element.
Out-of-band text tracks are supported by Safari and Google Chrome on desktop. Both will however only display subtitles using the WebVTT format.
For a more detailed introduction to Out-of-band Text Tracks, please refer to this article.comments powered by Disqus