Delivering closed captions or subtitles for video is essential to ensure that the content is available to users who are deaf or hard of hearing. In HTML5, the <track> element, a child of the <video> element, represents a standardized way of adding captions, subtitles, chapters or metadata information associated with a video. The HTML5 specification incorporates details on how the <track> element interacts with the rest of the media elements.
WebVTT is one of the options for the file format used for shipping out-of-band text tracks. The specification covers parsing and rendering rules of the text track file, including precise details on how the user agent should apply the display settings for each text track cue. Other formats are available and important, including SMPTE-TT and the W3C’s TTML.
WebVTT support within browsers is growing in part due to Adobe’s contributions to WebKit. Here’s some of the work coming out of this effort and how you can try it out yourself.
Support for more advanced display features is an on-going effort and in this blog post I will highlight how developers will be able to use the <track> element to position text anywhere on top of a video.
In the following short video you can see multiple text track cues positioned at different coordinates over the video.
Video Copyright (c) 2006
Blender Foundation / Netherlands Media Art Institute /www.elephantsdream.org.
Below is a snippet of the out-of-band text track file used in the screencast above:
00:00:05.250 --> 00:00:11.500
This is a demo track to illustrate positioning features for cues.
00:00:13.000 --> 00:00:19.000 line:50% position:50%
Using HTML5 captions, you can position the cues anywhere.
3 00:00:20.000 --> 00:00:23.000 line:0% position:0%
For example, here is a cue on the top left-corner.
00:00:23.000 --> 00:00:26.000 line:0% position:100%
Or on the top-right corner.
00:00:26.000 --> 00:00:29.000 line:100% position:100%
Or on the top-right corner
As you can see, specifying the position of each text track cue is done through the “line” and “position” parameters. The pair (position, line), specified in percentages, represent coordinates over the video viewport.
The actual position is done such that the cue is implicitly within the viewport area of the video. With a value of (0%, 0%) the upper left corner of the display cue box will be aligned with the upper left corner of the video. Specifying the pair (100%, 100%) aligns the bottom-right corner of the display cue with the bottom right corner of the video.
Generally, the coordinates (position%, line%) ensure that the point position% across and line% down the display cue box will be placed at the point position% across and line% down of the viewport are of the video.
This article covers only a particular way of specifying rendering settings, still experimental, but which you can now test by using a nightly build of WebKit, or Google Chrome Canary (the <track> flag needs to be enabled). Other basic aspects of using the <track> element are excellently highlighted here.
More advanced features are to be implemented in the near future. For example, the “line” setting is not necessarily a percentage value – and in this case it would represent the order number of line on the video (counting from top to bottom or from bottom to top, depending on the directionality of the text).
The WebVTT specification covers many other rendering flavors, such as complete support for bi-directionality, paint-on captions (appearance times are associated with fragments of text within the cue) and integrating roll-up captions display mode is currently under debate.
Developers interested in learning more about WebVTT are encouraged to check out the Web Media Text Tracks Community Group at the W3C.