Closed captioning

Closed captioning (CC) (commonly known as subtitles, and also called subtitles for the hearing impaired) allows people who are deaf or hard of hearing, learning a new language, beginning to read, in a noisy environment, or otherwise disadvantaged to read a transcript or dialog of the audio portion of a video, film, or other presentation. As the video plays, text captions are displayed that transcribe (although not always verbatim) speech and often other relevant sounds.

The term "closed" in closed captioning means that not all viewers see the captions&mdash;only those who decode or activate them. This distinguishes from "open captions," where all viewers see the captions, calling permanently visible captions in a video, film, or other medium "open", "burned-in", or "hardcoded" captions.

Most of the world does not distinguish captions from subtitles. In the United States and Canada, these terms do have different meanings, however: "subtitles" assume the viewer can hear but cannot understand the language, so they only translate dialogue and some on-screen text. "Captions" aim to describe all significant audio content and "non-speech information," such as the identity of speakers and their manner of speaking, along with music or sound effects using words or symbols.

The United Kingdom, Ireland, and most other countries do not distinguish between subtitles and closed captions, and use "subtitles" as the general term.

In the United States, the National Captioning Institute noted that 'English-as-a-second-language' (ESL) learners were the largest group buying decoders in the late 1980s and early 1990s before built-in decoders became a standard feature of U.S. television sets. This suggested that the largest audience of closed captioning was people whose native language was not English. In the United Kingdom, of 7.5 million people using TV subtitles (closed captioning), 6 million have a hearing disability.

Television and video
For live programs, spoken words comprising the television program's soundtrack are transcribed by an operator using stenotype or stenomask type of machines, whose phonetic output is instantly translated into text by a computer and displayed on the screen. This technique was developed in the 1970s as an initiative of the BBC's Ceefax teletext service. In collaboration with the BBC, a university student took on the research project of writing the first phonetics-to-text conversion program for this purpose. (PDF) Automatic computer speech recognition now works well when trained to recognise a single voice, and so since 2003 the BBC does live subtitling by having someone re-speak what is being broadcast.

In some cases the transcript is available beforehand and captions are simply displayed during the program after being edited. For programs that have a mix of pre-prepared and live content, such as news bulletins, a combination of the above techniques is used.

For prerecorded programs and home videos, audio is transcribed and captions are prepared, positioned, and timed in advance.

For all types of NTSC programming, captions are "encoded" into Line 21 of the vertical blanking interval – a part of the TV picture that sits just above the visible portion and is usually unseen. For ATSC (digital television) programming, three streams are encoded in the video: two are backward compatible Line 21 captions, and the third is a set of up to 63 additional caption streams encoded in EIA-708 format.

Captioning is transmitted and stored differently in PAL and SECAM countries, where teletext is used rather than Line 21, but the methods of preparation are similar. Note that, for home videotapes, a variation of the Line 21 system is used in PAL countries. Teletext captions can't be stored on a standard VHS tape (due to limited bandwidth), although they are available on S-VHS tapes.

For older televisions, a set-top box or other decoder is usually required. In the U.S., since the passage of the Television Decoder Circuitry Act, manufacturers of most television receivers sold have been required to include closed captioning. High-definition TV sets, receivers, and tuner cards are also covered, though the technical specifications are different. (High-definition display screens, as opposed to high-definition TVs, may lack captioning.) Canada has no similar law, but receives the same sets as the U.S. in most cases.

There are three styles of Line 21 closed captioning:


 * Roll-up or scroll-up or scrolling: The words appear from left to right, up to one line at a time; when a line is filled, the whole line scrolls up to make way for a new line, and the line on top is erased. The captions usually appear at the bottom of the screen, but can actually be placed anywhere to avoid covering graphics or action.  This method is used for live events, where a sequential word-by-word captioning process is needed.


 * Pop-on or pop-up or block: A caption appears anywhere on the screen as a whole, followed by another caption or no captions. This method is used for most pre-taped television and film programming.


 * Paint-on: The caption, whether it be a single word or a line, appears on the screen letter-by-letter from left to right, but ends up as a stationary block like pop-on captions. Rarely used; most often seen in very first captions when little time is available to read the caption or in "overlay" captions added to an existing caption.

A single program may include scroll-up and pop-on captions (e.g., scroll-up for narration and pop-on for song lyrics). A musical note symbol (hash sign in UK and Australia) is used to indicate song lyrics or background music. Generally, lyrics are preceded and followed by music notes (or hash signs), while song titles are bracketed like a sound effect. Standards vary from country to country and company to company.

For live programs, some soap operas, and other shows captioned using scroll-up, Line 21 caption text includes the symbols '>>' to indicate a new speaker (the name of the new speaker sometimes appears as well), and '>>>' in news reports to identify a new story. In some cases, '>>' means one person is talking and '>>>' means two or more people are talking. Capitals are frequently used because many older home caption decoder fonts had no descenders for the lowercase letters g, j, p, q, and y, though virtually all modern TVs have caption character sets with descenders. Text can be italicized, among a few other style choices. Captions can be presented in different colors as well. Coloration is rarely used in North America, but can sometimes be seen on music videos on MTV or VH-1, and in the captioning's production credits. More often, coloration is used in the United Kingdom, Australia and New Zealand for speaker differentiation.

There were many shortcomings in the original Line 21 specification from a typographic standpoint, since, for example, it lacked many of the characters required for captioning in languages other than English. Since that time, the core Line 21 character set has been expanded to include quite a few more characters, handling most requirements for languages common in North and South America such as French, Spanish, and Portuguese, though those extended characters are not required in all decoders and are thus unreliable in everyday use. The problem has been almost eliminated with the EIA-708 standard for digital television, which boasts a far more comprehensive character set.

Captions are often edited to make them easier to read and to reduce the amount of text displayed onscreen. This editing can be very minor, with only a few occasional unimportant missed lines, to severe where virtually every line spoken by the actors is condensed. The measure used to guide this editing is words per minute, commonly varying from 180 to 300, depending on the type of program. Offensive words are also captioned, but if the program is censored for TV broadcast, the broadcaster might not have arranged for the captioning to be edited or censored also. A television set top box is available to parents who wish to censor offensive language of programs, the video signal is fed into the box and if it detects an offensive word in the captioning, the audio signal is bleeped or muted for that period of time.

Caption channels
The Line 21 data stream can consist of data from several data channels multiplexed together. Field 1 has four data channels: two Captions (CC1, CC2) and two Text (T1, T2). Field 2 has five additional data channels: two Captions (CC3, CC4), two Text (T3, T4), and Extended Data Services (XDS). XDS data structure is defined in EIA–608.

As CC1 and CC2 share bandwidth, if there is a lot of data in CC1, there will be little room for CC2 data. Similarly CC3 and CC4 share the second field of line 21. Since some early caption decoders supported only CC1 and CC2, captions in a second language were often placed in CC2. This led to bandwidth problems, however, and the current FCC recommendation is that bilingual programming should have the second caption language in CC3.

DVD
NTSC DVDs may carry closed captions in the Line 21 format. They are are sent to the TV by the player and can be displayed with a TV's built in decoder or a set-top decoder as usual. Independent of Line 21, video DVDs may also carry captions as a bitmap overlay which can be turned on and off via the DVD player, just like subtitles. This type of captioning is usually carried in a subtitle track labeled either "English for the hearing impaired" or, more recently, "SDH" (Subtitled for the Deaf and Hard of hearing). On some DVDs, the Line 21 captions may contain the same text as the subtitles; on others, only the Line 21 captions include the additional non-speech information needed for deaf and hard of hearing viewers.

HD DVD and Blu-ray disc media cannot carry Line 21 closed captioning due to the design of High-Definition Multimedia Interface specifications that were designed to replace older analog and digital standards, such as VGA, S-Video, and DVI. Both Blu-ray disc and HD DVD can use either DVD bitmap subtitles (with extended definition) or 'advanced subtitles' to carry SDH type subtitling, the latter being an XML based textual format which includes font, styling and positioning information as well as a unicode representation of the text. Advanced subtitling can also include additional media accessibility features such as "descriptive audio".

Movies
There are several competing technologies used to provide captioning for movies in theaters. Just as with television captioning, they fall into two broad categories: open and closed. The definition of "closed" captioning in this context is a bit different from television, as it refers to any technology that allows some of the viewers to use captions while others in the same theater at the same time do not see captions.

Open captioning in a theater can be accomplished through burned-in captions, projected bitmaps, or (rarely) a display located above or below the movie screen. Typically, this display is a large LED sign.

Probably the best-known closed captioning option for theaters is the Rear Window Captioning System from the National Center for Accessible Media. Upon entering the theater, viewers requiring captions are given a panel of flat translucent glass or plastic on a gooseneck stalk, which can be mounted in front of the viewer's seat. In the back of the theater is an LED display that shows the captions in mirror image. The panel reflects the captions for the viewer, but is nearly invisible to surrounding patrons. The panel can be positioned so that the viewer watches the movie through the panel and captions appear either on or near the movie image. A company called Cinematic Captioning Systems has a similar reflective system called Bounce Back.

DTS or Digital Theater Systems, the company who created surround sound have a digital captioning device called the DTS-CSS or Cinema Subtitling System. It is a combination of a laser projector which places the captioning (words, sounds) anywhere on the screen, and the CD on the thin playback device holds many languages.

Other closed captioning technologies for movies include hand-held displays similar to a PDA (personal digital assistant); eyeglasses fitted with a prism over one lens; and projected bitmap captions. The PDA and eyeglass systems use a wireless transmitter to send the captions to the display device.

Video games
Closed captioning of video games is becoming more common. One of the first video games to feature true closed captioning was Zork Grand Inquisitor in 1997. Many games since then have at least offered subtitles for spoken dialog during cutscenes, and many include significant in-game dialog and sound effects in the captions as well; for example, with subtitles turned on in the Metal Gear Solid series of stealth games, not only are subtitles available during cutscenes, but any dialog spoken during real-time gameplay will be captioned as well, allowing players who can't hear the dialog to know what enemy guards are saying and when the main character has been detected. Also, in the video game Half-Life 2, when closed captions are activated, dialogue and nearly all sound effects either made by the player or from other sources (e.g. gunfire, explosions) will be captioned.

Video games don't offer Line 21 captioning, decoded and displayed by the television itself; but rather a built-in subtitle display, more akin to that of a DVD. The game systems themselves have no role in the captioning either: each game must have its subtitle display programmed individually.

Currently there is a big push from Reid Kimball, a game designer who is hearing impaired, to educate game developers about closed captioning for games. Reid started the Games [CC ] group to close caption games and serve as a research and development team to aid the industry any way it can. Reid writes articles, designed the Dynamic Closed Captioning system and speaks at developer conferences. Games[CC]'s first closed captioning project called Doom3[CC] was nominated for an award as Best Doom3 Mod of the Year for IGDA's Choice Awards 2006 show.

Theatre
While opera houses have used captioning for their productions since 1983, live theatre captioning has only recently begun appearing. Display techniques vary, with subtitles, surtitles and individual displays being used.

Telephones
Telephones are starting to apply closed captions for people who are deaf and hard-of-hearing.

Media monitoring services
In the United States especially, most media monitoring services capture and index closed captioning text from news and public affairs programs, allowing them to search the text for client references.

The use of closed captioning for television news monitoring was pioneered in 1993 by Tulsa-based NewsTrak of Oklahoma (later known as Broadcast News of Mid-America, acquired by video news release pioneer Medialink Worldwide Incorporated in 1997). US patent 7,009,657 describes a "method and system for the automatic collection and conditioning of closed caption text originating from multiple geographic locations" as used by news monitoring services.

Americas
The US ATSC HDTV system originally specified two different kinds of closed captioning datastream standards -- the original (available by line 21) and another more modern version associated with MPEG.

Sadly this has led to perhaps up to 10% (maybe even 20%) or more US ATSC HDTV sets not being able to decode the embedded closed captioning signal.

Neither the US FCC nor the Canadian CRTC have intervened by mandating that broadcasters either broadcast both datastream formats or exclusively in one format.

50hz Zone
In the 50Hz TV Zone (but excluding the PAL-M system) that is to say wherever PAL and SECAM are broadcast -- captioning interoperability is not as big an issue with HDTV broadcasting.

The European teletext systems are the source for closed captioning signals thus when teletext is embedded into DVB-T or DVB-S (and maybe DVB-H) the closed captioning signal rides along.

As the whole of Europe has settled on maybe 2 teletext standards -- and as the standards share a common encapsulation mechanism -- closed captioning interoperablity problems are much less severe.

DTV standard captioning improvements
The EIA-708 specification provides for dramatically improved captioning
 * An enhanced character set with more accented letters and non-English letters, and more special symbols
 * Viewer-adjustable text size, allowing individuals to adjust their TVs to display small, normal, or large captions
 * More text and background colors, including see-through backgrounds to optionally replace the big black block
 * More text styles, including edged or drop-shadowed text rather than the letters on a solid background
 * More text fonts, including monospaced and proportional spaced, serif and sans-serif, and some playful cursive fonts
 * Higher bandwidth, to allow more data per minute of video

History
The first use of closed captioning on American television was on March 16, 1980. Sears had developed and sold the Telecaption adaptor, a decoding unit that could be connected to a standard television set. According to the National Captioning Institute, the first programs seen with captioning that Sunday evening were the ABC Sunday Night Movie, Disney's Wonderful World on NBC, and Masterpiece Theatre on PBS. The captioned Disney feature, showing at 7:00 pm EST, was the film Son of Flubber, while the ABC movie at 9:00 EST was Semi-Tough.