The Rogue Captioner: 2015

Wednesday 11 February 2015

Failure to Communicate

So the Senate Committee Report into captioning reform in Australia has been squeezed out. To the surprise only of those who think the world is comprised of caring and fundamentally good people trying to do the right thing, the report is a snivelling apologia for the broadcast industry, the committee politely rolling over, licking their pointy boots, thanking them for its beatings (how else will it learn?) and asking polite permission to be allowed to please put the lotion on its skin.

Right away, Rupert.

It breezes right the fuck past all submissions from consumer groups. Weirdly it quotes them, provides no particular counter-arguments, then decrees whatever free ride the broadcasters were going to get anyway. I’m not going to go through it in detail, because honestly it’s too inane and that’s an exercise that would endumben us both. But for a sampler, the Minister (a man more renowned for Not Being Leader and for an admittedly fabulous leather jacket than for any particular intellect) concludes that:

I want to make it quite clear that broadcasting licensees will still be required to meet the same specified level of captioning for television programs to assist viewers with hearing impairment.

Now I don’t want to give you a sense of déjà vu, learned reader, but I went into some detail about how that was not actually, in the tangible world to which we are confined, the case. As a random sample, the ability to average the captioning output across linked sports channels (a measure which in overseas markets has led to reduced output), a get-out-of-jail clause for technical faults (news flash – the regulated free market can incentivise really clever measures to reduce those technical faults), the expanded exemptions for new channels (such a missed opportunity to build captions into the foundations) and the reduced standards for live captions (keep me honest, ffs!), ALL represent a drop in the Minister’s “specified level of captioning”.

Ah, but in your coat pocket...your card!

…But I don’t want to retread old ground. Indeed, here would be the point, in a well-conducted public debate, where I look at the rebuttals to points such as mine*, develop my defences for that which remains defensible, and abandon that which is not. But there’s very little scope for it. The views of the deaf fell on deaf ears, while erstwhile bastions of fairness among the ranks of broadcasters, like ABC and SBS, proved fair-weather friends. Arguments such as mine were raised in the same way that a budgie smuggled into a coal mine squawks about climate change – unheard, disregarded, directly inactive and dead on arrival. They were not engaged with, but instead placed side-by-side with the submissions of the Grown-Ups and talked past, as if tacitly admonished to run along and play outside.

* In fact, not only were my arguments mirrored in many submissions to the committee, my own words were quoted as part of one group’s submission, so the debate is discombobulatingly direct.

If there is one new argument to be made, it is this. The requirement for consumers to report, rather than broadcasters to annually self-report, was justified on the basis that between 6:00am and midnight, primary channels require 100% captioning, so it’s “easy” for consumers to see when it’s erroneously missing. That argument holds some water. Not enough, but some. But that exact reasoning constitutes a really good reason why averaging of captioning quotas on sport channels is a bad idea. The committee effectively acknowledges that reliable captioning quotas are the easiest kind for consumers to help enforce. Fair enough. Well, then the committee specifically endorses making it harder to report captioning problems on sport channels, because there that reliability gives way to “flexibility” (to offer less, of course – Murdoch’s sport networks already have the flexibility to exceed requirements).

He’s a flexible guy.

We also apparently have “no evidence” that viewers will be less effective at enforcing compliance than comprehensive industry record-keeping. Whereas, you see, we have lots of evidence from broadcasters that it will reduce the regulatory burden. That sound you can hear is a captioner screaming something about unequal consultation, in the darkness, after finding himself tragically unable to shake shake shake it off.

Must highlight this little gem:

The committee agrees that the breadth of consultation in relation to this bill has been insufficient. As a consequence, the effect of some proposed amendments appear to have been misunderstood…

Catch that? Consultation was too short. Not, however, because of any shortcoming in gathering the views of stakeholders and forming a broad consensus model for the legislation. No, “consultation” here means unilaterally talking, and too little consultation means they haven’t browbeaten people sufficiently into submission. For the record, my colleagues, my viewers and I understand. We do. We can’t help but notice it’s just a bit shithouse, is all.

It should be noted the committee represented the depressing bipartisanship for which our Senate is of course so celebrated. Which is to say, weak-as-piss Labor rubber stamping. The committee consists of three Coalition Senators, two Labor and one Green, along with two further Greens listed as “participating members for this inquiry”. The Greens issued a strong dissenting report, calling for a host of sensible amendments and rejecting large parts of what must, therefore, have otherwise been a consensus view (between Rupert Murdoch and his navel).

Bipartisanship.

So what do we do now? Well, unless I’m misreading it, the marginally amended bill still needs to pass the Senate, so if you hear of any Labor Senators ambling in the direction of Damascus, consider rigging up some extra-flashy pyrotechnics and a generous supply of peyote. And if and when we lose, keep on fighting in the form of complaints. Whether you’re a viewer or even a captioner (why not?), complain officially about every error you see, every loss, every time we cover up important information, every program or channel or instant without comprehensive captions. If it falls to you to enforce compliance, do it mercilessly. Don’t ask yourself whether it was a reasonable live error, whether it was a show which maybe didn’t require captions, if it was a technical hiccup at the network. Register every complaint. Don’t feel sorry for captioners. Keeping us honest makes us do better work, and more importantly makes our clients and employers provide us the resources to do better work. We’re on your side, so punish us good.

Disclaimer.

Monday 9 February 2015

Miss Happens

You know, there has been some weird reporting about live captions lately. The regulator Ofcom in the UK are being reported in various tabloids as suggesting that lots of captioning errors make captions harder to understand. In other news, the sky is suspected of being blue, the Pope showing signs of leaning towards Catholicism and not all bears fully utilise available public restrooms. I’m willing to assume the tabloids buried the lede somewhere and Ofcom went into more detail about error rates, delays, losses and such, which are the nuts and bolts of caption quality, but the tabloid reports seem to leave it at errors=bad, which is a no-brainer. The Mirror at least takes a whimsical approach to it, using errors=bad as a springboard to segue into a compiled listicle of funny mishaps. But the Sunday Express went bewilderingly fire-and-brimstone with it, using what appeared to be around four errors (ironically enough during a discussion of accessibility services) as a cudgel to attack the BBC, without the slightest discussion of what live captioning involves, or what is an acceptable error rate. They suggest that “Western Mark there is an answer” emerged where the caption should have just read “enough”, which I find pretty implausible because I recognise some of the brushwork. “Western Mark” is definitely an unlucky side-effect of that captioner, whoever they may be, saying “question mark”. I’m guessing the “an answer” part was a mishearing, whether human or software, of “enough”. As for “there is”, two possibilities are equally likely. Either they said it, and the Sunday Express had an uncharacteristic lapse in their usually stellar journalistic thoroughness, or they didn’t say it and it was added in an attempt (futile though it may have been) to trigger their voice software’s syntactical context-recognition. “There is enough…” is less likely to be misheard than simply “Enough”. Anyway, as I’ve said before, we’re in the business of the inexact. An error every minute is great, an error every two minutes makes you that Jedi-on-amphetamines captioner who we all hate a little bit, and if they think three or four errors is newsworthy, then boy howdy do I have some scoops for them.

On with the mishaps! In the lead-up to BAFTAs and Oscars and such, there’s been some tricky film chatter to caption. I know it’s a cliché to joke about his bafflingly homophonous name, but there was a strange, naughty, zeitgeisty joy in realising “an addict Cumberbatch” had gone to air. An award recipient wanted to “thank these gorges, gorgeous women,” because cliff faces are the true unsung heroes of the film industry. The Clint Eastwood classic Letters From Iwo Jima has a certain stoic profundity. “Letters from your Gmail”… does not. Finally, “a noir writer” emerged as “on a wire writer”, which has a nice kind of poetry.

There’s always a little bit of gambling when place names come up unexpectedly. Dragon has a large native database of place names, and we’ve all added reasonably comprehensive wordlists for the places our captions go to air. Still, there’s always a few in need of refinement. Thus we had “pill bra”, which sounds less like an Australian mining region and more like a place you stash your ecstasy so the bouncer won’t find it. Similarly I’m not sure “Albury wood donger” is a real place. Nor are countries spared, with Guatemala finding its way to air as “quite a Mahler”.

Quite.

Politics also remains a source of endless hilarity, and not just in terms of entertaining “captain’s calls”. Who knows what permutation may govern in coalition in the next UK parliament, but I hope no-one sides with the “glib dams” – their arguments, while pithy, don’t hold water. The NHS has been a hot political topic, but while increasing funding for NHS Blood and Transplant sounds like sound policy, “NHS Lard and Transplant” sounds like a Menulog regret waiting to happen. Voting “by conscience” seems laudable; voting “icon sheds” seems incomprehensible. One politician speaking “in relation to debt” spoke instead about “immolation debt”. And here’s me thinking setting fire to stuff isn’t all that pricey. And politics intruded unwonted when a visit to a toy fair became a visit to a “Tory affair”. Political adversaries became political “at the ferries”, which sounds altogether more pleasant. A colleague’s unfortunate pause made Indonesia’s leader “President Joker”. I guess we all appreciate the success of three-word slogans in politics, like “Yes we can” and “Stop the boats,” so I feel “Kill the Batman” is in with a shot. And I had Syria’s leader down as “resident aside”, which I guess given the displaced people in that area isn’t so much funny as not funny.

The owls are not what they seem.

Nature documentaries – still fun. Usually there’s time to nix these before they go to air, as they’re captioned offline, but they give me a guffaw in my booth. So only I got to see that instead of “our aquatic friends,” the humble fish became “power quantic friends”. And an expert, when asked how fast crocodiles grow, apparently replied “Well, it depends what they’re reading.” Finnegan’s Wake is, I grant, a challenge to digest. “Wobbegong” is a fun word at the best of times, but Dragon decided “wobbly goal” was a better fit.

Don’t let it get the best of you.

Captioning church remains an error-spotter’s delight. “Liquid myrrh”, admittedly an obscure phrase, came out as “liquid murder”, which would have made for an undeniably more badass (if less pretty smelling) Messiah. And while “coheirs to eternal life” has a stronger scriptural foundation, “co-eds to eternal life” sounds like more fun.

Forgive me.

The gravitas and dignity of history makes captioning errors in historical features all the more starkly silly. Thus I could only giggle as one colleague marked 10 years since a “50-foot salami” buffeted South-East Asia. And again when combat veterans were said to be suffering from “post-dramatic stress” (I guess they method). And I could only gasp, then giggle as another colleague, in a feature on Auschwitz survivors, made the classic “I scream”/”Ice-cream” switcheroo.

Finally, a couple of random mishaps from lifestyle shows. Recreational archery is made decidedly more challenging when the apparatus becomes “bow and error”. The trade-off between cardiovascular fitness and fun seemed reasonable when “aerobic routine” became “erotic routine”. The tips on where an eligible young gentleman can find the best “rattler pad” may come in handy if I want to have my snake bros over to play X-Box. A celebrity was described as being “no shrieking violet,” which is a good thing as that sounds abjectly terrifying. A sport analyst saying “I sense optimism” unintentionally reaffirmed his commitment to Sparkle Motion when it came out as “ice dance optimism”. “Be with you in a sec” and “be with you in a sack” are two very, very different things. And an interactive segment opening with “we’re back to hear all your thoughts” emerged as “wear bacterial thoughts”.

And that, tabloids, is how you make some gosh darn captioning errors. Just a couple of last things before I go. I mentioned the proposed changes to captioning regulation. Well now there is an opportunity to comment on Australia’s captioning regulations. Go do that! And I’ve talked about how much more smoothly the captioning process goes when you’re part of the process rather than an afterthought. This was recently explored in much more detail in a post on iheartsubtitles. And finally, another interesting post on the educational benefits of closed captions. They… uh, exist.

Disclaimer.

Thursday 8 January 2015

Quality and Accuracy Part Three: Style and the Great Offline Caper

Happy New Year, captioning enthusiasts! We swan-dive into the thick, opaque molasses of 2015 in interesting captioning times. Australia may or may not have exorcised the spectre of deregulation and quality-cutting, industry-wide security is being tweaked against the threat of hacking (as many have mentioned, captions comprise valuable metadata – a corollary is that it’s data which unsavoury types may covet), but at the cost of some convenience and productivity for captioners, and I discovered on New Year’s Day that captioning live fireworks instead of news headlines at the top of each hour is really quite fun (even if “auld” isn’t the most Dragon-friendly word). So I basically hope 2015 will see continued public support for high-quality captioning, smooth and user-friendly security protocols, and colourful explosions just every day.

If fireworks persist, see your doctor.

Another development over the past few months has been a diversifying of skills for your friendly neighbourhood Rogue Captioner. I mentioned in the very beginning that there are two basic strands of TV captioning – live and offline. Live captioning involves either stenography or respeaking in real time as things go to air like an uncommonly literate hamster, sometimes combined with cueing out prescripted elements. While I remain primarily a live captioner, I’ve been gradually learning most of the steps in the shadowy world of offline captioning, and filling in the sometimes-unpredictable gaps between manic live times with the more methodical offline work. I thought I’d share a little about how it’s done. As it makes sense here to go into style and standards, this post also forms the much-belated third part to my “quality and accuracy” series. The other parts covered losses and errors.

You may be surprised how much more time which goes into offline than live captioning. Live captioners typically produce output at a rate of around 5:2, since they require some prep beforehand (and of course a sandwich after), and then usually share the load 50-50 with a co-pilot. So fully captioning a two-hour segment requires two captioners to each prep for roughly half an hour, then alternate 15-minute or 30-minute slots on air, for a total of five captioner-hours. But offline is a really different, and much more chronovorous, ballgame. It’s rarely less than 10:1 – that’s 10 hours of work for one hour of captioned content – and even that kind of efficiency may only happen if you’re an unusually dextrous millipede with opposable thumbs.

So where does the extra time come from? Well, the glib answer is ~~offliners are lazy~~ perfection and timing. Live captions slither and snake their way onto the air, a word at a time, with around a five-second delay and an accuracy rate of between 97.5 and 99 percent. Offline captions appear in carefully sculpted blocks, adhering to a long list of style guidelines, exactly as the speaker is talking. This post will take you through the process of getting a captioned program ready for broadcast, up to the point of a final edit (that part isn’t yet among my responsibilities, so there be sea monsters), and let you in on the sorts of things we need to keep in mind. Two main processes need to happen – first scripting, and then file/fix-up – with some inevitable overlap between them.

So we first receive an episode, in the form of an MPEG file, from a broadcaster. Captioners assign themselves all or part of the runtime of the episode, depending on how much time and caffeine they have available, then get to work creating a script. The first step in scripting is to import it into our offline captioning software. This software is designed to stop, collaborate and listen with both Dragon, the speech-recognition software for respeakers, as well as the shorthand software used by stenographers. It combines video-navigation functions like play, stop, slow-motion, or (very usefully) jump-back-one-second-and-play, aka the “what was that?” button, with captioning functions like colour change, positioning on screen, and adding, deleting and combining captions. It adds up to an absolutely dizzying array of keyboard shortcuts, and watching someone really experienced use it can be quite baffling. I ain’t there yet, so the shortcut I’ve most mastered is Ctrl-Z, to undo. Next, before you get started, you need to make sure the timecode on the video matches that in the caption file, which can be thought of as the captioning equivalent of the clapper used to synchronise audio and visuals in film.

The software then takes a moment to create some invaluable metadata (I barely knew her data!) which maps out the audio track, calculating the “shape” of the sound in a way which will help it to guess where each caption should fall. Interestingly though, it also maps out the visuals, marking out where all the shot changes fall. When I discuss the second process, file/fix-up, that will come in handy.

Metadata: Nothing Whatsoever to do with Envelopes.

So now the main work of scripting can begin (at the very beginning, which Julie Andrews tells me is a bonza place to kick off). On the first play-through of the file, or “first pass”, we respeak it in much the same way we would for live content, with a few differences. Firstly, since we can pause and go back, there is no sense in paraphrasing to avoid words we don’t know or can’t catch, or to get around cross-talk (characters speaking over each other) or fast dialogue. The first pass can be far from perfect, but it must at least be reasonably complete. While in live captioning it sometimes makes sense to skim or just convey the gist, offline has no place for that. Secondly, the first pass is where we begin to create the timing. We do this by setting markers (more keyboard shortcuts) where a conversation or section of narration or whatever begins and ends. The software then gets fancy. It guesses the breakdown of the captions, based on colour changes, punctuation and a two-line limit, then looks at the shape of the audio from that metadata I mentioned, and roughly matches up each caption to that like an audiovisual OkCupid. So let’s say I’m captioning this scene:

I would put a section opener before “Big man in a suit of armour. Take that off, what are you?” and a section closer after “Genius, billionaire, playboy, philanthropist.” The software would recognise three sentences, each comfortably within two lines in length, and would split it accordingly into three captions. Then it would read the audio track metadata and see three little corresponding spikes at frequencies consistent with human speech. I’ve told it where the first caption begins and the third one ends, so it should make an educated guess that the second caption begins after Steve pauses, and the third begins where the voice soundwave shifts to Tony’s taunting tone. Those sentences were all similar in length, but if they vary, the software can include sentence length in its calculations. It often gets it wrong (single words which are held a bit long like “Stella!”, rapid-fire sentences, and lyrics cause particular problems), but it gives us something to work with when I eventually come back and fix the timing.

So after the first pass we have a rough script, roughly timed. The aim of the second pass is to get it word-perfect. We watch through again, pausing to fix any Dragon errors, verify any proper nouns, and standardise spelling to our style guides. For Dragon errors, there’s a handy keyboard shortcut which cycles through homophones of a selected word, so you can quickly turn nay into neigh if the politician voting against the motion turns out to be Mr Ed. For proper noun verification we can use the credits, imdb, or else my company maintains an enviable database filled with soap opera family trees, the baffling names of reality TV contestants who Didn’t Come Here To Make Friends, and street directories for places that never were. I won’t subject you to all of our spelling standards (with some exceptions, reciting the dictionary isn’t the best way to dazzle readers), but one of my favourite documents is our official spelling list of non-verbal sounds. So “eugh” expresses disgust, but “ew” is used to “express disgust, Valley Girl-style”. “Ah” always expresses discovery and “uh” uncertainty (except where it’s within “uh-oh,” “uh-uh” or “uh-huh”), even though what you actually hear may sometimes be the other way around. “Oh” is surprise or an interjection, but “Ohh” is emotional pain (“O” sometimes comes up as a religious invocation, but usually needs specific verification). If it’s quizzical or contemplative it’s “hmm”, but if it conveys agreement or pleasure it’s “mmm”.

Another acceptable usage.

We also resplit captions at this point to be as readable as ephemeral text on a glowy rectangle can be. Here there can be some trade-offs. We try and keep sentences, or clauses, or concepts, or individual speakers, together. We try not to end either a line or a caption with a preposition (of, for, under…), a conjunction (and, but, so…), an article (the, a, an) or a verb – anything, basically, which belongs with the word that follows. We try not to have a book or movie title go over a line or caption break. We don’t use semicolons as they’re difficult to read without the ability to glance back over the first part. We avoid colons in most cases as when captioning they specifically mean “read the screen”. So if a phone number is printed onscreen, we might caption “For a free trial appendectomy, call:”. And more generally we err on the side of shorter sentences as they’re more readable onscreen.

So once we finish the second pass and a quick spell-check, that wraps up the scripting phase. Next comes a grab bag of chores under the heading “file/fix-up”. We now have a word-perfect script, the next big task is timing. A handy keyboard shortcut which moves the video to the beginning of the caption you’re editing becomes your friend at this point. Hopefully many of the captions will be sitting roughly where they need to be, but we go through and meticulously adjust the start and end times of each caption to correspond with when the speaker is doing their speak thing. There’s a few exceptions though – the captions should be no shorter than one second, even if the utterance is, because while a very short caption doesn’t take long to read, it might take a moment to notice. A caption can linger longer for readability if it exceeds 300 words per minute. If a pause is very short, like on a dachshund, we don’t put a gap between captions. It looks more polished, and it also means that when the captions do cease, the viewer will subconsciously know the conversation has paused and they can safely look around the rest of the mise-en-scène without having to be immediately yanked right back to the captions. For similar reasons, we try and align the beginning and end of captions with any relevant shot changes, even if the speaker begins talking slightly before or after the cut. And they often do – there’s a film editing technique called a “sound bridge” which involves using sound to smooth over a visual cut. Sound bridges can also mimic the way our senses work – we begin hearing a sound and then look up. But a cut involves a whole new slab of (visual) information to take in. If a caption comes just before, the viewer might be intently reading it and miss something visual. If it comes just after, they might be taking in the visuals and run out of time to read the caption. If it’s simultaneous, it maximises the time to take both in.

So back to the above Avengers clip, let’s look at the timing considerations. No-one is talking fast enough to present serious problems with reading speed, and not many of the pauses would be long enough to justify a gap in the captions. Tony’s “Why shouldn’t the guy let off a little steam?” gives a nice example of a sound bridge as the cut from a long shot to a mid shot happens just before he finishes talking. So we’d probably clear it right on the cut, which frees up the viewer to take in the visual tension between Steve and Tony. Whereas for Tony’s last close-up, he starts saying “Genius, billionaire, playboy, philanthropist” just after the cut, but close enough that you’d probably start the caption on the cut. As an added bonus, this makes it easy to see who is talking, as it ties the shot and the caption together, like a comic panel and a speech bubble.

We also do caption positioning at this point. I’ve mentioned the principles of this, the main things are avoiding speakers’ mouths and important visual information. By default we hug line 20 at the bottom of the screen, raising over any supers when necessary. Interestingly, we also have to raise for 10 seconds after ad breaks in case network promos are added in post-production, after we’ve done our thing (or when the show is repeated). We have to make sure there’s at least a second clear just before and after ad breaks, as that can cause “hanging caption” glitches. For the same reason, a blank caption is needed at the beginning so any late-running ad captions don’t get stuck. We insert labels where it isn’t clear who is speaking, sound effects where relevant, and a captioning company credit at the end. We run a battery of tests which check for errors, short gaps, minimum and maximum lengths, word rates, overlaps, captions too close to shot changes, spelling, homophones and invalid characters (some text we copy in comes with the wrong kinds of apostrophes, which is a headache), and then we…uh, watch the show. We watch it as it will air, or at double speed if we’re running short of time, and look for anything that seems wrong or unclear.

Nup, all seems in order.

And then we send to the editor, and go get a sandwich.

Disclaimer.