>> THOMAS: hello everyone. Welcome to Accessibility New York City. Thank you for braving the cold, it's cold out there. Happy to have everyone here, this is a very special meetup for us. We worked with Mirabai Knight from th very beginning, during our meetup here in New York City. So the topic on why human captioning, we definitely benefitted from that the entire time that we have been having our meetup, Mirabai and someone from her team at White Coat captioning  >>Lindsay: Hi! >> THOMAS: Is captioning the talk tonight. And of course, we always appreciate and we recommend and we advocate for always having the human captioning and realtime captioning, we always have it at the Accessibility New York City events. So that is something that we do for every event that we to do, you can count on us doing that. For those of you that have not met me, I'm Thomas Logan. I'm one of the organizers of the group. It is nice to have Sean here in the back, the organizers, and Tyson, and Cameron is one of the organizers, he is not here tonight. But all four of us love having these events, and we have them the first Tuesday of the month, Generally, Thoughtbot has always been our generous host. Thank you for hosting us and providing this space, we want to thank Level Access and Adobe, who are both sponsors of our meetup and help us put on events monthly. So thank you, everyone, for being here. With that said, we will turn it over to Mirabai and learn ability why human captioning. >>MIRABAI: Thank you, this is really an honor. I got to speak at Accessibility Camp NYC in 2013 or something like that, and that was a little bit more specific. It was sort of about why I thought we should be making outreaches to the blind and lowvision community t get them into professional captioning, which I still believe very firmly. But this is going to be more talking about some of the recent advances in autocaptioning, and talking about what that means for the current state of human captioning, and what live realtime captioning will look like in the future. I tend to be a very fast speaker, but I am going to try to speak at a moderate and reasonable pace tonight, because I gave a version of this talk a couple weeks ago for an audience of stenographers, and I was going a million miles a minute, and most of the live chat commentary was about how they would hate to take me down as a witness in court. [ Laughter ]. So I'm trying to take that to heart, and give Lindsay a bit of a break. >>Lindsay: Yay! =) thank you. >> My amazing colleague from White Coat Captioning. And I will just try to address the most salient points, from a slightly different perspective. So rather than just speaking to fellow captioners about what our profession might look like going forward, this talk is more addressed to users of captioning, and people who might be in the position to arrange for captioning: Either for an event that they're hosting, or for an employee that they're hiring, or perhaps they work as a disability director of the university, and their responsibility involves coordinating captioning for students or employees, or public events. So that's going to be more of my focus, and I think specifically, because there has been so much changing recently, in the last six or seven years in terms of accuracy of autocaptioning, in terms of the accessibility  with respect to just being able to turn on you phone and push a button, you know, the ability to get realtime captions from an automated source has gotten much easier. And, at this point, it is very affordable. We might talk a little later about whether the business model involves, sort of, creating the market and starting to charge for it. Very few companies are making any money at providing automatic captioning, but we will see what happens in the future when it comes to the actual market. But I'm going to get started primarily addressing the differences, just the actual technical differences, both in the production of human captions, versus auto captions, and in the results, and what each might be used for, and in which circumstances. Just a little about myself: I have been a professional stenocaptioner since 2007, both here in New York City and remotely, across the country, both onsite and through the internet. I have my RDR, which means I'm certified at up to 260 words per minute. I prefer doing really, sort of, thick, thorny, medical and technical jargon. So my colleagues and I at White Coat Captioning do a lot of tech events and programming conferences, I like medical stuff, so I have done medical, pharmacy, dental, veterinary school. That stuff is really fun for me. And I also founded something called the Open Steno project back in 2010, which was an attempt to break machine stenography away from the, sort of, proprietary locks upon it that had concentrated it exclusively in the professional court reporting industry. Because I really love steno as a technology, and I feel like it has so much potential to be used for so many more things than just professional court reporting. So the Open Steno project was an attempt to make steno more usable by amateurs or, honestly, anyone who spends a lot of time in front of a computer on a daily basis by making the software free, the hardware less than a hundred dollarsish, and having a suite of fun free, gameified materials to learn this technology and ease the ergonomics of their text input. So this talk will first address the current state of live captioning by humans, versus live captioning by automated systems. And then I will go a little bit into speculation of how I think it might progress in the future, and what I think the industries might look like overall, and how they might intersect. So, if anyone here has seen the talk that I gave at Accessibility Camp New York City, some years back, I addressed autocaptioning but, at the time, it was really basically unusable. People referred to them as autocraptions, because YouTube was getting a 70 to 80 percent accuracy rate, which is three errors per sentence, more or less, and people were having all sorts of fun making caption fail videos, and it was terrible. I mean, it was just legitimately terrible back in 201011ish. And it has absolutely legitimately gotten better. That must be acknowledged and, throughout this presentation, I will acknowledge my own bias as a human captioner, as a professional captioner who is paid to do what I do, loves my job, and wants to keep doing it. So I will make a case for why humans will always have a place in the captioning industry, but it is disingenuous not to acknowledge that, to a significant degree, auto captions have improved. The question is: First of all, as they are right now, what situations are they appropriate for? And then the much more difficulttoquantify question is: Where are they going in the future? I think it is still an open question. I think a lot of companies are banking on auto captions just getting better than humans, in all circumstances, and being cheaper and being easier to disseminate and being much easier to, you know, turn into a viable business stream. And they are just thinking that human captioning will no longer be necessary in, who knows, 5, 10, 15 years. I think that's unlikely and, again, perhaps this is my bias speaking. But I think that that might be a little bit over optimistic. And so the question is, really, what is the trajectory of autocaptioning? We have to wait and see. I will talk about some of the reasons why human captioning has certain advantages over autocaptioning that I think are going to be very difficult to supplant. First, let's address the indisputeable advantages that autocaptioning has over human captioning: Scaling and commodification is, of course, the big one. You know, just logistically, being able to press a button and instantly have an algorithm transcribe your speech for you. You don't have to coordinate with a human, arrange payment, vet and filter and figure out which captioners are good, versus who are mediocre, and it is something that can be done 24 hours a day, you know, at basically any point in time. And, for as long as you want, you know, humans get tired, they have to switch out, they have to rest, they have to eat, robots don't need to. They won't make little random finger slips, they are not going to make what human captioners call misstrokes, you know? Their output is going to be very consistent, no matter what. And now, as I will address a little later, the input is really what we're talking about. But let's leave that for a second. And, just the fact that you can have it accessible on your smartphone, on your laptop, you know, and anywhere that you are connected to the internet is a huge, huge undeniable advantage and, also, a very powerful business model for companies that want to make money at realtime captioning. These are more the, sort of, mythical, or dubious advantages that have a lot of power in the narrative around autocaptioning, but I think it is important to confront the fact that they might not be entirely realistic. So one thing that is pretty famous now is what's called the 90 percent equals 100 percent fallacy, which means that humans are really bad at understanding probabilities that are not either 1 percent, 50 percent, or 100 percent. And when you tell someone, the accuracy of the captioning is 90 percent, it is difficult for the human not to round it up to a hundred, that is practically perfect, where can we go from there? Language is granular, and 90 percent accuracy means that you have one wrong word per sentence, and that might not be a big deal, depending on the word. If it is the most important word in the sentence, it can be a very big deal, you know. And when you magnify that out, so you've got 10 wrong words per paragraph, it can suddenly escalate to a point where the captions might be either useless or worse than useless, and that is at 90 percent which, some autocaptioning algorithms under some circumstances can get substantially higher than that, some lower than that. So 90 percent accuracy is one wrong word per sentence. 99 percent accuracy is one wrong word per paragraph. 99.99 percent accuracy, which is what I try to hold myself to. If I'm less than this, I feel like I'm having a bad day, is a wrong word or omission per page. So even though these sound like similar numbers that you can group together, they are not at all. They are quite different when it comes to the actual accuracy of what you are trying to understand. There's also what I call the Star Trek effect, which is that people have been talking to computers in an easy conversational manner for the last 50 years, and they are always perfectly understood, you know, the computer understands their commands, and their conversation, and whatever they want them to do, and they just carry it out as a matter of course. And we are sort of replicating that in real life, people are talking to their phones more, dictating text messages, instructing their smart speakers to play music. But people are starting to realize that even a slight margin of error in these kind of conversations can be intensely frustrating, and sometimes to the point where certain people might feel like it is not worth it. So I don't think that voice as an input method is quite the slam dunk that it necessarily was thought to be five years ago. And then this is the one that stuns me, because I have been doing steno now for, what, 12, 13, 14 years. And people still come up to me at conferences saying, "I thought that had to be an AI , because that is so fast and no human can possibly type that fast! And then I saw you were making corrections and fixing errors, and I was like, hold on! Is that a human! ? How is this possible! ?" And the thing that I think stenographers have not yet been successful in carrying across is how powerful a technology steno actually is. It is about six times more efficient, more ergonomic, faster, than QWERTY typing. It was invented in 1911, and it is still the fastest method of text entry that has ever been invented. I can basically write at 200 words a minute with a really high degree of accuracy, I would say at that speed, probably over 99.9 percent, for most subject matter, I can do that 8 hours a day without a break, and my hands don't even feel it. That is how powerful steno is. So it is really not actually the input method that is the bottleneck for stenographers, like with AI , it is more a matter of hearing things correctly, being able to parse them and comprehend them, and then being able to filter them through our steno code system to get the correct output. It is not the fingers that are working hard when a captioner is working, it is the brain. So now I'm going to talk a little bit about the advantages that wellqualified, welltrained human steno captioners have over AI captioning. And the main thing, the really crucial, crucial thing, probably the most important part of this presentation is that transcription is always better when you understand what people are saying which, so far, computers just don't! They are matching very, very complicated and vast databases of probabilistic and fuzzy logicically interconnected words and phrases and sound bytes and Markov chains and things that I have a vague comprehension of, but they don't really understand what people are saying in the sense that they can't guess what someone is going to say next, they can't use what someone has said in the last paragraph to inform what they are going to say in the next paragraph, they can't fix errors, you know  when a computer has made an error, it has no idea that it made an error. It can't, well, this is what I called the sniff test. It cannot run every single word that it is out putting through a filter of, does this make sense? Is it likely that this person said this? Does it fit in what they are saying, is it contribute to the overall message? And, because a human can do that, in a really granular fashion, with every single word that they choose to output, that is a tremendous advantage. I think that people, even when I explain my steno graphic system to them, assume that there is some sort of fuzzy logic or auto correct or autocomplete or something that the computer is doing that I'm not doing, but that is not true at all. Steno is 100 percent deterministic, the input that I gave it will produce the same output every time, with absolutely no variation. So it is really a matter of me understanding what the speaker is saying, and then deciding what that is supposed to look like in text. And that is something that computers can't really do, they can match the sound to the letters, but they can't match the meaning to the meaning. There's also, you know, secondary considerations, like punctuation, which might not sound like a big deal, but it can be surprisingly complicated and surprisingly important to ease of reading comprehension. When people make up words, when people use unfamiliar terminology, when Kelly with a Y is sitting over there, and Kelli with an I is sitting over there, that is really difficult for a computer to get a handle on. In some circumstances, that is not a big deal. In others, it is. This is a little bit of a pun, and I Googled a bit if I found this phrase, and I didn't find anything. I personally don't understand the difference between machine learning and deep learning. But something that humans can do that computers can't, as I've seen in reviewing the output of a lot of different autocaptioninging algorithms, is what I call steep learning, an inductive loop from noncomprehension to comprehension, the farther you go along. When you start off, and someone is speaking to a subject matter perhaps that you don't have a lot of general grounding in, you can take it in and take it in and take it in, and suddenly it clicks, and the gestalt of what you are saying increases for you, and it vastly increases the accuracy of your output because you understand the story they are telling, even theyou are not wellversed in the material itself. That is something that is very difficult to teach a computer to do, these algorithms rely on vast, vast inputs of training. And it takes quite a long time for them to churn through that and to come to increasingly more and more accurate algorithms. And, you know, I think the argument is always, well, they are getting better and better at the core, and soon all that will be left are the edge cases. But the thing is that language has a lot of edge cases. I don't know if you have tried to see if a document you are reading is online, and you pick a two or three word phrase, that seems relatively unusual, type that into Google and that's the only hit for that document. The recombination of individual words in this language can be so multivarious and so unique in each particular circumstance, that it is very difficult to map that probabilisticily, and it is difficult for an algorithm to get a better sense of what it is captioning over the course of a single, halfhour session. But a human can, and that is pretty important. And now, of course, this could be an entire other lecture, and I think I probably will give a lecture on this at some point: Finding a captioner who is qualified, what you need to look for, what qualifications they need to have. There is, for sure, great variability in quality between captioners, partly because many captioners are trained as court reporters, and their whole train is to get it down phonetic approximations of what is being said in the moment and then go back and edit and fix and clean up the transcript for later publication, and realtime captioners don't do that. Our product is the realtime feed, and any transcript is just a byproduct. So you need to make sure that whoever is captioning is trained as an actual realtime captioner and not as a court reporter. But that's a side note. The fact is, if you have a really good, wellqualified captioner with a good general knowledge about the variety of subjects, they are going to be able to produce highquality captions in a very wid variety of circumstances. Whereas algorithms, you know, you can pick one, you can say, okay, this has the best overall rating compared to the other handful of algorithms on the market. But, when you start tweaking the circumstances, when you dip the audio quality down a little bit, whether you increase the amount of unfamiliar jargon that comes maybe 5th or 6th in its list of possible matches, rather than first or second, when you introduce an accent, suddenly that accuracy can dip in a really unpredictable way, and so you are never really going to know whether the algorithm is going to be performing at the top of its game and any particular conference, or even speaker within a conference. And that lack of predictability can cause real problems. The fundamental issue is that sound is an imperfect method of communication. Humans misspeak, they don't articulate, there can be background noise, or music, people pronounce words in a variety of ways, depending on where they are from. There are many homonyms used in English. There are so many different ways that language is a difficult thing to nail down in a single, binary output. And humans are so good at doing this, at listening to this lossy, fuzzy, analog data stream and turning it into comprehensible ideas that they don't understand the sub conscious processing that the brain is doing, and they tend to think that the signal is a lot clearer than it is, and the noise is a lot less significant than it is. But if you talk to someone who has hearing loss, or if you are a person with hearing loss, you will know that there is a tremendous amount of cognitive fatigue involved in trying to take imperfect audio and synthesize it with your human semantic understanding of what the speaker is saying, and come out with a clear and complete idea stream from that lossy audiostream. And it is something that captioners know, because they do a lot of thinking on the job, and people with hearing loss know, because they do a lot of thinking throughout their lives. But people who haven't had to deal with either of those circumstances tend to just let it slide under the radar. So these next two slides are going to be just four or five, I guess, simple questions that you can ask yourself if you are attempted to use autocaptioning rather than human captioning for any given event, especially, I think, if you are talking about open captions at a public event. Can you assure yourself, and your audience, that all of your speakers will have standard American accents? Male speakers are preferred, it turns out. A lot of the algorithms are trained to the baritone male voice and have some problems with female speakers, there is work on getting nonAmerican accents into the corpus, but that is lagging behind American transcription. And do you have someone that is able to give the algorithm crystal clear audio? Can you assure yourself that all of the speakers have good mic skills, they will articulate, there will not be background noise or music, and that the audiostream and the connectivity, the internet connectivity, will be top notch? Those are the really two big ones that, if the answer is no to either of those, you absolutely should not use autocaptioning. And then there are the questions of, what are the stakes of failure? How important is it that the people in your audience, or the person you are captioning for in a job interview, or a person sitting in a college lecture, will get full and accurate text? If they miss one word because the algorithm, you know, is 99 percent correct, but that one word in the paragraph is wrong, if that's the most important word in the paragraph, does that mean the student fails the test, does that mean they don't get their job? Sometimes yes, sometimes no, this is a question that you have to answer. What if you are getting someone, you know, you are getting really good speakers and you are getting really highquality autocaptioning, and then suddenly someone comes in with a slightly unorthodox accent, and the accuracy goes from 99 down to 80 percent? What do you do, do you toss the whole speech out, and then the person got 8 out of 9 talks out of the event, so that one talk that is incomprehensible and confusing doesn't really matter? And is lag a factor, is the person using the captions using them for interaction? Are they able to answer back to the speaker, are they expected to answer questions, are they expected to take notes, and then maybe, you know, complete a pop quiz at the end of the hour? It is really important to consider that realtime interactivity is something that a human captioner sometimes has a real advantage of over autocaptioning. I wanted to make my slides really clean and pretty, but I couldn't figure a way to make it less busy, or weird looking than it was. But I thought it was important to put it up here, all the same. This is from the Accessibility Camp NYC that we did some years ago that I spoke at. This was a great guy named Disabled Foodie. I loved his presentation  David Freidman. He was talking about wheelchair accessibility at New York City restaurants, a lovely guy, he has cerebral palsy and has a good standard American accent with good articulation and a good, baritone voice, for what it is worth. And the video, it is a little small. I'm not going to play the video. I actually used two different versions of autocaptioning, one that I used the YouTube algorithm for 3 or 4 months ago, and one for today, and it makes different errors, but the same amount of errors. And so, you can just see at a glance that there are real problems here, 81 percent accuracy. And now, it was not 81 percent accuracy throughout his talk. I did, admittedly, cherrypick this particular paragraph because, for one thing, I thought that the hilarity of replacing special education with sexual education pointed out that auto captions cannot just be sometimes confusing, but outright embarrassing, and an out right liability issue. If they are representing your organization  and human captioner, who got I'm a special education teacher correct, as well as the autocaptioning earlier, would not know he was saying sexual education later in the paragraph. That's where semantic understanding comes in. A human captioner would also probably have some knowledge about the ADA and the IDEA, if they have worked in accessibility before. And hopefully, if they're a New Yorker, they will also know what Jewish kvetching was, and not transcribe it as huge fetching, or some other version, I don't remember, but it did not translate correctly. So I have about 10 or 15 minutes left. And this is where I'm going to shift into speculation about how the industry might look in the future. I think it is pretty undeniable that the more autocaptioning becomes ubiquitous on smartphones, public places, and tablets, the more people start using it, the more they will want to use it. I think that is happening with offline captioning, I know a lot of people in my generation with no hearing loss who have the captions on all the time, when they are watching TV. Because it is useful for multimodal input, you know, being able to read something and then listen to it at the same time can be more enjoyable, as well as more comprehensible, and less fatiguing, even if you have no hearing loss whatsoever. But also, you know, people are going to find out how useful even bad captioning is, and I have a feeling that will create a demand for good captioning. Plus, and this is pretty significant, the babyboomers are coming into their senior years and, especially since many of them have, you know, gone to rock concerts in their youth, and as many of them are staying in the workforce longer, and are wanting to participate in public life longer than, perhaps, the previous generation did and, as there are just so dang many of them, I really think that, as agerelated hearing loss has more and more of an effect on this particular generation, which also, I think, are more comfortable with technology in general, and with computers and smartphones in particular than the previous generation, I think that that is going to be a push from them to have more, sort of, ubiquitous, alwayson not on demand, but proactive captioning for public events and public spaces. But, and this is the question: What about the economic issue? One consideration is that there are just FAR too few steno captioners to provide good quality autocaptioning to everyone who needs it. I mean, there are around (4) 5000000 people with some degree of hearing loss now, and that will go up by at least 10 million in the next 10 years. There are 400 certified realtime captioners in this country, and far fewer in the rest of the world. The disparity is just staggering, even if I was a robot and I could work 24 hours a day, there is no way that I would be able to caption for everyone who wanted good captioning. And I'm also expensive, I'm going to be more expensive than an algorithm, no matter what. I need to feed my family. And there is not necessarily the money available to pay a human captioner for absolutely every circumstance and for every human being who might want captioning. There is also the possibility that the current trend of more places offering open captioning, you know, to anyone who might be able to enjoy it and appreciate it, but not necessarily selfidentify as having hearing loss, or necessarily knowing their rights neighborhood the ADA, that they have the ability to request captioning, I see a trend of companies and events closing the captioning as much as possible, saying, sure, you can have it, but only on your smartphone. We are not going to put it on the big screen. Sure, you can have it, but only on request, only on demand. We're not going to offer it proactively. And I can absolutely understand the reasons for them doing that, but I think it is very detrimental, not just for me as a professional captioner who wants my product out there and available for anyone who can use it, but also in terms of spreading the word about how important and useful this technology is for a large number of people, and not just people who require communication access in all circumstances, you know. People with profound to severe hearing loss, who use sign language to communicate with their nearest and dearest and interpreters to communicate with those that don't sign, they know their rights under the ADA and ask for accommodations when they need them. People who have a hearing loss that is in the mild to moderate range, especially when it is acquired later in life, after they learned their primary language, they might need it for a conversation between friends in a quiet cafe, but they might absolutely need it for a public event in a darkenned auditorium when they are 20 feet away from the speaker and cannot read their lips. So I think that open captioning is really going to be important to provide that wide spread, broadband access to as many people as possible. 1/50 people have hearing loss, when you talk about people over 65, it is 1/7. If companies don't want the autocaptioning algorithms saying sexual education instead of special education and making them the laughing stock of the internet, they might have the tendency to try to close that captioning down and not put it on public display. So that is something that I think that we need to be aware of, that trend is something that many accessibility advocates and professionals will want to push back against. There is also an even more dire and scary possibility, which is that legislatively, the ADA could just be cut out from under us. I'm not going to talk about it too much, but it's nothing we should take for granted. We always need to stand with disabled and notdisabled, deaf and hard of hearing alike. We all need to be aware of what is going on politically in this country, and we need to be very loud in our advocacy for ourselves and our fellow people. There is also something, you know, that might be good for me, but not necessarily so good for deaf and hard of hearing people. I mean, who knows. Perhaps, perhaps if autocaptioning is good enough, this sort of stratification might make sense; where it is something a little bit of the error demonstration that we saw some months ago, where blind and lowvision people can have, sort of, automatic image recognition on their glasses to help them read medicine labels and street signs and whatever. But if they are in a space where they need navigation tips, or they need to, sort of, really appreciate something visual about an event that they are experiencing, they have a certain number of hours included in the service that includes a live, human guide describing the scene to them. And I can kind of see that twolevel system happening with a auto caption service, too, that for someone, for some circumstances, they are fine with auto captions, but if the stakes are higher, or if auto tanks, they can go to a human captioner that can increase the accuracy. I wouldn't mind working in that circumstance, but I do worry that, if the model becomes  you get human captions if you pay for them, but you get auto captions if you can't, that is a real problem, where the haves get good captions and the havenotes don't. So that can be a good or a bad thing, depending on how it shakes out in practice. Or, another thing that I have seen, again, sort of working its way through the current trends of the industry, companies are saying, you don't really need live captioning. I mean, just record the audio and then send out little 30second snippets to four million people, and then they will either correct an auto transcript, or transcribe it themselves from scratch, and then you stitch it together, and then voila, you have a beautiful, lowcost transcript they are paid 5 cents, submit it, and whatever, and you have this transcript for cheap in an hour, or two hours. But that is not realtime communication access, and there are also quality control issues with that modality as well. Not an ideal solution, but it is scalable and a really tasty business model for a certain type of company. So it is something that needs to be considered in how everything is going to fit together going forward. Or, some companies might, you know, leap feetfirst into the whole algorithm biz and, you know, enjoy it for a while, and then the first big crucial moment comes, the autocaptioning fails, and they realize what it is like to be working without a net, and they go back to human captioning. This could happen, I think it is quite likely, honestly. And I also see the possibility of human and autocaptioning working to some degree together. There are certain unfamiliar terms that I might be, like, I wish I could stop time for two seconds so I can Google that. I know I would be able to put it in my steno dictionary if I could. If I could have a caption stream just sort of running as a byline on my computer monitor, and be able to see what it is able to access in its vast database of word and terms, and just use it for reference in producing my proactive, wordbyword transcript, I could see it becoming really useful. I could see a hybrid system where a steno captioner is standing by and reviewing the auto captions and once they dip below a certain level of accuracy, the steno captioner can jump in and save the day. That's a little bit riskier, I think, for a number of reasons. But it is possible that there can be a good fusion of the two that can be to the benefit of everyone and especially, like I said, to deaf and hard of hearing people who are not going to have full coverage if they only rely on human captioners for everything, unless we get a lot more human captioners, which I'm working on, but it is a work in progress. Basically, I think the fundamental issue is that trying to repair a broken thing after the fact is always trickier than being able to get it right the first time, and then maybe, you know, clean it up a little. I can see, if my transcript is almost perfect, but I dropped four words in there, the autocaptioning algorithm can scan through it, detect four missing words, I can quickly jump there, spot check, fill in the words, and have a 100 percent accurate transcript within 5 minutes of the event. I think economically and, in terms of usability, that's a powerful way of doing things, rather than the current paradigm, where you have the autocaptioning filling in mistakes and messyness, and throwing a swarm of low paid humans to fix it. It is less efficient, and even though it might be cheaper in some ways you figure it, it is not as workable in the long run. And that's basically what I've got to say. I mean, I have a TON of thoughts on the subject, and I would be very happy to talk to anyone about this. So basically, you can start me going, and I will never shut up. This is my attempt to boil down the most important and most salient points of how humans just work differently than autocaptioning algorithms, and what advantages we're able to offer that, so far, who knows, maybe the singularity is coming tomorrow. Who knows. But so far, we do things that autocaptioning can't and, um, that's my argument for why we're still relevant and useful in this day and age, and hopefully in the future going forward. Thank you so much. [ Applause ]. >>Lindsay: I can hear you! >> Thank you very much, this was awesome, as someone who relies on captions a lot of the time, it is cool to see behind the scenes. Something that you brought up about American accents got me wondering about how to make it easier for human captioners that have to deal with content in more than one language, or where there are switchovers. So, for example, I go to a lot of Jewish events where the speakers will just say things in untranslated Hebrew, or the translation in English is sticky. I wondered how you deal with that, and how you see technology working with that, or not working with that. >> Yeah, that's a fascinating question. And, um, I have to confess, my parents speak six languages between them, and I only speak language and it is the greatest regret of my life that they didn't teach me other languages. There are a few multilingual captioners, and they can switch onthefly. The open source software that I developed can support language and dictionary switching. So this is technologically something that is possible to have, but you need someone that has fluency in multipling language, and sten graphability, and not all languages have as developed a theory as others. There's always the question of automatic translation which, of course, has many problems and will continue to have many problems going forward. You can make the argument that if you are going to use one or the other, you might want to get the, you know, the English stream perfectly accurate, done by a human, and then let the algorithm translate it into another language. If you are talking about the same person translating two languages with imperfect knowledge of one or the other, that's a hard question. And it is not one that we have come up with a good solution for. >> Thanks. >> We had a couple questions in the live stream. So I will do this and keep going around the room here. So a question for, let's see, Travis Hopkins, how often do you add new words to your dictionaries, and how often do you run into words that don't work in steno, if ever? >> Okay. So, like I said, I have been doing this for about 12 years now professionally. And, in the beginning I would add hundreds a day, honestly. These days, depending on what doing, in terms of words that just aren't in there at all, it is probably fewer than five a day. I mean, sometimes even fewer than that. Often, I find myself adding stock phrases, because I don't necessarily want to type area under the curve in four strokes, when I can do it in one. So I have found myself making my steno dictionary more efficient in terms of using more phrases, but I don't tend to add a lot of extra definitions these days, in case I'm working in a subject that I have never captioned before. And then the question was whether I encounter stuff that just is not possible at all to render into steno, well, one thing that not everybody knows about steno it is not a strictly phonetic system, it is a phoneticpneumonic system. Some entries and definitions are entirely phonetic, a fair number, and a bunch of them are idiosyncratic code words that the individual stenographer makes up. A good portion of learning how to do steno well and efficiently is learning your own little inner language, your mnemonic hooks, and the ability to take a word and push it into a memorable piece of gobbledy gook that you will remember the next time it comes up. How did I write that? Oh, like this. So a lot of it is knowing your own brain, and that is way every stenographer has some overlap with other dictionaries, but the farther you get into it, it is just a reflection of your own brain's inner mnemonic structure, so there is nothing you can't turn into steno, because you are making up your own language. >> Awesome, and the second question from the stream, from Adam Goodkind. Hey, Adam, good to hear from you again. >> Hi, I love you! >> Live reaction  as a regular CART user, the most inconvenient part is having to look at a screen rather than having eye contact with the speaker. Is there any progress made on that front, like Google glass? >> I wish! Yeah, I mean, that is something that I like to address in the whole, like, demand for captioning sky rockets slide, smartphones are ubiquitous, but it is annoying to take them out of your pocket. Once everyone has a headup display, like glasses, an they are not dorky  I bought the Google glasses, but it was not good for captioning and it made me look like a tool. [ Laughter ]. But it is my dream that everyone will have screens in front of their eyes, it won't be a big deal, and that's when captioning will be everywhere. >> Didn't want to look like a glass hull. >> Yeah. >> I didn't want to use the term. >> JOLY: So, um, I probably use AI captions more than anybody else here. >> Yeah. >> JOLY: And I use it quite a lot, I use web captioner, which uses the Google translate API, and people are very impressed by it. One of the things that I like about it, when I'm doing it on the streams, is that it is very realtime, you know, and it just pulls the words as it comes out of their mouth. So even the best human captioner has a little bit of delay, so that is an advantage. And I can replace words, I can set it for certain accents, and I find the accents are the main inhibitor, and I have learned the hard way, the same lessons that you have put in there, you know, that good audio, and people speaking standard accents, is really what is most important. But generally, the choice is between, you know, there is no budget for the human captioner, so the choice is between no captions and AI . >>MIRABAI: It is a hard choice, and I have spoken to several deaf friends and colleagues about how they feel, and some of them are like, yeah, I do want to be able to use auto captions, I don't want to have to haggle with the stenographer every time I need access, and not primarily for buying a coffee at the coffee shop situation, which is what you can use captions for, but for conversations, for meetings, for phone calls. And I think the issue is that people who don't rely on captions will work in every captions, and paying is a fool's game, that is what my deaf friends are concerned about. If that's the standard procedure, they are not able to convince their employers to pay for captions for important business meetings, or university classes. It is a fine line to walk, and it is not something that I necessarily should be the largest voice in, because I'm not a caption leader, I'm speaking to some degree out of self interest, I want to do my job and I want to be compensated for it. But I  I do worry about people deciding that this level of captio is good enough, especially it is a novelty, for sure, people can press a button and, oh my god, it is getting so much right. And if you are not relying on them, what it is missing is just funny or amusing. I think there's lot of ways that the intersection of autocaptioning and human captioning can go, going forward, and there's a pretty good middle ground where there are reasons for each, but it is interesting to see if one will supplant the other. >> JOLY: What I see is a function for it, and the way I do it is, is that people are multitasking, and they have it on the screen, but they are not listening, and they can kind of follow and see if there is something going on that they want to, like, put the sound up. And, especially if I'm running stuff on (indiscernible) a lot, and the captions come up, and I find a really  that is where AI real works well. People want to have a general clue of what is going on. >>MIRABAI: I agree, and I use it myself for that purpose, if I want the gist of the video, know that is probably an error, but I can get what they are saying. I will listen to the audio later. And it is useful for a search, it is useful to have an auto caption corpus for search. So it is definitely going to become more prominent, no matter what, and it makes advances that I could not make, and I have to give credit to companies that are presenting these algorithms for that. But we have yet to see what level of accuracy and circumstances it is able to attain in the future. >> And in terms of the hybrid thing, you know, I agree with you, that it is  fixing something that is broken, it is easier to do i over and to do it properly. But it seems that it is easy to spot those errors, when they are happening, and if they are built into their system, the fact that you can touch screen anything that was wrong, and crowd source it, even if you have an audience of 20 people, it would be pretty easy to, like, then produce something fairly accurate, that transcript out of something. >>MIRABAI: That is true to a certain threshold, but when the errors multiply, they are all over the place, you have to scramble then if you don't have a backup system. >> JOLY: My thing is, you can see something wrong, and you can fix it, so it is fixed next time. But it is difficult, for something that is common, you have this in your dictionary, too. So if it is something that you know, if you put in this correction, it is going to screw up, and that word comes up the next time. So it is a very difficult thing to do. Okay. I think that's all I've got for now. >>MIRABAI: That is true to a certain threshold, but when the errors multiply, they are all over the place, you have to I actually really appreciate your input, because I personally have not used a lot of autocaptioning in live situations, I have seen some afterthefact autocaptioning. >> JOLY: And if you go to my live stream channel, livestream.com/ internetsociety, you will see a bunch of events with  where I hav used it and, you know, good and bad, haha. >>MIRABAI: That is definitely useful to have. >> SHAUN: So, this is Shawn, for those that don't know me, full disclosure, I work at Google, and the team that I was on for a number of years made voice typing in docs, as well as more recently auto captions in slides. That said, ha, having you do the captioning for our meetup and the other folks at White Coat Captioning has turned me into an absolute captioning snob. And so the question that I'm eventually leading to, I pay a great deal of attention to captions when I go to events now. And, by and large, most of them are kind of not great. And some I can see why, some are  the systems are where it is usi predictive typing, so it is people typing on QWERTY keyboards and it is using corrections, and they are selecting that, which leads to massive amounts of omissions and errors, but I'm also seeing people who are using steno, but they are just not very good at it. >>MIRABAI: Yeah. That is a VERY important issue to acknowledge for sure. >> SHAUN: And that's one of the biggest challenges that I see in making the argument for live captioning, with a human being, is that most people are exposed to lousy captioners. So when the error rate of the average human that they see is worse than the auto captions, it is really hard to make that argument. So I wanted to get your take on that, and kind of your vision on how we can fix that. >>MIRABAI: I cannot possibly agree with you more. My stock phrase with respect to this are my best competitors are my best friends, and my worst competitors are my worst enemies. People that try to do this job and do it poorly, like, poison the field for ALL the rest of us. And people who, you know, do it well  like there is plenty of work to go around. I'm not worried about people poaching my work, I want to increase the prominence of captioning and the standards that people hold it to, so the top quality captioners can get as much as they want, and the people who read captions get good captions. I absolutely agree with you. As I said, there are only 400 of us with a certification, and the certification is 96 percent accuracy, which is not good, at 180 words per minute, which is QUITE slow. So there needs to be higher standards, and there needs to be more good captioners, and I'm trying to make them with the Open Steno project. And, by the way, stenoknight is my main Twitter account, Plover is my Open Stenospecific Twitter account. The Open Steno project is trying to get more people to use steno for whatever they want, in a hopes that a few of them will catch fire, love it, and rise to the top and become highly qualified professionals, but it is a slow process and we need more exposure. So please get the word out. >> SHAUN: And props to the project, I downloaded it, and installed it on my laptop. It was free, easy, I got a 3D printed case for a steno machine, and I soldered it together, so you should try it, it is really fun. Other questions? >> JOLY: This is something that I wondered about. What happens when you get an itch? >>MIRABAI: Well, oddly enough, about a dozen or so of the most common words in the English language can be done with the left hand, so you wait for that hand to come up, you scratch with the other hand, and you are solid. >> JOLY: Seriously, though, if this becomes common, do you think it might affect how people will learn to speak more slowly for recognizing that there are stenographers? I have seen this, I have done global conferences, where they have translators, and people learn to speak slowly and firmly so that translateers and so on  you think that might happen? >>MIRABAI: It is possible, and it would be great. To the mic skills question, it is hard to teach every individual that you need to hear from. In the aggregate, I hope so. Some people just do not care, or cannot, I was fairly successful in modilating my rate of speech but it took effort to do so. >>Lindsay: Yep you did great! >>MIRABAI: Sorry, Lindsay, thanks. >> And being aware and making sense, the idea that a human can learn in realtime in the domain of what your particular presentation is about, I guess I'm curious for etiquette and efficiency, as far as for someone like myself, to be aware that it is a human captioner and communicate if you saw a mistake, just points and practices, getting that out quickly  >>MIRABAI: That happens never happens, but it is welcome if you see a word that is mistranslating, getting the spelling of that word is fantastic. I cannot think of many people that do that, but please more of that. >> I keep half an eye on it, because my name is Shawn, it is spelled in 5 ways that I know. So I keep an eye on things, like this, and say that mine is the way that is Shawn rather than Sean. >>MIRABAI: That is very helpful. >> To add to that, something that is done in Jewish communities when you request CART or an ASL interpreter is to provide a list of terms that might happen. I don't know, we have been talking about doing CART in a Facebook group that I'm in. But when we have ASL interpreters at Jewish events, they have given them translations, or hovered in the background and finger spelled the Hebrew terms in transliteration for the person using them, which is not the most efficient way to do it, but that is something that  I don't know you would like to see more of. >>MIRABAI: Yes, absolutely. When we can get stuff in advance, whether it is PowerPoints, or someone coming up to us. I did the Polyglot conference some years ago, and a guy came up to me and said, hey, welcome to the Polyglot conference, I'm going to speak to you about Yiddish in New York, in esperanzo, and I put it in my dictionary, and I had it, and I was so happy that it came up. And I still have that in my dictionary. He is fabulous. So, yeah, anything that you can give to us, very helpful, for sure. >> And I think this is probably related to that, but Travis Hopkins had a second question on tips for getting good audio for machine translation, and if you hire realtime steno captioners, the best way to accommodate. >>MIRABAI: I'm not an audio tech, so I cannot speak to the details of that, but as clear and clean as possible, please. You might know more than I would. >> Yeah  >> (Offmic comments). >>MIRABAI: Cardio (indiscernible)? I don't know what that means. Not my field of expertise. >> The mic that you are speaking into is a cartoid mic, it picks up in front of it, for a limited area. So if you are holding a mic like this, it will just pick you up and nothing else. When you talk about mic technique, you see people stand like this, they hold the mic like this, and when you have speakers in the ceiling, that is like  it is the bane of my life. I hate them. [ Laughter ]. >> And if you are in that situation, a human captioner will usually do a better job than an auto captioner, not that we like it. >> We have people that are speech impaired that you have done, and blown my mind, you know. >>MIRABAI: Thank you. >> Oops, here we go. >> Hi, this is fascinating. I learned so much, and I'm just wondering, has there been a study done coming from a user's point of view about how they feel about the autocaptioning and human captioning, either from the audience, or some kind of larger data? >>MIRABAI: Nothing former that I'm aware of, but I would love to read any research that somebody has on that subject. In preparing for this talk, I'm asking my half dozen deaf acquaintances, and I would love to use autocaptioning when practical, but not when that means that I will never have human captioning paid for employers, or organized events that I attend, which seems entirely reasonable to me. >> All right, one more, and then you are cut off. [ Laughter ]. >> I would like to give a plug to the dynamic coalition on accessibility and disability, as part of the ICU, as part of the UN. And I posted something to them, which was captioned, and I got grief back, because the default on web captioner, is capital letters, and they pointed me to stock which says that lower case is easier, and there is actually a dyslexic font in web captioning that I switched over to. Do you have any comments about why, I mean, the default on television is all capital letters, why don't they change, do you think? >>MIRABAI: With respect to the dyslexic font, the advantages of that have not been proven by research  >> I can relate that. >>MIRABAI: Some people say that it is less advantageous than it is purported to be, but not my field of expertise. In terms of all caps versus mixed case, I recommend mixed case in all circumstance. All caps is very tiring to the eye and less comprehensible. >> I like them because they show up better, when they are small. But I have gone to the bigger and then switched to lower case. >>MIRABAI: I'm sure that there are different preferences among different caption users but, by and large, I think that people prefer mixed case. >> All right. Any last questions? All right. >> Just in response to the fonts thing, the research is largely just, like you use san serif fonts, but the dyslexic fonts have not been proven. This font is fine, is it verdunda? >> This is fine, I usually use  (indiscernible)  but it depends. >> I think we have an announcement. >> Okay, great. Here comes the mic. >> Hi, everybody. There is an event coming up this Friday that you all might be interested in: Making data accessibility to people with disabilities, this Friday, March 8, 1230 to 2:00PM, if you are in New York, it is going to be at the Andrew high school and talking book library on west 40th street. The blurb about it, there are one million people with disabilities in New York City join Sarah and (indiscernible) from the mayor's office on as they share ways to make open data accessibility to the disability community. We will learn on making digital content accessible, such as the hurricane evacuation zone finder. So you can look into that, and I will post that on the meetup page, too, for a link to register for that to rsvp. >> And if I can make one more comment about something that you can do to help captioners, when you read something, you increase your speed by 30 to 40 percent, versus when you are speaking off the cuff. So it is very difficult to caption material that is being read aloud, as opposed to spoken conversationally. So it is very useful for captioners to have that material in front of them, if possible, for people reading material to consciously slow down their rate of speech. >> Thank you. And, on that note, thank you for that awesome talk! >>MIRABAI: Thank you! [ Applause ]. . Live captioning by Lindsay @stoker_lindsay at White Coat Captioning @whitecoatcapxg. >> So thank you, everybody, for joining. Thank you, again, To Thoughtbot for hosting us and Level Access and Adobe and internet society for streaming and recording and awesome tips on microphone holding and, White Coat Captioning for captioning. All right, thank you so much. We will have our next meetup next  the first Tuesday of the month, which is April... something. The first Tuesday in April, April 2nd, thank you. We will see you then, thanks! >>Lindsay: Thanks! =). Live captioning by Lindsay @stoker_lindsay at White Coat Captioning @whitecoatcapxg