Leveraging the Power of AI Transcription for Live Captioning

“As more of our interactions are now virtual and take place on a video platform, transcription is ever so much more important to ensure that comprehension and accessibility are available to all.” —Marie Abesamis, Director, Strategic Partnerships, Otter

I recently transitioned a large international conference with more than 100 education sessions to a virtual experience on a limited budget and tight timeline (eight weeks). Finding a platform robust enough within budget was a challenge, but an even bigger challenge was making the educational programming accessible by all attendees within that limited budget. In other words, I needed a low-cost, high-accuracy captioning solution right away.

Having worked in the virtual event/learning space for more than 20 years, I have had my fair share of successes and failures with captioning tools for both live and recorded learning experiences. I have budgeted ample sums of money for live captioning for online events, and worked with some stellar companies that provided highly skilled people who could type really fast.

Live transcription is an art. With the advent of voice-to-text, that art form has become much less expensive to procure, with surprisingly accurate results.

The Odyssey Begins

Lucky for me, or so I thought, one of the event speakers worked on the IBM Watson team. The Watson team must have something I could play with, I thought. There is some fantastic work going on with text-to-speech, but no one could offer me the nifty (and cheap) solution I sought.

I reached out to my network to see what other folks were experimenting with, and a colleague introduced me to the folks at Thisten. Hurrah! Or so I thought. Thisten looked to be exactly what I was looking for. After explaining what I was looking to do (have live transcription of more than 100 sessions, with five happening at the same time over three days), I was told they were holding off on transcribing large virtual events. They recommended I look at a product called Otter.

If you have been reading my articles, you may remember my friend, EVA, the voice assistant. I was sad when she was acquired by a large corporation, but my tears have dried and I rushed with open arms to my new Otter friend.

Otter to the Rescue

To start, Otter gave me a free account. I always like a free account. It took me less than one minute to figure out how to use it. The simplicity is lovely. I open my account in Otter.ai, click on “begin recording,” and watch as Otter transcribes what I’m saying as I’m saying it. Word for word. When I stop recording in Otter, the transcription is immediately available for playback and synced up with the audio recording.

While doing all this, I noticed a little “share” button in the lower left corner. Yes! Exactly what I was looking for. Otter lets me generate a link to share out, or designate individuals (via e-mail) to join me on the recording page.

And there it was! Live transcription the session producers could share out to conference attendees. With a minimal budget, I was able to secure 6,000 minutes of transcription over five accounts in Otter Teams, and provide live “captioning” for the entire event. On top of that, Otter provided a human touch as a service to go into the keynote talks and clean up the transcription.

Otter Teams and Live Notes have great collaborative features that now are being extended into Otter for Education. The features that resonate most with me are those that enable collaboration on a living document, and leverage individual and team vocabulary.

With Otter Teams (and Education), you can highlight and annotate the transcription during and after recording. You can search, edit, highlight, and comment. And you can insert images! Imagine being able to go back to your meeting notes and easily integrate a sketch you created or include relevant images that support ideas or actions.

You can upload an existing video/audio file for transcription in Otter, and then annotate, highlight, and add images. I uploaded a one-hour Webinar, and it took less than 10 minutes to transcribe. Once the transcription was done, I was able to go back in and add speaker names, which Otter identified, processed, and tagged all matching voiceprints—distinguishing between speakers with no errors. Building out a vocabulary means that acronyms, product names, and people’s names will be correct in the transcript.

And Otter lets you export all transcripts as text, docx, pdf, or srt files. That srt file from Otter is a fantastic way to provide highly accurate artificial intelligence-driven closed captioning.

Integrations are robust, as well. Otter integrates beautifully with Zoom meetings, and by the time this article is posted, Otter will integrate with Zoom Webinars. This lets you livestream directly from Zoom to Otter Live Notes.

There are analytics, calendar integrations, team collaborations, single-sign on, and two-factor authentication capabilities in the mix. The deal maker for me? Otter Live Notes can capture all conversation, even when participants are wearing headsets or earbuds. Nice!

They say otters are an essential keystone species—a species that is critical to how ecosystems function; the glue that holds an ecosystem together.

I forecast Otter will become a keystone application in my learning technology ecosystem!

Phylise Banner is the director of Education for the Society for Technical Communication (STC). A pioneer in learning experience design, she has more than 25 years of vision, action, and leadership experience in transformational learning and development approaches. She is an Adobe Education Leader, Certified Learning Environment Architect, STC Fellow, performance storyteller, avid angler, and private pilot.

what the tech?

