A.I. Transcriptionists invasion!
The title says it all; it is not a question of “if” but “when”. The current A.I. transcriptionists are still not as good as their human counterparts yet. You might see some out-of-context words here and there. Yet, there is no denying that the speed of improvement of such A.I.-based transcriptionists has been exponential. We, humans, have a natural tendency of underestimating the magnitude of exponential growth. That’s because our brains are tuned with thinking linear growth. If you are curious to learn about this natural bias, you might find this article really interesting.
Ray Kurtzweil defined “Singularity” as the point in time when A.I. will catch up with humans. Since the 1970s, Ray’s predictions have been unfolding with amazing accuracy. He had predicted that 2045 will be the point of singularity. On the other hand, Elon Musk has been warning about singularity becoming a reality much sooner than Ray’s prediction. Elon has estimated that Singularity might be here as early as the year 2025. That is in only 4 years from now and 20 years earlier than Ray’s prediction. We might attribute this 20-year gap as Ray’s linear natural bias mentioned in the last paragraph.
Currently, A.I.-based transcription accuracy could be as high as 80 to 90 percent, depending on the audio quality. The question is when this accuracy will increase to 99%.
In this article, we are going to cover the specific gaps that need to be bridged between A.I.-based transcriptionists and human transcriptionists. Will 4 years be enough time to bridge this gap? Let’s begin our analysis!
Gaps between human and A.I. Transcriptionists
1. Speed (A.I. Transcriptionists win)
As for the speed, I don’t anticipate any debate here. For every minute of audio, the fastest turnaround that a human transcriptionist can provide is at best 2 minutes. A.I. models can almost transcribe the audio in real-time. Furthermore, for a prerecorded audio file, the turnaround time for many A.I. transcriptionists is less than half of the length of the audio. For example, SmartScribe can transcribe 30-minute conversational audio in less than 10 minutes!
The winner by a lot of margins is: A.I. transcriptionist
2. Accuracy (Human Transcriptionists win)
The leading A.I. transcriptionist engines provide 80% to 90% accuracy for the English language, depending on the sound quality. Human transcriptionists usually guarantee a caliber of 99% accuracy. Therefore, from the accuracy perspective, currently, human transcriptionists are the winners. Having said that, the accuracy gap between humans and A.I. transcriptionists is closing noticeably. We anticipate that the accuracy gap will close sooner than we think.
3. Crosstalk (Human Transcriptionists win)
Transcribing audio with a lot of crosstalk in it is challenging for both humans and A.I. transcriptionists. When the sound of two or more speakers overlap, and when the channel is not separated, for A.I. it is difficult to isolate the speakers individually. For humans, due to daily experience of such scenarios, the brain is naturally trained to isolate different sounds. In this case, human is a winner as well. However, it is just a matter of time before A.I. catches up and does a better job than humans.
4. Background noise (Tie)
Background noise gets infused into the speech signal, diluting the signal used by the A.I. for accurate speech recognition. Similarly, the background noise makes it difficult for humans to hear the consonants accurately. We have discovered that depending on the dynamics of the noise, humans, and A.I. are challenged in varying degrees in recognizing the correct words.
For this one, we call the match a tie.
5. Multilingual (A.I. Transcriptionists win)
There is no competition in terms of the number of languages an A.I. can transcribe. SmartScribe’s A.I. model can transcribe 31 different languages and doesn’t suffer from mixing up the words in between the languages as a human does.
The winner here is an A.I. transcriptionist by a large margin.
6. Multilingual within the same dialogue (Human Transcriptionists win)
What A.I. currently lacks is the ability to identify multiple languages in the same dialog. For example, if in a dialogue one person speaks in English and another person replies in Spanish, we don’t know of any A.I. that can transcribe this.
On the other hand, a bilingual human transcriber who is knowledgeable in English and Spanish can easily recognize the languages within the same dialog and gets the job done.
Again, we think it is just a matter of time before A.I. will surpass humans for this ability but at this moment, a multilingual human transcriber is a winner.
7. Foreign accent (A.I. Transcriptionists win)
Considering the diversity and variety of languages a single stop can cover, we can confidently say that A.I. transcriptionists such as SmartScribe are the winners. SmartScribe model is capable of understanding a variety of English accents such as UK, Australian, Scottish, and Indian. Finding a human transcriptionist who has experience in transcribing numerous foreign accents is probably not easy. Often, human transcriptionists charge extra for transcribing foreign accents.
Considering the cost-effectiveness, and the ease of finding one, we pick A.I. transcriptionists (SmartScribe) as a winner.
8. Speaker identification (Human Transcriptionists win)
Speaker identification means identifying the individuals in audio and assigning the transcripted text to the correct speaker. When each speaker has their own channel, identifying the speakers is a trivial task. However, “Speaker identification” when the dialogue audio is on one channel is challenging.
A.I. transcriptionists identify the speakers by analyzing the tone characteristics of participants’ voices. Often, A.I. algorithms are not sensitive enough to distinguish different speakers when the speakers’ voices are similar. Also, sometimes the A.I. algorithms tag a speaker as two or more different individuals. This usually happens to the speakers who have adopted a dynamic intonation speaking style.
Contrary to A.I. transcriptionists, humans are aware of the meaning of the dialog and can assign the voices to their correct speakers better. That’s because, in addition to the vocal characteristics, humans can follow the dialogues. Meaning, when the voices are similar they can leverage the natural dialogue flow for guesstimating the correct speaker.
For speaker identification, the winner is clearly human transcribers!
9. Timestamping (A.I. Transcriptionists win)
Timestamps in a transcription text are a really helpful feature. Often, when a segment of a transcription looks odd, referring to original audio becomes a necessity. Periodically appearing timestamps help to find the audio segment quicker. A.I. transcriptionists do an excellent job in placing periodical timestamps in the transcriptions. Furthermore, Apps such as SmartScribe allow the users to instantly access the audio segment corresponding to a text. Timestamping for human transcriptionists is a time-consuming and tedious task and usually, they charge a premium for this.
A.I. transcriptionist is a winner for this category!
10. Cost (A.I. Transcriptionists win)
There is no room for any doubt that the winner here is an A.I. transcriptionist. Human transcriptionist services at least charge a dollar per minute. Which is much more expensive than A.I. transcriptionists that go as low as single-digit cents per minute
A.I. transcriptions are the winners for speed, multilingual, foreign accent, timestamping, and cost. Score: 5
Human transcriptions are the winners for speaker identification, multilingual dialogues, background noise, crosstalk, and accuracy. Score: 5
Clearly, there is no clear winner here. It really boils down to your preference. For example, if you care about accuracy the most and you don’t mind spending more and a longer turnaround time, a human transcriptionist is what you need. On the other hand, if your budget is tight and a few errors here and there are acceptable, pick an A.I. transcriptionist.
There are numerous A.I.-based transcriptionist apps available these days. This article ranks the best transcriptionist apps of 2021. Depending on the use case, some transcriptionist apps are more suitable than others. For example, some are subscription-based and some are pay-as-you-go. Furthermore, the platform and the app itself make some apps easier to use compared to others for certain workflows.
SmartScribe is an A.I.-based transcriptionist that is multilingual and capable of medical scribing. Also, you can hire a human transcriptionist directly from the app if you need a human transcriptionist at an unbeatable rate. SmartScribe’s mobile app is an amalgam of a voice recorder and transcriber. The online web application is a convenient tool for transcribing prerecorded audio files that need to be transcribed.
Download SmartScribe for iOS and contact customer service via the app to receive a free trial credit.
Read the interesting journey of how SmartScribe was created here.