Automations are everywhere, including in the English transcription field. I read and followed a discussion addressing the emergence of auto-transcript services that could possibly plunder this work niche that I began to love. Auto transcription, a technology that converts spoken language into text, has witnessed remarkable advancements in recent years.
The demand for transcription services has skyrocketed in today’s fast-paced digital world.
From interviews and meetings to podcasts and lectures, the need to convert spoken content into written form is pervasive. However, with the advent of machine learning and artificial intelligence, the landscape of transcription technology has undergone a significant transformation.
Machine learning algorithms have revolutionized the accuracy and efficiency of auto transcription systems. These technological advancements have made auto transcription tools a convenient solution. Recent developments in auto transcription offer real-time transcription capabilities that aim to provide instantaneous conversion of spoken language into text. Features like this can enable live captioning for events, meetings, and conversations with minimal delay.
The Challenges of Auto Transcription
While these tools offer speed and efficiency, they come with their own set of disadvantages. One of the key challenges in auto transcription development has been improving the accuracy of transcriptions, especially in noisy environments or with speakers exhibiting various accents and speech patterns. Currently, researchers are trying to train sophisticated machine learning models on vast amounts of audio data to recognise and interpret speech more accurately.
Additionally, large datasets were used to fine-tune specific transcription tasks and are said to have shown promising results in improving transcription accuracy across different domains.
As I write this musing, I realise how biased I would sound to the readers. I don’t think it’s proper to sugar-coat it either. Although I’m somewhat of a Luddite on this matter, I took a chance to try several services offering auto transcription. Initially, I thought it would make me work faster and easier just as it claimed. But alas, I came to the conclusion that I still prefer doing my transcription work in a traditional way. Don’t be mistaken. I am equipped with software supporting my transcription work, such as FTW Transcriber and FastFox Text Expander. I can attest that these two software literally make me work faster, just as advertised.
After numerous attempts and experiments, I’ve come to a firm conclusion: not all auto transcription tools are created equal. Some tools show promise and perform admirably in specific contexts, but the majority ultimately fall short of meeting my particular needs. I think it will be a while until I’m using auto transcription tools to support my tasks.
Even though marketed as reliable, efficient, and capable of handling a wide range of audio inputs, the reality often falls short of these promises. I’ve summarised several issues based on my experiments with several auto transcription tools.
Inconsistent level of accuracy
The most common theme I heard from people complaining about auto transcription is its level of accuracy. One case to observe is the YouTube auto transcription feature. While it appears to be transcribing speeches from the uploaded videos, I’m often amused by the results. I’ve tried using automation services—won’t mention any brands here—and I don’t find satisfaction in reading or even working with them (i.e. editing automated transcripts).
Auto transcription tools often struggle to decipher speech in recordings with medium to heavy background noise, resulting in misheard words that lack context. Furthermore, these tools cannot differentiate between speakers when multiple people speak simultaneously. They often leave a point in the timestamp blank or skip the part altogether. In my opinion, auto transcription tools are highly unreliable for transcribing focus group discussions type of recording. None of the brands I tried managed to prove otherwise.
Limited linguistic ability
a. Poor recognition of linguistic nuances, accents, and dialects.
While all those sophisticated algorithms and artificial intelligence are great and ground-breaking, a machine will still lack the knowledge of linguistic nuances, colloquialism, accents, and dialects.
Unlike human ears, auto transcription also still lacks understanding of the context of a speech. The machine struggles to grasp nuances of the conversation, including humour, sarcasm, or cultural references. The end product may fail to capture the intended meaning, which may lead to misunderstandings or misrepresentation of content.
b. Inability to correctly recognise speakers using English as their second or third language.
I have found that auto transcription tools can only understand standard accents. It’s still incapable of recognising English as a second or third language, especially if the speaker has a heavy accent. A case I found when I ran a sample audio with Geordie accent was that it only got some of the words correct and missed the rest.
c. Below average ability to transcribe mixed language in an audio recording.
Unfortunately, there aren’t many auto transcription tools equipped with the ability to do this precise task. If I were to use an auto transcription tool, I would require this feature to be top-notch because the files I receive sometimes contain a mixture of English and other languages.
However, when I tried to transcribe one of those mixed language files, the results were far from satisfactory. To me, it seems like the machine could only interpret foreign words or accents into the closest-sounding words, but with English spelling. Often, the interpretations have no meaning at all, and reading it can be quite comedic.
It takes longer to work on
Auto-transcribed texts are always amusing to read, but pain in the ass to edit. Most of the time, it makes me work longer than if I transcribe it manually. Indeed, auto transcription offers a quick and easy way to convert audio to text. Most transcribers I know dislike the task of significantly editing and proofreading those transcripts to ensure high accuracy.
A transcript produced using auto transcription will almost certainly have high errors, misinterpretations, and omissions of information. Such documents cannot fulfil their purpose as research data and will surely be deemed useless. Editing auto-transcribed text often proves to be a more time-consuming task than manually transcribing an audio file.
- Auto-transcribed text tends to contain a higher frequency of errors, ranging from minor spelling mistakes to significant misinterpretations of spoken language. These errors may result from various factors, including background noise, overlapping dialogue, accents, and speech impediments, which can challenge the accuracy of the transcription. If this is the case, I must meticulously review and revise the text to accurately reflect the intended meaning and context.
- Auto transcription tools may struggle with punctuation, formatting, and speaker identification, further complicating the editing process. Missing or misplaced commas, periods, or quotation marks can significantly impact readability and comprehension. Speaker changes are also often misidentified, which makes it difficult to follow the flow of the conversation. This additional task definitely requires additional time and effort to re-identify the speakers as well as to organise and structure the text effectively.
- Recognising and refining linguistic nuances also make the proofreading task more tedious. This may involve rewriting sentences, double-checking the terminology used or misheard words, paraphrasing complex phrases, and harmonising the tone to align with the intended audience or purpose.
- Last but not least, the editing process for auto-transcribed text often requires frequent toggling between the audio recording and the transcript to verify accuracy and context. This is my most hated element of editing auto-transcribed texts. Not to mention, it’s also cumbersome and time-consuming to toggle back and forth.
Privacy and security concerns
I’ve described a lot regarding the accuracy concerns of auto transcription. But there is one more thing that I seldom think about: privacy and security concerns. We exchange files using virtual services available in the market, but I think we often forget that we are entrusting these materials (some are more confidential than others) to the mighty cloud operated by third parties.
I must admit I never really gave it a second thought when exchanging files. I guess I relied so much on the client’s comfort in sharing them with me using the commonly used providers. However, arguments about privacy, and especially data ownership, are something to bear in mind in the future.
Conclusion
I have yet to adopt this technology to support my transcription tasks because I like to be in control of the files I’m working on. In manual transcription, or white-glove as they call it, I have direct and complete control over the transcription process, and I can adjust my focus or pace to my liking. I can also instinctively switch my brain to any non-English language I’m familiar with whenever necessary. Plus, I don’t need to toggle and edit misheard words or sentences.
While I am certainly interested in promoting my transcription service, I also believe it’s crucial to approach the use of auto transcription with caution. The potential implications of errors and the importance of accuracy cannot be overstated. If I were to consider using an auto transcription tool, it would need to address all of the concerns I’ve outlined above.
But once auto transcription learns itself to perfection, I realise the role of a transcriber will be diminished solely to curating and editing the raw text produced by automated transcribers. Where’s the fun in that? Look, I want to be uplifting, happy, dandy, smiley and full of spirit. But let’s be real; as the year rolls on, I feel that the day when the role of humans is removed or replaced will come soon enough. Geez, that took a dark turn.
Automation will advance; that is certain and unavoidable. But to me, human ability goes far beyond what machines can do. I am aware of the impending doom of my line of work once auto transcription replaces my role. In the meantime, I’m not too worried because producing transcriptions with high accuracy requires a balanced approach without forgetting quality, privacy, and accessibility.
What I was trying to convey is how grateful I am that I’m living in a time when AI still possesses imperfections. Some people favour auto transcription because it offers convenience and efficiency, but its limitations must also be considered.
All that said, if you’re looking for expertly-crafted transcripts that meet your specific needs, then look no further! Just reach out to me and let’s discuss your requirements in detail.