Speech Recognition (ASR)

What is ASR?

Automatic Speech Recognition (ASR) is the process of converting spoken audio into written text. Modern ASR systems are powered by advanced AI models, which have significantly improved in quality and accuracy over the last few years, even in challenging environments such as noisy backgrounds or heavily accented speech. These models are actively improved and new versions are released every couple of months.

However, ASR can still struggle with overlapping speakers, and accuracy may be lower than that of professional human captioners. Despite these limits, ASR remains highly accessible, fast and cost-effective, making it a strong option for many use cases.

ASR Engines

Line 21 offers several ASR engines and automatically recommends the best fit based on the language spoken in your audio. Each engine has unique capabilities such as speaker diarization, custom vocabulary support, or audio event detection (e.g., music or laughter).

Because every use case is different, Line 21 lets you choose the engine that matches your needs and helps you get the most out of speech recognition.

Here a list of engines with capabilities:

ASR Engine	Diarisation	Code Switching	Audio Events	Supported Languages
Speechmatics				55
Gladia				104
Deepgram Nova				68
AWS Transcribe				43

Languages

Not all languages are supported by ASR technology, but Line 21 currently supports 126, covering most global use cases.

We also support limited code switching for scenarios where multiple languages appear in the same audio.

Supported languages

Language	Code	ASR Engines
Afrikaans	AF
Albanian	SQ
Amharic	AM
Arabic	AR
Arabic (United Arab Emirates)	AR_AE
Armenian	HY
Assamese	AS
Azerbaijani	AZ
Bashkir	BA
Basque	EU
Belarusian	BE
Bengali	BN
Bosnian	BS
Bulgarian	BG
Catalan	CA
Chinese	ZH
Croatian	HR
Czech	CS
Danish	DA
Dutch	NL
English	EN
Esperanto	EO
Estonian	ET
Faroese	FO
Finnish	FI
French	FR
Galician	GL
Georgian	KA
German	DE
Greek	EL
Gujarati	GU
Haitian Creole	HT
Hausa	HA
Hawaiian	HAW
Hebrew	HE
Hindi	HI
Hungarian	HU
Icelandic	IS
Indonesian	ID
Italian	IT
Japanese	JA
Javanese	JV
Kannada	KN
Kazakh	KK
Khmer	KM
Korean	KO
Lao	LO
Latin	LA
Latvian	LV
Lingala	LN
Lithuanian	LT
Luxembourgish	LB
Macedonian	MK
Malagasy	MG
Malay	MS
Malayalam	ML
Maltese	MT
Maori	MI
Marathi	MR
Mongolian	MN
Myanmar (Burmese)	MY
Nepali	NE
Norwegian	NO
Nynorsk	NN
Occitan	OC
Pashto	PS
Persian	FA
Polish	PL
Portuguese	PT
Punjabi	PA
Romanian	RO
Russian	RU
Sanskrit	SA
Serbian	SR
Shona	SN
Sindhi	SD
Sinhala (Sinhalese)	SI
Slovak	SK
Slovenian	SL
Somali	SO
Spanish	ES
Sundanese	SU
Swahili	SW
Swedish	SV
Tagalog (Filipino)	TL
Tajik	TG
Tamasheq	TAQ
Tamil	TA
Tatar	TT
Telugu	TE
Thai	TH
Tibetan	BO
Turkish	TR
Turkmen	TK
Ukrainian	UK
Urdu	UR
Uyghur	UG
Uzbek	UZ
Vietnamese	VI
Welsh	CY
Wolof	WO
Yiddish	YI
Yoruba	YO
Zulu	ZU

Code switching

In some situations, multiple languages are spoken within the same conversation. Code switching allows an ASR engine to expect and recognise more than one language.
When enabled, the engine can detect which language is being used at any moment. Line 21 can also cross-translate between the languages detected.
To use this feature, create a multi-language input to enable automatic code switching.

Diarisation

Diarisation is the process of separating audio by speaker.
When an engine supports and enables diarisation, each speaker's lines are identified and separated. Captions will reflect speaker changes clearly, helping viewers follow who is talking at any moment.

AV Feeds

ASR can process either direct audio inputs or audio extracted from video.
If you're using one of the video ingestion methods available in Line 21, the system will automatically handle the audio for ASR.
See the documentation on Audio feeds and Audio/Video feeds for more details.

Context

Line 21's recognition quality is enhanced by our AI context system.
The context contains all known information about your event: names, terminology, keywords, technical concepts, and more.
When the context is processed, Line 21 generates a custom vocabulary for the ASR engine, ensuring that specific terms are understood correctly and improving accuracy beyond standard ASR models.

Troubleshooting

If speech recognition accuracy is lower than expected, review the common issues below.
Most ASR problems fall into one of these categories.

Audio Quality Issues

Poor audio is the main cause of ASR errors.

Symptoms:

Words sound distorted, muffled, or clipped
Volume is too low or too high
Background noise overwhelms speech
Audio “drops” or interruptions

How to fix:

Make sure your microphone or audio source is correctly placed and not too far from the speaker.
Avoid using built-in laptop microphones for important events.
Reduce background noise (fans, music, chatter, AC units).
If using AV feeds, verify the audio track being ingested is the cleanest available.
Check your audio format and ensure it matches the engine's requirements.
Check internet connection and ensure it is stable.

Missing Vocabulary

If specific words are not recognised (names, cities, products, institutions), the issue is usually the AI Context.

How to fix:

Verify that your AI Context includes all relevant names, technical terms, abbreviations, and jargon.
Add difficult words with phonetic hints if needed.
Re-upload updated context before your event starts.
See AI Context documentation for best practices.

Speaker Overlapping

ASR engines still struggle when people talk simultaneously.

Symptoms:

Words become jumbled
Parts of sentences disappear
Accuracy drops only during fast interactions or debates

How to fix:

Encourage speakers to avoid overlapping speech.
If possible, normalise the audio so speakers have similar volume.
Enable Diarisation (if supported by the chosen engine) to help distinguish speakers.

Wrong Language

If parts of the transcript appear in the wrong language, check your language settings.

How to fix:

Confirm you selected the correct language for the project.
If the content includes multiple languages, configure Code Switching and create a multi-language input.
See Languages for configuration guidance.

Engine Limitations

Different engines have different strengths.

How to fix:

Try switching to another ASR engine optimised for your language or use case.
Review which engines support diarisation, vocabulary, or special audio events.
See ASR Engines for recommendations.

Still having issues?

If the problem persists, contact us and we'll be happy to help you troubleshoot.

Speech Recognition (ASR)

On this page