Speech Recognition (ASR)

What is ASR?

Automatic Speech Recognition (ASR) is the process of converting spoken audio into written text. Modern ASR systems are powered by advanced AI models, which have significantly improved in quality and accuracy over the last few years, even in challenging environments such as noisy backgrounds or heavily accented speech. These models are actively improved and new versions are released every couple of months.

However, ASR can still struggle with overlapping speakers, and accuracy may be lower than that of professional human captioners. Despite these limits, ASR remains highly accessible, fast and cost-effective, making it a strong option for many use cases.

ASR Engines

Line 21 offers several ASR engines and automatically recommends the best fit based on the language spoken in your audio. Each engine has unique capabilities such as speaker diarization, custom vocabulary support, or audio event detection (e.g., music or laughter).

Because every use case is different, Line 21 lets you choose the engine that matches your needs and helps you get the most out of speech recognition.

Here a list of engines with capabilities:

ASR EngineDiarisationCode SwitchingAudio EventsSupported Languages
icon
Speechmatics
55
icon
Gladia
104
icon
Deepgram Nova
41
icon
AWS Transcribe
43

Languages

Not all languages are supported by ASR technology, but Line 21 currently supports 110, covering most global use cases.

We also support limited code switching for scenarios where multiple languages appear in the same audio.

Supported languages

LanguageCodeASR Engines
AfrikaansAF
icon
icon
AlbanianSQ
icon
AmharicAM
icon
ArabicAR
icon
icon
icon
ArmenianHY
icon
AssameseAS
icon
AzerbaijaniAZ
icon
BashkirBA
icon
icon
BasqueEU
icon
icon
icon
BelarusianBE
icon
icon
BengaliBN
icon
BosnianBS
icon
BulgarianBG
icon
icon
icon
CatalanCA
icon
icon
icon
icon
ChineseZH
icon
icon
icon
icon
CroatianHR
icon
icon
icon
CzechCS
icon
icon
icon
icon
DanishDA
icon
icon
icon
icon
DutchNL
icon
icon
icon
icon
EnglishEN
icon
icon
icon
icon
EsperantoEO
icon
EstonianET
icon
icon
icon
FaroeseFO
icon
FinnishFI
icon
icon
icon
icon
FrenchFR
icon
icon
icon
icon
GalicianGL
icon
icon
icon
GeorgianKA
icon
GermanDE
icon
icon
icon
icon
GreekEL
icon
icon
icon
icon
GujaratiGU
icon
Haitian CreoleHT
icon
HausaHA
icon
HawaiianHAW
icon
HebrewHE
icon
icon
icon
HindiHI
icon
icon
icon
icon
HungarianHU
icon
icon
icon
IcelandicIS
icon
IndonesianID
icon
icon
icon
icon
ItalianIT
icon
icon
icon
icon
JapaneseJA
icon
icon
icon
icon
JavaneseJV
icon
KannadaKN
icon
KazakhKK
icon
KhmerKM
icon
KoreanKO
icon
icon
icon
icon
LaoLO
icon
LatinLA
icon
LatvianLV
icon
icon
icon
icon
LingalaLN
icon
LithuanianLT
icon
icon
icon
LuxembourgishLB
icon
MacedonianMK
icon
MalagasyMG
icon
MalayMS
icon
icon
icon
icon
MalayalamML
icon
MalteseMT
icon
MaoriMI
icon
MarathiMR
icon
icon
MongolianMN
icon
icon
Myanmar (Burmese)MY
icon
NepaliNE
icon
NorwegianNO
icon
icon
icon
icon
NynorskNN
icon
OccitanOC
icon
PashtoPS
icon
PersianFA
icon
icon
PolishPL
icon
icon
icon
icon
PortuguesePT
icon
icon
icon
icon
PunjabiPA
icon
RomanianRO
icon
icon
icon
icon
RussianRU
icon
icon
icon
icon
SanskritSA
icon
SerbianSR
icon
icon
ShonaSN
icon
SindhiSD
icon
Sinhala (Sinhalese)SI
icon
SlovakSK
icon
icon
icon
SlovenianSL
icon
icon
SomaliSO
icon
SpanishES
icon
icon
icon
icon
SundaneseSU
icon
SwahiliSW
icon
icon
SwedishSV
icon
icon
icon
icon
Tagalog (Filipino)TL
icon
icon
TajikTG
icon
TamasheqTAQ
icon
TamilTA
icon
icon
icon
TatarTT
icon
TeluguTE
icon
ThaiTH
icon
icon
icon
icon
TibetanBO
icon
TurkishTR
icon
icon
icon
TurkmenTK
icon
UkrainianUK
icon
icon
icon
icon
UrduUR
icon
UyghurUG
icon
UzbekUZ
icon
VietnameseVI
icon
icon
icon
icon
WelshCY
icon
icon
WolofWO
icon
YiddishYI
icon
YorubaYO
icon
ZuluZU
icon

Code switching

In some situations, multiple languages are spoken within the same conversation. Code switching allows an ASR engine to expect and recognise more than one language.
When enabled, the engine can detect which language is being used at any moment. Line 21 can also cross-translate between the languages detected.
To use this feature, create a multi-language input to enable automatic code switching.

Diarisation

Diarisation is the process of separating audio by speaker.
When an engine supports and enables diarisation, each speaker's lines are identified and separated. Captions will reflect speaker changes clearly, helping viewers follow who is talking at any moment.

AV Feeds

ASR can process either direct audio inputs or audio extracted from video.
If you're using one of the video ingestion methods available in Line 21, the system will automatically handle the audio for ASR.
See the documentation on Audio/Video feeds for more details.

Context

Line 21's recognition quality is enhanced by our AI context system.
The context contains all known information about your event: names, terminology, keywords, technical concepts, and more.
When the context is processed, Line 21 generates a custom vocabulary for the ASR engine, ensuring that specific terms are understood correctly and improving accuracy beyond standard ASR models.

Troubleshooting

If speech recognition accuracy is lower than expected, review the common issues below.
Most ASR problems fall into one of these categories.

Audio Quality Issues

Poor audio is the main cause of ASR errors.

Symptoms:

  • Words sound distorted, muffled, or clipped
  • Volume is too low or too high
  • Background noise overwhelms speech
  • Audio “drops” or interruptions

How to fix:

  • Make sure your microphone or audio source is correctly placed and not too far from the speaker.
  • Avoid using built-in laptop microphones for important events.
  • Reduce background noise (fans, music, chatter, AC units).
  • If using AV feeds, verify the audio track being ingested is the cleanest available.
  • Check your audio format and ensure it matches the engine's requirements.
  • Check internet connection and ensure it is stable.

Missing Vocabulary

If specific words are not recognised (names, cities, products, institutions), the issue is usually the AI Context.

How to fix:

  • Verify that your AI Context includes all relevant names, technical terms, abbreviations, and jargon.
  • Add difficult words with phonetic hints if needed.
  • Re-upload updated context before your event starts.
  • See AI Context documentation for best practices.

Speaker Overlapping

ASR engines still struggle when people talk simultaneously.

Symptoms:

  • Words become jumbled
  • Parts of sentences disappear
  • Accuracy drops only during fast interactions or debates

How to fix:

  • Encourage speakers to avoid overlapping speech.
  • If possible, normalise the audio so speakers have similar volume.
  • Enable Diarisation (if supported by the chosen engine) to help distinguish speakers.

Wrong Language

If parts of the transcript appear in the wrong language, check your language settings.

How to fix:

  • Confirm you selected the correct language for the project.
  • If the content includes multiple languages, configure Code Switching and create a multi-language input.
  • See Languages for configuration guidance.

Engine Limitations

Different engines have different strengths.

How to fix:

  • Try switching to another ASR engine optimised for your language or use case.
  • Review which engines support diarisation, vocabulary, or special audio events.
  • See ASR Engines for recommendations.

Still having issues?

If the problem persists, contact us and we'll be happy to help you troubleshoot.

Last updated: January 13, 2026 at 09:22 AM

On this page