Novelty and ambition

The main objective of the project is to develop a high-quality auto-transcriber for conversational Norwegian speech. No such system exists, and major knowledge and technology components are currently non-existent.

The project will produce theoretical insights and algorithmic developments both of generic nature and specifically for Norwegian language technology. Transcribed conversational speech data are scarce in all languages, necessitating development of semi- or self-supervised methods, as well as hybrid data-driven/knowledge-based learning approaches with principles which will be applicable to the general field of machine learning. This will enable us to generalize the concept of attention and create a system that understands the context and meaning of the conversation, which again translates into better performance and more coherent transcriptions.

Dialects and accents create problems for ASR systems in most languages. Our approaches for lexical and pronunciation modeling will contribute to improve the state-of-the-art in ASR systems for natural speech transcription. Furthermore, for Norwegian, the approaches for mapping spoken dialect to an appropriate written form are unique. In addition, interesting linguistic information may come to light by analyzing the acoustic phonetic patterns of the different dialects. This contributes to generation of new knowledge about the Norwegian language itself, improvement in transcription performance, and inclusion of a greater variety of groups using Norwegian dialects.

The project will introduce new metrics that better attend to the contextual and semantic information of the conversation, in contrast to conventional metrics which give equal weight to all transcription errors. These metrics will provide a novel approach both to assessment of system performance that is closer to human judgment, as well as enabling improved learning and optimization of ASR systems.

This project has strong focus on releasing databases of transcribed speech that will allow even small teams to make a real contribution to speech technology. In addition, we will open-source our pre-trained models for the community to use and contribute with further improvements, for the benefit of Norwegian society.

The topics described above can be studied almost independently, but our main goal is reached when they are all combined into the same project. This guarantees that our WPs enjoy a certain degree of independence (avoiding bottlenecks) while at the same time makes possible tight collaboration between the different groups and areas of research as discussed in this proposal.


  • WP1 - Modelling conversational speech

  • WP2 - Characterization and handling of dialects

  • WP3 - Metrics and system integration

  • WP4 - Linguistic Resources and Data Management