Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
MSTRE-Net: Multistreaming Acoustic Modeling for Automatic Lyrics Transcription
Royal College of Music in Stockholm, Department of Folk Music. Kungliga Musikhögskolan, Stockholm.ORCID iD: 0000-0002-4756-1441
Number of Authors: 32021 (English)Conference paper, Poster (with or without abstract) (Refereed)
Abstract [en]

This paper makes several contributions to automatic lyrics transcription (ALT) research. Our main contribution is a novel variant of the Multistreaming Time-Delay Neural Network (MTDNN) architecture, called MSTRE-Net, which processes the temporal information using multiple streams in parallel with varying resolutions keeping the network more compact, and thus with a faster inference and an improved recognition rate than having identical TDNN streams. In addition, two novel preprocessing steps prior to training the acoustic model are proposed. First, we suggest using recordings from both monophonic and polyphonic domains during training the acoustic model. Second, we tag monophonic and polyphonic recordings with distinct labels for discriminating non-vocal silence and music instances during alignment. Moreover, we present a new test set with a considerably larger size and a higher musical variability compared to the existing datasets used in ALT literature, while maintaining the gender balance of the singers. Our best performing model sets the state-of-the-art in lyrics transcription by a large margin. For reproducibility, we publicly share the identifiers to retrieve the data used in this paper.

Place, publisher, year, edition, pages
2021.
Keywords [en]
automatic lyrics transcription, music information retrieval, computational linguistics, automatic speech recognition
National Category
Other Computer and Information Science
Identifiers
URN: urn:nbn:se:kmh:diva-4324OAI: oai:DiVA.org:kmh-4324DiVA, id: diva2:1622917
Conference
ISMIR2021, International Society for Music Information Retrieval
Available from: 2021-12-26 Created: 2021-12-26 Last updated: 2022-01-03Bibliographically approved

Open Access in DiVA

No full text in DiVA

Search in DiVA

By author/editor
Ahlbäck, Sven
By organisation
Department of Folk Music
Other Computer and Information Science

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 475 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf