Marco Dinarelli with his first journal publication in a IEEE review Marco Dinarelli
Web site of Marco Dinarelli in English  Site web de Marco Dinarelli en français  Sito web di Marco Dinarelli in italiano 


LIG (UMR 5217)
Office 327
700 avenue Centrale
Campus de Saint-Martin-d’Hères, France


Email:
marco [dot] dinarelli [at] univ-grenoble-alpes [dot] fr
marco [dot] dinarelli [at] gmail [dot] com

                        Curriculum Vitae           Profile of Marco Dinarelli in LinkedIn


Latest news

2021 / 04 / 19:
French wav2vec 2.0 models available here. Associated SLU (and other benchmark) systems available here.

2020 / 11 / 10:
TarcMTS, the Fairseq multi-task system associated to our WANLP 2020 paper is available

Data for the system described in the paper submitted at Interspeech 2021

In this page I provide input features used for the experiments described in the paper submitted at Interspeech 2021 (link coming soon, whether accepted or not...). For a description of the system please see our git repository for Interspeech 2021.

Features

The features must be used as input to the system with the option --serialized-corpus data-prefix. data-prefix is the common prefix for all filenames (train, dev, test and dict). For example, for using spectrogram features of the MEDIA corpus (the only currently provided here), the option for the system is --serialized-corpus MEDIA.user+machine.spectro-Fr-Normalized.data.
All splits plus the dictionary must be downloaded for the system to work.


Feature description
Type Train Dev Test Dict SLU Model
Spectrogram Download Download Download Download Download
W2V2-En-base Download Download Download Download Download
W2V2-En-large Download Download Download Download Download
W2V2-Fr-S-base Download Download Download Download Download
W2V2-Fr-S-large Download Download Download Download Download
W2V2-Fr-M-base Download Download Download Download Download
W2V2-Fr-M-large Download Download Download Download Download
XLSR53 Download Download Download Download Download

Results

In the following table we report results obtained on the MEDIA corpus with the system described in the paper, and in the repository.

Token decoding (Word Error Rate)
Model Input Features DEV ER TEST ER
Comparison to our previous work
ICASSP 2020 Seq Spectrogram 29.42 28.71
Interspeech 2021
Kheops+Basic Spectrogram 36.25 37.16
Kheops+Basic W2V2-En-base 19.80 21.78
Kheops+Basic W2V2-En-large 24.44 26.96
Kheops+Basic W2V2-Fr-S-base 23.11 25.22
Kheops+Basic W2V2-Fr-S-large 18.48 19.92
Kheops+Basic W2V2-Fr-M-base 14.97 16.37
Kheops+Basic W2V2-Fr-M-large 11.77 12.85
Kheops+Basic XLSR53-large 14.98 15.74
Concept decoding (Concept Error Rate)
Model Input Features DEV ER TEST ER
Comparison to our previous work
ICASSP 2020 Seq Spectrogram 28.11 27.52
ICASSP 2020 XT Spectrogram 23.39 24.02
Interspeech 2021
Kheops+Basic Spectrogram 39.66 40.76
Kheops+Basic +token Spectrogram 34.38 34.74
Kheops+LSTM +SLU Spectrogram 33.63 34.76
Kheops+Basic +token W2V2-En-base 26.79 26.57
Kheops+LSTM +SLU W2V2-En-base 26.31 26.11
Kheops+Basic +token W2V2-En-large 29.31 30.39
Kheops+LSTM +SLU W2V2-En-large 28.38 28.57
Kheops+Basic +token W2V2-Fr-S-base 27.18 28.27
Kheops+LSTM +SLU W2V2-Fr-S-base 26.16 26.69
Kheops+Basic +token W2V2-Fr-S-large 23.34 23.75
Kheops+LSTM +SLU W2V2-Fr-S-large 22.53 23.03
Kheops+Basic +token W2V2-Fr-M-base 22.11 21.30
Kheops+LSTM +SLU W2V2-Fr-M-base 22.56 22.24
Kheops+Basic +token W2V2-Fr-M-large 21.72 21.35
Kheops+LSTM +SLU W2V2-Fr-M-large 18.54 18.62
Kheops+Basic +token XLSR53-large 21.00 20.67
Kheops+LSTM +SLU XLSR53-large 20.34 19.73