AISHELL-5
Identifier: SLR159
Summary: The First Open-Source In-Car Multi-Channel Multi-Speaker Speech Dataset for Automatic Speech Diarization and Recognition, provided by Beijing AISHELL Technology Co.,Ltd.
Category: Speech
License: CC BY-SA 4.0
Downloads (use a mirror closer to you):
train.tar.gz [52G] (Training set, far-field and near-field microphone speech and transcripts
) Mirrors:
[US]
[EU]
[CN]
Dev.tar.gz [2.1G] (Development set
) Mirrors:
[US]
[EU]
[CN]
Eval1.tar.gz [1.8G] (Evaluation set
) Mirrors:
[US]
[EU]
[CN]
Eval2.tar.gz [2.1G] (Evaluation set
) Mirrors:
[US]
[EU]
[CN]
noise.tar.gz [13G] (Noise set
) Mirrors:
[US]
[EU]
[CN]
About this resource:
You can cite the data using the following BibTeX entry:
@inproceedings{AISHELL-5_2025, title={AISHELL-5: The First Open-Source In-Car Multi-Channel Multi-Speaker Speech Dataset for Automatic Speech Diarization and Recognition}, author={Yuhang Dai, He Wang, Xingchen Li, Zihan Zhang, Shuiyuan Wang, Lei Xie, Xin Xu, hongxiao Guo, Shaoji Zhang, Hui Bu, Wei Chen}, booktitle={Interspeech}, url={https://arxiv.org/pdf/2505.23036}, year={2025} }
External URL: https://www.aishelltech.com/AISHELL_5. Full description from the company website.