LibriSpeech-PC
Identifier: SLR145
Summary: LibriSpeech text with Punctuation and Capitalization
Category: Text
License: CC BY 4.0
Downloads (use a mirror closer to you):
manifests.tar.gz [25M] ( Manifest files that match original LibriSpeech splits
) Mirrors:
[US]
[EU]
[CN]
About this resource:
LibriSpeech-PC: A dataset based on LibriSpeech* with restored punctuation and capitalization.
- The dataset includes ONLY .json manifests, NO audio files, audio files can be taken from the original LibriSpeech: https://www.openslr.org/12
- Subsets' structure is preserved.
- Some samples were dropped during punctuation and capitalization restoration, see STATISTICS for details.
You can cite the data using the following BibTeX entry:
@article{meister2023librispeechpc, title={LibriSpeech-PC: Benchmark for Evaluation of Punctuation and Capitalization Capabilities of end-to-end ASR Models}, author={A. Meister and M. Novikov and N. Karpov and E. Bakhturina and V. Lavrukhin and B. Ginsburg}, journal={arXiv preprint arXiv:2310.02943}, year={2023}, }