PTIEN-NY dataset

Collection process

The audio was collected using Public Transport Information service for New York provided in English language(PTIEN-NY). The PTIEN-NY is an automated service using Alex spoken dialogue system framework. See the project website.

Release Notes

The data are released in a raw format, not thoroughly checked and validated. The audio transcriptions were obtained using crowd sourcing.

Audio

Storage format: 16khz, signed-integer, 16bit, little endian wav
Recording via HTML5 audio element or VOIP telephone channel
The directory all contains the audio from the whole dialogues but only with the customer voice (no TTS).
The directory recorded contains the customers audio with Total Duration of 4166 files: 03:13:35.06.
- Contains key_transcriptions.scp which stores transcriptions for each of recorded wav files in the folders
The ptien-ny-extracted-flat contains subset of recorded with Total Duration of 1328 files: 00:58:49.09.
- Contains scp files containing transcriptions and train, dev, test are disjointly slitted.
- all-trns.scp - 1328 utterances
- dev-trns.scp - 200 utterances
- test-trns.scp - 400 utterances
- train-trns.scp - 729 utterances

Scripts & metadata

The meta data were created automatically but may contain errors.
The asr_transcribed_concatenated.xml contains meta data about the dialogues
The extract_trans.py script extract transcription given the asr_transcribed_concatenated.xml and all directory.

Contributors

Martin Vejman
Filip Jurcicek
Ondrej Dusek
Ondrej Platek

License

Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
all		all
ptien-ny-extracted-flat		ptien-ny-extracted-flat
recorded		recorded
.gitignore		.gitignore
README.md		README.md
asr_transcribed_concatenated.xml		asr_transcribed_concatenated.xml
extract_trans.py		extract_trans.py
scp-to-single-trns.py		scp-to-single-trns.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PTIEN-NY dataset

Collection process

Release Notes

Audio

Scripts & metadata

Contributors

License

About

Releases

Packages

Languages

UFAL-DSG/ptien-ny

Folders and files

Latest commit

History

Repository files navigation

PTIEN-NY dataset

Collection process

Release Notes

Audio

Scripts & metadata

Contributors

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages