URDU dataset contains emotional utterances of Urdu speech gathered from Urdu talk shows. It contains 400 utterances of four basic emotions: Angry, Happy, Neutral, and Emotion. There are 38 speakers (27 male and 11 female).
A Korean read Speech Corpus of about 120 hours from the National Institute of Korean Language (NIKL).
Transcribed speech for use in Speech Recognition Engines; categorize and make available all submitted audio files (Speech Corpus) and Acoustic Models.
A Twitter Corpus built with the aim of representing and analyzing hate speech against some minority groups in Italy: immigrants in particular, but also Muslims and Roma. The contains the tweets' ID and their annotation.
Audio-visual dataset consisting of short clips of human speech, extracted from interview videos uploaded to YouTube. This dataset contains 7,000 + speakers, 1 million + utterances and 2,000 + hours of both audio and video.
The dataset has 71,358 total number of words, with 13,311 distinct words, approximately 10 hours and 28 minutes of speech from a single speaker, recorded at 48Khz, containing a total of 3,632 audio files in Wave format. Audio files range from 0.67 to 50.08 seconds.
Made from audio talks and their transcriptions, the dataset contains 1495 audio talks in NIST sphere format (SPH), 1495 transcripts in STM format, dictionary with pronunciation (159, 848 entries) and selected monolingual data for language modeling.
A large database of sentences and translations to see examples of how words are used in the context of a sentence.
Common Voice dataset, an open-source dataset of voices, currently consists of over 7,000 validated hours in 60 languages and includes demographic metadata like age, sex, and accent that can help train the accuracy of Speech Recognition engines. Each entry in the dataset consists of a unique MP3 and corresponding text file.
A central place for speech resources, OpenSLR hosts speech and language resources, such as training Corpora for Speech Recognition, and software related to Speech Recognition.
Explore over 500 data sets of the Machine Learning Repository from UC Irvine, through a searchable interface. Datasets range across many topics, vary in terms of size, from only a few cases (or “instances”) up to over 43 million and from only 1 or 2 variables (or “attributes”) to over a million variables.
Designed to provide speech data for the acquisition of acoustic-phonetic knowledge and for the development and evaluation of Automatic Speech Recognition systems. Contains a total of 6300 sentences, 10 sentences spoken by each of 630 speakers from 8 major dialect regions of the United States.
These datasets include diverse topics from recognizing objects to reconstructing a 3D room, from finding a person in a video to identifying a shirt in a photo. The datasets can be sorted by published date or topic, and users can search with keywords to locate images appropriate to their needs.
More than 3,000 Machine Learning Datasets. Find datasets by task and modality, compare usage over time, browse benchmarks and more.
Open-source datasets for Computer Vision Machine Learning models across a wide array of domains- animals, board games, self-driving cars, medicine, thermal imagery, aerial drone images, and even synthetically generated data. You can freely download images and annotations in any format: VOC XML, COCO JSON, YOLOv3 flat text files, even TFRecords.
Use these open datasets to build facial recognition applications, virtual reality gadgets, sensory detection, holographic imaging and much more.