corpora.io

teaching computers indigenous languages

Corpora.io is an opensource app designed to collect spoken corpora—vocal recordings—to train computers to understand spoken languages through machine learning. This project started in an effort to automate the transcription of thousands of hours of te reo Māori archives available at tehiku.nz and to enhance access to Māori media. We also wanted to bring te reo Māori, and other indigenous languages, to voice operated digital assistants like Siri.

Available Languages

Translations for corpora.io are provided in the following langauges:

  • Māori

Add another language by translating this app.