X LEEL International Workshop

X LEEL International Workshop

Spontaneous speech corpora compilation and segmentation methodologies

Belo Horizonte (UFMG), March 23rd and 24th, 2017

The compilation of speech corpora, as well as their segmentation, have become crucially relevant themes in linguistic studies.

Corpus Linguistics applied to speech data requires the input from phonetic studies in order to better understand how speech should be segmented, that is, a precise understanding of acoustic parameters that mark the perception of boundaries is needed. One of the biggest challenges in the compilation of large-scale spoken corpora lies in the chore of speech segmentation, and its automatization, at least partially, would be enormously beneficial methodologically. On the other hand, spontaneous speech corpora perceptually segmented by humans comprise an important source of data for phoneticians in their development and training of software aimed at detecting relevant acoustic parameters for the detection of prosodic boundaries and their complex typology (terminal, non-terminal, non-terminal with continuity marks, etc).

The X LEEL Workshop gathers both expert researchers working on the compilation of spoken corpora of diverse languages, namely Brazilian Portuguese, Afro-Asiatic and Angolan languages, as well as those engaged in the study of acoustic parameters that carry boundary perception.