Course 1. Speech and Multimodality in Human-Human and Human-Machine


March 8, 15, 22 and 24. From 2 PM to 4 PM (Brasília time)

Nicolas OBIN. Science and Technologies of Music and Sounds (STMS Lab - Ircam, CNRS, Sorbonne Université)

Thomas PELLEGRINI. University of Toulouse, IRIT

Catherine PELACHAUD. Institute des Systèmes Intelligents et de Robotique (ISIR), CNRS, Sorbonne Université

The course is divided in 6 parts. Parts 1 and 2 will introduce the fundamentals of speech processing and generalistics. This will start by presenting the mechanisms and acoustics of speech production, to its digital representation as a speech signal and its time-frequency representation as a spectrogram. Then, the source-filter representation of a speech signal will serve to present the parametric decomposition of a speech signal (F0, intensity, noise, resonance filter). Discussing the various domains and levels of speech communication, this session will end by detailing the functional aspects of speech communication, providing some elements of phonetics and prosody. Then, we will focus on presenting an overview of speech synthesis and text-to-speech synthesis. This session will start by presenting speech coding (vocoder and linear predictive coding) as historically used in speech synthesis; then presenting the evolution of text-to-speech synthesis from unit-selection and concatenation text-to-speech (Festival), to parametric speech synthesis (HTS and hybrid TTS), and modern neural text-to-speech synthesis (Tacotron and Wavenet). A particular focus will be made on the evolution of the NLP preprocessing tools required in TTS, from raw text, grapheme-to-phoneme conversion, syntactic tagging, and semantic and sentiment analysis. Also, this session will discuss and illustrate the differences between symbolic-AI TTS (set of rules defined by expert humans as derived from linguistic knowledge) to connectionist-AI TTS (the “rules” are learned directly from the data).

Parts 3 and 4 will present an overview of the automatic tools and models used to transcribe speech at various levels: phonemes (called phones), characters and words. An historical point of view will be given. More practical aspects will also be presented: the existing tools, in particular the open-source ones as it is the trend of the scientific community to try to share the resources and tools. Applications will be discussed and the limits of the current technologies will be invoked.

Parts 5 and 6 will focus on multimodal communication, in particular on expressive socially interactive agents SIAs which are virtual entities able to interact with humans through verbal and nonverbal behaviors. We will start by introducing the basis of affective computing and nonverbal behaviors, their communicative functions and signals. Then we will present a general architecture for human-agent interaction. Different computational models of communicative and emotional behaviors (facial expressions, hand gestures, gaze) will be presented. SIA ought to act as speaker and listener. Models for turn-taking mechanisms and backchannels will be discussed.


Course 2. Understanding and using scripts for Prosody Research

April, 4, 6, 8, 11 and 13. From 2 PM to 4,30 PM (Brasília time)

Plínio A. Barbosa, UNICAMP

Gustavo C. P. Silveira, UNICAMP

Day 1

Elements of programming

What is a program?

Types of programming languages

Basic elements of a programming language

  • Variables and data structures
  • Operators
  • Flow control
  • Inputs and outputs

Praat scripting

  • Integrating Praat commands and programming language
  • Examples with a script for prosody research

Day 2

Introducing prosody and prosody research

Rhythm, intonation, and voice quality: what are they?

Prosody, function, and form

Measuring prosody:

  • Prosodic units: between manual and automatic segmentation and labeling
  • Using the AlinhaPB and BeatExtractor scripts.

Day 3

Understanding and using scripts for prosody research

  • Automatic segmentation of stress groups and their hierarchy (SGdetector and Salience detector).

Day 4

Obtaining prosodic measures with the Prosody Descriptor Extractor script

Presentation of and their meaning

            Melodic measures

            Rhythmic measures

            Pause-related measures

            Voice quality measures


Day 5

Fully dedicated to exercises


Course 3. Experimental design and analysis in speech and language sciences

May 9, 11, 13, 16, 17. From 2PM to 4:30 PM (Brasília time)

Daniel SILVA (Universidade Estadual de Minas Gerais)

Day 1. Introduction

- Hypothesis testing and the reasoning behind the experimental method: how experimental studies differ from non-experimental studies?

- Terminology and crucial distinctions: manipulation; control; effect; error; types of variables according to their role in the study; types of variables regarding their measurement levels).

- Internal and external validity of experiments.


Day 2. Simple experimental designs in speech sciences 

- Planning, implementing and analyzing single, two-level factor designs with either continuous or discrete dependent variables.

- Another crucial distinction: comparing between independent samples versus comparing between repeated measurements.


Day 3. Multi-level factor designs and dose-response designs

- How to deal with independent variables with more than two levels? Analysis and interpretation: the reasoning behind ANOVA and multiple comparisons.

- How to deal with dichotomous dependent variables. The logistic-regression approach.


Day 4. Multi-factor designs 

- Experimental designs with more than one independent variable. 

- Exploring and interpreting main effects and factor interactions.


Day 5. A slightly more advanced concluding session

- Fixed effects, random effects, and the “language as a fixed effect fallacy”: how can mixed-effect regression models help us? 

- A brief word about non-parametric tests and Bayesian approaches. 


Course 4. L1 and L2 phonological acquisition: interaction between segments and syllable structures
June 20, 22, 24, 27, 29. From 2 PM to 4,30 PM (Brasília time)

Maria João FREITAS, Chao ZHOU (Universidade de Lisboa, CLUL)

Nonlinear phonology assumes that segments interact with prosodic structures in language processing. More specifically, segmental distribution is licensed by syllabic constituents: for example, in Portuguese, all [+consonant] nodes are possible as singletons, while only /s, ɾ, l/ are legitimate in coda. In this course, we will focus on the syllable - segment interaction, which has been very often reported in studies on L1 and L2 phonological acquisition. In each module (L1; L2), we will:

(i) introduce the main research questions that have been explored in the literature;

(ii) highlight the relevance of production data (less explored over the last years) for the research on language acquisition (the relationship with perception data will be explored in the L2 module);

(iii) demonstrate how the use of theoretical models can contribute to the understanding of language acquisition and of the nature of phonological representations.


Course 5. Intonational analysis: phonetics, phonology and communicative functions
October, 10, 12, 17, 19. From 2 PM to 5 PM (Brasília time)

Luciana LUCENTE (Universidade Federal de Minas Gerais)

Day 1. Intonation and prosody; f0 measures; acoustic analysis of f0; spontaneous and laboratory speech on intonational analysis.

Day 2. The phonology of intonation; methodological issues on intonational annotation; practice of annotation. 

Day 3. Intonation and communicative functions, pragmatics, discourse and syntax; data analysis; practical exercises.

Day 4. Discussion of the data analysis.


Course 6. Applied Phonetics: Between Speaking a Second Language, Speaking in Society and Solving a Crime
November, 3, 4, 8, 9 and 10, from 2 PM to 4:30 PM (Brasília time)

Leônidas J. SILVA Jr., UEPB



This workshop proposes exercises covering Acoustic Phonetics techniques to address three applications that concern the characterization of the speech from individuals and minority communities: i) in which aspects the L2 speaker's pronunciation differs from the native pronunciation of a foreign language; ii) in which aspects the speech of internal migrants living in a different dialectal region can be affected by the daily interactions that they have with native life-long residents, and iii) in which aspects the speech of a criminal differs and is similar to that of a group of suspects. Everything (Issues/Content) will be conducted in a didactic language with the aim of opening the layperson's (non-expert’s) horizons to the areas of application of Phonetics.

Day 1: Prosody: what is it? Prosody variation across and within individuals/communities.

Day 2: L2 Prosody research: notions, methodology and examples.

We will present an overview of how Acoustic Phonetics and L2 Prosody interact during the recognition of a foreign accent through correlates, such as: stress, rhythm, intonation and voice quality of the target-L2. We will foster this phonetic knowledge with well-introductory examples applied in L2 pronunciation teaching and L2 speaker verification. As for the methodology, we will highlight procedures that account for the use of acoustic-prosodic parameters in the determination of foreign speech.

Day 3: Sociophonetic research: notions, methodology and examples.

We will discuss how Acoustic Phonetics and Labovian Sociolinguistics can be integrated to investigate the social conditionings of prosodic patterns in different speech communities. In particular, we will focus on theoretical and methodological issues concerning the acquisition of new prosodic features by mobile speakers who interact with speakers from other dialectal regions.

Day 4: Forensic phonetic research: notions, methodology and examples.

We will present the main issues concerning Speaker Comparison (SC) as related to similarity of speakers and typicity of phonetic phenomena. For doing so a protocol with recommendations and scripts for helping the task of SC will be proposed. 

Day 5: Hands-on day: online exercises.


Course 7. Acquisition and loss of prosodic abilities

December, 5, 6, 9, 12 and 16. From 2 PM to 4,30 PM (Brasília time)

Patrizia SORIANELLO (Università degli Studi di Bari, Italy)

The course will focus on two closely linked aspects concerning speech prosody: (1) the role of intonation in language acquisition and (2) the role of intonation in speech disorders.

After a general introduction about intonation, in both phonetic and phonological perspective, and its linguistic and paralinguistic functions, the course will introduce and discuss with empirical data the following topics:

1) the development of intonation in first language acquisition (L1)

2) the development of intonation in second language acquisition (L2)

3) the neurological/motor bases of prosodic disorder

4) production and perception of speech prosody in language disorders


Courses in 2023 (seven courses foreseen)

Details about dates and topics will be provided by September 2022. So far, the following scholars have given their availability:

Donna Erikson (Hasting Lab, Yale)
Modeling the articulatory system (March)

Anabela Rato (University of Toronto, Canada), Mëi-Lan Mamode (University of Toronto, Canada), Angélica Carlet (Universitat Internacional de Catalunya, Spain) 
Can you train your brain to listen better? The effects of phonetic training on L2 speech learning
May 16, 18, 23 e 25 de maio de 2023, 1 PM (Brasília time)

Marina Vigário (Universidade de Lisboa – CLUL)
(June and July)

Plínio Barbosa (UNICAMP)
Experimental Phonetics

Marcelo Vieira (McGill University)
Programming with Praat

Júlio César Cavalcanti (UNICAMP)
Some topic on Audiology and/or Speech pathology 

Heliana Mello, Tommaso Raso, Bruno Rocha (UFMG)
Spontaneous spoken corpora compilation and annotation

