Associação Luso-Brasileira de Ciências da Fala

Relations between Speech Sciences and Industry

Oliver Niebuhr
CIE - Centre for Industrial Electronics

How to cite
[Niebuhr, O. 2021. Relations between Speech Sciences and Industry. In: Verbetes LBASS. Disponível em: http://www.letras.ufmg.br/lbass/.]

Engineering, Computer Science, Business/Management and Law are among the most attractive study subjects for a long time already; and the majority of public research funding nowadays flows into the areas of healthcare (e.g., biotechnology), physics (e.g., nanotechnology), engineering (e.g., energy systems), and economics. The speech sciences do not appear in such popularity lists, or if they do, then they are often located at the lower end.

This is somewhat hard to understand. Ultimately, it is the symbolic representation of meanings in letters or sound shapes, embedded in a recursive syntactic structure, which makes us what we are: humans. The communication system of human language works, irrespectively of its actual language-specific form, as a catalyst for the mind. The system allows us to replace abstract or complex referents with single linguistic symbols. And when we think, we can use these symbols instead of the actual referents. By means of this replacement procedure, language systems increase our cognitive performance by orders of magnitude and also accelerate it. The symbols of a language also affect the precision of the perceptual performance itself, and the way in which information is processed and filtered in the brain. It is likely that even our reception organs have adapted to the acoustic and visual requirements of language stimuli in the course of evolution. Thus, language shapes us and our very view of the world.

Language also gives us an identity. We cannot develop any kind of relationship with people or things that have not been named. Marketing experts know that and give every product, service, or manufacturing method a name. And who could imagine a successful scientific theory (or virus) without a name? Letters and especially sound shapes also tell us who we are and to which social or cultural group we belong – and to which not. The voice, in particular, is something very intimate. Age, gender, state of health, mood, emotions, social hierarchy, stress, fatigue, personality traits – all that and much more is contained in a speaker's voice. How much this is true, we often only notice when we lose our voice (temporarily), when we discover how our pets react to our voices (and we to theirs), or when someone criticizes our voice or tone-of-voice and asks us to change it. And although machines have learned to speak to us, their voices and speech melodies often make us laugh because they still sound awkward and inappropriate.

Against the background of all these aspects (and one could mention many more), the speech sciences should actually rank a lot higher on the study and funding lists of the world. A main reason why this is not the case is that engineering and applied sciences – and, thus, the general public as consumers of their products and services – usually are unaware of how rich and complex the communication system of human language actually is (e.g., that is does not end at the word), and how deeply it is woven into our biology, physiology, cognition, perception, and behavior. But, let's admit it, speech scientists haven't really done much to change anything about this lack of awareness. This awareness gap can only be closed if we, as speech scientists, actively seek partnerships with industry, i.e. with providers of products and services; and if we focus our R&D work on questions that are directly or indirectly relevant to these providers. This is not a statement against basic research. Rather, it is a statement in favor of basic research whose individual steps do not get lost in details and end in mere academic sparring exercises. Also basic research should always be clearly geared towards solving a problem of everyday life. Based on this mindset, I will give some examples in the following that will hopefully inspire readers and show what are perhaps the most fertile areas for speech scientists to collaborate with industrial partners.

Sound symbolism: The advertising industry is constantly faced with the challenge of giving new products the literally "proper" names or existing products other names that work just as well in countries and cultures outside the product's own region of origin. Research in the speech sciences has accumulated a considerable amount of know-how in the area of sound symbolism, which can be very effective in solving these naming challenges. What makes a product sound, e.g., robust, strong, fast, and elegant, or either round or angular, or large or small? Appropriate names reflect the core properties of products, and we as speech scientists can help develop such names. We have also begun to understand what underlies the impression of a rhythmical, fluent sound shape, i.e. why expressions like "ping pong" or "ding dong" sound better than "pong ping" or "dong ding" and why Germans say "arrow and bow" (Pfeil und Bogen), whereas English speakers prefer the exact inverse order "bow and arrow". The same underlying principles determine that "salt and pepper" are preferably referred to in the same order in both languages.

Noise Treatment: Noise or noise pollution is one of the biggest problems in the modern world. And it just keeps getting bigger. So, how can this challenge be met effectively? Engineers relate noise primarily to a single physical measure: acoustic energy (dB). This goes so far that dB values alone determine and specify the approval and use of products in markets. However, research is providing increasing evidence that noise perception is surprisingly complex, and that dB levels are not the only important factor of it. One further factor of noise perception is how much the noise interferes with speech communication. Noise that disrupts this communication stresses us much more and is rated to be much more annoying than noise that does this less or not at all; and the amazing thing is that this also applies even if those who perceive the noise are not engaged in speech communication at all. That is, recent results from experimental research suggest that the way we assess noise is (also) determined by the phonemes of our language. One and the same noise stimulus receives different stress and annoyance ratings in languages that have a lot of their main functional-acoustic load in frequencies below 2 kHz, e.g., due to rounded vowels or retroflex consonants, than in languages that use a lot of fricative phonemes and, thus, have their main functional-acoustic load in frequencies above 2 kHz. Such connections between sound structures of languages and noise perception open up huge opportunities for smart, innovative and efficient noise treatments; opportunities from which speech scientists should appropriately benefit. Furthermore, know-how about speech production can prove immensely useful as well. At least one case is known in which the understanding of the links between the articulation and acoustics of fricatives has helped tailor and reduce the noise of a jet engine.

Sound design and haptic feedback: Sound design and haptic feedback are not only playing an increasing role in the automotive industry. They are a billion dollar business. In the robotics industry too, it is becoming increasingly important which sounds the moving parts of robots produce. Do these sounds come across as friendly and trustworthy? Or do they scare us and arouse our suspicions? Research in these areas takes place largely without speech scientists – although we know from prosody research fairly well how the acoustic settings of sounds are associated with emotions and moods, and how this is related to universal biological codes, i.e. to sound-meaning links that apply very generally and, perhaps, across all sound-generating "entities". Exemplary experiments determined the emotions and personality traits that listeners associated with different cello strings, once by playing these strings normally on a cello and once by using them as larynx signals, i.e. as the different "voices" of a speaker. Comparisons between the sounding and the talking cello strings show that it is same acoustic-prosodic patterns that arouse specific emotions in us and that we associate with specific traits. That is, acoustic-prosodic patterns of human emotions and personality traits can basically be transferred to things/industrial products (at least to a certain extent). And that actually makes all speech scientists powerful advisors for industrial sound design. It is similar with haptic feedback. A pressed button on a touch screen should vibrate such that the haptic feedback feels round or edgy, or large or small? The acoustic of speech sounds is a powerful source of ideas to achieve this development goal.

Talking machines: The situation in this major area of industrial partnership is actually quite simple. Research on speaker charisma is a representative example of the fact that acoustic-prosodic patterns can be transferred from humans to machines (including non-anthropomorphic ones) and unfold similar effects there. For example, in the recent past, the more charismatic prosodic profile of Steve Jobs and the less charismatic prosodic profile of Mark Zuckerberg have been applied to the synthesized speech of robots and navigation devices in cars. The result was always the same: Those robots and devices equipped with Jobs' more charismatic profile were able to induce human interaction partners to fill out longer questionnaires, eat healthier foods, take the stairs instead of the elevator and take detours with their cars (in the case of the navigation device). Up to now, engineers have always focused on one thing: word intelligibility. However, this is changing currently with devices like Alexa, Siri etc., which are not only meant to provide us with requested information but also to become our "friends". This new focus forces engineers to enter the huge field of prosodically encoded social functions – a field that is completely new and hard to understand for them, but a familiar playground for speech scientists. This is a great opportunity for speech scientists to help engineers lift talking machines to the next level of their evolution and to make them "friends by design" for us humans. Something similar also applies to the avatars and non-player characters in computer games.

Public speaker training and voice-based personality analyses: There are already countless companies that earn their money by analyzing speech samples and, on this basis, giving HR departments recommendations about the promotion and hiring of employees. The analyzed speech samples are mapped onto personality traits and are, thus, intended to provide a deep insight into the value and areas of application of employees. Such systems are almost without exception AI-based and trained with large amounts of commercially available data. The same applies to assessment and feedback systems that are used in public speaker training. Such developments are potentially dangerous in that all these systems are created by engineers who typically do not have a profound linguistic and phonetic background. Accordingly, common statements from rhetorical advice literature are taken over without any critical reflection (e.g., a low-pitched voice is often thought to be more charismatic, although phonetic research has shown the opposite for 20 years); and measures for the phonetic analysis of the speech samples are often chosen unfavorably (e.g., speaking rate measured in words per minute, although the morphosyntactic complexity and hence the duration of words strongly varies with speaking style or audience). There is a great need for the know-how of speech scientists in this area of automatic/machine-based speaker assessment, and significant amounts of this know-how already exist. It is just waiting to be skillfully used and implemented.

The bottom line of all examples and explanations provided above is that humans have always had the peculiarity of humanizing things, whether they are anthropomorphic or not. Applied to the speech sciences, one could say that humans unconsciously and inevitably perceive and interpret the signals of everything that vibrates and/or makes sound against the background of the form-function framework of their language. This claim may turn out to be exaggerated or too sweeping. However, researching and determining the limits of this claim will be just as exciting in the future as its implementation as a heuristic guiding principle in the cooperation of speech scientists with industry. Let's be a little pragmatic and see where and how far our pragmatism leads us; hopefully to a point where the speech sciences rank far higher in the study and funding lists of this world than they currently do.

Which avenues give us access to industrial collaborations and partnerships is a difficult question, and perhaps one that is so culture-specific that the author of this text should perhaps better not attempt to make recommendations. However, a few strategies exist that are likely to work well in general. One of the most important strategies is probably also one of the simplest: Publish open access. Industry and media representatives do not have access to the publications of the dominant publishers – and do not want to either. However, the path to industry frequently leads through the media. They arouse interest in the obtained findings and explain them and their relevance to industrial partners in simple, richly illustrated words. Open access publications ensure that research can gain attention beyond the academic community. Another effective strategy are open days and industry days, which are organized by the respective research institution (e.g., the university). Such events are the rule in the technical and medical faculties around the world, but they remain an exception in the humanistic faculties. This is a wasted opportunity. Another strategy can be match-making events, at which advanced students or graduated students are brought into contact with representatives from the industry. Such match-making events, e.g., organized in connection with workshops or conferences, can prompt students to choose industry-related topics for their theses. These students are then often employed in the respective companies later on – and, in this way, create a natural, sustainable cooperation with industry partners. Who could be a better door opener to a company than one's own former student or the employee at the company who co-supervised the student's thesis? Emails to generic company addresses like contact@... or info@... are usually a very ineffective way of initiating a cooperation with industry partners.

Finally, a word of caution: In the context of an industrial cooperation, questions like "May I publish this?" or "To whom does the obtained intellectual property belong?" can quickly arise and cause unforeseen problems for both the researchers and the partnership between them their industry contacts. Universities and other research institutions usually have a legal department for this purpose, which should definitely be involved before the cooperation yields concrete results. The knowledge acquired by the speech sciences over the past decades and in the future is precious and should be treated as such. That is, it must not be given away for free. Nevertheless, one should not be afraid of industrial cooperation. It often opens up completely new perspectives and ideas for one's own work. And cooperation with industry is different: it is faster, more pragmatic, and overall more result-oriented. It often follows the 80-20 rule. That is, in 20% of the time one can go 80% of the way towards the goal of a R&D project, whereas one has to invest another 80 % of the time to make the last 20 % of the way. Companies are typically satisfied with the 80 % journey, researchers are often not. As a researcher you should be prepared for this, and you have to accept it. After all, this leaves a lot of room for your own basic research and the next R&D journey with industry partners.

Recommended further readings

Fischer, K., Niebuhr, O., Jensen, L. C., & Bodenhagen, L. (2019). Speech melody matters—How robots profit from using charismatic speech. ACM Transactions on Human-Robot Interaction 9(1), 1-21.

Hinton, L., Nichols, J., & Ohala, J. J. (Eds.). (2006). Sound symbolism. Cambridge: Cambridge University Press.

Hoque, M., Courgeon, M., Martin, J. C., Mutlu, B., & Picard, R. W. (2013). Mach: My automated conversation coach. Proc. 2013 ACM international joint conference on Pervasive and ubiquitous computing, 697-706.

Jee, E. S., Jeong, Y. J., Kim, C. H., & Kobayashi, H. (2010). Sound design for emotion and intention expression of socially interactive robots. Intelligent Service Robotics 3(3), 199-206.

Jekosch, U. (2005). Assigning meaning to sounds—semiotics in the context of product-sound design. In Communication acoustics (pp. 193-221). Berlin: Springer.

Klink, R. R. (2001). Creating meaningful new brand names: A study of semantics and sound symbolism. Journal of Marketing Theory and Practice 9(2), 27-34.

Lercher, P. (1996). Environmental noise and health: An integrated research perspective. Environment international 22(1), 117-129.

Michalsky, J., & Niebuhr, O. (2019). Myth busted? Challenging what we think we know about charismatic speech. Acta Universitatis Carolinae Philologica 2019(2), 27-56.

Mohammadi, G., & Vinciarelli, A. (2012). Automatic personality perception: Prediction of trait attribution based on prosodic features. IEEE Transactions on Affective Computing 3(3), 273-284.

Moore, D., Martelaro, N., Ju, W., & Tennent, H. (2017). Making noise intentional: A study of servo sound perception. Proc. 12th ACM/IEEE International Conference on Human-Robot Interaction, 12-21.

Niebuhr, O., S. Tegtmeier, & A. Brem (2017). Advancing research and practice in entrepreneurship through speech analysis – From descriptive rhetorical terms to phonetically informed acoustic charisma metrics. Journal of Speech Science 6, 3-26.

Özcan, E., & van Egmond, R. (2008). Product sound design: An inter-disciplinary approach?

Schuller, B. (2011). Voice and speech analysis in search of states and traits. In Computer Analysis of Human Behavior (pp. 227-253). London: Springer.

Stallen, P. J. M. (1999). A theoretical framework for environmental noise annoyance. Noise and Health 1(3), 69.

Staples, S. L. (1996). Human response to environmental noise: Psychological research and public policy. American Psychologist 51(2), 143.

Tennent, H., Moore, D., Jung, M., & Ju, W. (2017). Good vibrations: How consequential sounds affect perception of robotic arms. Proc. 26th IEEE International Symposium on Robot and Human Interactive Communication, 928-935.

Trinh, H., Asadi, R., Edge, D., & Bickmore, T. (2017). Robocop: A robotic coach for oral presentations. Proc. ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 1(2), 1-24.

Yorkston, E., & Menon, G. (2004). A sound idea: Phonetic effects of brand names on consumer judgments. Journal of Consumer Research 31(1), 43-51.