Silent personal reading emerged in the 18th and 19th centuries, coinciding with the wider circulation of printed books and pamphlets, according to literacy scholars. Before this period, reading was typically performed aloud, even in solitude.
Text-to-speech (TTS) software seems to be returning us to that read-out-loud practice. Our texts can be read to us by smartphones and other devices, at variable speeds and with different voices and intonations — and while we are multitasking or on the move.
Developers of text-to-speech software have recruited AI to synthesise ever more convincing reading styles, mannerisms, and emotions.
I’ve been experimenting with the state of the art. Here’s an excerpt from a blog post I wrote called Audience disengagement published in 2012 on the subject of mood. (I amplified this theme in my book Mood and Mobility: Navigating the Emotional Spaces of Digital Social Media published in 2016.) It is fair to say the following text is written to be read, silently. (The term “influencers” is a recent edit.)
Social media influencers who want to inform, persuade, and motivate could do worse than follow Aristotle’s advice in The Art of Rhetoric. To persuade an audience you need to establish rapport, which requires you to understand the emotional condition of the audience, to know how they are feeling. Know your audience and present to them accordingly.
For Aristotle this emotional orientation involves being “disposed in a certain way” — the speaker to the audience, and the audience towards the speaker. Latch on to the mood of the crowd. Get the mood right and the job of persuasion is almost done. Ideas, memes, stories, and arguments exert their influence in the context of the moods people are in. People have to be in the mood to respond to whatever influences come their way, whether going with the crowd or resisting it.
Were I to write a business manual, I would overstate the case thus: put your efforts into mood management and the solutions will follow. Of course, managing moods is a complex, difficult, subtle and contingent process. Mood management focuses on esprit de corps, solidarity, and communities rather than individuals. If you get the mood right then you’ve almost solved the problem. The audience is already persuaded. The team is motivated and near to the solution. The hard work is done. The rest is easy.
SSML (Speech Synthesis Markup Language)
Some TTS (text-to-speech) software allows the writer to insert tags that direct the software to adjust tone, emphasis, style, mood and emotion. From my brief experimentation the current iteration of the tools has limits, or at least I’ve found it hard to master the interface to entry level versions of the tools.
It also takes substantial effort to annotate an entire document with emotion tags. Fortunately, with its access to semantic embedding techniques, ChatGPT and other LLMs can do the tag-work taking account of the content of the document as a whole.
Re-written to be read out loud
The first challenge in adapting written text to appeal to listeners is to adapt the text to something that is easier to read out loud. I asked ChatGPT to revise the text above so that it would sound more conversational. I wanted some hesitancy, as if the speaker is making it up as they go along.
The first iteration had a US flavour, so I asked for a version that was was friendly but more British. Then I called on the AI to insert simplified emotional tags (SSMLs) of the kind supported by the current version of ElevenLabs TTS software. Here is the AI-revised text.
If your aim is to [calm]inform, persuade, or motivate[/calm] — [pause] whether in business, on social media, or simply [hesitant]in everyday conversation[/hesitant] — it’s worth going back to some rather old advice. [pause] Aristotle’s Art of Rhetoric is ancient — [slight_amused]very ancient[/slight_amused] — but surprisingly relevant, even now.
His point is [slight_emphasis]really[/slight_emphasis] quite straightforward: if you want to persuade people, you first need to [slight_emphasis]connect[/slight_emphasis] with them. That means [reflective]understanding your audience[/reflective] — [pause] particularly their emotional state. How are they feeling? [short_pause] What’s the general mood? [pause] [soft]If you can sense that, and respond to it[/soft], you’ve already made [upbeat]real progress[/upbeat].
Aristotle puts it in terms of being “disposed in a certain way” — [pause] both the speaker towards the audience, [short_pause] and the audience towards the speaker. [hesitant]In other words…[/hesitant] you have to tune in to your listeners, [pause] and ideally, [slight_emphasis]encourage them[/slight_emphasis] to be favourably disposed towards you. [pause] Get the mood right… and persuasion becomes [soft]so much easier[/soft].
It’s the same with ideas, [slight_pause] arguments, [slight_pause] and stories. [pause] They don’t exist in a vacuum — they land [short_pause] or they don’t — depending entirely on the [emotional]mood[/emotional] of the people listening. [pause] [soft]If people aren’t in the mood[/soft], even the [upbeat]best[/upbeat] arguments can fall flat. [short_pause] If they are in the mood, [slight_amused]well[/slight_amused], very little effort may be required.
If I were writing a business book, I might put it like this: [pause][slow]get the tone right… and many problems simply [soft]melt away[/soft].[/slow] [hesitant]I’m slightly overstating it, [pause] of course[/hesitant], but there is [calm]something in that[/calm].
Managing mood isn’t about [slight_emphasis]manipulating[/slight_emphasis] individuals — [pause] it’s about creating [slight_emphasis]a sense of collective purpose[/slight_emphasis], [soft]a shared outlook[/soft].
And when the atmosphere’s right… [long_pause] well, [pause] much of the [upbeat]hard work is already done[/upbeat]. People are more open, [slight_pause] more engaged, [slight_pause] and far more willing to act. [pause] [soft]The rest just follows.[/soft]
Here it is read out loud by the ElevenLabs software. The reader is one of the cloned versions of my own voice, with the accent changed from Australian to British academic it seems! Other than pauses, the effects of the emotion tags seem subtle. ElevenLabs state that their V3 version of the software that supports these tags is in beta.
References
- Coyne, Richard. Mood and Mobility: Navigating the Emotional Spaces of Digital Social Networks. Cambridge, MA: MIT Press, 2016.
- Manguel, Alberto. A History of Reading. New York: Viking, 1996. (The Silent Readers)
Note
- Featured image is by ChatGPT: generate an image of a row of empty theatre seats front on, and in a derelict building; wooden seats front on in elevation (not perspective); a longer row of old victorian wooden seating in a derelict building; slatted seats in a defunct victorian music hall at the seaside; change the backdrop to a seaside setting; some chairs are broken. it’s an old disused setting
Discover more from Reflections on Technology, Media & Culture
Subscribe to get the latest posts sent to your email.