thrall (a. regina cantatis): Getting the Most Out of the Speech Synthesizer

On the Writing Your Own Scripts page of this tutorial, I explained how to use some of the most useful bits of code - and included this screenshot from an induction I wrote. No doubt it raised a lot of questions in people's minds. For instance, what was up with all those extra commas and hyphens, not to mention the strange spelling of unfocus (You did notice that, didn't you? ;-))? Well, like the note at the top of the screenshot says, it's all about getting the most out of the speech synthesizer.

Once again, let me remind you that the speech synthesizer is a Microsoft product, not something Follow the Watch created for use with Virtual Hypnotist. And the speech synthesizer has its own peculiar quirks.

Now, if you don't really care how robotic the sythesizer sounds when it reads your scripts, then by all means, just ignore this page - or better yet, download a professional recording, or even record your own voice for use with your personalized sessions. A recorded voice is always better than the synthesizer.

On the other hand, it's a lot easier to change a few words or lines of a text script than it is to re-record a whole 30-minute-long custom audio file. And if you're willing to put in some effort, you can get pretty nice results out of the speech synthesizer. You can make it sound like a real, entranced human being; and if you're like me, imagining yourself being turned into a hypnotized drone by another hypnotized drone makes for some pretty intense sessions.

So if that's how you feel, then read on.

I've been keeping notes as I've experimented with VH, and these are the most important things I've learned about the speech synthesizer, in no particular order:

When you're writing or modifying a script, test it a little bit at a time, as I describe on the Writing Your Own Scripts page. Testing is just as important in getting the most out of the speech synthesizer as it is in proofing your code.

The synthesized voice usually goes up before a \pau\ but down before a comma or period.

Ending a sentence with a question mark usually doesn't produce a very realistic-sounding question. I've found that questions sound much more question-like if you just end them with a period.

Using the \emp\ tag even once, anywhere within a "speak" line, causes that whole line to sound more and more robotic with every word spoken. I have not found any way to stop this from happening, so I suggest that you just don't use \emp\ at all, except when invoking triggers (It's not much of an issue there, since triggers are so short). There are other ways to put more emphasis on particular words. For instance....

Very short words like a and to and are usually don't get enough emphasis with the speech synthesizer. However, you can increase the emphasis on these and other words by attaching them to the word before and/or after them with a hyphen (example: you-are-already). Try a few different tests to see what works best in any given instance, because context does matter. Also, I don't recommend connecting more than three words with hyphens; any more than that, and they start to sound robotic.

You can also use hyphens to get rid of unwanted pauses between words. For instance, there's a line in the VH "long induction" that goes, "just as if you were trying to convince somebody that you were absolutely sound asleep." The speech synthesizer kept reading it as, "Just as [pause] if you were trying...." So I connected as and if with a hyphen ("just as-if"), and the speech synthesizer read it properly.

By the same token, you can force a pause where the speech synthesizer doesn't put one by inserting either a comma (if you want a downward inflection on the last word before the comma) or a \pau\ (if you want an upward inflection). A pause of 150-250 milliseconds will be right for most purposes.

If you have a really unusual word and can't get the effect you want with a pause or a comma or a hyphen, try a combination. For instance, if you used a comma to create a downward emphasis, but the new punctuation causes the speech synthesizer to pause too long before continuing, try putting a \pau=25\ after the comma to shorten the pause.

Again, context matters. For instance, the speech synthesizer will sometimes read the word perfect as PERfect, and sometimes as perFECT. I have found a workaround for this particular word, which is to spell it as purrfect when I want the emphasis on the first syllable but the speech synthesizer insists on emphasizing the second syllable. With other ambiguous words, sometimes I just have to change to less ambiguous words. This is something you just have to take on a case-by-case basis.

The speech synthesizer pronounces the short "u" sound in words like unconscious and unfocused with too little emphasis. I've found that the best way to correct this is to write all short "u" sounds as "uh". That's why you see unfocus written as "uhn-focus" in the screenshot. So if you wanted to use a word like cut, I'd suggest writing it as "cuht."

Sometimes the speech sythesizer's eccentricities work in your favor. For instance, the words command and obey always come out with a lovely forcefulness. On the other hand, I have not found any way to make the speech synthesizer say the word ate/eight without making it sound harsh and robotic, so I've just stopped using that word in my scripts altogether.

With unusual words that the speech synthesizer doesn't recognize, you generally have to test a few different phonetic spellings (often with hyphens between the syllables, and sometimes with the comma/short pause combination I mentioned above) to make them sound the way they're supposed to sound. For instance, I have discovered that the speech synthesizer pronounces Lady Ru'etha's name more or less correctly when I write it as "Lady Roo-eh-thuh."

If you study the preinstalled scripts on VH, you'll see that many of them have phrases and sentences separated by long series of periods. I'm not sure how the people who wrote these scripts got this to work, because when I tried to do the same thing, the speech synthesizer said "point" as soon as it hit the second period. It doesn't do that with the preinstalled scripts, but it also doesn't consistently recognize long pauses created this way. So stick to the \pau\ command and don't mess with the extra periods.

Another way to get more oomph out of the speech synthesizer is to change the speed, volume, and/or pitch of various words. You'll find explanations and examples of each effect in my Quick Reference Guide for Coding VH Scripts.

Note that a very small reduction in speed can make a very big effect, so use this code carefully. In my opinion, the best use of it is to slow down a single word that the speech synthesizer tends to pronounce too quickly - for instance, the word very.

You can also make your trigger much, much more effective if you drop the binaural frequency from alpha or theta to delta on the line just above the trigger, (Refer to my Quick Reference Guide for details), then raise it again just below the trigger - and at the same time, drop and raise the speed, volume, and pitch of the speech synthesizer. Here's an example of what I mean, copied from one of my personal scripts:

And here's the explanation:

To make the trigger enhancement work properly, you need to undo everything you did, immediately after the trigger is spoken. That's what makes the trigger stand out so well in the context of the session. So, for instance, I dropped the volume by 20000, then brought it back up by 20000.

For the session I took this code from, the binaural frequencies started at 128 on the left side and 133 on the right. So as you can see, I'm dropping the binaural beats from theta to delta, just for the duration of the trigger, and bringing them back to their starting point.

Note that the command to reduce the speed is the last code before the trigger, and the command to increase the speed is the first code after the trigger. Trust me, this is for the best. You don't want to stretch the slowdown any further than you have to.

Don't assume that the exact values from my example will be the right ones for your session. You might have to play with them a bit. Go to your "Speech Synthesis" tab and make a note of your starting settings for volume, pitch, and speed. Your goal is make significant changes to those values without taking the changes too far to be effective. For instance, the maximum volume allowed by Virtual Hypnotist is 65535. If the value you put in for your volume change would take the volume up over 65535, it won't work; the volume will actually go down. And if the number you put in for your change takes the volume too close to 65535 and you play that thing through your earbuds, I promise you won't enjoy the results.

With that in mind, here are some general notes:

If your starting volume is somewhere in the middle range and you raise it by 20000, you should get good results.
If your trigger is only one word, whatever your starting speed is, lower and raise it by exactly that number. If your trigger is two or three words, you'll have to play around with the values to find the right adjusted speed.
Don't lower and raise the pitch by very much. A small number can make a huge difference.

Well, those are all the tips I have with regards to the speech synthesizer...so far. As I said above, the best way to get the most out of it is to test your script a few lines at a time. Play with the commas, the pauses, the hyphens. Play with whether to break sentences apart with "speak" commands or whether to line up several of them behind a single "speak." It takes some work, but if you don't have the luxury of recording mp3's for every session, it's definitely worth the effort.

*Virtual Hypnotist is a freeware program created by Follow the Watch, not by me, and the program includes its own help files. You can download Virtual Hypnotist here.