Speech synthesis

sje · Post by **sje** » Sun Aug 18, 2013 4:32 pm

Speech synthesis

In the Old Days, one could buy a dedicated chess computer with speech synthesis capability. I thought that this was a nice feature and so I added it to my old program Spector back in 1987. At the time, I used a separate speech synthesis box with a 6502 CPU and connected via a serial link to do the actual sound generation.

Today, things are much simpler. Symbolic has a utility rouine which calls the system() routine to run a Unix speech generator. On Mac OS/X (and maybe all other OpenBSD platforms), the speech application is "say". On Linux, the application is "spd-say". In both cases, the command is called with a single argument which is the quoted string to say.

Example:

Code: Select all

say "The white bishop at square c4 takes the black pawn at square f7 giving check."

zullil · Post by **zullil** » Mon Aug 19, 2013 2:05 am

sje wrote:Speech synthesis

In the Old Days, one could buy a dedicated chess computer with speech synthesis capability. I thought that this was a nice feature and so I added it to my old program Spector back in 1987. At the time, I used a separate speech synthesis box with a 6502 CPU and connected via a serial link to do the actual sound generation.

Today, things are much simpler. Symbolic has a utility rouine which calls the system() routine to run a Unix speech generator. On Mac OS/X (and maybe all other OpenBSD platforms), the speech application is "say". On Linux, the application is "spd-say". In both cases, the command is called with a single argument which is the quoted string to say.

Example:
Code: Select all
say "The white bishop at square c4 takes the black pawn at square f7 giving check."

I learn the oddest things on this forum. Wasn't aware of the OS X say command:

Code: Select all

SAY(1)                     Speech Synthesis Manager                     SAY(1)



NAME
       say - Convert text to audible speech

SYNOPSIS
           say [-v voice] [-r rate] [-o outfile [audio format options] | -n name:port | -a device] [-f file | string ...]

DESCRIPTION
       This tool uses the Speech Synthesis manager to convert input text to
       audible speech and either play it through the sound output device
       chosen in System Preferences or save it to an AIFF file.

OPTIONS
       string
           Specify the text to speak on the command line. This can consist of
           multiple arguments, which are considered to be separated by spaces.

       -f file, --input-file=file
           Specify a file to be spoken. If file is - or neither this parameter
           nor a message is specified, read from standard input.

       --progress
           Display a progress meter during synthesis.

       -v voice, --voice=voice
           Specify the voice to be used. Default is the voice selected in
           System Preferences. To obtain a list of voices installed in the
           system, specify '?' as the voice name.

       -r rate, --rate=rate
           Speech rate to be used, in words per minute.

       -o out.aiff, --output-file=file
           Specify the path for an audio file to be written. AIFF is the
           default and should be supported for most voices, but some voices
           support many more file formats.

       -n name, --network-send=name
       -n name:port, --network-send=name:port
       -n :port, --network-send=:port
       -n :, --network-send=:
           Specify a service name (default "AUNetSend") and/or IP port to be
           used for redirecting the speech output through AUNetSend.

       -a ID, --audio-device=ID
       -a name, --audio-device=name
           Specify, by ID or name prefix, an audio device to be used to play
           the audio. To obtain a list of audio output devices, specify '?' as
           the device name.

       If the input is a TTY, text is spoken line by line, and the output
       file, if specified, will only contain audio for the last line of the
       input.  Otherwise, text is spoken all at once.

AUDIO FORMATS
       Starting in MacOS X 10.6, file formats other than AIFF may be
       specified, although not all third party synthesizers may initially
       support them. In simple cases, the file format can be inferred from the
       extension, although generally some of the options below are required
       for finer grained control:

       --file-format=format
           The format of the file to write (AIFF, caff, m4af, WAVE).
           Generally, it's easier to specify a suitable file extension for the
           output file. To obtain a list of writable file formats, specify '?'
           as the format name.

       --data-format=format
           The format of the audio data to be stored. Formats other than
           linear PCM are specified by giving their format identifiers (aac,
           alac). Linear PCM formats are specified as a sequence of:

           Endianness (optional)
               One of BE (big endian) or LE (little endian). Default is native
               endianness.

           Data type
               One of F (float), I (integer), or, rarely, UI (unsigned
               integer).

           Sample size
               One of 8, 16, 24, 32, 64.

           Most available file formats only support a subset of these sample
           formats.

           To obtain a list of audio data formats for a file format specified
           explicitly or by file name, specify '?' as the format name.

           The format identifier optionally can be followed by @samplerate and
           /hexflags for the format.

       --channels=channels
           The number of channels. This will generally be of limited use, as
           most speech synthesizers produce mono audio only.

       --bit-rate=rate
           The bit rate for formats like AAC. To obtain a list of valid bit
           rates, specify '?' as the rate. In practice, not all of these bit
           rates will be available for a given format.

       --quality=quality
           The audio converter quality level between 0 (lowest) and 127
           (highest).

ERRORS
       say returns 0 if the text was spoken successfully, otherwise non-zero.
       Diagnostic messages will be printed to standard error.

EXAMPLES
          say Hello, World
          say -v Alex -o hi -f hello_world.txt
          say -o hi.aac Hello, World
          say -o hi.m4a --data-format=alac Hello, World.
          say -o hi.caf --data-format=LEF32@8000 Hello, World

          say -v '?'
          say --file-format=?
          say --file-format=caff --data-format=?
          say -o hi.m4a --bit-rate=?



1.0                               2010-09-23                            SAY(1)

sje · Post by **sje** » Mon Aug 19, 2013 8:14 am

Here's the calling code:

Code: Select all

void Speak(const std::string& str)
{
  std::ostringstream oss;
  std::string app;
  
#if (HostOsApple)
  app = "say";
#endif

#if (HostOsLinux)
  app = "spd-say";
#endif

  oss << app << " \"" << str << ".\"";
  system(oss.str().c_str());
}

Notes:

1. The call to system() will block the calling thread. There are ways to prevent this.

2. The Linux speech application might pronounce the ending period explicitly as "dot"; something here needs adjustment.

3. If the Speak() routine were to be called from multiple threads, it would need a mutex.

4. Having never seen this particular use of system() fail, the return code is not checked. Maybe this should be changed on general principles.

5. The primary use of the routine is for the Move method Speak(), but it's used other places too, as at the end of any long processing action, a mate-in-N announcement, or when an exceptional condition occurs.

6. At present, I am able to resist the temptation to add spoken insults, profanity, and the like.

7. An idea is to add speech recognition and then have two computers in the same room play against each other communicating only by voice. This would surely annoy my cats.

sje · Post by **sje** » Mon Aug 19, 2013 4:36 pm

There can be a problem with the use of the system() call in some cases. I ran a test with my Linux box with 16 GB RAM and got a crash when calling system(), apparently because more than half the RAM was in use by the perft() transposition table and this was just too much for the fork() call buried somewhere in the system() routine. The problem was fixed by deallocating the transposition table prior to emitting speech with a call to system().

An alternative to using system() is to call platform specific library routines directly to generate speech. I used to do this on a Mac, but I don't know the equivalent Linux routines.

Henk · Post by **Henk** » Mon Aug 19, 2013 6:33 pm

I hope it doesn't sound like the opening line of this song:

AlvaroBegue · Post by **AlvaroBegue** » Mon Aug 19, 2013 7:40 pm

Here's a better story about how someone learned about the `say' command (see the mouse-over text): http://xkcd.com/530/

sje · Post by **sje** » Tue Aug 20, 2013 2:08 pm

More issues:

1. If the speech application is called via system() before a prior invocation completes, then the audio stream from the prior call may be truncated or garbled.

2. Speech output should probably be deactivated if the program is running in batch mode.

3. The old Mac OS/X speech routine which I had used was SpeakString(). It has been deprecated in the most recent Mac OS/X release. There is a replacement of sorts, but it's in the Objective C/C++ library and so is not directly callable from C++.

4. In Linux, the only way of avoiding the use of the spd-say application is to steal from its source to write one's own speech utility routines.

5. I have no idea how one might do any of this in Windows.

6. Speech recognition on the Mac is done by supplying a list of strings to a library routine along with an audio snippet; the routine returns the index of the best match. Supplying a list of all possible moves encoded in verbose format would make this work. Note that the free Chess application provided with Mac OS/X has speech recognition.

7. To have two programs communicate using audio, there are alternatives to speech. One of these is audible telegraphy code. Another would be musical notes, perhaps like R2D2 talk (although I very much disliked Star Wars). And then there's the possibility of using feline or canine noises which would certainly agitate the household pets.

Speech synthesis

Speech synthesis

Re: Speech synthesis

The calling code

Re: The calling code

Re: Speech synthesis

Re: Speech synthesis

More issues