Inserting tones into a toneless text

tl;dr: Being able to automatically annotate texts goes beyond just morphological tagging. Why not add some "missing" phonetic or phonological data?

As part of my masters thesis, I’ve been working on a Somali morphological analyzer and a syntactic disambiguator. A short introduction for anyone reading who doesn’t know what these things are is: software that can tell you what the function of a word is in the sentence, and, when multiple posible functions exist, it chooses the one that is correct from context. In English for instance, the word ‘can’ can be both a auxilliary verb as well as a noun; but we English speakers know which is which when we hear the word in context.

In the case of Somali (and many languages), some forms are ambiguous in text that would not be in speech due to intonational and stress information. For Somali however, this means that information on number of nouns (éy ‘dog’ vs. eý ‘dogs’) and sometimes gender of nouns (masculine vs. feminine) is marked via tone. It is easy to imagine then, that when generating speech from text, producing better sounding (and grammatically sound) Somali speech would require being able to know where the tones are in a text.This is where these analytical tools come in handy… And conveniently, tonal patterns in Somali are mostly rule-based.

Naagta laybreeriga wax ku qoraysa ayaa soo socota.

The woman who is writing in the library will come.’

After the morphological analyzer runs, we end up with input like the following:

naagta  naag+N+Fem+Sg+Def+Abs+Prox

laybreeriga laybreeri+N+Masc+Sg+Def+Abs+Prox

wax wax+N+Masc+Sg+Indef+Nom
wax wax+N+Masc+Sg+Indef+Gen
wax wax+N+Masc+Sg+Indef+Abs
wax wax+Pron+Indef+Abs

ku  +Nom+Prox
ku  ku+Adp
ku  ku+Pron+Pers+2Sg+Obj

dhex    dhex+N+Fem+Sg+Indef+Gen
dhex    dhex+N+Fem+Sg+Indef+Abs

qoraysa qor+V+Prog+3SgF+Ind+Pres+Red+Abs

ayaa    ayaa+CS+Foc/L+Subj+Null

soo soo+PP+Deic

socota  soco+V+3SgF+Ind+Pres+Red+Abs

There are a couple items that need to be removed here, and disambiguation is carried out by constraint grammar. Casting out the ambiguous possibilities in context rewards us with the following analysis:

"<naagta>"
    "naag" N Fem Sg Def Abs Prox 
"<laybreeriga>"
    "laybreeri" N Masc Sg Def Abs Prox 
"<wax>"
    "wax" Pron Indef Abs 
"<ku>"
    "ku" Adp 
"<dhex>"
    "dhex" N Fem Sg Indef Abs 
"<qoraysa>"
    "qor" V Prog 3SgF Ind Pres Red Abs 
"<ayaa>"
    "ayaa" CS Foc/L Subj Null
"<soo>"
    "soo" PP Deic 
"<socota>"
    "soco" V 3SgF Ind Pres Red Abs

… And these disambiguated forms can then be fed back into the morphological analyzer/generator to get the proper tone marking.

naágta laybreériga wax ku dhéx qóraysá ayaa soo socotá

I am a little unsure of the tone marking on dhéx (and in fact, ayaa should probably have a stress-tone on it too, as well as soo), but in any case, this was all carried out automatically, and these things may be fixed. Being able to provide input like this to a text-to-speech program would result in something a little less monotonous, and pleasing to the ear.

As the analysis progresses, it would even be possible to assign places where pauses are necessary, or where the ends of certain clauses are accompanied by boundary tones. … There are also some other relevant phonological phenomena that could be processed in this manner and included in text-to-speech input.

Now that that’s out of the way, does anyone know of some nice, open-source text-to-speech software that is open for use with any language and not just the largest ones?