Hacking through Irish trad

Does computationally analyzing a set of Irish traditional tunes help identify a learning progression?

The typical progression for learning a new thing is to practice at it a lot, but sometimes identifying where to begin is a trick. What are the things you need to know first, and what are the things that are less important, and you can come to later?

Irish traditional music has long fascinated me, and I think for some of the same reasons as human languages and programming languages. There’s lots of structure, lots of variation, and the process of ‘acquiring’ Irish music is similar to acquiring a new language. Tunes are mainly passed on by ear, though notation is a handy resource. It’s hard to even imagine how many tunes the genre may contain: The Session, a database of Irish tunes, recordings, and a discussion forum, has thousands submitted. Most musicians may actively know a couple hundred tunes by memory, and have passive recall of many more. There’s a tune for everything.

Identifying patterns

While looking around for resources for learning piano accompaniment for Irish trad, I came across Vashon Celtic Tunes. The programmer in me saw one big plaintext file, I mused over how I could get something out of it, so I came up with this…

One of the pain points with learning accompaniment in a genre with too many tunes for one person to learn, is that it is impossible to know everything, but, as you learn more, you learn the general patterns to be able to intuit your way through it. Guitarists at sessions do this all the time, but it is a little tricky if you’re more of a beginner: you may not know what patterns to expect— so why not find a way to identify them automatically?

I wrote a quick and dirty script in Python that reads all the chord descriptions and tunes, and for each tune type and key, sorts all the tunes by chords present. Here’s a plain and simple example that’s probably apparent to 99% of Irish trad musicians: jigs in D!

After sorting, we find the two largest subcategories, the second is very similar to the first, but with a less frequent Em chord to spice things up. These may essentially be the same category, really, but the script doesn’t know that.

Jigs in D (57 tunes)

D, A, G

- Behind the Haystack: D (42%), A (40%), G (19%)
- The Black Rogue: D (47%), A (31%), G (22%)
- Dennis Murphy's: D (47%), A (38%), G (16%)
- Gillian's Apples: D (59%), A (28%), G (13%)
- Haste To The Wedding: D (53%), A (34%), G (13%)
- Jerry's Beaver Hat: D (75%), A (16%), G (9%)
- My Darling asleep: D (63%), A (22%), G (16%)
- The Rambling Pitchfork: D (59%), A (25%), G (16%)
- Sean Bui: D (63%), A (25%), G (13%)

D, A, G, Em

- Gallant Tipperary: D (42%), A (23%), G (20%), Em (14%)
- The Rose In The Heather: D (56%), A (19%), G (19%), Em (6%)
- Sarah's Jig: D (52%), A (31%), G (13%), Em (4%)
- Shandon Bells: D (63%), A (22%), G (9%), Em (6%)
- Smash The Windows: D (66%), A (19%), G (9%), Em (6%)
- When Sick, Is It Tea That You Want?: D (59%), A (25%), G (9%), Em (6%)

There are some interesting other things to be found though, a healthy subset of these D jigs prefer to emphasize a Bm over the more common G, or use it almost equally:

D, A, Bm, G

- The Cordal Jig: D (41%), A (31%), Bm (19%), G (9%)
- Paddy's return: D (44%), A (19%), Bm (19%), G (19%)
- The Three Note Jig: D (44%), A (38%), Bm (13%), G (6%)
- Tripping Up The Stairs: D (38%), A (31%), Bm (25%), G (6%)

For reels in D, there’s a similar pattern, but something interesting emerges: the second most frequent chord for tunes with three common chords can vary: sometimes A is overwhelmingly the second most common, and sometimes it’s G.

Reels in D

D, A, G

- The Blackthorn Stick: D (66%), A (25%), G (9%)
- The Cameronian: D (59%), A (34%), G (6%)
- Grren Mountain: D (66%), A (28%), G (6%)
- Hand Me Down The Tackle: D (59%), A (25%), G (16%)
- Humours Of Westport: D (59%), A (28%), G (13%)
- Katie's Reel: D (47%), A (34%), G (19%)
- Lady Anne Montgomery: D (69%), A (22%), G (9%)
- Last Night's Fun: D (66%), A (22%), G (13%)
- Merry Blacksmith (Paddy on the Railroad)): D (56%), A (38%), G (6%)
- Morpeth Rant: D (50%), A (28%), G (22%)
- Mountain road: D (63%), A (25%), G (13%)
- Paddy on the Railroad (Merry Blacksmith): D (56%), A (38%), G (6%)
- Petronella: D (59%), A (28%), G (13%)
- Providence Reel: D (59%), A (22%), G (19%)
- Roaring Mary: D (59%), A (25%), G (16%)
- The Wild Irishman: D (59%), A (38%), G (3%)
- The Wise Maid: D (63%), A (19%), G (19%)

D, G, A

- The Boyne Hunt: D (69%), G (19%), A (13%)
- Boys of the Lough: D (59%), G (28%), A (13%)
- Danny Meehan's: D (47%), G (28%), A (25%)
- John Brennan's: D (56%), G (25%), A (19%)
- Lady On The Island: D (56%), G (25%), A (19%)
- McDermott's: D (60%), G (33%), A (6%)
- Miss Monahan's: D (59%), G (25%), A (16%)
- Rolling In The Rye Grass: D (63%), G (25%), A (13%)
- Saint Anne's Reel: D (59%), G (22%), A (19%)
- Tom Billy's: D (59%), G (22%), A (19%)
- Wind That Shakes The Barley: D (50%), G (31%), A (19%)

Two additional subsets also include Em in this: those with D, A, G, Em, and those with D, G, A, Em.


Obviously, 500 is a small sample to work with when considering the possibility that this is only a 5th of what exists, or even less. But, this concept scales up: if you were to include another 1,000 tunes, you would likely find the same categories.

Similarly, the use of “keys” is not completely accurate: keys and modes would be better, and perhaps identify more understandable and expected patterns, but, the data doesn’t always provide a mode (though I’m sure it can be predicted).

All the data

Go take a peek at all the data on Github, along with the script to generate. Also, you’ll find some notes and other things in that repo that I’m writing up as I learn…