Challenges
Beat Generation and Cleaning
Importing the beats into music21 didn't turn out to be too complicated. When we loaded the Groove dataset into music21, we discovered that they were mostly represented like a normal score consisting of part objects, with measure objects, with notes, etc. However, we noticed that sometimes, instead of Part objects, the data was loaded into Voice objects. In addition, single notes were sometimes parsed as Chord objects. There were also some beam objects on our score that remained when we removed some notes, but since those didn't affect the final audio output, we didn't bother to remove straggling beams.
Another issue we had was some beats not having a full 4 bars of content. Instead, some would have their 4th bar missing some number of beats. The result was that the beat part of our composition would shift off rhythm with the chords and melody because everything would come in slightly earlier when beats were missing. We fixed this issue by scanning for these weird bars with less than 4 beats, and then adding the appropriate amount of rests to them.
A final issue here was that our beats sometimes had too much groove for their own good. We noticed, especially in the Jazz beats, that kicks and snares would fall on odd rhythms. These beats made sense when you used all of their percussion together, but when you only used the kick and snare, they sounded weird and off-rhythm. Our solution to this was to quantize the beats to 16th note rhythms.
Harmony Detection
The current harmony detection is incapable of matching the right chords under key changes. The subsystem currently makes the strong assumption of monotonality. This hypothesis implies that a whole piece is under one single key, so our code could then transpose and normalize the pieces. As such, one notable concern with the extracted harmonies was that, it did include most melodies under the I and V chord, which are arguably the most prevalent harmonies, at least within Western music, but it did also include rather unusual chords, such as Italian chords, which did exist, but some pieces displayed them when they should not have due to not storing the proper bass for the inversion, which is a feature to be included. This would make more precise harmonies (though, we may want to have a primary model for the base harmony and then a separate secondary model for choosing different ``flavors'' of that harmony; this will help provide more data since more melodies would belong to the same base and be better for chord similarity as more chords will map to the same chord instead of being disjoint, reducing the number of ``complex'' harmonies in the primary model, and then we can select a particular harmony and function and increase the probability of such corresponding melodies. We would need to be careful with this since it would also map C7 and C to C major, despite having very different harmonic functions, so we may instead consider joining harmonies by functions or intervals, such as Db7 and G7 being nearly equally functioning chords).
In particular, however, some chords would emerge due to key changes. Imagine the bII6 chord (Neapolitan chord): it would be found as a bII major chord that could have emerged due to a modal transposition or particular key change. Note that because it currently represents chords (to be hashable) as the pitch set, it would treat the subdominant chord equally to tritone substitution despite having different functions due to having different contexts (is dominant to the dominant chord versus being dominant to lead back to the tonic, respectively). The II major chord may also, for example, emerge from a presumed Lydian mode, when it should actually be the V/V chord, being the dominant in the dominant key, which may occur from a key change.
As such, it would be best to use the music21.analysis.floatingKey.KeyAnalyzer tool to handle polytonality. The idea would be to run it on single measures (or even smaller for very quick key changes) and then repeat with increasing levels of smoothing to handle various degrees of harmonic structure and rhythm. Ideally, we could identify the harmonic rhythm as the first step so we can determine the right order of smoothing such that we can capture key changes at different rhythms. We can also use the cadence extractor to help with identifying modulations.
Especially when the tonality of the piece is ambiguous due to the use of common chords, ASATO currently looks for any chord symbol and works on multiple at a time if they both apply at the same timing. We can extend this by determining the many different tonalities at different times and then recovering the transposition for those different tonalities per timestep to normalize the chords. For example, we may have a ii chord which would be the ii in that key but also v chord in its dominant key, so we can recover not just proper key changes but also even get additional harmonic data to train the Markov model on since we would then support different valid interpretations of the harmony that would equally be capable of generating melodies for that harmony (since the ii chord may have melodies that are ``predominant'' as ``dominant'' preparations to a dominant chord, which is identical functionally to ``dominant'' preparations to a tonic chord, if in the dominant key).
Modes and Tonality
Another challenge was handling the different modes and the possibility of rather funky scales (imagine the whole-tone scale). Those were prevalent in some of the blues and jazz pieces in our dataset, but it would be ideal to include them. Right now, it assumes only the seven modes of the major key, so we could make this extendable by allowing arbitrary key ``signatures'' (such as WWWWWW). It is unknown if \lstinline{music21} supports this so far, but there are correlation tools inside. Regardless, the idea would be to first identify the pitch classes for finding the scale shape. From that, we can then find the keys/modes within the piece, with different (potentially overlapping) subsequences extracted depending on the tonality at that moment. Once found, the ASATO analyzer should work as is, perhaps with a minor change to chord representation to support and integrate these generalizations.
Additional features that can be added are to improve flexibility for datasets. Chords could also be generated from non-annotated pieces by looking for ``chordal'' notes within the melody that fall on the downbeat to then build a signal that represents how likely a given note would belong to this chord. Another feature would be to also maintain a model for different genres for the piece so that we can use that context and maintain the same context as we prepare the Markov model, increasing dimensionality but allowing us to not borrow, say, a jazz chord progression in what was intended to be classical. The model currently assumes that all melodies and harmonies are under a single genre, so we may use a harmony found in one genre but then predict another harmony that is of highest probability but was provided by a piece in the dataset of a different genre. Most genres will stick to a subset of pitches and chords in relation to each other, so our approach may have some ``rough'' spots since it works off the entire space of possible pitches and chords in a given state, instead of a partition of that space. Maintaining genre would thus help to make the piece smoother in context and not mix-and-match across the dataset. However, the classifiedd dataset was homogenous in genre; it would be hard to categorize the pieces by genre, but, perhaps, we can run a clustering algorithm in the future and then annotate the genres manually or use some API out there that already has these clusters defined.