Steady On Tim

From SizeCoding
Jump to: navigation, search

Steady On Tim: a bytebeat journey

by Gasman / Hooy-Program

This is a walkthrough of the process behind creating Steady On Tim, It's Only A Budget Game, my bytebeat entry for Lovebyte 2022. Final version on

There are lots of ways to approach bytebeat. Some people like to experiment with functions and formulae by trial and error in the hopes of stumbling upon something that sounds interesting. Others will focus on the pure audio synthesis aspects, using those functions to shape the waveform. Personally, I prefer to stick to simple chiptune-like sounds - my interest is in building musical structures out of those mathematical foundations - piecing together those building blocks of melodies, harmonies and rhythms with a predetermined goal in mind. I'm definitely not saying it's the one true way to do bytebeat - just the one that fits my mindset.

1. The basics

Let's start with the simplest building block of all - a square wave.


(Set the sample rate to 44kHz. We've got the quality, we might as well use it.) Here we're just taking bit 7 of the time variable, which will flip on and off every 128 samples, giving a value that alternates between 0 and 128 - in other words, a square wave of amplitude 128. (Or 64 if you want to be pedantic about the definition of 'amplitude' - but since we're working in the range 0 to 255, and adding our waves to a baseline of zero rather than going into negative numbers, it really does make sense to think of the amplitude as the total height of the wave.)

Want to change the frequency? Just divide or multiply t by a constant:


As you can hear, each doubling takes us up an octave. So since we're working with powers of two, let's make that explicit:


Now steps of less than an octave are easy too - an octave divides equally into 12 semitones, so we increase the exponent in steps of 1/12.


I haven't made any effort to ascertain what these notes correspond to in real-world concert pitch terms, and that doesn't matter. Our tuning can be whatever we want it to be, and we're free to shift things up or down until they sound right. We can arbitrarily define C as the note we get for an exponent of 0, meaning that C sharp corresponds to 1/12, D is 2/12 and so on.

So now we have the makings of a musical scale, the next step is to play a melody in it. At this point, we'll want to settle on a tempo, and it'll be useful to introduce a new variable T to serve as our 'beat counter' - a scaled version of our global time counter t that counts up by one each beat.


Here (T%8|0) is repeatedly counting up in integers from 0 to 7, giving us an ascending scale - the |0 is a nice compact way to cast T (which is still a float at this point) to an integer. We'll see this pattern coming up a lot. Actually, in this case we can exploit the fact that 8 is a power of two, and replace %8 with &7, which gives us the integer casting for free...


And while we're on the subject of crunching bytes, we can use 1e4 as a shorthand for 10000:


But let's not get distracted - back to our melody. Rather than just looping over 0...7 as our note values, let's use that as an index into an array of proper note values - again, counted in semitones - that make up a real melody:


Now, I don't know about you, but all those commas trigger my "eww, that's a lot of bytes for not much information" reflex. A trick I like to use at this point is to pack those values into one big integer, and use some bit shifting to do the equivalent of an array lookup and pull out the appropriate value:


Let's break this down: 0x20447542 is our note sequence, encoded one hex digit per note, lowest bits first (so in reverse order from our original array). T&7 is our loop counter. Multiply that by 4 and that's how many bits we need to shift our magic number right to get the appropriate digit into the lowest bits. Then apply &15 to extract that digit, and we're done. Except all that bit shuffling has ended up two bytes longer than it was before. Not to worry, let's convert our magic number to decimal so we don't have the overhead of the 0x...


Still one byte over? Bah. I'm not giving up yet, though... all our note values are under 8, so there's no real reason (other than familiarity) why we had to encode them as hex digits. Let's crank things up a notch and encode them in base 8 (which means shifting 3 bits at a time):


That's more like it. But we're getting distracted again, since that isn't even the real melody we're going for. Time to pause for a moment and set our eyes on the real goal, which is to steal the descending riff from Tim Follin's LED Storm soundtrack and develop that until we run out of bytes.

2. Melody

In lightly simplified terms, the LED Storm riff is a sequence of C - B - G - E - D which then repeats an octave further down. Being a repeating sequence of five notes has the neat side effect of seeming more complicated than it really is (always a good trait for impressing people with size coding...) because it's out of step with the plain 4/4 time signature - the C isn't going to land on the downbeat each time, which disguises the repeat. Even better, after that initial run completes - 32 notes in all - the following phrases in the music are exactly the same, just shifted down a whole tone.

Let's go back to one of our earlier iterations, slot in that five note sequence (but count our semitones downward from octave 3, because that feels more natural here), and increase the tempo a bit:


Add an extra modifier to drop down an octave every 5 beats:


We want this to loop after 32 beats, so let's introduce a new 'local' timer variable b that counts from 0 to 31:


And change the starting octave to 4 so that it's not dropping into sub-bass territory quite so much:


Now let's add in the shift down that happens on each phrase. The phrase number is given by T divided by 32, or T>>5, and each time we want to descend by a whole tone, or two semitones, which is 1/6 of an octave.


Instead of descending forever, let's just take the bottom two bits of our phrase number with &3 so that it loops after four phrases:


On second thoughts, I don't really like that jump back to the start after four phrases - it feels too abrupt. So instead of descending a tone each time - an offset of 0, -1, -2, -3 - let's go back up, so it's 0, -1, -2, -1. From a bit of fiddling around (the Developer Tools console in the browser is your friend...) the shortest formula I can come up with that takes 0, 1, 2, 3 as input and returns 0, -1, -2, -1 is (1-N)%2-1. So let's plug that in:


Feels a bit wordy for what it is, but let's move on. One thing that will make this really shine is an echo effect - playing another copy of the melody at a lower volume on a delay. To do that, we'll move our whole formula into a function so that we can pass T as a parameter. These days Javascript has a nice compact notation for function definitions: sum = (a,b)=>a+b - so let's define a function M (for Melody) and immediately call it:


(We're being a bit scrappy by using T as both a local variable in the function and a global variable outside it, but hey, that's demoscene code for you.) And now let's add a second copy of the melody, timeshifted by 3 beats (by all means experiment with other offsets, but I like the sound of this best) and with the result divided by two to halve the volume:


Sounding pretty lovely now. We definitely need to keep that melody contained in a smaller pitch range, though. Let's do what the original tune does, and have it fall then rise. That's pretty easy, since we've got the variable b to keep track of how far through the phrase we are - we just need to tweak it so that rather than going from 0 to 31, it ping-pongs from 16 to 0 and back to 16:


OK, so we're upside down now, because I've overcompensated for the whole 'descending' thing... easily done. One stint of sign-swapping and list-reversing later, we arrive at:


A couple of size optimisations jump out now - we can do the hex packing trick from before, aided by the fact that the last element of the list (and thus the most significant digit) is 0, and is thus going to give us a smaller magic number:


0x158a converts to 5514 - there's something very satisfying about converting a chunk of note data into a single number like that, isn't there?


Now I notice that the |0 in (b%5|0) has just been wasting space ever since we introduced the b variable, since it's already an integer...


We can also move the assignment of the (global) variable T into the place where it's first used:


3. Bassline

Time to move on to something new. The bassline we're aiming for has a rhythm that goes "doot doot, diddle oot doot" where the "doot"s are low C and the "diddle" is a high Bb, C. Or, expressed as semitone increments and dots for rests, the repeating pattern is:

   0 . 0 . . . 10 12

We haven't had to think about rhythm yet, because our melody was "always on", so we'll tackle that next. Starting again from our rising square wave scale, but keeping T as our beat counter:


We want to turn notes on and off at the appropriate beats, which we can do by multiplying the wave by 0 or 1:


And, of course, that packs nicely into binary:


We've turned T&7 back into T%8 because the % operator has higher precedence, ensuring that it happens first, then the >>, then the &. (To keep things moving along I've not gone into operator precedence much here, but when you try this for yourself, be prepared to do a lot of adding and removing of brackets.) Our binary constant becomes 197:


So that's the rhythm sorted - now, how to get it playing at the right pitches? We need to drop in a proper formula in place of the (T&7)/12. Since all the semitone values we're aiming for (0, 10 and 12) are even, we can simplify things by changing the /12 to /6, so that we're counting in whole tones instead:


Now we're aiming for the pattern: 0 . 0 . . . 5 6. We don't care what value the formula comes up with when the note is turned off, so we can conveniently make everything in the first half (i.e. whenever bit 2 of the beat counter is 0) into zero by applying a logical AND:


And for everything else, we can just subtract 1 from the beat counter, so that beats 6 and 7 give the results 5 and 6.


Now, sawtooth waves tend to work nicely for basslines, and there's an easy way of achieving that - change the &128 to &127, so that we're preserving all the lower bits of the time counter and seeing the value ramp up:


Also, bearing in mind that we're shortly going to mix this back in with the melody line - which is already well above 50% of our volume limit - we should reduce the volume so that we don't end up overflowing (resulting in a whole lot of distortion). We could do that by dividing the result by a constant, but a shorter option is to change the &127 to &63, dropping off the most significant bit:


Those last two steps have had the side effect of moving us up a couple of octaves, so let's subtract a few to get us back into bass territory.


Now we're ready to mix that back in with our lead instrument:


It's getting a bit lost in the mix, so let's reduce the volume on the lead instrument and echo:


To add some progression to the music, we'll let the lead melody run through once on its own - 4 phrases of 32 notes each, so 128 notes total - before bringing in the bass. We can do that by adding a T>128 condition to the bassline expression:


Or equivalently, for one byte less, T>>7:


4. Drums

Or rather, "drum", because we're going to have to be selective about what we put our bytes towards, and I think a snare drum to fill in that gap in the bassline will be enough to make it feel complete. I think this is actually the first time I've done drums in bytebeat, but from my AY chiptunes, I know that a snare drum is high frequency noise going down to low frequency noise, where frequency means "how often you pick a new random number". That definition is problematic for bytebeat, because everything is computed anew on each sample - we don't get to hold on to previously-chosen random numbers. Instead, we're going to need a pseudo-random function that predictably returns the same output for the same input. I figure we can steal a trick from the shader coders, and scale up a sine function by some huge factor and just take the fractional part:


Yep, that's definitely noise. And if we do an integer division on t before taking the sine, we can get it to return the same result for a certain number of consecutive samples, giving us our lower-frequency noise:


So, we just need to find a way to make that parameter increase over time. We'll take our trusty beat counter variable T and introduce a new variable D that resets to 0 at the start of every snare drum hit - that's every 8 beats. Scale that to something in the range 5...40, and you ought to get something like this:


Except that doesn't seem to be right - the descending frequency gets slower on each repeat. I guess using t as the random seed while also modulating it by a variable derived from t makes the randomness cancel out, or something? Through some very unscientific hacking around, I stumble upon this version that avoids that problem:


We also want the volume to fade out over time, going from 1 to 0 as D increments from 0 to 8, which means multiplying by (1-D/8):


Sounds more like a gunshot than a snare, but let's plough on regardless. Returning to our last version of the bassline on its own:


we can add in our snare drum expression (with its volume divided by 8 to avoid overflowing):


We want the snare to be on the offbeat, though, so let's offset our variable D by 4 beats:


And now let's fold that snare drum expression back in to the main tune, taking care to add it inside the T>>7 clause so that it comes in at the same time as the bass.


5. Interlude: a chance discovery

Playing around with changing parameters and operations - either in an attempt to shave bytes off, or just experimentation - can often take you to unexpected places. I found that if you change the b%5 that selects a note for the lead instrument into a b%4, you can get a whole new melody out of it - since the octave shift is still happening every 5 notes, it ends up out of step with the note selection, and so the effect is more like a musician improvising around the scale we've set up rather than just playing it straight.


This is too good an opportunity to pass up, so let's make that into a conditional expression based on T>>8, so that we switch from %5 to %4 after 256 beats. Combined with the T>>7 from earlier, that means we get four phrases of just the lead melody, then the bassline comes in for four more phrases, after which we switch to the second melody.


That's 9 bytes well spent, I think.

6. Arpeggios

We're getting perilously close to the 256 byte limit now, but there's still one element lacking from the true Follin experience - those chippy fast arpeggios. We're aiming for a C major seventh chord, which will be represented by the notes B - E - G. (In the great chiptune music theory tradition of cutting out everything non-essential, we can leave out the C and rely on our brains to fill it in.) Counting in semitones, that's -1, 4, 7. This will get transposed down then up in the subsequent phrases, just as the lead melody line does.

We'll use the same building blocks as for the melody - the only difference is that when we're picking notes from the sequence, we're cycling through them more rapidly than once per beat. With some experimenting, we find that T*4 (i.e. advancing to the next note four times per beat) gives us the bloppy sound we're looking for:


Rather than playing continuously, we want these to be short 'stabs', so we'll do the same thing as we did for the drums to make the volume fade out - (4-T%4)/4 to count down from 1 to 0 as T%4 increases from 0 to 4:


To get it transposing up and down for each phrase, we'll steal the formula that we used for the melody - ((1-(T>>5&3))%2-1)/6:


Now let's mix that into the rest of the tune. We'll insert it as part of the M function, so that the echo is applied to it as well:


Having two copies of that tone-shifting expression is a bit wasteful, so we'll assign the result to a new variable K to be re-used the second time round:


That rhythm definitely needs livening up. Ideally I'd want to do what we did for the bassline, and define a nice syncopated rhythm (One-two-three, One-two-three, One-two...) as a binary sequence, but I can tell we're not going to have space to fit that in. Happily we can get pretty close just by changing the timing to trigger every 12th beat rather than every 4th - the echo effect helpfully contributes some 'ghost' notes to make it sound more interesting.


It makes sense for these arpeggios to become another element that we introduce later on in the song to add a bit of progression, with another of our T>>8 tricks. The sequence is now going to be: lead melody, then bring in the bassline on beat 128 (T>>7), then the arpeggios on beat 256 (T>>8), and finally switching to the second melody on beat 512 (T>>9).


7. Final crunching

We're now at 278 bytes, so we need to claw back 22 bytes somehow. After lots of searching, I failed to find any further optimisations that didn't affect the output (or, more truthfully: I found a bunch of them, but I've already folded them into the narrative above to not make it more long-winded than necessary). One possibility, saving two bytes, is to change the note selector for the arpeggio from T*4%3|0 to T*4&3 - with the side effect of making it sound more 'choppy', as it's periodically fetching a non-existent 4th element from the array:


This actually doesn't sound terrible, but I slightly prefer the original sound. I'll keep that option in reserve if those two bytes are what it takes to get over the wire... under the wire? Whatever.

No, we're going to have to be more ruthless about what to cut. There are a few elements that could be considered non-essential, like the arpeggios and the second melody, but the final product is riding quite heavily on them and it would be a massive disappointment to drop them. On the other hand, if there's one thing in here that isn't returning enough value for the bytes invested, it's got to be the snare drum. Removing it entirely makes it sound awkwardly empty, but as long as that gap is filled by some sort of percussive sound, it doesn't matter too much whether that's a snare drum or a moose's fart. All that effort on doing a smooth transition from high frequency to low frequency noise is a bit wasted, so let's replace our fiddly sine-based random number generator with a bog-standard random()*255:


Eliminate the now-useless variable D and cancel out a couple of factors:


And we've hit the magic 255 bytes. Ship it!