441
submitted 4 months ago* (last edited 4 months ago) by ch00f@lemmy.world to c/programming@programming.dev

I originally told the story over on the other site, but I thought I’d share it here. With a bonus!

I was working on a hardware accessory for the OG iPad. The accessory connected to the iPad over USB and provided MIDI in/out and audio in/out appropriate for a musician trying to lay down some tracks in Garage Band.

It was a winner of a product because at its core, it was based on a USB product we had already been making for PCs for almost a decade. All we needed was a little microcontroller to put the iPad into USB host mode (this was in the 30-pin connector days), and then allow it to connect to what was basically a finished product.

This product was so old in fact that nobody knew how to compile the source code. When it came time to get it working, someone had to edit the binaries to change the USB descriptors to reflect the new product name and that it drew <10mA from the iPad's USB port (the original device was port-powered, but the iPad would get angry if you requested more than 10mA even if you were self-powered). This was especially silly because the original product had a 4-character name, but the new product had a 7-character name. We couldn't make room for the extra bytes, so we had to truncate the name to fit it into the binary without breaking anything.

Anyway, product ships and we notice a problem. Every once in a while, a MIDI message is missed. For those of you not familiar, MIDI is used to transmit musical notes that can be later turned into audio by whatever processor/voice you want. A typical message contains the note (A, B, F-sharp, etc), a velocity (how hard you hit the key), and whether it's a key on or key off. So pressing and releasing a piano key generate two separate messages.

Missing the occasional note message wouldn't typically be a big deal except for instrument voices with infinite sustain like a pipe organ. If you had the pipe organ voice selected when using our device, it's possible that it would receive a key on, but not a key off. This would result in the iPad assuming that you were holding the key down indefinitely.

There isn't an official spec for what to do if you receive another key-on of the same note without a key-off in between, but Apple handled this in the worst way possible. The iPad would only consider the key released if the number of key-ons and key-offs matched. So the only way to release this pipe organ key was to hope for it to skip a subsequent key-on message for the same key and then finally receive the key-off. The odds of this happening are approximately 0%, so most users had to resort to force quitting the app.

Rumors flooded the customer message boards about what could cause this behavior, maybe it was the new iOS update? Maybe you had to close all your other apps? There was a ton of hairbrained theories floating around, but nobody had any definitive explanation.

Well I was new to the company and fresh out of college, so I was tasked with figuring this one out.

First step was finding a way to generate the bug. I wrote a python script that would hammer scales into our product and just listened for a key to get stuck. I can still recall the cacophony of what amounted to an elephant on cocaine slamming on a keyboard for hours on end.

Eventually, I could reproduce the bug about every 10 minutes. One thing I noticed is that it only happened if multiple keys were pressed simultaneously. Pressing one key at a time would never produce the issue.

Using a fancy cable that is only available to Apple hardware developers, I was able to interrogate the USB traffic going between our product and the iPad. After a loooot of hunting (the USB debugger could only sample a small portion, so I had to hit the trigger right when I heard the stuck note), I was able to show that the offending note-off event was never making it to the iPad. So Apple was not to blame; our firmware was randomly not passing MIDI messages along.

Next step was getting the source to compile. I don't remember a lot of the details, but it depended on "hex3bin" which I assume was some neckbeard's version of hex2bin that was "better" for some reasons. I also ended up needing to find a Perl script that was buried deep in some university website. I assume that these tools were widely available when the firmware was written 7 years prior, but they took some digging. I still don't know anything about Perl, but I got it to run.

With firmware compiling, I was able to insert instructions to blink certain LEDs (the device had a few debug LEDs inside that weren't visible to the user) at certain points in the firmware. There was no live debugger available for the simple 8-bit processor on this thing, so that's all I had.

What it came down to was a timing issue. The processor needed to handle audio traffic as well as MIDI traffic. It would pause whatever it was doing while handling the audio packets. The MIDI traffic was buffered, so if a key-on or key-off came in while the audio was being handled, it would be addressed immediately after the audio was done.

But it was only single buffered. So if a second MIDI message came in while audio was being handled, the second note would overwrite the first, and that first note would be forever lost. There is a limit to how fast MIDI notes can come in over USB, and it was just barely faster than it took to process the audio. So if the first note came in just after the processor cut to handling audio, the next note could potentially come in just before the processor cut back.

Now for the solution. Knowing very little about USB audio processing, but having cut my teeth in college on 8-bit 8051 processors, I knew what kind of functions tended to be slow. I did a Ctrl+F for "%" and found a 16-bit modulo right in the audio processing code.

This 16-bit modulo was just a final check that the correct number of bytes or bits were being sent (expecting remainder zero), so the denominator was going to be the same every time. The way it was written, the compiler assumed that the denominator could be different every time, so in the background it included an entire function for handling 16-bit modulos on an 8-bit processor.

I googled "optimize modulo," and quickly learned that given a fixed denominator, any 16-bit modulo can be rewritten as three 8-bit modulos.

I tried implementing this single-line change, and the audio processor quickly dropped from 90us per packet to like 20us per packet. This 100% fixed the bug.

Unfortunately, there was no way to field-upgrade the firmware, so that was still a headache for customer service.

As to why this bug never showed up in the preceding 7 years that the USB version of the product was being sold, it was likely because most users only used the device as an audio recorder or MIDI recorder. With only MIDI enabled, no audio is processed, and the bug wouldn't happen. The iPad however enabled every feature all the time. So the bug was always there. It's just that nobody noticed it. Edit: also, many MIDI apps don't do what Apple does and require matching key on/key off events. So if a key gets stuck, pressing it again will unstick it.

So three months of listening to Satan banging his fists on a pipe organ lead to a single line change to fix a seven year old bug.

TL;DR: 16-bit modulo on an 8-bit processor is slow and caused packets to get dropped.

The bonus is at 4:40 in this video https://youtu.be/DBfojDxpZLY?si=oCUlFY0YrruiUeQq

top 27 comments
sorted by: hot top controversial new old
[-] CosmicTurtle0@lemmy.dbzer0.com 95 points 4 months ago

OP: not only was this a great solve, you wrote this very well.

If you haven't already, I'd do a RCA for your company and send it to your manager and manager's manager.

And keep a copy for yourself.

There aren't many engineers that can code and write well.

[-] TrickDacy@lemmy.world 29 points 4 months ago

Curious -- what's an RCA in this context?

[-] thenextguy@lemmy.world 28 points 4 months ago

Root Cause Analysis

[-] Randomocity@sh.itjust.works 18 points 4 months ago

Root cause analysis. What happened, why, how it was fixed.

[-] TrickDacy@lemmy.world 4 points 4 months ago

Got it. I was looking at lists of things on wikipedia that it could mean and I did not spot that one. Thanks for the response!

[-] ch00f@lemmy.world 15 points 4 months ago

Thank you! But this was 12 years ago lol. Think they’ve moved on.

[-] CosmicTurtle0@lemmy.dbzer0.com 14 points 4 months ago

Revert the code and claim credit for it again. 😜

[-] Technus@lemmy.zip 38 points 4 months ago

This would be a lot more readable with some paragraph breaks.

[-] ch00f@lemmy.world 29 points 4 months ago

Whoops. Formatting got lost in the transfer. Fixed now.

[-] Perhyte@lemmy.world 15 points 4 months ago* (last edited 4 months ago)

[EDIT: removed now that the original is fixed]

[-] Potatos_are_not_friends@lemmy.world 36 points 4 months ago

Someone post this to the Best of Lemmy community. This is gold.

[-] litchralee@sh.itjust.works 27 points 4 months ago* (last edited 4 months ago)

There was a ton of hairbrained theories floating around, but nobody had any definitive explanation.

Well I was new to the company and fresh out of college, so I was tasked with figuring this one out.

This checks out lol

Knowing very little about USB audio processing, but having cut my teeth in college on 8-bit 8051 processors, I knew what kind of functions tended to be slow.

I often wonder if this deep level understanding of embedded software/firmware design is still the norm in university instruction. My suspicion has been that focus moved to making use of ever-increasing SoC performance and capabilities, in the pursuit of making it Just Work(tm) but also proving Wirth's Law in the process via badly optimized code.

This was an excellent read, btw.

[-] TrickDacy@lemmy.world 22 points 4 months ago

That must have been incredibly satisfying when you figured out the fix! Well done!

[-] ch00f@lemmy.world 23 points 4 months ago

Thanks!

And it was. They told me to take the rest of the day off which at the age of 22 was unheard of.

[-] tinyVoltron@lemmy.world 5 points 4 months ago
[-] ch00f@lemmy.world 5 points 4 months ago

Lol, no, but in the summers we were allowed to wear t-shirts on Friday.

[-] xep@fedia.io 15 points 4 months ago

That is some fantastic sleuthing, well done.

[-] Machindo@lemmy.ml 15 points 4 months ago

Thanks for sharing! Pretty wild bug. Really commendable debugging.

[-] Deebster@programming.dev 6 points 4 months ago

A great read, thanks for sharing.

[-] Bezier@suppo.fi 6 points 4 months ago

Nice bonus :D

Looks like it's the same dock I was imagining the whole time. A couple years ago I randomly got a chance to play around with an old ipad (2?) and a midi keyboard. Didn't encounter this bug.

[-] null@slrpnk.net 5 points 4 months ago

That was a great read, thanks for sharing!

[-] ipkpjersi@lemmy.ml 3 points 4 months ago

Great write-up! The biggest bugs always end up being the "simplest"/one-line things lol

[-] Sekoia@lemmy.blahaj.zone 2 points 4 months ago

Would you happen to remember what the optimization was, mathematically?

https://stackoverflow.com/questions/20036698/subdivide-a-modulo-function-16-bit-but-can-only-do-8-bits-at-a-time#20036828 seems to say that it's "impossible afaik", and I can't seem to optimize it myself (though this kind of math isn't my forte)

[-] ch00f@lemmy.world 6 points 4 months ago* (last edited 2 months ago)

I believe the optimization came because the denominator was a power of two. In my memory, the function counted up all of the bytes being sent and checked to see that the sum was a multiple of 16 (I think 16 bytes made a single USB endpoint or something; I still don't fully understand USB).

For starters, you can split up a larger modulo into smaller ones:

X = (A + B); X % n = (A % n + B % n) % n

So our 16 bit number X can be split into an upper and lower byte:

X = (X & 0xFF) + (X >> 8)

so

X % 16 = ((X & 0xFF) % 16 + (X >>8) % 16) % 16

This is probably what the compiler was doing in the background anyway, but the real magic came from this neat trick:

x % 2^n = x & (2^n - 1).

so

x % 16 = x & 15

So a 16 bit modulo just became three bitwise ANDs.

Edit: and before anybody thinks I'm good a math, I'm pretty sure I found a forum post where someone was solving exactly my problem, and I just copy/pasted it in.

Edit2: I'm pretty sure I left it here, but I think you can further optimize by just ignoring the upper byte entirely. Again, only because 16 is a power of 2 and works nicely with bitwise arithmatic.

[-] Sekoia@lemmy.blahaj.zone 1 points 4 months ago

Ahh, that makes sense. Powers of two are real convenient. Your math is a little wrong though: X != (X & 0xFF) + (X >> 8), but X = (X & 0xFF) + (X >> 8) << 8 The right half can be removed entirely if you're doing modulo 16, since the first 4 bits will always be 0. So it simply becomes X & 15! Much cleaner for sure.

[-] ch00f@lemmy.world 1 points 4 months ago

Oh right, duh. Thanks.

[-] bitchkat@lemmy.world 0 points 4 months ago

I was totally expecting a shitty morph in that novel.

this post was submitted on 21 Jun 2024
441 points (99.1% liked)

Programming

17314 readers
45 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Rules

  • Follow the programming.dev instance rules
  • Keep content related to programming in some way
  • If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities !webdev@programming.dev



founded 1 year ago
MODERATORS