An actual difficult bug fixed
October 08, 2008 [FreeGuide, Java, Lean and Agile, Tech, Test Driven]Of course, I am bound to get a bug report immediately I have posted this telling me my fix breaks everything, but for the moment I am chuffed that I found, tested, and fixed a genuinely difficult bug.
I am particularly proud because I wrote an automated test to ensure it can never happen again, and I used that test to make the debugging process much easier than it otherwise would be. The code that reads, processes and stores listings in FreeGuide is a spider's web of interfaces and helper classes (because of the arguably over-engineered plugins framework used for all the moving parts), and tracking this down with plain old-fashioned debugging would have been a huge job.
Anyway, I bet you are dying to hear what the bug was, aren't you?
When you have a programme already stored in FreeGuide, and then a new one comes along that overlaps it, the old programme is deleted. For example if we start off with:
... 19:00 Top Gear ................... 20:00 Charlie and Lola .............
but then later download listings again and get these programmes:
... 19:00 Pocoyo .. 19:15 Round the World ......
Then Top Gear will disappear, and be replaced by the 2 new programmes. In fact, any old programme that overlaps any new incoming programme will be automatically deleted.
At least, that is what is supposed to happen. In fact, the real situation is a little more complex because the programmes are stored in separate files (.ser files) for different days and times. Actually, there are 4 files for each day, named things like "day-2008-09-15-A.ser", where the suffix A, B, C and D indicate which part of each day a file is for.
So imagine what happens when the first set of programmes comes in looking like this:
19:00 Programme 1A ......... 21:15 Programme 1B .. 21:30 Programme 1C ........... 22:00
and then the second comes in like this:
19:00 Programme 2A......................................... 21:45 Programme 2C .. 22:00
So obviously the old 3 programmes should be completey deleted, and the new 2 should be what you see.
But you don't. In fact what you see is programme 1B and programme 2C, before 1B and between the two. Weird huh?
"Why?" I hear you ask. Well, it's simple when you consider how the programmes are split into files.
Programme 1A goes into file day-2008-09-14-D.ser, and programmes 1B and 1C go into day-2008-09-15-A.ser.
[Side note: this is true in this case because the bug reporter is in the GMT -0400 timezone and the file boundaries are quarters of a day in GMT.]
Then, when the new programmes come along, 2A goes into 14-D - wiping out 1A, and 2C goes into 15-A - wiping out 1C but not 1B.
Then, when the files get read back in again later, 2A is read from 14-D, but then 1B is read from 15-A, wiping out 2A, and finally 2C is read in as well from 15-A, so we end up with 1B and 2C.
How to fix it? Well, what I did was leave everything as it is, and then do the final read in the reverse order. This means we read in 1B and 2C, but then we read 2A later, and it wipes out 1B, leaving 2A and 2C as we would expect.
Neat fix eh? It works because this kind of wrongness in the .ser files will only exist when a programme hanging off the end should have wiped out something in a later file. Because programmes are classified into files by their start time, they can only hang off the end of a file, not the beginning, so reading the files in backwards will always read the hanging-over file last, wiping out anything which should have been wiped out earlier.
There is a little bug/feature remaining, but it only applies when you get some really weird listings from your provider. If you had a programme like 1A (19:00 - 21:15), and downloaded new listings, which ONLY contained a programme overlapping it, but falling into a later file (so maybe it starts at 21:00), and didn't contain any programme starting at 19:00, then the backwards reading would mean you would never see your new programme because it would be wiped out by 1A.
This is a very unusual case though, since normally if you get a new programme at 21:00, you will also get new programmes leading up to it, if only to reflect the fact that 1A is now a different length. So this is really a theoretical bug, which explains why I've decided not to fix it...
Anyway, by the time I'd fiddled with my test for this to get the bug to trigger (which took a long time - working out which bits to fake out and which to test at all was tricky), the actual fix was easily implemented (1 line of code I chose to break out into 3), and then validated in a single click.
Just in case I hadn't mentioned it, I love tests.