Anil Dash’s library is dead

Posted by Pierre Igot in: iTunes, Technology
January 11th, 2007 • 10:38 pm

It really is hard to believe that someone as technology-aware, as cognizant of the pitfalls of modern computer technology as Anil Dash could have been careless enough to have used iTunes daily for several years without keeping a daily backup of his music library metadata, thereby taking the risk of losing all kinds of valuable personal information (play lists, play counts, track ratings, etc.).

Yet that is exactly what appears to have happened. To his credit, Anil does not immediately embark on a rant against Apple as a response to the grief the loss of this valuable information is causing him.

But really… Someone like Anil Dash should know that all this information is highly valuable, that it is stored on a hard drive, that it is modified daily, even hourly, and that the risk of file corruption or hard disk damage is very real.

I mean, we are all guilty of having lived dangerously for years without proper backups. But I really believe that, today, in 2007, we no longer have any excuses—especially those of us who are comfortable enough with the technology. All that is needed is a second hard drive and a piece of software such as the excellent SuperDuper! (for Mac OS X, obviously; I am sure there are similarly affordable and easy-to-use solutions for Windows users). It is really a small investment in time and money, especially considering the peace of mind that it brings.

It’s not a perfect solution. The second hard drive can fail too, although it’s not likely to fail at the exact same time as the first one. And it’s still in the same physical location and shares the same vulnerabilities as the first one in that respect. But it is hugely better than no backup at all, and surely Anil Dash is able to purchase and implement such a solution. With a nightly backup of his iTunes library files (not the music files, just the metadata files that iTunes stores in its main folder), at the first sign of corruption Anil would have been able to go back to a library state that was less than 24 hours old.

(Another drawback of the regular nightly backup strategy is that it assumes that you notice any problems with your files within the first 24 hours. If not, then obviously the problematic files will be backed up and replace the previous versions. But I am assuming here that the corruption happened suddenly and that Anil saw it right way. He hasn’t posted an update on his blog that would infirm this theory and give him more of an excuse. For really important files, I typically keep incremental backups created manually separately from my nightly backup. But I must admit I am still not disciplined enough about that, and I should probably create manual backups of stuff like my iTunes library metadata on other volumes more often, just in case something bad happens and I don’t notice it right away.)

Of course, this is not to say that Apple is entirely blameless here. File corruption sucks, and there really should be procedures in our operating systems to guard against it. But based on the little bit of reading that I have done on the subject in the recent past, this would probably require substantial changes at the level of the file system itself, and such changes are obviously a rather big technological challenge. But eventually, something will have to be done about this, by all technology companies. I guess the pressure from end users is not strong enough yet. Maybe if such disasters hurt high-profile people like Anil Dash more often, something will be done about it sooner rather than later.

Obviously the Time Machine in Mac OS X 10.5 will be a good step in the right direction. But we need much more than that. We need regular remote backups. We need protection against file corruption. And we need still bigger hard drives.


6 Responses to “Anil Dash’s library is dead”

  1. henryn says:

    I’m sorry to hear that anyone loses data. We’ve all become so accustomed to ultra-reliable hard disks that my guess is very few people make backups of any kind. Corporate workers depend on their IT departments, and the rest of us…

    It seems this fellow’s primary valuables –the songs themselves– were not lost, just his customizations, playlists and the like I’m envious of people who have the time and energy to spend on such things, and not quite as sympathetic as I might be if the loss was his songs — or valuable data for his work.

    I’m concerned about the big picture. I’ve never purchased a song, and my playlists are trivial. But we just got a new digital camera, the temptation to take full-res 10MB photos is great — that will create a tremendous data-storage load. (Backup or archive or what, I’m not certain.) Distinguishing what needs to be backed up at any given time is non-trivial. I’m certain I don’t care about Chinese fonts, maybe I can skip those raw photos, but there’s that little note file over yonder contains data I would hate to lose. We’ll have increasing potential for data corruption as software systems get more complex, and if that’s not enough, the crime rate in our town has been rising: Macs make inviting targets. Oh, yeah, over Christmas I heard from an old friend whose house was stuck by lightning, not proverbially — it burned down.

    Last time I looked while backing up my data –nothing to do with the system or applications– I saw file counts in the hundreds of thousands. How can we manage those kinds of numbers of files wisely? Well, we simply back them all up, right? Or not.

    As hard disk drive capacity increases, the temptation to simply “do it” increases, and I’m concerned that the overhead cost will continually creep up, too.

    What I think we need is new models for managing our data, something to work _with_ files and folders to help us decide on priorities for backup, make damn sure the most important items are regularly (and retrievably) backed up, with a much broader set of choices for how and where the rest of the data is “saved”. We also need to understand much better the reliability and failure modes of the only practical backup media we have in sight right now, DVDs. And ways of managing the data in light of the fact that it won’t take all that many 10Mb photos to overflow a 5GB DVD.

  2. danridley says:

    I am sure there are similarly affordable and easy-to-use solutions for Windows users

    Ha!

    there really should be procedures in our operating systems to guard against them. But based on the little bit of reading that I have done on the subject in the recent past, this would probably require substantial changes at the level of the file system itself, and such changes are obviously a rather big technological challenge.

    One that Sun has taken on and done an excellent job with. The possibility of ZFS in Leopard has people excited for a reason — the ZFS developers looked long and hard at everything that’s wrong with data storage, and they implemented everything they possibly could to ensure that data loss became optional. With sufficient drive capacity, RAID-Z and snapshotting make it a real possibility to never lose data again, either to corruption or disk failure. (Some manner of offsite backup is still required for edge cases like fire or catastrophic disk controller failure.)

  3. danridley says:

    henryn: I’m somewhat flabbergasted at your suggestion that metadata (playlists, ratings, etc) is less valuable than third-party data. As far as my iTunes data goes, it’s quite the reverse — 90% of my songs are easily replaceable, either by re-ripping the CD, re-downloading from the artists’ sites, or (for certain painful values of ‘easy’) re-purchasing from the iTunes Store.

    On the other hand, my metadata represents everything that is mine about that data. That playlist that I made for the drive to see Cirque du Soleil several years ago; the careful ratings and smart playlists that make it so I can listen to music that helps me focus rather than distracting me; these are irreplaceable, and therefore of immense value to me.

    Sure, if I somehow lost everything and had to do data recovery I’d go after clients’ Web sites long before I went after my iTunes library file, but it seems to me that metadata is frequently as important, and sometimes more important, than the data itself.

  4. ssp says:

    I really am hoping that Time Machine will rock because these days doing an actual backup that keeps a bit of history around (I wouldn’t count SuperDuper because the time span it covers is just too short) is tricky.

    And what about long time spans? I recently wanted to have a look at an ancient file that I last used 15 years ago or so. And I found it was corrupted (and judging from some backups has been corrupted for a number of years already). I couldn’t even figure out what went wrong and when. Some HD problem in hardware or file-system wise? some application ruining the file? something going wrong when I copied it over from my old Atari? No idea.

    That’s disappointing of course – and perhaps another point for the checksumming they do in modern file system.

  5. henryn says:

    Each person places different value on different material. What’s important is that there is a way to routinely back-up the data consistent with its true value to its owner.

    Each of us has the right to a plea, “My data is special”. In my case, I’m collecting scans of rare historical photos and documents, returning the originals to their owners, with little hope of seeing them again. Much of this data will eventually be made public on websites, but I feel I have a responsibility to do a good job archiving the original scans, too. I simply haven’t found a good way to do this, especially where photos are involved. I can at the highest reasonable resolution, so I get big TIFFs. Then I do one or more levels of repair in Photoshop, then I produce JPGs for the web — each scan might produce 20 – 80 Mb in all. Hundreds of scans. No easy way to back-up on-going progress _and_ produce assured archives in 5GB chunks as provided by DVD backups. I’m _really_ concerned about losing archival data due to DVD degradation … or physical damage.

  6. Pierre Igot says:

    It’s true that we managed quite well to enjoy our music listening experiences without computers, iPods, and iTunes for many years before all this technology became available. So in that respect maybe the metadata accumulated over time in the iTunes library are not that important. On the other hand, we managed modern life quite well without e-mail for many decades, and yet somehow e-mail has now become indispensable for most people. So in a way the metadata aspect of music file management has become essential to many people.

    Ultimately, the relative importance of data is a very personal and subjective issue. Basically the iTunes library stuff was obviously pretty important in Anil Dash’s life (as it is in mine), so he should have had regularly scheduled backups to fall back on.

    But obviously there are many situations where, even with the best intentions and lots of effort, we can’t easily achieve peace of mind, because of the size of many multimedia files. (In that respect, I am bit concerned that Time Machine will consume a lot of disk space, because it won’t know that these MP3 files that I had on my desktop for a few days were not that important since I have them on CD or I only meant to listen to them once.)

    I am hoping that blue-laser recordable DVDs become available soon at a reasonable price (either DVD HD or Blu-Ray, or probably both), because there will be a substantial leap in disk capacity. Backing up our entire hard drives on 50 GB DVDs won’t be all that difficult.

    But we’re not there yet, and it will still require lots of manual intervention. So complete peace of mind and data safety are still a long way off. In the mean time, the best we can do is prioritize, based on our own individual preferences. Nobody other than myself (and certainly not a computer program) can tell which files are really important to me. That means that, like everyone else, I still have lots of manual work to do.

    That said, I still think every serious Mac OS X user should use SuperDuper! :).

Leave a Reply

Comments are closed.