The long road to recover Frogger 2 source from tape drives
Comments
The big challenge, which I think is actually important almost philosophical challenge — it might sound like a dull issue, like how do you format a database, so you can retrieve information, that sounds pretty technical. The real key issue is that software formats are constantly changing.
People say, “well, gee, if we could backup our brains,” and I talk about how that will be feasible some decades from now. Then the digital version of you could be immortal, but software doesn’t live forever, in fact it doesn’t live very long at all if you don’t care about it if you don’t continually update it to new formats.
Try going back 20 years to some old formats, some old programming language. Try resuscitating some information on some PDP1 magnetic tapes. I mean even if you could get the hardware to work, the software formats are completely alien and [using] a different operating system and nobody is there to support these formats anymore. And that continues. There is this continual change in how that information is formatted.
I think this is actually fundamentally a philosophical issue. I don’t think there’s any technical solution to it. Information actually will die if you don’t continually update it. Which means, it will die if you don’t care about it. ...
We do use standard formats, and the standard formats are continually changed, and the formats are not always backwards compatible. It’s a nice goal, but it actually doesn’t work.
I have in fact electronic information that in fact goes back through many different computer systems. Some of it now I cannot access. In theory I could, or with enough effort, find people to decipher it, but it’s not readily accessible. The more backwards you go, the more of a challenge it becomes.
And despite the goal of maintaining standards, or maintaining forward compatibility, or backwards compatibility, it doesn’t really work out that way. Maybe we will improve that. Hard documents are actually the easiest to access. Fairly crude technologies like microfilm or microfiche which basically has documents are very easy to access.
So ironically, the most primitive formats are the ones that are easiest.
I'm secondhand pissed at the recovery company, I have a couple of ancient SD cards laying around and this just reinforces my fear that if I send them away for recovery they'll be destroyed (the cards aren't recognized/readable by the readers built into MacBooks, at least)
It’s a little sad that it took such a monumental effort to bring the source code back from the brink of loss. It’s times like that that should inspire lawmakers to void copyright in the case that the copyright holders can’t produce the thing they’re claiming copyright over.
> This issue doesn't affect tapes written with the ADR-50 drive, but all the tapes I have tested written with the OnStream SC-50 do NOT restore from tape unless the PC which wrote the tape is the PC which restores the tape. This is because the PC which writes the tape stores a catalog of tape information such as tape file listing locally, which the ARCserve is supposed to be able to restore without the catalog because it's something which only the PC which wrote the backup has, defeating the purpose of a backup.
Holy crap. A tape backup solution that doesn't allow the tape to be read by any other PC? That's madness.
Companies do shitty things and programmers write bad code, but this one really takes the prize. I can only imagine someone inexperienced wrote the code, nobody ever did code review, and then the company only ever tested reading tapes from the same computer that wrote them, because it never occured to them to do otherwise?
But yikes.
I have backed up my blu-ray collection to a dozen or so LTO-6 tapes, and it's worked great, but I have no idea how long the drives are going to last for, and how easy it will be to repair them either.
Granted, the LTO format is probably one of the more popular formats, but articles like this still keep me up at night.
I've actually long been stunned by the propensity of proprietary backup software to use undocumented, proprietary formats. I've always found this quite stunning, in fact. It seems to me like the first thing one should make sure to solve when designing a backup format is to ensure it can be read in the future even if all copies of the backup software are lost.
I may be wrong but I think some open source tape backup software (Amanda, I think?) does the right thing and actually starts its backup format with emergency restoration instructions in ASCII. I really like this kind of "Dear future civilization, if you are reading this..." approach.
Frankly nobody should agree to use a backup system which generates output in a proprietary and undocumented format, but also I want a pony...
It's interesting to note that the suitability of file formats for archiving is also a specialised field of consideration. I recall some article by someone investigating this very issue who argued formats like .xz or similar weren't very suited to archiving. Relevant concerns include, how screwed you are if the archive is partly corrupted, for example. The more sophisticated your compression algorithm (and thus the more state it records from longer before a given block), the more a single bit flip can result in massive amounts of run-on data corruption, so better compression essentially makes things worse if you assume some amount of data might be damaged. You also have the option of adding parity data to allow for some recovery from damage, of course. Though as this article shows, it seems like all of this is nothing compared to the challenge of ensuring you'll even be able to read the media at all in the future.
At some point the design lifespan of the proprietary ASICs in these tape drives will presumably just expire(?). I don't know what will happen then. Maybe people will start using advanced FPGAs to reverse engineer the tape format and read the signals off, but the amount of effort to do that would be astronomical, far more even than the amazing effort the author here went to.
Another tech tip is not buying 2 backup devices from the same batch or even the same model. Chances being these will fail in the same way.
I'm assuming the use of "cave-at" means the author has inferred an etymology of "caveat" being made up of "cave" and "at", as in: this guarantee has a limit beyond which we cannot keep our promises, if we ever find ourselves AT that point then we're going to CAVE. (As in cave in, meaning give up.) I can't think of any other explanation of the odd punctuation. Really quite charming, I'm sure I've made similar inferences in the past and ended up spelling or pronouncing a word completely wrong until I found out where it really comes from. There's an introverted cosiness to this kind of usage, like someone who has gained a whole load of knowledge and vocabulary from quietly reading books without having someone else around to speak things out loud.
In cases like this can imagine some company yelling "copyright infringement" even though they don't possess a copy themselves. It's a really odd situation.
Or in a few years, just have an AI write the code...
Melted pinch rollers are not uncommon and there are plenty of other (mostly audio) equipment with similar problems and solutions --- dimensions are not absolutely critical and suitable replacements/substitutes are available.
As an aside, I think that prominent "50 Gigabytes" capacity on the tape cartridge, with a small asterisk-note at the bottom saying "Assumes 2:1 compression", should be outlawed as a deceptive marketing practice. It's a good thing HDD and other storage media didn't go down that route.
I wonder would a CD-R disk retain data for these 22 years?