WHY THIS MATTERS IN BRIEF
With so much new information being generated every day we’re already pushing the limits of today’s technologies, but now ultra-dense, ultra-efficient new forms of storage are appearing on the horizon.
It’s long been known that today we’re generating more information, faster, than ever before. The same was true yesterday and it’ll be true tomorrow. What would today’s storage sales men and women do if they didn’t have that little fact to hand…? Anyway, while they’re busy selling the virtues of the latest and greatest solid state drives a team at Columbia University and the New York Genome Center (NYGC) have been busy at work trying to build the hard drives of the future. Using DNA.
Last week researchers managed to show that an algorithm designed for streaming video on a smart phone, known as a Fountain Code, could unlock almost all of DNA’s storage potential by squeezing more information into its four base nucleotides than ever before, and they also managed to demonstrate that the technology is also extremely reliable.
DNA is an ideal storage medium because it’s ultra-dense and ultra-energy efficient, furthermore it can last hundreds of thousands of years – something that was demonstrated by the recent recovery of DNA from the bones of a 430,000 year old human ancestor found in a cave in Spain.
“DNA won’t degrade over time like cassette tapes and CD’s, and it won’t become obsolete – if it does, we have bigger problems,” said Yaniv Erlich, a computer science professor at Columbia Engineering and a key member of the NYGC.
Erlich and his team selected six files to write, or encode, onto DNA – a full computer operating system, an 1895 French film, “Arrival of a train at La Ciotat,” a $50 Amazon gift card, a computer virus, a Pioneer plaque and a 1948 study by information theorist Claude Shannon.
What else would you store on the world’s most advanced strands of storage?
First they compressed all of the files into a single “master” file and then split the data into short text strings of binary 1’s and 0’s and then randomly packaged the strings into so called “droplets” and mapped them onto the four A,G,C and T nucleotide bases in DNA. It’s worth noting here that at this point there’s no actual physical DNA involved, the team were simply putting together a text based DNA sequence in a Microsoft Word document.
How advanced is that for you?
Once they’d finished putting the sequences for their DNA strands together – 72,000 strands in all – they then E-Mailed them over to a DNA synthesis company called Twist Bioscience, the same company that Microsoft has been buying its tailored DNA storage strands from, who then used them to synthesise the actual DNA. Turning what was once just a string of text characters in an E-Mail into biological, DNA storage device. Two weeks later, the team received a vial holding a speck of DNA molecules.
To retrieve their files the team used modern sequencing technology to read the DNA strands, followed by software to translate the genetic code back into binary and phenomenally they recovered their files with zero errors.
The team also managed to demonstrate that a virtually unlimited number of copies of the files could be created with their coding technique by multiplying their DNA sample using a Polymerase Chain Reaction (PCR) process, and that those copies, and even copies of their copies, and so on, could be recovered error free.
Finally, not happy with changing everything we know about storage, and as their final piece de resistance they managed to demonstrate that their new coding strategy can pack 215 petabytes, or 215 million gigabytes of data onto a single gram of DNA, blowing away the previous record, which was held by Harvard and the European Bioinformatics Institute by a hundred fold.
“We believe this is the highest density data storage device ever created,” said Erlich.
The capacity of DNA data storage is theoretically limited to two binary digits for each nucleotide, but the biological constraints of DNA itself and the need to include redundant information in order to reassemble and read the fragments later reduces its capacity to 1.8 binary digits per nucleotide base.
While the new team have managed to smash the previous record cost will still remain an issue as the team had to spend $7,000 to synthesise the DNA they used to archive their 2 megabytes of data, and another $2,000 to read it. And although the cost of sequencing DNA has fallen dramatically over the years there’s still a huge way to go before it could be a commercially viable alternative to today’s storage systems. And that’s also nothing to say of the technical challenges that still remain trying to create a DNA storage architecture that will be able to write and read data from the new type of storage.
“Investors may not be willing to risk tons of money to bring costs down,” said Erclich, “but the price of DNA synthesis can be vastly reduced if lower quality molecules are used, and we use coding strategies like DNA Fountain to fix molecular errors. We can do more of the heavy lifting on the computer to take the burden off time-intensive molecular coding.”
And as for the future of the future of storage? Well, if we can store 215 petabytes in a single gram of DNA then imagine what happens when you have six DNA nucleotide bases to play around with – something that became a reality recently with the creation of a new six base alien life form – not just four. When you start doing the sums, let’s just say you run out of zero’s.