Replicating Nature- A new take on data storage

Many of us have wondered- what is the future for information storage? Currently we rely on mechanical devices as USB flash drives and SSDs to store data. Empirically we know that mechanical devices are prone to failure, primarily due to the incredibly useful (and painful) force called friction. This is not good if we want the future to easily access our ideas and thoughts (and eventually laugh at them)! Then why not turn to biology/chemistry- where sometimes reactions can be sped up to 10^17 times (OMP Decarboxylase) and with fidelity up to 10^9 (DNA Pol Processivity)? As previously mentioned in these posts, “if it ain’t broke, don’t fix it!” So once again, we attempt to replicate an idea from Nature.

The mention of DNA probably gave away the method- but yes, encoding data into DNA is becoming a developing topic in biochemistry. Binary encryption uses “0” or “1” to store information, so DNA stores information similarly by assigning “0” to “A” (or “C”- the base pair) and “1” to “T”/”G”. But certainly there can be problems in encoding, considering DNA’s generally conserved structure or even DNA damage over time. The key is to use the Reed-Solomon Code to encode the information in such a way that no base is repeated more than 3 times. For a 83 Kb document, the error rate of incorporation was roughly 0.7 nucleotides. Pretty good, I’d say.

How to encode information into DNA. Source: Grass et. al, ETH Zurich

But what’s the big fuss? Why could this be important? Well one, consider the length of DNA. 1 g of DNA can store roughly 450 exabytes of data- all of Google’s and Facebook’s data with plenty of space to spare! The key, of course, is stability and recovery- that is the ability to store this mini “time capsule” and recover the data encoded in the DNA sequence after a given period of time. This new study that used a 83 kb file showed stability after four DNA half-lives, which is similar to stability in the Global Seed Vault (-18 degrees C) for 2 million years. Yes, 2 million years. Start preparing the biochemistry bandwagon, cause I’m jumping right on!


DNA data storage- NS

DNA data storage- ETH


3 thoughts on “Replicating Nature- A new take on data storage

  1. This is pretty impressive in terms of storage, but I wonder how feasible this is as a model for memory. Modern computers often use random accesses and seeks to memory and have caches to speed up look ups. I wonder how good reading the DNA at random points (instead of reading out the whole encoded information) is, because if that is perfected and the information doesn’t get corrupted/lost, then I think this would be a practical, good way to start storing data if we ever have so much data that storing (and not accessing) the data becomes a problem.


    1. Yes, that’s a great point. Storage is one thing, but what about read/write access? Well, the key here is the ‘device’ that reads/write from the storage device. The computer hardware can read/write to memory devices pretty quickly- the analog for DNA is the DNA polymerase to ‘read’ the DNA. As mentioned, the DNA polymerase fidelity (i.e. rate of incorporating the correct base pair/rate of incorporating the incorrect one) is over 10^9 and the speed is pretty fast- human DNA replication can occur in about 24h for 3 billion bp . Perhaps not as fast as mechanical read/write speeds yet, and similar to how mechanical equipment are friction-limited, biochemical processes are diffusion limited. All in all, you are right in that information storage is the better use for the DNA, but do not discount the potential use of DNA polymerase to read back the data.

      I would also like to point out two more things: 1) the cost of this idea. According to the New Scientist article, it cost roughly 1000 pounds to encode the 83 Kb file. This is definitely not cheap, but if you consider that 2) DNA sequencing costs are falling faster than Moore’s Law for computing (!/image/falling-fast-nature.png_gen/derivatives/landscape_630/falling-fast-nature.png), this will also change very soon. But the data was recovered even after 4 half-lives of DNA… I think sensitive documents should be encoded using this idea for now until our understanding of biocomputing improves.


  2. DNA polymerase is fast, but does the system actually use it? I’m really hopeful for biological circuitry in the future, but it might be a few years before we see bio-computers in action. Fingers crossed!


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s