It has been said that, as humans, music is in our DNA. According to promising results published in the journal Nature by researchers at the European Bioinformatics Institute, practical, high-capacity, low-maintenance information storage in synthetic DNA may be feasible. What is the implication for music?
[Image credit: Janusz Kapusta. Title quote: Jonathan Wells.]
High-resolution digital audio (24-bit samples at 96KHz or higher) is big data. The transition of music from physical media to weightless high-resolution digital form is contributing to the forecasted 50-fold increase in global data by 2020, while hard drive capacity may only grow 15-fold in the same period [David Epstein]. There is a growing storage gap.
Enter DNA. Just as digital information can be encoded as arbitrarily long sequences of 1's and 0's, (bits) it is also possible to imagine the same information as arbitrarily long sequences of other characters, including 0's, 1's, and 2's (trits). EBI scientists Nick Goldman and Ewan Birney have devised a mapping from a string of trits to a DNA string with no repeated nucleotides {A, C, G, T}, as well as scheme for organizing arbitarily long DNA strings into overlapping fixed-lengths runs for sequencing with robust error-correction. The result is a method for encoding digital information in synthetic DNA, a volumetric (rather than planar) storage medium that requires no power and can last indefinitely.
Here's how it works (slightly simplified):
- Digital music is encoded as a String S0 of 24-bit samples.
- S0 is converted from binary to String S1 in base-3, transforming each 24-bit sample to a sequence of 18 trits.
- Add length information (20 trits) and zero-padding to produce a string S2 whose length N is a multiple of 25.
- Convert S3 to a DNA string S3 of nucleotide (nt) characters {A, C, G, T} according to a mapping based on the last character written and the next trit in the sequence, such that there are no repeated nt.
- Split S3 into overlapping segments of 100 nt, each offset from the previous by 25 nt. This step provides decoding redundancy for error correction. Each nt is contained in up to four segments. For a string S3 of length N, N/25 − 3 segments are produced.
Encoding high-resolution digital audio as a DNA string. |
According to this method, 1 second of 24/96 digital audio comprising 576K bytes would generate a DNA string of more than 16 Million nt, a 28x size expansion that seems inefficient. (Digital aduio is typically compressed for storage, not expanded.) But DNA is incredibly dense. Even with the data size expansion, DNA encoding achieves a data density of 2.2 Million GB/gram, which translates to more than 1200 hours of high-definition digital stored in a single gram of DNA barely visible to the human eye. (That's twice the size of my entire library.)
Goldman and Birney proved the viability of their method by encoding digital files of all the Shakespearean sonnets into synthetic DNA, then using standard DNA sequencing software to recover the string for decoding to the original texts.
A media player based on DNA playback is not in our immediate future. The current price of DNA storage is estimated at $7.5 Million per GB, as opposed to $0.05 ("five cents") per GB for magnetic disc space. Furthermore, the speed of DNA sequencing and subsequent decoding to audio does not support real-time playback. The technology is on a 50-year horizon.
Genre: Rock
Year: 1978
You wouldn't have thought to put the late folk singer/songwriter Dan Fogelberg with jazz flutist Tim Weisberg, but the collaboration worked well, even producing the hit "Power of Gold." A fitting title for our discussion of DNA. 70's hair aside, the two musicians don't really look like twins, a fact they conceded on their followup 1995 release, No Resemblance Whatsoever.
© 2013 Thomas G. Dennehy. All rights reserved.