A few years ago, even researchers were reluctant to use DNA to store data because it seemed too sci-fi and had no practical value. Today, we can extend PostgreSQL with the right software and biochemical modules and run SQL on DNA. In this era of data explosion, global data not only far exceeds our ability to understand numbers (for example, do you know what zettabytes are?) More than our storage capacity.

Everything is digital, and everything is increasingly running on applications based on data training algorithms, which in turn will produce more data to provide information for more downstream applications and algorithms. You see?

Simply put, at this rate, there will soon be insufficient data storage and computing materials available. This is why people have begun to look for alternative data storage media. Using DNA to store data sounds strange, but it actually makes sense. Now, researchers have made a breakthrough, enabling them to integrate DNA storage into the popular open source database PostgreSQL.

Heinis, head of scale laboratory at Imperial College London, and appuswamy, assistant professor of data science at eurecom, presented a research paper entitled “oligoarchive: using DNA in the DBMS storage hierarchy” at the Research Conference on innovative data systems. Although they are not the first to use DNA to store and retrieve data, they are the first to use structured data, integrate with existing databases, go beyond storage and realize computing.

The first thing about DNA as a data storage layer is that oligonucleotides must be synthesized every time a write operation is performed. How will this work in practice? Do laboratory technicians have to be on standby to perform this operation and “refill” raw materials used in chemical processes?

Not so. According to appuswamy and heinis, this is the value that Microsoft has proved through its automatic DNA storage and retrieval system. This shows that it is possible to operate such a process without human participation. Just as no one supervises the daily operation of the data center except maintenance, the same applies to DNA based data centers.

Will DNA storage technology have its market in the future

Nevertheless, we are far from replacing hard disks with synthetic DNA arrays. First, modern technology for storing data in this way is very slow. At first, scientists spent a week storing one megabyte of data. Appuswamy and heinis agree that more work is needed in this regard. But this is beyond their own research scope, so they can only wait for the biochemical synthesis process to be further improved.

Oligoarchive changes the database storage hierarchy by replacing the tape based archive layer with the DNA based archive layer. The storage of synthetic DNA requires additional measures. For ordinary devices, it is doubtful whether DNA based storage is effective. But in any case, data and databases will enter the cloud. As long as your data is safely stored in the data center, it is a black box for end users.

Appuswamy and heinis also pointed out that even if it is still slow, DNA storage provides great parallel processing potential. Because it’s rich and cheap – or rather, we hope it will eventually be. At current speeds, storing a minute of high-quality stereo will cost $100000.

Although large-scale storage using synthetic DNA is still too expensive, appuswamy and heinis say they expect costs to fall, which is a typical way of every technological breakthrough, including storage technology.

If synthetic oligonucleotides become economically feasible, it would be a reasonable expectation to have a large number of synthetic oligonucleotides. This means that many DNA storage units have great potential to run in parallel. Although not every aspect of each algorithm is parallelizable, it can achieve great acceleration for those algorithms. This brings us to a key point.

Until today, DNA has been used to store unstructured files, whether text or video, or whatever. What appuswamy and heinis do is integrate DNA storage into relational databases. They used the data and queries contained in the standard database benchmark TPC-H, and ran TPC-H on the PostgreSQL instance. Not serial access, but random selection of data. Using back-end DNA to store structured data in the database system and query through SQL has become a reality today.

The researchers built archiving and recovery tools (pg_oligo_dump and pg_oligo_restore) for PostgreSQL to perform pattern aware encoding and decoding of relational data on DNA, and used these tools to archive 12KB TPC-H database to DNA, perform in vitro calculation, and recover it again. This is huge. This means that now DNA storage can also support SQL operations to selectively access and process some data. Note that the data is not extracted into the database to perform the operation. Appuswamy and heinis found a way to handle SQL joins in oligonucleotides. This is beyond biochemical storage – it also requires biochemical calculations.

Eurecom, CNRS, ICL, UCA and helixworks, a DNA synthesis start-up, have received EU funding to further carry out DNA storage research. The system will be designed to support a fully automatic cycle of data coding, synthesize it into DNA and read data through sequences. It will store various data types and realize close data processing when storing and accurately retrieving data.

Leave a Reply

Your email address will not be published. Required fields are marked *