Challenges posed by long-term storage

In short, many people think about the time of data storage. It would be good to have five years. What I want to talk about today is a long-term storage problem. What are the challenges of long-term storage? There are two words here, one is “big data”, which is a hot word, and the other is “long data”, which is a cold word.

At present, the storage time of mainstream storage media is short. The average life of hard disk is 5 years, that of solid-state disk is only 5 years, that of tape is a little longer, about 10 years, while the average life of people is 75 years, and the information related to people should be kept for at least 75 years, including personal data such as bank deposits, insurance, housing, and photos taken by mobile phones. It is best to keep them for a lifetime; The data of the government, enterprises, institutions and the army also need to be preserved for a long time; Important archives also need to be kept permanently.

In addition to the national level units, it is also very difficult to preserve the traditional film materials for a long time. The film has been distorted for decades. I have been to the CCTV reference library. They use more than 80000 tapes in their tape library. There are also great problems in their long-term preservation. There is also the National Library. The state has invested a lot of money. The protection work at the provincial level is much worse. There are 500000 ancient books in a major cultural province. Now half of them have been damaged, which is a great loss to our cultural heritage.

Information and feelings of a national seminar

I participated in the national academic seminar related to “long term preservation of national digital resources”, and those institutions that really use long-term preservation, such as the National Library, the archives of the Chinese Academy of Sciences, the national science and technology literature center, gathered together for discussion.

I have some feelings about attending this meeting.

First, the long-term preservation of digital resources is extremely important for our country. For example, the Chinese Academy of Sciences has so many institutions that all scientific research projects should be permanently archived;

Second, they set up the National Working Group on long-term preservation system of digital resources (NdPP), which took into account all aspects from laws and regulations to management system to technical system, and worked very carefully;

Thirdly, the national document strategic reserve is being planned and constructed, with huge investment, and a building complex has been designed;

Fourth, the technical level is still facing great challenges. The maintenance costs are increasing with the adoption of existing technologies. We hope to have better technologies. In their speeches, they mentioned the concept we proposed ten years ago – the technology of pool optical electric fusion. After ten years’ efforts, we have launched corresponding products.

Can our storage research institutions and enterprises meet the needs of national long-term storage or even permanent storage?

Many industry information is very important and cannot be lost. Once lost, it will cause great losses. So many countries have introduced mandatory laws. The most famous is the saipans Act issued by the United States after the Enron incident, which forces enterprises to permanently retain data for litigation and other purposes, and must take out original data that cannot be tampered with. There are long-term preservation laws in various industries in the United States. The European Union has also stipulated the data retention act, which stipulates how many years each industry data should be preserved. Our country has also successively introduced various bills. Last year, it was stipulated that electronic cases should be preserved for at least 30 years, and people’s life expectancy should be 75 years, at least 30 years.

The burden of cold data storage for Internet enterprises will become more and more unbearable

In addition to very important information, we also have a lot of cold data to be stored for a long time.

picture

For example, the wechat circle of friends we all use. Some time ago, I attended the Tencent developer conference to discuss this issue. It was said that there were onebillion photos uploaded by the circle of friends every day. On the first day, many people praised it. It was very hot. On the second day, the data became cold sharply, and on the third day, no one visited it. But the photos of the circle of friends can’t be thrown away. Tencent’s data from the beginning of wechat to the present are kept in the hard disk (three copies), and will continue to be kept. There are a billion photos a day, and it should be said that there are millions of hard disks running constantly, which is an increasing energy burden. Although there is a technology to make it sleep, there are some problems in controlling it. In addition to consuming the energy for hard disk operation, there is also cooling. So many hard disks are very hot together. Once the hard disk is not air-conditioned, it is very easy to damage, so the cost is increasing day by day.

Concerns of the father of the Internet

There is also a more profound problem. Mr. VINT Cerf, the father of the Internet, worried at a large-scale science and technology conference in 2015 that the picture files retained on the computer and the Internet today would be lost, and mankind would enter a digital dark era. Future people may not know the historical records of today’s people. His current work at Google is to study how to preserve information on the Internet for a long time. He has also carried out an extremely extensive survey, which is the demand for preserving information for a hundred years. The result of the survey is that the long-term preservation and online of information is a very common task, which has also triggered the international research topic how preserve information for 100years? Is how to save information for a hundred years. In addition to Google, CMU in academic circles also do research work in this area.

Big data long-term storage faces four challenges

In my opinion, there are four major challenges in the long-term storage of data.

One is the service life. There is an order of magnitude difference between the service life of storage media and the actual demand. We have only paid attention to and studied it for a few years, but we want it to be used for decades, a hundred years or even longer. There is an order of magnitude gap.

Second, cost. There is more and more information, and the growth rate of information is exponential. All of them have to be preserved and face huge costs.

In addition to equipment costs, there are also data migration costs.

Look at this picture. This is an international information. Data migration is the main means of long-term storage. Comparison of various data migration costs in 75 years. If you use hard disks every five years, you need to replace 1000 hard disks for five petabytes of data. In this way, millions of hard disks can be stored for 75 years. The equipment cost is very high. In addition, data migration requires a lot of manpower and material resources, and the energy consumption cost is also very large. In short, this cost is a considerable challenge.

The third is the challenge of updating. The floppy disks used before, the technology and equipment are being updated, and the storage products are also being upgraded.

Fourth, the renewal of the agreement.

Let’s see what agreement consistency is.

Egyptian hieroglyphs are well preserved. Every word is clear, but no one knows what it means. The information is actually lost. How did you solve this problem? In the 18th century, a French expedition to Rosetta, a port city in Egypt, found a stone tablet (which was transferred to the British British Museum during the war between Britain and France). There are three parts on it, namely, the ancient Egyptian hieroglyphs, the ancient Greek and the then common script. The same content is engraved on it. Archaeologists found the agreement based on this tablet, After deciphering the meaning and structure of Egyptian hieroglyphs lost for more than a thousand years, we can know what they mean in the museum. In order to express gratitude, Egypt presented an obelisk to France and placed it in the French square. There is a translation software called Rosetta, which is based on this story. This is the long-term consistency of the protocol. It is not enough to preserve the optical physics. The software protocol should be consistent.

How to deal with the challenge?

In my opinion, there are two countermeasures: one is to develop mass storage devices with longer service life and lower cost, and the other is to solve the problem of protocol consistency.

Life analysis of mainstream nonvolatile storage media

At present, the mainstream storage media, hard disk for five years and tape for ten years, and solid-state disk for storing information by charge, are more unreliable. With the increase of density, the charge is less and less, and the service life is shorter and shorter. We use many technologies to ensure that it can be stored for five years.

picture

The disc life is longer. I bought the CD when it came out. It’s almost 40 years now, and I can still play songs. Later, another dye DVD came out. It has a short life and will break down in three years. Blu ray can be stored for 50 years. Now there is an m-disc (Millennium disc) and a fused silica glass disc, which can be stored almost permanently.

The comparison shows that optical storage is the storage medium with the greatest life potential.

Why does optical storage have a long lifetime? Give you a lesson. Our ancient things have been preserved for a long time. The code of Babylon morabi in the Middle East dates back to 3800 years. The Egyptian papyrus death books are all stored by light. Traditional storage, such as stone carvings, bamboo slips and paper writing, is essentially an information record formed by the difference in the reflection of light. As long as the service life of the light reflection medium is long enough, the information can be preserved for a long time.

Advantages of optical storage in big data storage

Optical storage has the advantages of long service life and energy saving. The media is separated from the drive. It can be stored when it is not stored. It is anti electromagnetic interference and waterproof. When the hurricane comes, both the tape and hard disk will be damaged, only the optical disk. The Japanese put Blu ray discs in the sea for two months, and the data can be read out. Another advantage of optical storage is its low cost. Just plating a layer of template on the plastic sheet does not require high environmental requirements.

Optical storage also has disadvantages. One is that the capacity is small. The first generation of Blu ray disc is only 25g. Later, it was 50g. Now it is up to 300g. The hard disk solid-state disk is at least one order of magnitude higher; Second, the speed is slow. The optical drive speed is 10m/s, slower than the hard disk and slower than the solid-state disk, which is close to the difference of two orders of magnitude.

Ten years ago, on the eve that the functions of audio and video distribution and software distribution of optical disks were about to be replaced, the optical storage industry discussed whether to develop new products and give full play to the advantages of optical storage to overcome the disadvantages of optical storage. It took nearly ten years for domestic and foreign workers to work and hand over the answer – a super large capacity optical disk library.

There are three types of CDs in the world: Amethyst, Hitachi, Facebook and mutual alliance.

There is no ideal storage medium in every aspect

When it comes to applications, it should be said that flash memory media are used for thermal data and magnetic recording media are used for thermal data. The time has come for cold data and archived data to use up media. Currently, there are more disks and many tape libraries, but Facebook has used up storage to store cold data.

In terms of independent innovation, we have worked with Amethyst to build the world’s largest optical disk library. In terms of density, bandwidth and response time, our key indicators are superior to similar international products. We have formed our own core technology and have begun to apply it to practice.

There are three revolutionary technologies for optical storage in the future

Optical storage technology has reached its limit after the capacity of Blu ray disc reaches 1TB, and there is almost no possibility of breakthrough. If we go back to the wave layer, we won’t go very far.

The next generation of revolutionary optical storage technology after Blu ray.

The first is coaxial multi-dimensional holographic optical storage technology, which has just been included in the national key research and development plan. We have participated in this project together with Fujian Normal University, Institute of Optoelectronics of Chinese Academy of Sciences and Amethyst. The second is the project of breaking through the diffraction limit of light, which won the Nobel Prize in 2014. Australian scientists applied this technology to light and theoretically reduced the light spot from 300 nm to 9 nm, which is a huge increase in capacity. It can reach at least 15tb per disc. Ideally, it can achieve Pb level. The first inventor was Dr. Gan zongsong from the mainland in the past. Now he has returned to Wuhan radio and Television Center in China. The latest progress in the laboratory is 380 nm. Now it is almost 100 points in one place, achieving a hundred times improvement.

There is a recent news that Microsoft will engrave this year’s Hollywood film “Superman” on glass for permanent preservation. This technology comes from the nanocrystalline glass 5-D optical disc developed by the University of Southampton in the UK. The storage life can exceed 30billion years, and the information will not be lost at 1000 ° C.

This technology has been valued by Microsoft. Microsoft has invested a lot of human and material resources and set up a team of dozens of people. It has made rapid progress. Dr. zhangjingyu, the leading researcher of 5D nanocrystalline storage introduced from the UK, started the research and development of multi-dimensional permanent storage in Wuhan after his return.

What is 5D? It is three-dimensional plus light intensity and polarization. In addition to the light intensity and polarization just now, our laboratory also has postures. Different postures form seven dimensions. One point can have a lot of information. He wrote the Bible in Britain and engraved the socialist core values of China’s top leaders in China. This is what we are doing now.

Prospects of optical storage and our goals

In recent years, the breakthrough of optical diffraction limit and the progress of multi-dimensional technology have made the optical storage technology have a great space for capacity improvement, surpassing all current storage technologies. Coupled with the breakthrough in optical storage life, optical storage technology in the future presents a bright prospect.

Wuhan photoelectric research center will combine the breakthrough of optical diffraction limit with multi-dimensional technology, plus the already successful CD library technology, to form a brand-new product with huge capacity and long service life, to meet the challenges of big data storage in the future, and is expected to form a new industry.

The National Research Center has formed a good accumulation of technology and talents in both the existing and future technologies of optical storage. It has spent 8 years to successfully develop a super large capacity optical disk library and vigorously introduce talents who master the world’s most advanced technologies.

The goal of Wuhan Optoelectronics Research Center is to use this technology to achieve 300tb, which may not be so high. However, with a disk of 50 TB, we already have a 12000 optical disk library. With existing technology and future technology, a standard bit can store 600 Pb, and it is permanent storage.

As for the problem of protocol consistency, the protocol is still unrecognized after it is lost, or the program can not run. Now Wuhan optoelectronic center has a research topic, which is also being studied abroad. Those data are stored in a certain format according to the specifications, and these data can be recovered in a long time. There is also a lot of work, because of the time limit.

epilogue

The long-term preservation of big data is a very important technology at present and in the future. As time goes on, people will more and more realize its importance. Optical storage has unique advantages in the long-term preservation of digital resources. At present, there are suitable products. The super large capacity optical disk library has been commercialized and practical. This technology can be used more and more in the market.

Three revolutionary technologies are making breakthroughs, so that future optical storage should have absolute advantages in long-term cold data storage. A physical long-term effect and a protocol long-term effect, a hard problem and a soft problem are all issues worthy of attention.

Leave a Reply

Your email address will not be published. Required fields are marked *