Today’s data storage method
Data leakage is considered normal.
In 2017, Equifax was hacked. 148 million people’s credit card information was stolen, including customer name, social security number, birthday and address. Equifax must pay $700 million in fines for data breaches. In 2012, 167 million LinkedIn accounts were stolen to access each user’s password. Yahoo had a data leak in 2016. Facebook had a data leak in 2018. Almost every week we hear another data leak. They have become the norm. According to Wikipedia, by 2020, the average cost of data leakage is estimated to be more than $150 million, while the global annual cost is predicted to be $2.1 trillion.
Can’t we stop things from happening?
This is a data hosting challenge – how to access data and manage data more effectively, including how to transfer access from one entity to another.
Data hosting status
Let’s think about how to manage customer data in a typical start-up. It is usually stored in some databases, such as mongodb, in the cloud. Any software developer can access these data at any time in the process of software development. Marketers will use this data to understand key performance indicators such as customer growth rate. If hackers can hack into the accounts of any developer or marketer, they can access all the customer data of the start-up company.
Now imagine that startups are developing very well. It’s like having 40000 customers at the beginning, then growing exponentially at the rate of ten years and obtaining more than 1 billion users. This is Facebook. Imagine that all developers and marketers can still access this data. This is also Facebook: even if Facebook has more than 10000 employees, almost all employees can access all customer data. If a hacker hacks into any employee’s account, the hacker can access the information of all Facebook users.
It is not necessarily better for old enterprises running “enterprise level” data software. How do we know? These companies are also frequently attacked by hackers.
This is a mental model to describe what is happening. The traditional security model for data hosting and management is the “M & M” model. There is a hard candy shell outside to secure the key. If someone pierces any part of the shell, they can touch all the chocolate inside. The situation will only get worse: Modern AI systems need more data, which expands the scope of data systems and makes the attack surface larger.
Future data hosting?
Token data hosting
Let’s see if we can improve data security. Consider a thinking model opposite to “M & M”: a hard center is used to hold the key, surrounded by more malleable infrastructure. That is the idea of blockchain; The rock solid center is the transaction list, which can be copied dozens, hundreds or even thousands of times in dozens to thousands of entities. Transactions can hold access control information.
Then, let’s consider tokenizing data access so that data access can be transmitted as a token. In fact, in traditional data access tokens (consider OAuth 2.0). But “token” is just a string, and “transfer” is basically copying and pasting the string. It’s hard to be safe.
Therefore, these existing data access tokens are not the “tokens” we think of in the blockchain field. But what if they are? Blockchain tokens indicate ownership of the private key. Your private key is your token.
Specifically, consider whether erc721 “irreplaceable token” (NFT) holds information that controls data access. If you have this data token, you can access the dataset. If you have a data token, you can host the data.
Data token variation
Here are some variations of data tokens (data access tokens):
·The visit can be permanent (you can visit as many times as you like) or once (after the visit, the token will be burned).
·Data access is always regarded as a data service. This can be a service that accesses a static dataset (such as a single file) or a dynamic dataset (stream).
·The data service may have a calculated element before returning the result dataset. In this case, it still looks like a data access token.
In addition to tokens, there are many variations that can be used for data access. These include:
·Read and write access. This article focuses on “read” access. But there are some variations: Unix style (read, write, execute; for individuals, groups, all); Database style (crud: create, read, update, delete) or blockchain database style (Crab: create, read, append, burn).
·Tokens used to access computing services (e.g. “bring computing into data”).
·The physical representation of the data itself (for example, one token per data copy) and the physical representation of the calculation (for example, one token per CPU minute). In many cases, the location of hardware resources and hardware functions have a great impact.
Data tokens and licenses
Having tokens that physically access data means having access to data. We can formalize this right: data tokens usually have a license to use the data. In particular, the data will be protected by copyright (a form of intellectual property or IP) as the embodiment of the physical storage device. A license is a contract to use a specific form of IP. Or the data kept behind the firewall can be regarded as trade secrets.
Data access transmission
Trusteeship of tokens means the right to transfer tokens (unless otherwise stated). With NFT, you can transfer data access to Alice by sending tokens to Alice. Specifically, if you have NFT based data tokens in your encrypted wallet, just press the “send” button, select the address to send to, and then confirm. nothing more!
NFT comes with a data license, which means that the recipient also has the legal right to access the data.
In short: data transfer = token transfer.
Consider the following scenario.
1. Alice has a data token x, which is used to permanently access the static dataset X. She downloaded the dataset.
2. Alice transfers the token to Bob. Bob then downloads the dataset.
3. But Alice still owns the dataset.
This can be seen as a problem: the data has been “transferred” because Alice still retains the data, although she no longer has tokens. However, each of the following aspects solves this problem in its own way.
1. Licensing Alice may hold the location of the data, but she no longer holds the license right to use it. Of course, it is not for profit. By contrast, imagine you have a Bohemian Rhapsody, that is, you have those pieces of information. You can’t sell it at will because you don’t have permission. If you ignore this and sell it, or even upload it without permission, you are likely to receive a letter from a lawyer representing artist rights.
2. One time access (as opposed to permanent access). Only Alice can use the token to access data at one time. The license will reflect this.
3. Dynamic access (relative to static). The most valuable data is the latest data. After Alice transfers the token, she will no longer have access to the most valuable data.
4. Bring compute to the data. Data will never be deleted. Therefore, the data is regarded as a trade secret. Among the options listed here; But it requires more setup and overhead.
5. Transferable (yes or no). Just as airline tickets are usually non transferable for security reasons, so are specific types of data.
Your key is your data
Andreas antonopoulos has a popular saying: “your key is your bitcoin. If it is not your key, it is not your bitcoin”. In other words, to really own your bitcoin, you need to have its key. For tokens, having a key means having these tokens. This affects the data:
That is, to really own your data, you need to have its key. When you hold the data token, you have the key of the data (the private key of NFT, which can access the data) and the license of the data.
Infrastructure for data token hosting
Token wallet for data hosting
Once we have data tags as erc20 or erc721 tokens, we can take advantage of the existing encrypted token infrastructure. The infrastructure can be used directly for data hosting.
Mobile and PC wallets. There are dozens of software wallets for erc20 tokens, and at least twelve software wallets for erc721 tokens. For example, trustwallet has both erc20 and erc721 token storage functions. Its version can be used directly on PC, IOS and Android. It supports more than 10 networks, including Ethereum and POA Network。 Add tokens to the trustwallet team through GitHub pull request.
Alternatively, the user can obtain the metamask to support any erc20 token by providing the URL of the custom network and the smart contract address of the token on the network (based on Ethereum).
Consider storing data tokens in trustwallet next to bitcoin, Ethereum and irreplaceable items such as cryptokitties and metacartel.
Hardware wallet, trezor, ledger, etc. provide hardware wallet. In these wallets, the private key is located in the wallet; Never lost. The wallet uses a key to sign a transaction in the wallet. Only signed transactions leave the wallet. This makes the private key more secure and thus the token more secure.
The following figure shows an example of a token hosting solution provided by riddle & code, which is oriented to enterprise token management. These same enterprises can use these same wallets for data storage. It is much safer than hoping that system administrators and dozens of other employees who have access to the data will not disclose valuable private data.
Multi party data token hosting
Multi signature data wallet. Wallets like gnosis safe have a “multi signature” function, and m of the N participants need to sign a transaction to pass. This can be used to manage valuable data in the company. For example, some key data may be three of the five multi signatures of the company’s five executives. At other times, it may be one of the five multisig, which requires the signature of any supervisor.
Data Dao. How can we extend “multi-party” data hosting to hundreds of parties, in which the management of data may be more complex than “n of n”. Dao (decentralized autonomous organization) is a promising road. Daos can coordinate many people around the world (such as traditional online communities), but they can also manage resources (such as traditional companies). Here, Daos run by hundreds of people can “own” data tokens. It is called “data Dao”. The Dao will manage data tokens: tokens to be acquired, tokens to be held, and tokens to be sold / licensed. Outstanding Dao creation tools include Aragon and daostack; There are more lightweight Daos, such as molochdao and its derivatives, such as metacartel.
Other methods of keeping data tokens
Web browser. Brave’s browser has a built-in encrypted wallet. With data tokens, it will become a secure place data hosting tool.
AI / data science tools. There is now an integrated development environment (IDE) for data science, such as azure ml studio. These may have built-in wallets for saving and transmitting data tokens for training data, models as data, etc. Even tools with non graphical interfaces can be integrated with token wallets. For example, tensorflow Python library integrated with Web3 wallet.
Third party custody. Just as some people like to let traditional banks save their money, or people like coinbase hold their tokens, data tokens can be held by a third party specializing in token or data token storage, called these databases. The advantage is that if the key is lost, recovery may be easier. Of course, trusting these third parties is also disadvantageous (not your key, not your data).
Custom wallet, adjusted for data. The above example uses the existing wallet infrastructure directly. There is still a lot of room to make custom data wallets, such as from scratch or from scratch, to solve the specific functions of data tokens. For example:
1. Long tail data token. Someone may have thousands of data sets. How do you manage these?
2. Visualization of data sets. The wallet may have a built-in data browsing function.
Token custody continues to improve – just like bank level security for everyone and companies. With tokenized data, data hosting will be able to inherit all the improvements that occur.
Data token usage
Data tokens are naturally applicable to ocean protocol, because ocean already has a mechanism for access control of data services using blockchain infrastructure.
This is a high-level method to implement data tokens in ocean protocol. Each data service has its own decentralized identifier (did), which can be resolved to the metadata of the did in its did descriptor object (DDO). Therefore, the trick is to associate the NFT with the did by putting the did into the metadata field of the NFT.
It is also worth noting that even if the technology is mature, data access Hosting / data tokens based on blockchain alone can not solve the problem of data leakage. However, this may help, just as the blockchain technology for financial asset custody helps.
In this article, we describe how data breaches have become commonplace because of the wide range of attacks. Then we describe how the blockchain reduces the attack surface; And the specific role of data tokens. Then, we describe the variation of token based data hosting, such as hardware wallet, multi signature wallet, and even data Dao. Data tokens will be an exciting aspect of ocean protocol.
Responsible editor: CT