FISCO bcos transaction signature algorithm is designed based on the principle of ECDSA, which is also the transaction signature algorithm used by bitcoin and Ethereum.

This paper introduces the related knowledge of ECDSA and ECC, the recovery mechanism and implementation of ECDSA, and the underlying principle of FISCO bcos transaction signature and signature verification. The content is hard (Shu) core (Xue). Developers who are interested in cryptography principles and the underlying principles of blockchain are welcome to communicate with us.

The story begins

The story begins with a magic number in Ethereum.

In the Yellow Book of Ethereum, the explanation of transaction signature talks about two special numbers “27, 28”. In fact, it evolves from “0, 1” by adding a 27 to “27, 28”, so it is essentially a special number 27.

What does the special number 27 mean?

A detective tour begins

It’s like a bug

Search found that there have been many discussions about this issue before, among which a stack exchange post pointed out that this is a design bug. There is also a related issue on the source code GitHub of Ethereum, which is marked with type:bug The label of the.

There is a link in the stack exchange post giving the code to fix the bug. Please see the following screenshot (red box). As you can see from the comments and the code, the fromrpcsig function specially deals with the magic number 27. In the signature from RPC, if the value of V is less than 27 (maybe 0-3), then 27 will be added as the new value of V. in this way, fromrpcsig function is compatible with ECDSA original value of V (that is, recoveryid) and Ethereum value of V.

Is this really a bug in Ethereum design?

Going back to the source code file of fromrpcsig just now and looking at the interface implementation in detail, we find that there is such a line of code “V: chainid? Recovery + (chainid * 2 + 35): recovery + 27 “, the code of V assignment reveals three information, namely magic number 27, magic number 35 and chainid.

So, the question is more, what is magic number 35? What is chainid?

It’s not like a bug

With these questions, we can see that the design of chainid is described in Ethereum eip155. Based on the Ethereum source code, there are many chains running in the network. In order to prevent the transaction of one chain from being submitted to another chain and causing replay attack, the design of chainid is introduced and implemented in the position of 2675000 block height.

After understanding the role of chainid, another question arises: why do we need chainid when Ethereum has a networkid to distinguish different networks?

This should be explained from the scope of networkid and chainid. Networkid is mainly used for chain isolation at the network level. Nodes need to exchange networkid when establishing mutual connection. Only when they have consistent networkid can they complete handshake connection. Chainid is the transaction level, which can prevent the transactions of different networks from being crossed and repeatedly attacked.

Ethereum (ETH) and the classic Ethereum (etc) have a main network ID of 1, which requires chainid mechanism to prevent cross replay of transactions between Eth and etc. the chainid of eth main network is 1, and that of etc main network is 61.

At this point, I still don’t know why it is 27 and why it is 35? We can see the communication record between Jan and buterin in issue? 155 of EIP GitHub. It seems that 27 is the product of bitcoin.

Open the GitHub of electric. We can find the following code in electric / electric / ecc.py

It can be seen from the code that when electrum signs, it is originally a recovery ID between 0 and 3

Add 27 and a compression mark. If there is compression, add 4. The value range of recid is 27-34.

So far, we can see that 27 and 35 probably come from this. Ethereum inherits the design of bitcoin, and the implementation mode is also determined in the ckey:: signcompact function of bitcoin source code bitcoin / SRC / key.cpp, but why bitcoin is designed in this way is still unknown.

ECDSA is the “bug”

At this point in the story, we have a general understanding of the past and present life of magic number 27 in Ethereum code, but this is only the beginning of the story, which leads us to further think about a question: what is recoveryid?

In order to explain this problem clearly, we need to start with ECDSA algorithm and understand its underlying principle from a mathematical point of view. ECDSA is a transaction signature algorithm adopted by FISCO bcos. We will find that ECDSA has a recovery mechanism, which is the real “bug” level function.

ECDSA (elliptic curve digital signature algorithm) is a digital signature algorithm based on elliptic curve. Digital signature algorithm is a method to identify digital information by using public and private key system, which is similar to ordinary signature written on paper. Common digital signature algorithms include DSA, RSA and ECDSA.

Elliptic curve cryptography (ECC) is a public key encryption algorithm based on elliptic curve mathematics, which is based on the elliptic curve discrete logarithm problem. The common protocols include ecdh, ECDSA and ecies.

Elliptic curve parameters can be configured in many ways, so there are many different curves, such as secp256k1, secp256r1, curve25519, etc. there are some differences in the safety of different curves, which are described in safecurves.

ECDSA algorithm mainly includes the following four key functions:

Generate key genkey

·Select an elliptic curve E_ P (a, b), select the base point G, the order of G is n

·Select the random number d ∈ n as the private key, and calculate the public key q = D ⋅ G

Signature algorithm sign

·Using message digest algorithm for message M, we get z = hash (m)

·Generate random number k ∈ n, calculate point (x, y) = k ⋅ G

·Take r = x mod n, if r = 0, then select random number k again

·Calculate s = k ^ − 1 (Z + RD) mod n, if s = 0, then select random number k again

·The above (R, s) is the ECDSA signature

Verify algorithm

The public key Q and message M are used to verify the signature (R, s).

·Verify R, s ∈ n

·Calculate z = hash (m)

·Calculate U_ 1 = ZS ^ − 1 mod N and u_ 2 = rs^−1 mod n

·Calculate (x, y) = U1 ⋅ G + U2 ⋅ Q mod n

·Judge r = = x, if equal, the signature verification is successful

Recovery algorithm

Given the message M and signature (R, s), the public key q is recovered and calculated.

·Verify R, s ∈ n

·Calculate r = (x, y), where x = R, R + N, R + 2n… And substitute it into elliptic curve equation to obtain R

·Calculate z = hash (m)

·Calculate U_ 1 = − Zr ^ − 1 mod N and u_ 2 = sr^−1 mod n

·Computes the public key q = (x ‘, y’) = U_ 1⋅G+u_ 2⋅R

In order to answer the question of recoveryid, we focus on “recovery algorithm recovery”.

In the process of calculating R, we can see that there are multiple possibilities of X values, which leads to the possibility of multiple R values. Therefore, there are also multiple possible results of the calculated Q. we need to determine which q is correct by comparing with the known public key. If the correct q is not found by traversing all the possibilities of X, it means that the message does not correspond to the signature, or it is an unknown public key.

In order to determine the correct Q, we need to traverse all possible values of X and run multiple rounds of recover algorithm. This time cost is relatively large. In order to improve the time efficiency of recovery, the idea of space for time is adopted. A V value is added in the signature to quickly determine X and avoid traversal search trial. This V value is recoveryid.

In the blockchain system, the client signs each transaction, and the node verifies the transaction signature.

If “verify algorithm” is used, the node must first know the public key corresponding to the transaction, so it needs to carry the public key in each transaction, which consumes a lot of bandwidth and storage.

If the recovery algorithm recovery is adopted and the recoveryid is carried in the generated signature, the public key corresponding to the transaction can be quickly recovered, the user address can be calculated according to the public key, and then the corresponding operation can be performed in the user address space.

There is a design philosophy of blockchain hidden here. The resources (assets, contracts) on the blockchain belong to a user. If a signature that matches the user’s address can be constructed, it is equivalent to mastering the user’s private key. Therefore, the node does not need to determine the user’s public key in advance, only restores the public key from the signature, and then calculates the user’s address, The corresponding operation of the user address space can be performed.

Based on this principle, FISCO bcos designs and implements transaction signature and signature verification.

Calculation of recoveryid

In the article on the performance optimization of Java SDK (remember the process of improving the performance of Java SDK from 8000 to 30000), a key optimization point, the calculation of recoveryid, is mentioned. Here we discuss it in detail.

ECDSA signature (R, s), where R is the X mod n corresponding to a point kg (x, y) on the elliptic curve, which means that only the value related to the x-axis coordinate is left in the signature information and the value related to the y-axis is discarded. In “recovery algorithm recovery”, try to retrieve the value corresponding to y axis, construct R, and then recover the public key.

Because r = x mod n, R, R + N, R + 2n… May be legal original x values. Different elliptic curves have different numbers of legal x values. There are two possible R, R + n for secp256k1 curve adopted by FISCO bcos.

Each X-axis coordinate corresponds to two possible y-coordinates, so FISCO bcos has four possible R, (R, y) (R, – y) (R + N, y ‘) (R + N, – y’). However, for an R value, the probability of two X-axis coordinates is very low, which is almost negligible. Ethereum ignores these two small probability events.

How small is the probability of this small probability event? This starts from the parameters of secp256k1 curve. When describing the points (x, y) of an elliptic curve, the values of X and y are the results of mod P. P is the parameter of the curve, which is a large prime number. The previously mentioned n is also the parameter of the curve, which is equal to the number of points on the curve (the number of points on the curve is n * h, h is also the parameter of the curve, and the curve H = 1), The values of N and P are very close, as shown in the figure below.

Because r = x mod n, X is the result of mod p, R is the result of mod n, the range of x value is [0, P-1], and the range of R value is [0, n-1]. If R + n is also a point on the curve, then the value of R must be less than p-n, and the probability is (p-n) / P, about 3.73 * 10 ^ – 39, which is very small.

Based on the signature result (R, s) and the y value of the random point (x, y) generated in the signature process, the recoveryid is calculated as follows:

1. id = y & 1; //「 In the signature algorithm “sign”, the Y coordinate of the k g point sets the ID value according to the parity, because y is the result of mod p, and its parity is completely corresponding to the positive and negative of the coordinate axis

2. id |= (x != r ? 2 : 0); // Small probability events, as explained above

3. if (s > n / 2) id = id ^ 1; // If s calculated by signature is greater than N / 2, N-S will be taken as s value. Therefore, the corresponding transformation is made here, and the two transformations occur at the same time

The article of Java SDK performance optimization is based on this calculation formula, which changes the traversal search recoveryid to calculation, which greatly improves the performance.

Afterword

Start with a magic number, consult the relevant information, understand the design principle, and then break into the world of ECDSA, confused and wandering in a pile of mathematical formulas, one problem after another. At the beginning, I was confused and didn’t understand. Relying on Virgo’s cleanliness spirit, I finally solved my doubts one by one.

Excellent cryptographic protocols, esoteric mathematical theory, to be a blockchain code farmer, there are still many things to learn. Only through painstaking efforts and strenuous efforts can we treat every doubtful point well and never let go of every detail.

There will always be a day when you can see the sun through the clouds and the moon through the clouds.

Responsible editor; zl