1 Overview

With the development of the information society, data exchange, online transactions and other activities are becoming more and more frequent, so network security has become an important issue that people pay attention to. With the development and application of information technology, the connotation of information security is also constantly extending, from the initial information confidentiality to information integrity, availability, controllability and non-repudiation, and then to attack (attack), defense (prevention), measurement (detection), control (control), management (management), evaluation (assessment) and other basic theories and implementation techniques. There are currently three metrics for security: authentication, data integrity, and confidentiality.

The HMAC_SHA1 algorithm can be well used in identity verification and data integrity, and it is also well implemented in the current network security. However, most applications are implemented by software, but its security is difficult to be guaranteed, so the research on hardware implementation of security algorithms has become a hotspot. This paper optimizes the design and implements the HMAC_SHA1_96 algorithm application scheme of the hardware system through the analysis of the algorithm and the characteristics of the field programmable chip.

2. SHA1 function

The SHA1 function is a secure hashing algorithm SHA designed by the National Institute of Standards and Technology and the National Security Agency for use with DSS and published as the Federal Information Processing Standard (FIPS) for the Secure Hash Standard (SHS), SHA1 is Revised version of SHA. When the input length is less than 2 64bit messages, a 160bit digest is output. The algorithm steps are as follows:

Step 1: Fill in the additional bits. Generally, the length of the message is 512 modulo 64 bits after padding. This step is usually required even if the message length is already the desired length. So the padding length ranges from 1 to 512, with the highest bit being 1 and the rest being 0.

Step 2: Append the message length value. That is, a 64-bit message length is appended to the above message (higher byte first) to achieve a multiple of 512 bits.

Step 3: Initialize variables. A 160-bit cache (ie, a 160-bit register) can be used to store the initial variables, intermediate digests and final digests of the hash function, but it must be initialized first, and assign values ​​to the initial variables, namely:

A=0x67452301,B=0xefcdab98, C=0x98badcfe,D=0x10325476,E=0xc3d2e1f0

Step 4: Process 512bit packet packets. This step includes four loops, each loop has 20 processing steps, and each loop uses different nonlinear functions for B, C, and D, and the constants used are also different:

For t=0~19,

ft (B,C,D)=(B∧C )∨((¬B)∧D),

Kt=0x5a827999

For t=20~39,

ft (B,C,D)=B⊕C ⊕D,

Kt=0x6ed9eba1

For t=40~59,

ft (B,C,D)=(B∧C )∨(B∧D)∨(C∧D),

Kt=0x8f1bbcdc

For t=60~79,

ft (B,C,D)=B⊕C ⊕D,

Kt=0xca62c1d6

Note: ∧ means “and”; ∨ means “or”; ⊕ means “exclusive or”; ¬ means “reverse”.

In each step, the following algorithmic process will be performed (Figure 1).

Induced into the following form, where “<< represents a cyclic left shift: A’, B’, C’, D’, E’← (((A<<<5) + ft (B, C, D) + Et +Wt +Kt ), A, (B<<<30), C, D Since we input 16 32bit messages, and the SHA1 operation requires 80 32bit data, there is a conversion process that generates 2560bit data from a 512bit message (ie generate W operation), and its conversion mechanism is as follows:

For t=0~15, W t=Mt;

For t=16~79,

Wt=(M t-3⊕Mt-8⊕Mt-14 ⊕Mt-16)《《《1。

Step 5: Result output. After the 512-bit message is calculated, a 160-bit message digest is output.

3. HMAC_SHA1_96 algorithm

The HMAC_SHA1_96 algorithm [2, 3, 6] is based on the one-way hash function SHA1 and a key-based integrity check verification mechanism. It selects 96 bits from high to low from the generated 160-bit digest as the final output. In this algorithm, it is mainly the SHA1 function and the HMAC algorithm. Its function is to generate a digest and put it after the message to verify whether the message has been modified or changed during transmission, and to ensure the integrity of the message. According to the definition of HMAC, the HMAC_SHA1_96 algorithm schematic diagram of this design is shown in Figure 2.

A few explanations of the algorithm in Figure 2: ① _ipad represents the result of XOR between the complemented key and ipad, K_opad represents the result of XOR between the complemented key and opad; ② SHA1 operations include generating W operations; ③ by The result of the SHA1 operation output is the result of addition processing; ④ The dotted line part represents the ignored information grouping and the corresponding SHA1 operation part; ⑤ If there is only a 512bit message, the first round only needs to perform two SAH1 operations, then go to second round. Figure 2 can be written as the following expression:

SHA1( K XOR opad, SHA1(K XOR ipad, M) )

Where K is the new value after the key is complemented, that is, 0 is added after the key to make it 512bit; ipad is an array of 0×36 repeated 16 times; opad is an array of 0×5c repeated 16 times; M is message; XOR means XOR operation; SHA1 is a secure hash function.

4. Hardware Design

According to the above algorithm analysis and implementation process characteristics, combined with the hardware structure of the FPGA chip, the following hardware system design and optimization are carried out.

4.1 Using the RAM structure

In the HMAC_SHA1_96 algorithm system, there is a large number of data storage. If it is implemented by registers, it needs about 7000 registers, which takes up a lot of FPGA chip resources. This is because each LE (logic unit) unit on the FPGA chip There is only one register, so other hardware resources on each LE unit would be wasted. However, a large number of ESB (Embedded System Block) resources on the FPGA chip are not fully utilized, and ESB can be used to implement various types of memory modules, such as RAM, ROM, FIFO, and CAM, etc. In this case, ESB can be used Implement RAM to replace registers, thereby saving LE hardware resources, and RAM stores data, which is much more convenient to control than registers.

4.2 Reusing the same module

As mentioned in the previous algorithm, the SHA1 algorithm consists of 80 operations, and the structure of each operation is the same. If the waterline operation mode is used, using 80 identical modules will occupy a lot of hardware resources. It does not meet the requirements of the optimized design, so you can first optimize the design of a module, and then reuse the module 80 times. The result of each operation needs to be stored in the register to be sent to the next operation, which can greatly optimize the use of FPGA hardware. resource.

4.3 Module Division

The hardware implementation of the HMAC_SHA1_96 algorithm system must have data handshake transmission with peripheral circuits. Since the operating clock frequency of the peripheral circuit (8255 or CPU) is different from the operating clock frequency of the designed chip, in order to make the designed chip work in harmony with the peripheral circuit, it is necessary to specially design the input and output interface circuit, and then design the core processing module, so as to Not affected by the external circuit working environment. Therefore, the design can be divided into three parts: input module, algorithm realization module and output module.

4.3.1 Input module

Since the input module will be connected to the peripheral circuit (such as 8255) for signal or data transmission, according to the handshake signal ACK and OBF, each time the input 8bit data is written to the 64×8bit RAM, it takes 64 times, and when the peripheral circuit inputs the data by Controlled by the internal signal sha_end, the output of this module is 32bit at a time, so it is equivalent to reading 4×8bit data at a time.

4.3.2 Algorithm Implementation Module

This module mainly performs HAMC_SHA operation and outputs 160bit abstract. The data processing flow chart is shown in Figure 3 (M_RAM is used to store messages in the figure), which can be divided into the following main parts:

① Key input processing part. When processing the key, it is necessary to perform XOR operation on it first, and then write it into two 32×16bit RAMs, assuming that they are I_RAM and O_RAM respectively. The data in I_RAM is calculated first in the first round of SHA1 algorithm, while the data in O_RAM will not be calculated until the second round.

② Generate the W processing part. Since 80 operations are performed in the SHA1 function, each operation uses a different 32-bit W value, and only 16 32-bit data are input, so the algorithm uses four different W values ​​for XOR operation to generate a new W value. The four W values ​​are read from the 80×32bit W_RAM, and new W values ​​are generated and then written to the unused positions in the RAM in turn.

③ SHA1 operation part. It is the core part of the design. It needs to complete 80 operations, each time a 32bit W value is read out from the 32×80bit RAM, and a 160bit summary is finally generated.

④ Summary processing part. It mainly performs addition operation on the digest generated after each SHA1 operation and the initial key of this time, as the initial key of the next SHA1 operation, or as the final output digest, or as the message input of the next round of SHA1 operation.

⑤ Abstract complement part [6]. Complement the 160bit digest generated in the first round, the method is: [160]~[190] 0[191] 1 [192]~[479] 0 [480]~[511]=1010100000, write this value A 16×32bit FILL_RAM.

4.3.3 Output module

Like the input module, due to the signal or data transmission with the peripheral circuit, according to the handshake signal STB and IBF, 8bit data is output to the peripheral circuit each time, but this part is mainly an 8×12bit RAM, which can write 96bit data at a time.

4.4 The overall structure of the hardware system

The purpose of adding a latch at the data output end is to ensure that the output data is always valid before being sampled by the peripheral circuit, so as to achieve the purpose of coordinating the design with the peripheral circuit. The structure diagram is shown in Figure 4.

5. FPGA implementation

This design adopts APEX20KE160EQC240_1X chip of Altera to realize, its function module and PC interface schematic diagram are shown in Fig. 5.

The FPGA programmer in the figure adopts Quartus II 2.0 software, the HMAC_SHA1_96 application environment setting is mainly used to configure the software to control the operation mechanism of HMAC_SHA1_96, and the PCI controller is used to control the communication between the FPGA chip and the PCI BUS. First, generate pof file or sof file through the code layout and wiring through Quartus II 2.0 software. The pof file can be used to configure the FPGA directly, but it must be reconfigured every time it is used; the sof file can be stored in the EEPROM first, and then the FPGA is configured by the EEPROM , configured by EEPROM before each power-on, so that it can be directly applied to the information security hardware system.

Responsible editor: gt

Leave a Reply

Your email address will not be published.