With the rapid development of the Internet, network traffic is increasing and network attack types emerge in endlessly. With the requirements of high speed, low miss detection rate and low false alarm rate of intrusion detection, the traditional software based IDS is facing more and more pressure. The improvement of pattern matching algorithm can only improve the speed of intrusion detection, which is not the fundamental strategy to solve the problem.
Through the analysis of the bottleneck of detection speed and pattern matching algorithm in intrusion detection system, this paper puts forward the strategy of Intrusion Detection Based on hardware. In the system, hardware is used to replace the software in the traditional intrusion detection system to realize the main component functions, such as data acquisition and filtering, multi pattern matching and data packet scheduling. The speed of hardware implementation is significantly higher than that of software implementation.
2 system design
The design of the system prototype is based on SOPC technology and adopts the design ideas of multiprocessor parallel processing structure, expandable instruction set and coprocessor acceleration algorithm. Its core content is to use multi-level parallel processing technology and special functional units to replace the complex time-consuming algorithm, so as to obtain great packet processing performance.
1. Overall structure design of the system
The main components of the prototype system include data acquisition module, message dispatcher, soft core processing unit, pattern matching coprocessor, custom instruction, main control channel interface and response output. The architecture of the system is shown in Figure 1.
In order to solve the bottleneck problem of detection speed of intrusion detection system in high-speed network environment, each processor component must be able to process data packets at line speed in each link of the intrusion detection process. In this system, the hardware module is used to realize the function of data acquisition, the parallel processing method is used to speed up the data packet analysis and detection module, and the coprocessor is used to speed up the pattern matching process. FPGA based coprocessor can accelerate many different algorithms, and its performance is dozens of times higher than that of software implementation.
2. Data acquisition unit
The traditional network data interception is realized by software, and the data acquisition is realized by copying the data packets in the MAC buffer through the function. The running speed is low and the packet loss rate is high. The data acquisition and filtering module of the system is directly implemented by hardware. When the input port has data coming in, the filtering unit selectively copies the required data into the buffer, which has high acquisition speed and low packet loss rate in Gigabit environment.
According to the fair polling strategy, the packet dispatcher reads the IP packets from the input channel in turn, and then dispatches the IP packets to the input buffer FIFO of the soft core processor according to the idle status of the FIFO in the input buffer.
3. Soft core processor unit
SOPC builder is used to define the soft core of Altera Nios Ⅱ processor, which is the parallel processing unit in the network processor prototype chip. In the definition, various hardware parameters need to be set, including working frequency, cache options, power on boot mode, interrupt register setting, memory size, flash base address, various access modes and address mapping, etc. This paper describes several processor soft cores in turn, and generates the logic symbols and software programming interfaces of each processor soft core, and generates. ELF file. The hardware description language is used for other hardware logic designs in the chip prototype, and the generated hardware configuration data is combined with the. ELF file of the processor soft core to form the. Hex file for configuring the chip prototype FPGA.
In the system chip prototype, multiple PES adopt parallel structure, each PE has its own data and instruction memory, and shares pattern matching coprocessor module. The communication between PE is realized by parallel communication interface PIO. Nios Ⅱ is the soft core of microprocessor, ram and cache store instructions and data, load_ Header and send_ Header is the header input/
Look up controller is the interface logic of multi pattern matching coprocessor_ Update controller is the logic interface of rule table management, PIO is the parallel communication interface.
4. Coprocessor design
Despite the improvement of software pattern matching algorithm, pattern matching is still the limitation of high-speed traffic analysis. We remove this bottleneck by downloading all pattern matching tasks to reconfigurable FPGA coprocessors. FPGA can compare each mode and packet content in Snort rules, and the whole rule set is installed on a low-end FPGA device.
In this system, Nios Ⅱ, the soft core of processor, shares the coprocessor of matching table lookup and adopts the bus arbiter sharing mechanism. Each processor soft core has an interface controller lookup that matches the look-up coprocessor, which is included in the processor soft core as a custom instruction logic. The coprocessor works in parallel with the soft core of the processor to determine whether the coprocessor is finished by query. The coprocessor uses cam to store the rule table. Key technology and implementation of the system
1. Multiprocessor technology
PE (processor element) is the core component of network processor, which undertakes the main computing tasks in the system. Each PE is actually a microprocessor core. Network processors usually include multiple PES. There are two main connection modes between PES: parallel connection and serial connection.
2. Multi level parallel processing technology
Network packet processing is essentially parallel. In order to maximize the speed of packet processing and meet the needs of network applications, network processors implement different levels of parallel processing: processor level parallel processing, thread level parallel processing through hardware multithreading, and instruction level parallel processing through instruction pipeline. Parallel between processors can also be divided into parallel running between PE and parallel running between PE and coprocessor.
3. Hardware thread and “0 overhead” switching technology
The hardware multithreading technology is used in PE. There are many hardware threads in each PE. The hardware thread refers to the thread that has independent program counter, register and storage space. The principle of “0” delay handoff is: when a hardware thread switches, it does not need to save the field information of the thread to be stopped or recover the field information of the ready process to be run, so it does not consume PE resources and can achieve “0 overhead” handoff. NP uses hardware threads to cover up the delay of thread switching, and improves the efficiency of PE. In general, the efficiency of PE can be approached by reasonable design of scheduling strategy.
4. Distributed data storage technology
Memory is also the key of network processor, and its access speed and bandwidth are important factors affecting the performance of network processor. According to the idea of store and forward, all packets must be cached after entering the network processor. In order to process packets at line speed, the output speed of packets must be consistent with the input speed, that is, the memory bandwidth is more than twice of the port speed, which is a huge challenge to the current memory.
On the one hand, NP adopts data distributed storage structure. There are multi-level memories in NP: on chip fast memory and off chip slow memory. On the other hand, network processor uses data prefetching, block transmission, high-speed data path technology to solve the problem of high-speed computing and high-speed data transmission.
5. System implementation
This system is a prototype of network processor for intrusion detection. The process of intrusion detection includes four steps: data acquisition, data packet preprocessing, data packet detection and response. In the prototype system, the main work of intrusion detection, such as data acquisition and filtering, multi pattern matching, data packet distribution on multi processing units, is implemented by hardware, and the analysis and detection of data packets are processed by multiple processing units in parallel.
The prototype system consists of a main control module, a hardware data acquisition module, eight soft core parallel processing units, and a multi pattern matching coprocessor. The system uses the packet dispatcher to complete the task of IP packet to the core processor, and the soft core microprocessor does intrusion detection on the packet. Each soft core microprocessor has an input buffer FIFO and an output buffer FIFO, which are used to buffer the packets to be processed. The response and output control module decides whether to discard the packet as illegal packet or forward it to the corresponding output data path as normal packet according to the result of packet inspection. 4 system test and performance analysis
In order to test the system, the verification platform as shown in Figure 2 is developed. The system prototype verification platform uses multiple FPGAs to implement various functional interfaces. At the same time, the core functions of the network processor are implemented in a single FPGA chip. The main control module supporting the network processor control function and system management function is also implemented in an independent FPGA.
The system performance test is carried out under the system verification platform. The verification platform is connected with the microcomputer through JTAG interface, and the IDE development environment is used for software debugging. Message generator is a module in FPGA, which can generate long continuous message. The input terminal of the network processor chip receives 33 bits of data (32 bits of data and 1 bit of flag bit), which is allocated to multiple PES for processing by the message dispatcher. Each PE parses the message header and looks up the rule table to generate 11 bit forwarding control information for output control.
The important performance index of pattern matching on FPGA is throughput. We experiment by inputting data packets of different sizes, and the experimental results are shown in Table 1.
It takes 246 clock cycles for PE to read the first message header to read the second message header, and the message processing delay is 4920ns. In the experiment, the system processes 330k packets per second. The throughput of the system is related to the length of packets. The total throughput can reach 14gbps theoretically, but the theoretical value can only be obtained when PE and coprocessor are running at full load. It can also be seen from table 1 that with the increase of resource utilization in FPGA, the internal delay will increase and the throughput will slightly decrease.
By analyzing the bottleneck of detection speed in intrusion detection system, a hardware based intrusion detection system prototype is designed. The prototype uses the hardware strategy based on network processor to replace the software strategy of traditional intrusion detection. Experiments show that the performance of the system is significantly improved compared with the traditional methods, and the problem of speed in intrusion detection is well solved. The system is based on FPGA, and can increase hardware and custom instructions to improve the system performance according to the actual needs.
The author’s Innovation: a prototype of intrusion detection system based on SOPC network processor is designed. The main work of intrusion detection is realized by hardware, and the performance is significantly improved compared with the traditional software based strategy.
Editor in charge: GT