From the perspective of hardware, reliability testing is divided into two categories:
Reliability test based on industry standards or national standards. Such as electromagnetic compatibility test, climate environmental test, mechanical environmental test and safety regulation test.
The test items developed by the enterprise according to its product characteristics and understanding of quality. For example, some fault simulation tests, voltage pull bias tests, fast power on and off tests, etc.
These two types of reliability tests are introduced below.
1 reliability test method based on industry standards and national standards
The product must bear a lot of external stresses in its life cycle. The common stresses include business load, temperature, humidity, dust, air pressure, mechanical stress, etc. The makers of various industry standards and national standards give how much stress level a certain type of product will have under what application environment, and the standard users should select the corresponding test conditions, namely the stress level, according to the product application environment and quality requirements. The selected stress level is essentially the product test specification.
In the product testing stage, we must apply corresponding stress types and stress levels to sufficient test samples one by one in the laboratory environment to investigate the working stability of the product. For communication equipment, the common test items include at least EMC test, safety test, climate environment test and mechanical environment test, and the above four types of test items also include many test sub items, such as climate environmental test also includes high temperature work test, low temperature working test, wet heat test, temperature cycle test, etc. There are many such test items, which will not be introduced in detail here. In general, all test items belong to specification compliance test (i.e. pass or fail test). The purpose of the test is to simulate the stress type and stress level of the product in the life cycle and investigate its working stability.
2 reliability test method of enterprise design
As the functions of network products vary greatly, there may be a variety of applications. Generally, the industrial standards and national standards related to reliability testing only give the test stress conditions of a certain type of products, and do not indicate the working state or configuration combination of the tested equipment to be tested. Therefore, some test combinations may be omitted in the test design. For example, rack products, line card type, line card installation location, message type and system power supply configuration can be flexibly matched, which involves more test combinations, and there must be more extreme test combinations in the test combinations. Another example is to verify the system heat dissipation performance of the rack. The worst test combination is that the rack is fully equipped with the maximum power line card board under the heat dissipation condition; If considering the low-temperature working performance of a line card, the more extreme combination is to configure the least single board on the rack with the best heat dissipation conditions, and the configured single board has the least power consumption, and place the single board in the slot with the best heat dissipation.
In short, when doing test design, it is necessary to jump out of the limitations of traditional test specifications and test standards, and carry out test design from the perspective of product application, so as to ensure that every hardware feature and hardware function under the typical application combination, full configuration combination or extreme test combination of the product are fully exposed to various test stresses. The test of this link ensures that, The reliability of the product is guaranteed.
The following two examples are given to illustrate how to design reliability test methods according to product characteristics.
2.1 example 1: parallel bus test of package processor external buffer
In order to deal with the burst traffic and traffic management of the network, the packet processor in the network equipment usually has various random access memories (RAM) to cache packets. Due to the interconnection between packet processing and ram through high-speed parallel bus, generally, the working clock frequency of the parallel bus may be as high as 800MHz, with a large number of signals and complex topology. When the product device density is higher and higher, the product is likely to encounter serious signal quality problems such as crosstalk and switching synchronization noise (SSN), We need to conduct careful business design to fully expose the corresponding hardware circuit to adverse physical conditions to see whether it works stably.
Crosstalk is simply a kind of interference. Due to the internal and external wiring of ASIC, the runout of one signal line will produce unwanted voltage noise interference to other signals. In order to improve the circuit working rate and reduce low power consumption, the signal amplitude is often very low. A small signal interference may lead to digital 0 or 1 level identification error, which will have a great impact on the reliability of the system. During the test design, it is necessary to apply a special service load to the tested equipment to make a large number of specific signal jumps on the tested bus, that is, expose the bus to the crosstalk as much as possible, observe whether the signal quality of each bus is acceptable and monitor whether the service is normal with an oscilloscope. Taking the 16 bit parallel bus as an example, in order to extreme the crosstalk effect, when designing the test message, the jump direction of 15 lines (i.e. attack signal line agressor) of the 16 signals shall be the same, that is, 15 signal lines shall jump from 0 to 1 at the same time, and the other disturbed signal line (i.e. victim) shall jump from 1 to 0, so that all 16 lines shall traverse this situation.
Switching synchronization noise is also a physical phenomenon that we may not expect in RAM high-speed parallel interface. When the driver of the IC is switched on and off at the same time, it will produce a large current that changes instantaneously. When it passes through the inductance in the return path, it will form an AC voltage drop, resulting in noise (called SSN), which may affect the signal level decision of the signal receiving end. This is a very bad working state of parallel bus, which poses a severe test to the high-speed signal transformation ability, driving ability, dynamic response of power supply and filter design of power supply. In order to verify whether the product works reliably under such working conditions, a special test load, namely special test message, must be added to the equipment under test (DUT).
give an example:
If the tested bus is 16 bit wide, to make all 16 turn synchronously with the signal line, the message content should be:
If the tested bus is 32-bit wide, in order to turn all 32 bits synchronously with the signal line, the content of the test message should be:
If the tested bus is 64 bit wide, to turn over all 64 signal lines synchronously, the content of the test message should be:
If the message has the bus with the above bit width in the service channel inside the DUT, the service test must load the above message to see whether dutuut works normally under each message, and conduct signal test on the corresponding bus to see whether the signal is normal.
2.2 example 2: thermal test
The thermal test measures the temperature distribution of key points or key devices in the product by using a multi-channel point thermometer. The test result is the input condition for the life of calculator components (such as E-cap) and the prediction of product reliability index. It is an important reliability activity in the process of product development.
Generally speaking, the thermal test is mainly to verify whether the thermal design of the product meets the specifications of the product’s working temperature range. It is a laboratory benchmark test, which means that in order to ensure the consistency of the test results, strict requirements must be imposed on the test environment. For example, the tested equipment is required to operate without heat source and forced air cooling within a certain range, and the surface cannot be covered with any foreign matter. However, in fact, the working environment of many products is different from the above test environment:
Some products may be placed on the table or hung on the wall, and these equipment basically rely on natural heat dissipation. Different installation methods will directly affect the heat convection of the equipment, and then affect the temperature distribution inside the equipment. Therefore, different installation positions must be considered when testing such equipment. Placing the equipment on the table under laboratory conditions and passing the thermal test does not mean that the equipment can pass the thermal test when hung on the wall.
Some network devices are used more in the Internet cafe industry, and it is common to stack several devices together. When doing the thermal test of similar products, we must consider whether the product meets the requirements under this condition.
For some frame type equipment, due to many slots, there may be some dead corners in the air duct design. If the tested object is a service board, which can be inserted into multiple service card slots at will, the tested board must be placed in the slot with the worst heat dissipation during thermal test, and the high-power service board that can be supported by the specification must be inserted in the slot next to it. After that, let the tested board assist the board and work at full load, and carry out thermal test under the condition of this service configuration.
The hardware reliability test items may be different for different product forms, but the basic idea of the test is the same. The basic idea is to completely analyze the possible application environment of the test object. Under the possible application environment, it will bear the possible working state, including limit working state, manufacture various stress conditions and change the working state of the equipment in the laboratory environment, Try to expose every hardware feature and hardware function of the product to various limit stresses one by one. The omission of any test combination will inevitably affect the reliability of the product.