The main purpose of this paper is to develop a new kind of PCI slave device serving as a motion controller for a biaxial motion control system. This kind of controller device is a new realization scheme of PCI devices, which is embedded with a deeply customized PCI interface block instead of traditional PCI interface chips, which will greatly promote the comprehensive performance of the device. Besides, we improved the popular and widely used DDA arc interpolation algorithm, promoting its performance in both accuracy and stability, and integrated it into our device, allowing the ability of the moving parts to move along nonlinear curve paths. Currently, this kind of controller device has been successfully applied on a surface mount machine which is also developed by our lab. As a result, the controller device performs well and is able to satisfy the requirement of accuracy and velocity of the surface mount machine. And its reliability and stability are also remarkable.

1. Introduction

Biaxial motion control system is a kind of electromechanical system widely used in various areas such as industrial manufacturing and commodity production [1]. For example, plane coordinate plotter and surface mount machine are two representative kinds of biaxial systems. In general, one of the common points of this kind of systems is the requirement of high speed and accuracy.

For example, a plane coordinate plotter mentioned in [2] is a typical kind of biaxial system. The system consists of several components including a motion platform, turn-screws, stepping motors, plotting cursor, microcontroller, and computer interface. During a plotting process, the cursor is moved to a given position under the control of the microcontroller according to the instructions from the computer [2]. In order to draw a wanted curve quickly and accurately, high speed of data transaction must be guaranteed as well as the reliability. Surface mount machine is also a kind of sophisticated biaxial motion system [3], the index requirement of which is even higher.

Here in this paper, an implementation of motion control board specially intended for biaxial motion systems is proposed. This control board is designed as a slave device abiding PCI bus protocol, allowing fast data transactions between the upper control master and the slave device, possibly reaching to a peak speed of 132 MB/s (32-bit in 33 MHz clock).

In terms of board level design, traditional scheme usually includes microcontroller, FPGA, and an extra PCI interface device, such as PLX9054 [4, 5]. However, this kind of design is not compact enough. We embedded the PCI protocol decoding block, written in Verilog HDL, into a FPGA device on board instead of independent PCI interface chips, reducing a large quantity of routes on board, thus promoting the reliability of the board system and cutting down the total cost. More importantly, the board is convenient to update to a new scheme version and can be deeply customized according to the demand of customers. Furthermore, the compact design is easy to be protected from imitation and malicious plagiarism.

Another important issue which ought to be carefully considered for biaxial motion system is arc motion, that is, how to move the cursor along a curve rather than a line [6], which requires a systematic moving algorithm to arrange the velocities and positions of the two motors of the two axes. To solve this problem, an algorithm called digital differential analyser (DDA for short) has been proposed [7, 8]. DDA is often used as a motion interpolation algorithm. When moving the cursor of a biaxial motion system, an X-Y plotter, for example, DDA breaks the track into several micro parts. Each of the parts can be covered by one step of the cursor, which is realised by moving the cursor to a micro distance along one axis. As a result, a curve can be covered within several micro steps. The basic principle of DDA arc interpolation will be discussed later.

Although the basic problem of arc motion has been solved by means of DDA arc interpolation algorithm, some other problems such as moving stability and smoothness still affect the performance of the algorithm. For example, this kind of DDA algorithm, which we call traditional DDA, tends to cause unwanted sawteeth on the moving track and often produces huge errors when arc radius is large. In order to overcome its defects and improve its performance, we modified the algorithm, allowing the moving parts of a motion system to track along a given curve within tolerable errors.

The improved DDA algorithm is rewritten in Verilog HDL, and embedded in an FPGA device on board.

2. Device System Structure

The entire system structure of the motion control board is shown in Figure 1. This board communicates with the upper system through PCI system bus, which offers a rather large bandwidth of data transmission [9]. Meanwhile, the board works under the control of the microcontroller unit (MCU for short) and drives the motors with pulse signals produced by the DDA interpolation block or distributed by the upper system.

2.1. FPGA Device

As shown in Figure 1, most of the functioning logic modules are implemented in FPGA device and written in Verilog HDL. By doing this, the out-chip routing is greatly reduced and the reliability of the entire board system is deeply enhanced.

2.1.1. PCI Protocol Decoding Block

PCI is a multiplexing bus, which makes it relatively sophisticated to decode the PCI protocol. In general, the main task of the protocol decoding block is to separate address and data from the multiplexed AD pins [10], which will be explained in detail in the following contents.

2.1.2. DDA Interpolation Block

DDA interpolation block used in this board is written in Verilog HDL and modularized in FPGA device. Thus, the speed of operation is higher and fewer resources are occupied. This part is also going to be amply discussed in the following contents.

2.1.3. Functioning Registers

By writing data to the functioning registers during I/O transactions, the system is able to control the board device in different modes, which makes the board system rather flexible to use.

2.1.4. DATA FIFO

Since transactions on PCI bus are far faster than those on back-end bus [11], a FIFO or RAM block is necessary to serve as a buffer in order to balance the speed difference.

2.1.5. Instruction Register and Pulse Output Register

The instruction register conserves the control instructions delivered from the system and thus produces a series of frequency-controlled pulses according to the instructions to X-axis and Y-axis motor drivers, driving the cursors to move to the designated position.

2.2. Level Conversion Chips

The signal level on PCI bus is 5 V-TTL, while it is 3.3 V-CMOS on pins of FPGA device. Thus, bidirection bus switches such as 74CBT3384 are needed to serve as level converters between the two different levels.

2.3. Microcontroller Unit

Microcontroller unit (MCU) serves as a center controller, making the system function according to the program written inside the chip.

2.4. Other Essential Components

Components on board also include CAN controller and other bus connectors.

3. PCI Bus Protocol Decoding Block

Although some PCI interface chips, such as PCI9054 [12], are available for this system, we use the PCI protocol decoding block embedded in FPGA device as the PCI bus interface.

3.1. Functions of Decoding Block
3.1.1. Device Configuration

PCI protocol decoding block provides necessary information to PCI master system when the system starts and raises itself, including device ID, vendor ID, resource requirement, and function options [13]. After this, the base address of memory resource assigned by system is written back into PCI device, which will be conserved by the decoding block.

3.1.2. Address Decoding

When an access occurs on the PCI bus, the PCI device should check out whether it is being called by the system [14]. If the access address on the bus hits the range of the address space of the back-end device, it should respond to the system immediately (usually within 3 cycles according to PCI protocol [15]).

3.1.3. Protocol Timing Decoding

Timing decoding of PCI protocol is the key function of the decoding block. PCI device behaves strictly according to PCI timing.

3.1.4. Bus I/O Control

Protocol decoding block will avoid the occurrence of bus collision by handling enable signals of PCI bus appropriately.

3.2. State Transition

The kernel of PCI protocol decoding block is a state machine. Here are the states involved during a data transaction. And the state transitions of the decoding block are shown in Figure 2.

3.2.1. Idle

The slave device is idle, waiting for an access initiated by the system.

3.2.2. Con_Hold

The system has initiated a configuration access to PCI device, including configuration read and write operations, and is waiting for respond.

3.2.3. I/O_Hold

The system has initiated an I/O access to PCI device, including I/O read and write operations, and is waiting for respond.

3.2.4. Mem_Hold

The system has initiated a memory access to PCI device, including memory read and write operations, and is waiting for response.

3.2.5. Read_Hold

This state is specially inserted between address cycle and the first data cycle in a read access on PCI bus to avoid bus collision.

3.2.6. Configuring

A configuration transmission is taking place.

3.2.7. Rwing

A nonconfiguration transmission is taking place.

3.2.8. Ending

A PCI access is ending. All control signals will be disabled and all S/T/S signals will be released in one cycle.

3.3. Read and Write Operations on PCI

An access on PCI bus consists of three parts, one address cycle, at least one cycle, and several wait cycles. Address appears on AD pins during the address cycle while the data appears during data cycles. And wait cycles are inserted for data latency [16].

There are 3 types of transactions on PCI bus. A configuration transaction usually occurs as soon as the control board is inserted to the system motherboard, while I/O transactions are used for parameter settings. And a memory transaction takes place during a data transition operation.

4. Embedded DDA Interpolation Algorithm

Digital differential analyser (DDA), usually serving as an interpolation algorithm, is widely applied in modern numerical control systems [17]. It is used when shifting the moving parts, or cursors of a system, to a designated position along given tracks, especially curve tracks. Generally speaking, the paths of the cursors controlled by numerical signals will not perfectly match the given continuous track. Therefore, the main issue lies in how to plan a path for a cursor to approach the given track as closely as possible. Based on integral theory, DDA arc interpolation algorithm breaks a continuous track into a series of discrete points that the cursor of a numerical system is able to reach [18].

4.1. Basic Principle of DDA Arc Interpolation

DDA interpolation algorithm for an arc in Cartesian coordinate is shown in Figure 3 [19]. To make it easy to analyse, we take the 1st quadrant for example. According to Figure 3, we have [19] where is the starting point and is the ending point, while stands for the current position of the cursor. is radius of the arc, while is the tangential velocity. and , respectively, stand for velocity along -axis and -axis. And is a proportional constant if we assume that the tangential velocity of the moving part is constant. Therefore, we have [18]

Considering this we have [17] where is the number of steps it takes for the cursor to reach the ending point () starting from ().

According to formula (4), we get the DDA arc interpolation algorithm [18]. In order to describe the algorithm briefly, we define two arithmetic expressions as follows: where is a logic expression, and equals “1” when is true and “0” when is fault. Also, we define where is the remainder when is divided by .

At the beginning, the -axis integrand register is loaded with , while the -axis integrand register is loaded with . At the same time, the accumulators of the two axes are usually half-loaded [20]. In other words, the highest bit is set to “1,” while other bits remain “0.” Therefore, the initial conditions can be written as and then the integral clock starts to drive the accumulators to add to the values conserved in the corresponding integrand registers, producing overflow pulses, which drive the integrand registers to update their values with new current coordinate . Thus, we have the recursion formulae as

The algorithm keeps on conducting until the error check registers indicate that the cursor has reached the ending point, or within tolerable errors, after which the iteration stops. Thus, the ending condition can be expressed as

When End equals “1,” the recursion stops. And the points , forming the path of the cursor, are just what we want.

It is worth mentioning that left-shifting normalizing is often used to maintain velocity stability [21], which will not be deeply discussed here.

Given starting point (8,0), ending point (0,8), and arc in the first quadrant, anticlockwise, the simulation result is shown in Figure 4.

As mentioned in Figure 4, we call this kind of DDA algorithm as traditional DDA algorithm. And the logic structure of traditional DDA arc interpolation block embedded in FPGA is shown in Figure 5.

4.2. Improved DDA Arc Interpolation

For the traditional DDA arc interpolation algorithm mentioned in Figure 4, there are some problems. The most fatal one is that when the radius of the arc is far larger than the step length, the errors can be intolerable. For instance, when step length is 1, while radius is 100, the simulation result is shown in Figure 6.

One way to solve this problem is to select an appropriate weighting factor to be multiplied by the integrand before being added into the corresponding accumulator. Here, we define the weighting factor as . Thus, the recursion formulae (8) can be changed into

When , while other conditions are the same as Figure 6, the simulation result is shown in Figure 7, from which it can be concluded that a properly small can improve the performance of DDA when radius is large. We call this kind of DDA algorithm as weighted DDA arc interpolation.

Another problem is that even though the errors are small enough, a great number of “sawteeth” can be seen on the path as shown in Figure 8, which may cause constant mechanical shocks on the system. The main reason leading to this problem is that the accumulators of the 2 axes function separately. Thus, the motion on each axis proceeds separately as well, unless both the accumulators overflow at the same time.

Generally, a sawtooth consists of a y-axis step motion and an x-axis step motion closely following as shown in Figure 9. It is obvious that a sawtooth always occurs when accumulator of one axis has overflown, while the other is going to overflow in the next step.

Therefore, we can do the addition to the latter accumulator in advance in order to produce an advanced overflow. Thus, the recursion formulae can be rewritten as

As a result, the cursor will take one “combined” step instead of two separate steps, thus, eliminating the sawtooth in advance. With the same conditions as Figure 7, the simulation result of the improved DDA arc interpolation is shown in Figure 10. And the detailed comparison of the simulation results between the typical DDA algorithm and the improved one as shown in Figure 11, from which it is palpable that most of the sawteeth on the path are eliminated. We call this kind of algorithm improved DDA algorithm.

In order to implement this improved DDA algorithm on FPGA device, the overflow conditions have to be changed, and the accumulator registers should add to 2 extra bits. One is to save the carry bit caused by overflow, while the other is sign bit, since the value of an accumulator can be negative. And the logic structure of the improved algorithm is shown in Figure 12.

In order to measure the performance of the two algorithms, we define path variance as where is the radius of the curve path and stands for the number of the total interpolating steps from the start point to the end point.

For the typical DDA algorithm, with conditions in Figure 6, we have

While for the weighed DDA algorithm as shown in Figure 7, with the same conditions, we have

And for the sawteeth-eliminating DDA as shown in Figure 10, with the same conditions, we have

From the data, we can see that the improved DDA algorithm has the smallest path variance. Thus, it can be concluded that the performance of the sawteeth-eliminating DDA algorithm improved by us is better than the traditional one.

5. Electromagnetic Compatibility and Signal Integrity

5.1. Power Design
5.1.1. Power System

The maximum power allowed for a 32-bit PCI device is 25 W. And the power system of PCI bus is more complex than other kinds of buses, which has 6 different power connectors (+3.3 V, +5 V, +, +12 V, −12 V, and +3.3 Vaux). Among them, +5 V, +, +12 V, and −12 V are provided by system motherboard, while +3.3 V and +3.3 Vaux should be supplied from the device board, which means that a 5V-to-3.3 V power converting block is needed on board.

5.1.2. Power Layer

The device PCB board has four layers: top-layer, bottom-layer, power plane, and ground plane. Power plane, especially, should be divided into several power districts. If possible, high-speed signal wires will not go across two different power districts. Otherwise, adjust the direction of the slit to minimize the impact.

5.1.3. Power Decoupling

Every Vcc pin of every digital chip is assigned a decoupling capacitor connected to the ground. And every power pin is allocated a 0.047 F electrolytic capacitor and a 0.01 F nonpolar capacitor. What’s more, pads and vias of the decoupling capacitors will not be 0.25 inches farther from corresponding Vcc pins or “golden-fingers,” and routing wide shall be larger than 0.02 inches.

5.2. Signal Wire Routing
5.2.1. Clock Wire Routing

The clock signal of PCI device is based on reflected wave effect rather than incident wave effect. Therefore, the trace length of the PCI clock signal is 2.5 inches and 0.1 inches for 32-bit PCI devices.

5.2.2. Nonclock Wire Routing

The maximum length of signal routing of 32-bit PCI slave device will not exceed 1.5 inches.

5.2.3. Pull-Up Resistors

Every control signal pin should be assigned with a pull-up resistor in case that these pins will not float when not driven. A PCI-slave-device developer need not care about this since it has been done on the motherboard of system.

5.3. Signal Integrity Test on PCI Pins

A simple series of tests on signal integrity of waveforms on PCI pins (the golden fingers) has been conducted on the PCI device board [22, 23]. The first test is to export a 20 MHz (20 Mhz is the maximum frequency that can be generated by the board, which is still close to 33 MHz, PCI clock signal) square wave to one golden finger from the board device and test it using an oscilloscope. And the test result is shown in Figure 13, from which it can be seen that the waveform is rather integrated, with steep rising edge and proper overshoot.

The second test is to export a 20 MHz square wave to one finger while testing that on an adjacent finger. This test is to judge how much interference one high frequency signal on one pin can cause on other PCI pins, especially on the adjacent ones. And the result is shown in Figure 14. It is clearly revealed that the waveform on the tested finger (the lower one) is much like that on the exported finger (the upper one), but the amplitude is far smaller, not enough to reach the threshold level. Therefore, it can be concluded that even though electromagnetic interference shows up on adjacent pins of the output, it does no harm to the functions of the system.

The third one is to export 2 waveforms to 2 adjacent fingers and observe both of them on signal integrity in order to measure the coupling interference. And the result is shown in Figure 15. It is obvious that both waveforms are highly integrated and without big distortion and interference. After the three tests, we can make a conclusion that the property of signal integrity of the board is rather remarkable. Thus, the harmful impact of electromagnetic interference is negligible.

6. Conclusion

In this paper, a kind of motion control board is discussed. And the design scheme of the control board is reasonable and able to satisfy the requirements of biaxial motion systems. The PCI protocol decoding block is self-designed and functions well. And more importantly, we improved the typical DDA arc interpolation algorithm, broadened its application, and reduced its negative effect: the sawteeth. Currently, it has been put in use by our lab, and the result is rather remarkable.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.