Abstract – FIR filters, microprocessor and digital signal processor are the core system of multipliers. MAC is the most important building block in DSP system. The key element of high throughput multiplier and accumulator unit (MAC) is to achieve a high-performance digital signal processing application, but multipliers are the most time, area, and power consuming circuits. In this paper, Modified Russian Peasant Multiplier (MRPM) using adder compressors has been proposed. According to Russian Rules, Divide and conquer technique is used in the multiplication process. But, in perspective of digital design, only shifters and adders are used in Russian Peasant Multiplier (RPM) to produce Partial Product Generation (PPG). In this paper first we present an approach towards the reduction of delay in RPM by using 8:2 adder compressors (8:2 AC), in the partial product reduction stage. The proposed design is also compared to the RPM which uses Ripple carry adder (RCA) and carry selector adder (CSA) in terms of propagation delay. The proposed design enhances speed of the system by 70.81% compared to the RPM using RCA and 92.11% compare to RPM using CSA. The total operation is coded with Verilog HDL using Model-Sim 6.3C, synthesized by using Xilinx ISE 14.7 design tool.
Keywords: MRPM, RCA, CSA, 8:2 AC, PPG and Verilog HDL.
Multiplications are important and tedious task among arithmetic operations. So, multipliers are the major components in the various processors like arithmetic, signal, and image processors. There are many multiplication based functions like multiply and accumulate, convolution, filtering etc. in signal processing and image processing. The execution time for this process highly depends on the speed of operation of multiplier unit. In many DSP algorithms multiplication consumes more time compared to other basic operations, so the critical delay path for the complete operation is determined by the delay required for the multiplication unit and it substantiates the performance of the algorithm. Addition and multiplication are widely used operations in computer arithmetic; for addition full-adder cells have been extensively analysed for approximate computing 1-3.
All DSP algorithms would need some form of the Multiplication and Accumulation Operation. It is consists of an adder, multiplier and the accumulator. Usually adders implemented in DSPs are RCA, CSA or CSA. Basically the multiplier will multiply the input values and give the results to the adder, which will add the multiplier results to the previously accumulated results. In this paper, MRPM using 8:2 has been designed. The reason for using the RPM is that, using this multiplier can reduce the number of partial products during multiplication. In final addition stage design an adder using 8:2 AC. This architecture is used to reduce the area, delay and power.
This paper is organized as follows. Section 2 is a review of existing schemes for RPM. The new designs of an approximate 8-2 AC are presented in Section 3. Introduction 8-bit RPM algorithms are given in Section 4 and high speed adder in Section 5. Proposed high speed MRPM see in Section 6. Simulation results for multipliers with the approximate compressors are provided in Section 7 and Section 8 concludes the manuscript.
2 LITERATURE SURVEY
In Chang.T.Y. and Hsiao, M.J., 1998 described the carry select adder using single ripple carry adder. a carry-select adder that requires a single carry-ripple adder with zero carry-in, an add- one circuit, and a multiplexer. Having a lower transistor count and 1.5 more units of two input NAND gate delay, the add-one circuit is used to replace the original carry-ripple adder with carry- in Cin = 1. The transistor count can be reduced by 29.2% with a speed penalty of 5.9% for n = 64.
In Gunasekaran, K., and Manikandan, M, 2014, Reconfigurable FIR filter has been designed by using Russian Peasant Multiplier (RPM). For performing addition operation of MAC unit, Carry Select Adder (CSLA) with Sklansky Adder is used in the design. It offers 30.9% reduction of area than traditional CSLA. Further to improve the architecture, some changes are made in CG block of CSLA architecture.
In Elguibaly, F, 2000 explained a fast parallel multiplier –accumulator using modified booth algorithm. A dependence graph (DG) to visualize and describe a merged multiply-accumulate (MAC) hardware depend on the modified Booth algorithm .The carry-save technique is used in the Booth encoder, and the accumulator sections to ensure the fastest implementation. The DG applies to any MAC data and allows designing multiplier structures that are normal and have minimal delay, sign-bit extensions, and data path width. Using the DG, a fast pipelined implementation is proposed, in which an accurate delay model for deep submicron CMOS technology is used. The delay model explains multi-level gate delays, taking into account input ramp and output loading.
In Saikumar, M., et al. 2014 described the design and performance analysis of multiply –accumulate (MAC) unit. Multiply-Accumulate (MAC) unit is designing for various high performance applications. MAC unit is a fundamental building block in the computing devices, especially Digital Signal Processor (DSP). MAC unit operates multiplication and accumulation process. MAC unit consists of multiplier, adder, and accumulator. In the traditional MAC unit model, multiplier is designed using modified booth multiplier. In this paper, MAC unit model is designed by incorporating the various multipliers such as Array Multiplier, Ripple Carry Array Multiplier with Row Bypassing Technique, Wallace Tree Multiplier and DADDA Multiplier in the multiplier module and the performance of MAC unit models is analysed in terms of area, delay and power.
Compressors by far have been considered as the most efficient building blocks of a high speed multiplier. It provides an advantage of accumulation of partial products at an expense of least possible power dissipation. Rather than entirely summoning partial products with the help of CSA/Ripple adder tree, a structure of compressors would complete the same task in much lesser time and also will simultaneously eradicate the problems of large power consumption and optimization of the area. This addition of partial products when done using conventional method of implementing full adders and half adders cannot account as much to lessening of delay associated with the critical path as when counter or compressors are used. The reason for the apparent preference of compressors over counters is the advantages it provides in terms of power, number of transistors used and the delay associated with the critical path(comprising of XORs mainly) 4. The compressor design implemented in this paper prefers both MUXs and XORs.
The internal structure of the 3-2 adder compressor is presented in Fig. 1-a. The maximum delay is given by two XOR gates. The final sum S of the 3-2 adder compressor is given in expression (1). The 3-2 adder compressor can also be used as a full-adder (i.e. mux-based full-adder) when the input C is used as a carry input.
S = Sum + 2 * Carry (1)
The internal structure of the 4-2 adder compressor is presented in Fig.1-b. It has a reduced critical path compared to conventional adders since the maximum delay is given by three XOR gates. The 4-2 compressor has five inputs (A, B, C, D, Cin), where Cin is the input carry, and three outputs (Sum, Carry and Cout). In this adder compressor, the carry output Cout is independent of the input carry (Cin), making it possible to implement this structure with higher performance. The final sum S result of the 4-2 adder compressor is given in (2).
S = Sum +2 * (Cout + Carry) (2)
The internal structure of the 5-2 adder compressor is presented in Fig. 1-c. The maximum delay is given by six XOR gates. The final sum S of the 5-2 adder compressor is given in (3).
S = Sum + 2* (Cout1 + Cout2 + Carry) (3)
The internal structure of the 7-2 adder compressor is presented in Fig. 1-d 5. The maximum delay is given by ten XOR gates. The final sum S of the 7-2 adder compressor is given in (4).
S = Sum + 2 * (Cout1 + Cout2 + Carry) (4)
In this paper 8-2 adder design using 3-2, 4-2, 5-2 and 7-2. The internal structure of the 8-2 adder compressor is presented in Fig. 2(a,b,c,d) 6. The final sum S of the 8-2 adder compressor is given in (5).
S = Sum + 2 * (Cout0 + Cout1 + Cout2 + Cout3 + Cout4 + Carry) (5)
4 EXISTING RPM
Existing RPM is designed to improve the hardware utilization of the circuit. The main aim of VLSI System design is to reduce the hardware complexity, power consumption and to increase the speed & throughput of the system. Hence, the aim of proposed work is reduce the delay and power consumption of multiplication. In general, Multiplication function has three important steps:
• Partial Product Generation (PPG)
• Wallace Tree Reduction (WTR)
• Partial Product Addition (PPA)
Existing RPM has been illustrated in Fig.4. It gives n rows of partial products using only Multiplexers 7.
5 HIGH SPEED ADDERS
For any multiplication algorithm contains three steps but in this summation of partial products is an important step to generate the final result. The performance of the multiplier depends on how fast partial products get added to obtain the final result. Many researchers can work in this area to achieve fast adders. The fundamental adder architecture is a Ripple Carry Adder and further develops number of adders such as CLA, Carry select adder, Carry save adder and Carry skip adder etc. In this ripple carry adder is well known for its regular structure and maximum delay because each step waits for the carry from the previous step. CLAs have a minimum delay but areas associated with these adders are maximum. Carry skip adder gives the more performance than ripple carry adder but it’s consist of extra hardware circuitry to skip the carry generated 8. Carry save adder gives the further addition by reducing addition there are number of three into two. The major drawback carry save adder consumes larger area 9. Further carry select adder uses the two ripple carry adders and it does not wait for previous stage to execute. The carry select adder with higher bits exhibits excellent area and speed trade off compare with other adder architectures 10. Many modifications can be done in carry save adder for sacrificing its speed for area 11.
Fig. 6 presents an addition of eight 8-bit values as an example. It is noted in Fig. 6 that adder circuits are required to recombine the partial sums of previous values (i.e. recombination line), since a Carry signal from the compressor n must be added with the Sum signal of the compressor n + 1 to generate the final sum (S) of bit n + 1.