Design of Low Power and High-Speed Twin Parallel Multiplication

Kondaparthi Sravani¹, Konatham Sushmareddy²

¹²Vlsi System Design, Vaagdevi College of Engineering, Warangal, India

Abstract: A configurable multiplier optimized for low power and high speed operations and which can be configured either for single 16-bit multiplication operation, single 8-bit multiplication or twin parallel 8-bit multiplication is designed. The output product can be truncated to further decrease power consumption and increase speed by sacrificing a bit of output precision. Furthermore, the proposed multiplier maintains an acceptable output quality with enough accuracy when truncation is performed. Thus it provides a flexible arithmetic capacity and a tradeoff between output precision and power consumption. The approach also dynamically detects the input range of multipliers and disables the switching operation of the non effective ranges. Thus the ineffective circuitry can be efficiently deactivated, thereby reducing power consumption and increasing the speed of operation. Thus the proposed multiplier outperforms the conventional multiplier in terms of power and speed efficiencies.

Keywords: Booth multiplier (BM), configurable multiplication, low-power design, truncation, partially guarded computation.

1. INTRODUCTION

Portable multimedia and digital signal processing (DSP) systems, which typically require flexible processing ability, low power consumption, and short design cycle, have become increasingly popular over the past few years. Many multimedia and DSP applications are highly multiplication intensive so that the performance and power consumption of these systems are dominated by multipliers. The computation of the multipliers manipulates two input data to generate many partial products for subsequent addition operations, which in the CMOS circuit design requires many switching activities. Thus, switching activity within the functional unit requires for majority of power consumption and also increases delay. Therefore, minimizing the switching activities can effectively reduce power dissipation and increase the speed of operation without impacting the circuit’s operational performance. Besides, energy-efficient multiplier is greatly desirable for many multimedia applications.

Here attempt is made to combine configuration, partially guarded computation, and the truncation technique to design a high speed and power-efficient configurable BM (CBM). The main concerns are speed, power efficiency and structural flexibility. The proposed multiplier not only perform single 16-b, single 8-b, or twin parallel 8-b multiplication operations but also offer a flexible tradeoff between output accuracy and power consumption to achieve more power savings.

Several techniques are available to improve the speed and power efficiency is analysed. Approaches termed guarded evaluation; clock gating, signal gating, truncation etc. reduce the power consumption and increase the speed of multipliers by eliminating spurious computations according to the dynamic range of the input operands. The work in separated the arithmetic units into the most and least significant parts and turned off the most significant part when it did not affect the computation results to save power. Techniques in that can dynamically adjust two voltage supplies based on the range of the incoming operands and disable ineffective ranges with a zero-detection circuitry were presented to decrease the power consumption of multipliers. In a dynamic-range detector to detect the effective range of two operands was developed. The one with the smaller dynamic range is processed to generate booth encoding so that partial products have a greater opportunity to be zero, thereby reducing power consumption maximally.

Furthermore, in many multimedia and DSP systems is frequently truncated due to the fixed register size and bus width inside the hardware. With this characteristic, significant power saving can be
achieved by directly omitting the adder cells for computing the least significant bits of the output product, but large truncation errors are introduced. Various error compensation approaches and circuits, which add the estimated compensation carries to the carry inputs of the retained adder cells to reduce the truncation error. In the constant scheme, constant error compensation values were pre-computed and added to reduce the truncation error. On the contrary, data-dependent error compensation approaches, were developed to achieve better accuracy than that of the constant scheme. In data-dependent error compensation values will be added to reduce the truncation error of array and Booth multipliers (BMs).

Here, we attempt to combine configuration, partially guarded computation, and the truncation technique to design a power-efficient configurable BM (CBM). Our main concerns are power efficiency and structural flexibility. Most common multimedia and DSP applications are based on 8–16-b operands, the proposed multiplier is designed to not only perform single 16-b but also performs single 8-b, or twin parallel 8-b multiplication operations. The experimental results demonstrate that the proposed multiplier can provide various configurable characteristics for multimedia and DSP systems and achieve more power savings with slight area overhead.

The remainder of this paper is organized as follows.

Section 2 deals with different methodologies and key components used in the design of Configurable Booth Multiplier. Section 3 gives the result analysis of the simulated modules and demonstrates the efficiency of the designed multiplier in terms of speed and power. Finally a concluding remark is given in Section 4.

2. CONFIGURABLE BOOTH MULTIPLIER DESIGN

In this section, partially guarded computation and the truncation technique are integrated into the configurable multiplication to construct a 16-b low-power CBM. Figure 1 shows the block diagram of the proposed 16-b CBM. The configuration signals are utilized to configure the operation of the proposed multiplier into six modes as shown. When CM[2:1] = 11 or 10, the single 16-b or single 8-b multiplication operation is performed. On the other hand, two parallel 8-b multiplication operations that satisfy the high-throughput requirement are carried out if CM [2:1] = 00. The Bit CM [0] decides whether truncation has to be done or not, if it is 0 then truncation will be done through which more power saving and speed is obtained else the output product will not be truncated. Whenever truncation is done error compensation values will be added to maintain output precision. The key components will be described and explained in detail in the following section.
2.1. Dynamic Range Detector (DRD)

Given CM[2:0] and input operands A[15:0] and B[15:0] the proposed dynamic-range detector (DRD) in Figure 1 generates switching signals SWLH, SWHH, SWHL and SWLL for each 8-b Booth multiplication to pick the operand that leads more partial products to zero for Booth encoding. In addition to switching signals, DRD produces several extra shutdown signals including SDLH, SDHH, SDHL, and SDLH to dynamically disable the redundant computation of the multiplier by forcing unnecessary partial-product bits and carry propagations to zero based on the multiplication mode and the effective range of the input operands.

2.1.1. Switching Logic

Figure 2 shows the switching logic for four 8-bit Booth multiplications whose input operands are A[15,8], B[15,8], A[7,0] and B[7,0]. If the output of a comparator is 1, it indicates that the input 3-bit group is successive zeros or ones so that its Booth encoded product will be zero. Finally each operand is compared to generate the switching signal that is used to determine which operand is a multiplier. In our design, the input operands will be exchanged if the switching signal is one. Aside from increasing the probability of Booth encoded products becoming zero, the switching logic can aid in detecting the length of the sign-extension bits of the input operands and shut down unnecessary computation.

2.1.2. Shutdown Logic

Given the multiplication mode and the effective range of the input operands, the shutdown logic shown in Figure 3 produces shutdown signals SDLH, SDHH, SDHL, and SDLI, to individually shut down AHBH, AHBL, ALBH and ALBL multiplications by setting the signals to be zero to dynamically disable the redundant computation of the multiplier by forcing unnecessary partial-product bits and carry propagations to zero based on the multiplication mode and the effective range of the input operands. For example, SDHL is utilized to shut down AHBL multiplication.

---

**Fig2. Switching logic of the dynamic-range detector**
Design of Low Power and High-Speed Twin Parallel Multiplication

2.2. Sign Bit Generator

If one of the input operands is zero, the entire operation of the configurable multiplier can be shut down to obtain more power savings by preventing input registers from loading new data and directly resetting the output registers to zero thereby increasing the speed of operation.

Fig4. Sign Bit Generator.

Therefore, we develop an SBG as shown in Figure 4 to generate an SB, LZ and HZ and shut down the entire multiplier when one of the input operands is zero (clock gating technique). In partially guarded computation, the sign-extension bits of product are replaced by an SB to avoid unnecessary sign extension computations.

2.3. Radix 4 Booth Encoding

Radix 4 Booth Encoding Radix 2 booth algorithm does not work well when the multiplier has isolated ones. In such case the recorded multiplier has more number of one’s when compared to the actual multiplier. So we group 3 bits for finding the recorded multiplier which will help to overcome the
above said disadvantage. To multiply A by X, the Radix 4 Booth algorithm starts from grouping X by three bits and encoding into one of \{-2, -1, 0, 1, 2\}.

Table 1. Truth Table of Booth Encoding Scheme (Radix 4)

<table>
<thead>
<tr>
<th>X(i+1)</th>
<th>X(i)</th>
<th>X(i-1)</th>
<th>VALUE</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>1</td>
<td>2</td>
</tr>
<tr>
<td>1</td>
<td>0</td>
<td>0</td>
<td>-2</td>
</tr>
<tr>
<td>1</td>
<td>0</td>
<td>1</td>
<td>-1</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>0</td>
<td>-1</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
</tr>
</tbody>
</table>

Table 1. Shows the rules to generate the encoded signals by Radix 4 BE scheme. Then with these new multipliers multiplication is done by means of shifting and adding the multiplicand. For negative values 2’s compliment is obtained.

2.4. Truncation and Error Compensation Circuit

For fixed-width multiplication operation the least significant bits of the n-bit output product can be disabled to further reduce power consumption and reducing number of adders there by increasing the speed of operation. To incorporate into the proposed multiplier, the partial products of each 8-b Booth multiplication are divided into Higher part (HP), Middle part (MP), and Lower Part (LP), as shown in Figure 5(a). When truncation is performed, the partial products in LP are forced to zero. The partial products in MP are used as inputs to generate approximate carries as shown in Figure 5(b) which are added along with the carry inputs of the adder cells in HP to reduce the truncation error.

Fig 5. (a) HP, MP and LP for an 8-bit BM. (b) ECC for n=8.

2.5. 16-Bit Multiplication Matrix

The Multiplication expression is divided into four sub expressions AHBH, ALBL, AHBL and ALBH as shown in Figure 6. Where AH means A[15:8] and AL means A[7:0] and similarly for operand B. Four independent partial-product arrays are produced by using Radix-4 Booth Encoding approach. The partial products generated for all the individual blocks are grouped as shown in the Figure 6 to obtain the final product using adders and compressors.
Design of Low Power and High-Speed Twin Parallel Multiplication

![Multiplication matrixes for 16-b multiplication](image)

2.6. Compressor and Adder

These partial products can be effectively reduced using Dadda tree compression techniques. In the compression algorithm each and every partial product is combined in groups of three and compressed in groups of 2 using full adder which is the 3:2 compressor. This process will be continued until all the partial products along with their carries are compressed. Thus the number of stages and the delay in those stages are reduced effectively using Dadda tree compression technique. The Dadda multiplier is usually faster and smaller than Wallace tree multiplier and hence it is preferred.

3. Results

Booth truncated

![Booth truncated](image)

UN truncated

![UN truncated](image)
4. CONCLUSION

A configurable booth multiplier has been designed which provides a flexible arithmetic capacity and a tradeoff between output precision and power consumption. Moreover, the ineffective circuitry can be efficiently deactivated, thereby reducing power consumption and increasing speed of operation. The experimental results have shown that the proposed multiplier outperforms the conventional multiplier both Radix 2 Booth multiplier and Radix 4 Booth multiplier in terms of power and speed of operation with enough accuracy at the expense of extra area.

REFERENCES

Design of Low Power and High-Speed Twin Parallel Multiplication

AUTHOR’S BIOGRAPHY


Konatham Sushmareddy, Assistant Professor in Vaagdevi College of Engineering, BTECH ECE - Jayamukhi Institute of Technological and Science, MTECH - Jayamukhi Institute of Technological and Science.