# Ultra-low Power Approximate Adder Design Techniques for Mobile Applications

Karshankant Sharma1, Vijay Kumar Magraiya<sup>2</sup> and Meenakshi Mishra<sup>3</sup>

<sup>1</sup>Research scholar,<sup>2</sup>Assistant Professor,<sup>3</sup>Assistant Professor Electronics and Comm. dept., SRCEM Banmore, Morena, India

Abstract— The rapidly increasing functionality on the portable gadgets requires ultra-low power designs. As user cannot tolerate with the rapid discharge of the mobile device rather can manage with small degraded quality. The ultra-low power designs can be achieved by designing efficient arithmetic circuit that performs most of the processing within these cores. We propose different approximate adder designs techniques that improve the power and speed parameters simultaneously at the cost of minor loss in accuracy. The proposed adder can be efficiently utilized in the image video processing applications. To evaluate the effectiveness of the proposed adders, all the designs are implemented on MATLAB to evaluate error metrics and Tanner to evaluate design metrics. Simulation results shows that proposed adder significantly reduces power, area and delay at small loss in accuracy.

*Keywords*— Portable devices, Adder, Architecture, VLSI, Ultra-low power design.

# I. INTRODUCTION

The mushrooming electronic devices in users hand have created a power scarcity challenge to the world. The power consumption by these electronic gadgets is increasing exponentially due to exponential increase in usability of these devices. The high power circuits not only degrades battery lifetime but also worsens the reliability of the device. To achieve power efficient design, the first approach that is utilized is the transistor scaling as it improves all the design parameters simultaneously. With the technology scaling the transistor size has reached to the nano-scale era where process and other variations are becoming very severe. The process and temperature variation compensation techniques are becoming more costly than the gained advantage due to scaling. Hence scaling the device dimension fails to achieve energy efficient design. So, the design of high-speed and low-power VLSI architectures needs efficient arithmetic processing units, which are optimized for the parameters, namely, speed and performance power consumption.

As in all arithmetic operations, adder is the key operation in general purpose microprocessors and digital signal processors. It can also be utilized to perform many other functions such as subtraction, multiplication and division. As a result, it is very pertinent that its performance augers well for their speed performance.

In addition to the above design techniques, there are several applications where minor error can be tolerated called as error tolerant applications. For these applications approximate circuit can be design that provides approximate value of the results at improved design metrics. Hence, we can improve all the design parameter simultaneously at the cost of minor acceptable loss in accuracy by approximate design. In these applications designing an accurate circuit is the waste of power, area and performance.

Based on the characteristic of digital VLSI design, some novel concepts and design techniques have been proposed. The concept of error tolerance (ET) and the PCMOS technology are two of them. According to the definition, a circuit is error tolerant if: 1) it contains defects that cause internal and may cause external errors and 2) the system that incorporates this circuit produces acceptable results. The "imperfect" attribute seems to be not appealing.

To cope-up with the present technology, very high speed and ultra-low power adders are required else the increased complexity of the device will worsen the battery life. The traditional ripple-carry adder (RCA) is therefore no longer suitable for large adders because of its low-speed performance. Many different types of fast adders, such as the carry-skip adder (CSK), carry-select adder (CSL), and carry-look-ahead adder (CLA), have been developed. Also, there are many low-power adder design techniques that have been proposed. However, there are always trade-offs between speed and power. The errortolerant design can be a potential solution to this problem. By sacrificing some accuracy, the error tolerant applications can attain great improvement in both the power consumption and speed performance. Thus, approximate adders can be effectively utilized in these applications.

The rest of the paper is organized as follows. Section 2 details conventional full adder whereas Section 3 shows proposed full adder. The extensive bit width adder using the proposed full adder is given in Section 4. Section 5 shows simulation results while Section 6 concludes the paper.

## **II. ACCURATE FULL ADDER CIRCUIT DESIGN**

From the literature, we have seen different kind of full adder architecture. In our analysis of the approximate adder, we will take Mirror adder as basic full adder and apply our approximation techniques on this adder. The circuit diagram of the mirror adder as shown in figure 1 consists of 24 transistors only.



Figure 1: Circuit diagram of Mirror adder

From the circuit diagram we can see that the mirror adder provides complimented value of sum and carry out for the given input. Further, to achieve sum and carry out value addition inverter can be appended.

## **III. DESIGN OF APPROXIMATE FULL ADDER**

In this section we will first see how approximation techniques will be applied on the adder to get approximate adder and then will analyse the severity of the applied approximation on it using various quality parameters.

Since there is a capacitance at each node in the CMOS circuit and that take finite time to charge or discharge, it provides delay of that circuit. The larger the node capacitance larger will be the delay. In order to reduce the delay i.e. to improve the speed of the design, we must reduce the node capacitance. Furthermore, decreasing the number of transistor to implement a design will certainly reduce the overall area. Hence, in this work we try to reduce the transistor in the full adder circuit that reduces the node capacitance and simultaneously provides small area. While removing the transistor from the circuit, it introduces error. So, the transistor have to eliminated intelligently such the introduced error is small. One important care that has to be taken into consideration is that there should not be any open or short circuit path formed while removing the transistors.

With application of approximation technique, the first approximate full adder after removing few transistors is shown in figure 2. It can be seen that, the approximate full adder requires 6 less transistor compared to the original full adder.



Figure 2: Approximate Mirror adder 1 (AMA1)

Moreover, from the close observation of the truth table of the full adder, we can see that value of sum is equal to inverse of carry out for six out of eight conditions. Then we can simply equate the value of sum to the inverse of carry directly. But doing in this will result in increase in the load at the carry out so we will have a buffer before getting a sum output. The resultant simplified mirror adder is shown in figure 3. From the figure we can see that approximate mirror adder 2 (AMA2) requires only 11 transistors.



Figure 3: Approximate Mirror Adder 2 (AMA2)

We also noticed that value of Cout is equal to A (First input) for six out of eight conditions. If we select Cout to A and then calculate the value of sum from Cout in accurate manner, we can have an approximate adder. The resulting circuit diagram of the approximate mirror adder 3 is shown in figure 4. This adder requires only 11 transistors and thus significantly reduces the area. Moreover the decrement in node capacitance due to reduced number of transistor increases the speed of the adder. The table 1 depicts the truth table of the conventional and the approximate mirror adders. The wrong value in each approximate adder is highlighted with red colour.

| Inputs |   | Accurate |   | Approximate FA outputs |   |       |                 |       |                 |                       |                 |
|--------|---|----------|---|------------------------|---|-------|-----------------|-------|-----------------|-----------------------|-----------------|
| Α      | B | Cin      | S | Со                     | S | $b_1$ | C <sub>01</sub> | $S_2$ | C <sub>o2</sub> | <b>S</b> <sub>3</sub> | C <sub>03</sub> |
| 0      | 0 | 0        | 0 | 0                      | 1 | 1     | 0               | 0     | 0               | 0                     | 0               |
| 0      | 0 | 1        | 1 | 0                      | 1 | l     | 0               | 1     | 0               | 0                     | 0               |
| 0      | 1 | 0        | 1 | 0                      | ( | )     | 1               | 0     | 0               | 1                     | 0               |
| 0      | 1 | 1        | 0 | 1                      | ( | )     | 1               | 1     | 0               | 1                     | 0               |
| 1      | 0 | 0        | 1 | 0                      | 1 | 1     | 0               | 0     | 1               | 0                     | 1               |
| 1      | 0 | 1        | 0 | 1                      | ( | )     | 1               | 0     | 1               | 0                     | 1               |
| 1      | 1 | 0        | 0 | 1                      | ( | )     | 1               | 0     | 1               | 1                     | 1               |
| 1      | 1 | 1        | 1 | 1                      | ( | )     | 1               | 1     | 1               | 1                     | 1               |

# Table 1: Truth table of accurate and approximate adders

## IV. PROPOSED EXTENSIVEBITWIDTH ADDER ARCHITECTURE

In all the application, there is demand of high speed low power multi-bit adder. These multi-bit adders can be implemented via full adders connected in the different adder architectures. The simple architecture of the multi-bit adder is ripple carry adder (RCA). The RCA has the small area and very simple architecture, but the large delay to long carry propagation chain reduces its uses in the design. In order to increase the performance (speed) of the adder, we can calculate the carry in advance and supply to each full adder, the resulting design is called as carry-look-ahead (CLA) adder. The CLA provides highest speed but the large area overhead bridle its fame. Further we can have small chain of RCA and these RCA can be utilized to build an extensive bit width adder such that carry out from on RCA select the result of two RCA connect with logic'1' and '0' at their carry input. The resulting adder is known as carry select adder. Although, CAL improves the speed of the addition but large area overhead hinders it uses in the applications.

We proposed different adder architectures by utilizing the different numbers of approximate full adder at the least significant bits (LSB) positions. The accurate full adder is used at MSB to reduce the introduced error while approximate mirror adder is used to improve the design metrics. The resulting approximate adder 1 (AA1) is shown in figure 4.



Figure 4: Proposed Approximate Adder-1 (AA1)

Similarly other, approximate adders are also designed using the other approximate full adders as show in figure 5.



Figure 5: Proposed approximate A2 and AA3

We can see from the architectural diagram that proposed adder significantly reduces and area due to reduced number of adder in approximate full adders. The next will discuss the experimental setup and methodology to evaluate proposed approximate adders.

## V. SIMULATION RESULT DISCUSSION

Quality Parameters: In order to evaluate the effectiveness of the approximate designs, the designs are modeled on the MATLAB and then simulated for 1milion random input pattern. In order to design the approximate adder, first approximate full adder is designed as per the truth table given in Table 1. Using these full adder and accurate adders, 16-bit adders are designed. In these approximate adders are then simulated with 1million random inputs and the corresponding error metrics are evaluated. The error metrics as shown in Table 2, depict that proposed adder have less value of mean error. It can be seen that AA1 provides has less mean and MSE over AA2 and AA3 and higher PSNR which is desirable. But this good error metrics occurs at the cost of slight decreased design metrics as can be seen from the design metrics Table 4. Thus, the adder AA1 is better suitable for the less error tolerant applications while AA2 and AA3 are suitable for the application that can tolerate more error.

| Adder<br>Type | Mean<br>Error (µ)     | MSE   | Std. dev.            | PSNR   |
|---------------|-----------------------|-------|----------------------|--------|
| AA1           | 7.28x10 <sup>-4</sup> | 0.443 | $2.1 \times 10^{-3}$ | 125.9  |
| AA2           | $1.3 \times 10^{-3}$  | 1.123 | $3.4 \times 10^{-3}$ | 116.7  |
| AA3           | $1.6 \times 10^{-3}$  | 0.970 | $3.1 \times 10^{-3}$ | 118.13 |

**Table 2: Comparison of error metrics** 

Design Parameters: To evaluate the design parameters the proposed adder architectures and other well-known adders, all the designs are implemented on the tanner 14.1 and simulated with 45nm technology file. To have fair comparison transistor sizing are taken identical with same power supply for the proposed and reference design. The three primary design parameters are determined to evaluate the effectiveness of the proposed adders. The schematic of the AMA1 is shown in figure 6.



Figure 6: Schematic diagram of AMA1

Using these AMA the AA1 is implemented on the Tanner as shown in figure 7.



Figure 7: Schematic diagram of AA1 on Tanner

Similarly, other design parameters are also implemented on the Tanner to evaluate the design metrics. Table 2 shows the design parameter for all the approximate and conventional accurate full adders. From the table we can see that proposed AMA1, AMA2 and AMA3 reduces the PDP by 40.58%, 59.94% and 49.96% respectively over conventional full adder. Further the design also show significant reduction in area, power and delay parameters simultaneously.

| Table 3 | : Com | parison | of | design | metrics |
|---------|-------|---------|----|--------|---------|
|         |       |         |    |        |         |

| FA<br>Type | Area<br>(# Tran.) | Power<br>(µw) | Delay<br>(ps) | PDP<br>(fJ) |
|------------|-------------------|---------------|---------------|-------------|
| CMA        | 24                | 0.117         | 64.4          | 0.754       |
| AMA1       | 16                | 0.09          | 53.1          | 0.448       |
| AMA2       | 14                | 0.114         | 26.7          | 0.304       |
| AMA3       | 11                | 0.093         | 44.1          | 0.415       |









Figure 6: Comparison of area, power and delay

Similarly, design metrics for the proposed approximate adders and other accurate adders are also evaluated and shown in Table 3. From the table we can see that proposed AA1, AA2 and AA3 reduces the PDP by20.85%, 43.25% and 84.68% respectively over conventional full adder. Further the design also show significant reduction in area, power and delay parameters simultaneously.

 Table 4: Design metrics for the proposed adders

| Adder | Area      | Power | Delay | PDP           |
|-------|-----------|-------|-------|---------------|
| Туре  | (# Tran.) | (µw)  | (ns)  | ( <b>fJ</b> ) |
| RCA   | 448       | 2.6   | 0.629 | 1.63          |
| CLA   | 964       | 4.8   | 0.302 | 1.45          |
| AA1   | 304       | 1.9   | 0.68  | 1.29          |
| AA2   | 280       | 1.785 | 0.518 | 0.925         |
| AA3   | 244       | 1.246 | 0.21  | 0.26          |





Figure: 7 Design metrics comparison different 16 bit adders

Thus, the proposed AAs can be effectively utilized in the portable battery operated devices where power/energy is the prime requirement over accuracy.

## VII. CONCLUSION

In this paper, we have proposed novel approximate adders that provide improved power, area and delay parameters simultaneously at the cost of minor acceptable error. The proposed adders are very suitable for the image/video processing applications that can tolerate small amount of errors. The efficacy of the proposed adders is evaluated by designing on MATLAB and Tanner. From the simulation results we have seen that the error metrics of the proposed adders shows acceptable PSNR while the design metrics shows significant reduction in area, power and delay over the accurate adder architecture. The present mobile and other battery operated devices demands highly energy efficient arithmetic operation and the proposed adder is very much suitable for these devices.

#### REFERENCES

[1] V. Gupta, D. Mohapatra, S. Park, A. Raghunathan, and K. Roy, "IMPACT: Impreciseadders for low-power approximate computing," in *Low Power Electronics and Design(ISLPED) 2011 International Symposium on*, aug. 2011, pp. 409–414. [2] Ning Zhu, et. al "An Enhanced Low-Power High-Speed Adder For Error-Tolerant Application," International symposium on IC., vol. 18, no. 8, pp. 69–72, Aug. 2009.

[3] Ning Zhu, et. al "Design of Low-Power High-Speed Truncation-Error- Tolerant Adder and Its Application in Digital Signal Processing," *IEEE Trans. on VLSI.*, vol. 18, no. 8, pp. 1225–1229, Aug. 2010

[4] Melvin A. Breuer and Haiyang Zhu, "Error-tolerance and multi-media," in *Proc. of the 2006 International Conference on Intelligent Information Hiding and Multimedia Signal Processing*, 2006

[5] M. A. Breuer, S. K. Gupta and T. M. Mak, "Design and errortolerancein the presence of massive numbers of defects," *IEEE Des TestComput., vol. 24, no. 3, pp. 216–227, May-Jun. 2004*  [6] B. Ramkumar, H.M. Kittur, and P. M. Kannan, "ASIC implementation of modified faster carry save adder," *Eur. J. Sci. Res., vol. 42, no. 1, pp.* 53–58, 2010

[7] V. Gupta, D. Mohapatra, A. Raghunathan, and K. Roy, "Lowpower digital signal processing using approximate adders," *Computer-Aided Design of Integrated Circuits andSystems, IEEE Transactions on*, vol. 32, no. 1, pp. 124–137, jan. 2013

 [8] P. Kulkarni, P. Gupta, and M. Ercegovac, "Trading accuracy for power with an underdesigned multiplier architecture," in VLSI Design (VLSI Design), 2011 24th International Conference on, 2011, pp. 346–351.