# A Novel Efficient Technique for Contention Reduction In Wide Fan-In Dynamic OR Gate

Jai Krishna Goswami and Anshul Jain

VLSI Design Lab, Electronics & Comm. Department,

Shri Ram College of Engineering and Management, Morena, Gwalior- 474010, Madhya Pradesh,India jay.goswami.14@gmail.com, anshuljaineng@yahoo.co.in

Abstract-Register file structures in modern microprocessors usually employ wide fan-in dynamic CMOS OR gates. Weak keepers have been traditionally used to resolve the low noise margin problem of dynamic CMOS design. Aggressive scaling trends in CMOS design have reduced the effectiveness of this weak PMOS keeper. On the other hand large sized PMOS keeper used in wide fan-in dynamic OR gate results in contention between the pull down network (PDN) and the keeper. As a consequence of contention there is an unnecessary increase in power dissipation and loss in performance. In this paper a new keeper design is proposed which is capable of reducing the contention between the keeper and PDN and hence capable of reducing the power dissipation and delay. Simulation results at 50nm shows that the power dissipation and delay have been reduced by 40% and 35% respectively as compared to the wide fan-in dynamic OR gate with conventional keeper.

Keywords- Dynamic CMOS logic; Noise immunity; Keeper Transistor

## I. INTRODUCTION

Wide fan-in dynamic OR gate forms an important structure in the critical path of modern high speed microprocessors [1]. But aggressive scaling trends in CMOS design [2], [3] leads to variation in leakage current of gates. In such a situation to maintain appropriate level of noise margin for wide fan-in OR gate a large sized PMOS keeper is used, but this large size keeper results in large contention between pull down network (PDN) and the keeper. This contention results in an unnecessary increase in power dissipation and delay. An effort has been made in this work to reduce the contention resulting in low power dissipation and less delay.

In this section firstly the importance of wide fan-in dynamic OR gate is explained and then the deign issues with the wide fan-in dynamic OR gate are discussed. Later in this section the prior technique is discussed which is also capable of reducing contention current up to some extent. Finally in the next section a better contention current reduction scheme has been proposed.

# A. Importance of wide fan-in OR gate in Register file architecture

Fig. 1 shows the architecture of ARM – Cortex A9 microprocessor. The high performance ARM Cortex<sup>TM</sup>-A series Processors are used as core processors in almost all the smart devices being used today such as iphone, ipad, and mobile phones [4]. In this processor, two register files are deployed in the data path, which are boxed for emphasis. The register files are used almost in each clock cycle, as in order to

execute each instruction data should either be read from or written to the register file. Therefore register files forms an important module in high speed microprocessors.



Fig. 1. Register files deployed in ARM Cortex<sup>™</sup>-A microprocessor [1].



Fig. 2. (a) Block diagram of a simplified register file and (b) read port implemented using 4 x 1 multiplexer (MUX) [1]

Fig. 2(a) shows the block diagram of such a register file consisting of static RAM register, a read and a write port [1]. These ports are implemented using multiplexer and demultiplexer whose implementation is shown in Fig. 2(b). Fig. 2(b) illustrates a simple 4x1 multiplexer with 4 input lines. Note that actual ARM CORTEX microprocessor would be consisting of 16 or 32 bit register file and hence would need 16 or 32 bit

input OR gate. Therefore, wide fan-in OR gate forms an important structure in such a high speed microprocessor. But designing a highly robust wide fan-in dynamic OR gate is a difficult task in sub 100nm regime [8]. High contention current is the factor which make designing the wide OR gate a challenging task as will be discussed in the next section.

#### B. Contention current issues in wide fan-in OR gate

Contention problem in wide fan-in dynamic OR gate can be explained clearly with the conventional wide fan-in OR gate with a large size keeper as shown in Fig. 3. Consider the evaluation phase when clock is at logic '1', dynamic node is charged to logic '1' and output is at logic '0', the logic '0' output keeps the keeper ON during the evaluation phase compensating for any leakage through the pull down network. Now if one of the input switches from logic '0' to '1' as shown in Fig. 3 then that NMOS becomes ON and slowly tries to pull down the dynamic Node to logic '0'. Slow discharge of dynamic node is because of the fact that in wide fan-in OR gate a large amount of parasitic capacitance appears at the dynamic node due to large number of NMOS connected in the pull down network. Due to such a slow discharge of dynamic node the keeper remains ON until the time up to which the dynamic node discharges to such a value that the static inverter switches state and output becomes logic '1'and turns OFF the keeper. During this period the keeper tries to pull up the dynamic node while the ON NMOS tries to pull down the dynamic node. The current flowing through the keeper during this time is known as contention current [5]. Such a contention results in unnecessary increase in delay and static power dissipation.



Fig. 3. Conventional wide fan-in OR gate with one of the input switching from logic '0' to '1' during evaluation phase

To illustrate this contention current a conventional 16-input dynamic OR gate has been simulated in worst case delay condition i.e. with one of the input as logic '1'. This simulation is shown in Fig. 4 for 50nm technology and at 1.6 GHz of frequency. The rest of simulation setup has been described in section III. The high level clock signal shows the evaluation phase while a low level clock signal shows the precharge phase. First waveform shows the contention current flowing through the keeper Transistor. Simulation result clearly reveals that the contention current is of the order of 16  $\mu$ A during the initial period of evaluation phase. This contention current unnecessarily increases static power dissipation of the design.

From the above discussion it is clear that the contention current is an important issue for wide fan-in dynamic OR gate. The proposed design deals with reduction in contention current, as given in section II.



Fig.4. Simulation result of the conventional 16 input dynamic OR gate showing the contention current through keeper and the clock signal.

### C. A high performance low contention (HPLC) technique for wide fan-in dynamic OR gate

In this section a previously proposed technique [6] has been reviewed and analyzed. Further in simulation and analysis section III, this technique has been used for comparison with proposed technique.

The high performance low contention (HPLC) technique based wide fan-in dynamic OR gate has been illustrated in fig. 6. In this technique the output node is connected to the footer node of the OR gate as shown in fig. 5. By using this topology the output node is maintained at a voltage higher than the ground voltage, this result in the reduction of keeper gate overdrive and hence results in the reduction of contention current.

In this technique the strength of the keeper is reduced during the initial period of evaluation phase to reduce the contention current. However, the keeper is still ON during the initial period of evaluation phase, and hence results in some amount



Fig. 5. High performance low contention wide fan-in dynamic OR gate [6]

of contention current. Moreover, the output node never reaches to a ground level voltage that may further result in an erroneous output propagated to the next stage of the OR gate.

#### II. PROPOSED TECHNIQUE

The proposed design illustrated in fig. 6 aims at reducing the unwanted contention current for wide fan-in dynamic OR gate. In this section design, analysis and operation of the proposed technique is described.

#### A. Design and Analysis:

The proposed wide fan-in dynamic OR gate with reduced contention is shown in fig. 6. In the conventional keeper design the keeper was operated by sensing the output node however, in the proposed design the keeper operates by sensing the dynamic node and the clock.



Fig. 6. Proposed keeper design for wide fan-in OR gate.

The basic principle used here to reduce the contention current is to keep the keeper OFF in the duration when there are chances of contention between pull down network and keeper. Such an operation is achieved by delaying the clock by a sufficient duration and operating the keeper with this delayed clock. This ensures that there are no chances of contention current flowing through the keeper. This delay is obtained by a buffer shown in the proposed design. The delayed clock obtained from the buffer is supplied to the gates of PMOS  $M_{18}$  and NMOS  $M_{19}$ .

From fig. 4 the approximate duration of contention current can be easily obtained. To avoid contention issue the buffer transistors are designed such that the buffer has a delay less than or equal to this duration. An important design constraint is that the delay of the buffer should not exceed this duration, since on exceeding this limit the keeper will remain OFF when it is needed during the evaluation phase and will degrade the noise margin.

Buffer transistors are designed such that the buffer has a delay less than or equal to duration of contention in the evaluation phase. Therefore to obtain an appropriate delay from the buffer a super buffer design has been used. Such a super buffer design is shown in fig. 7 and has been used and explained in [].



#### Fig. 7. Super Buffer Design

The sizes of the transistors can be obtained from equation 1 to obtain required delay.

 $\tau_{total}$  = Total Delay of the buffer

- N = Number of stages
- $\tau_0$  = Delay of first inverter
- $C_g$  = Gate capacitance of first inverter
- $\tilde{C_d}$  = Drain capacitance of first inverter
- $\alpha$  = Scaling factor

Firstly the size of first inverter is fixed and then with the appropriate value of total delay the value of scaling factor  $\alpha$  is obtained from equation 1. With this value of scaling factor the sizes of second inverter is obtained as:

$$\left(\frac{W}{L}\right)_{NMOS(inverter2)} = \alpha \left(\frac{W}{L}\right)_{NMOS(inverter1)}$$

and

$$\left(\frac{W}{L}\right)_{PMOS(inverter2)} = \alpha \left(\frac{W}{L}\right)_{PMOS(inverter1)}$$

#### B. Operation of the Proposed Design:

Similar to conventional dynamic design the proposed design has two phases of operation pre-charge phase when the CLOCK signal is low and evaluation phase when the clock signal is high.

In the pre-charge phase transistor  $M_5$  is ON and  $M_6$  is OFF ensuring that the keeper  $M_1$  is OFF during the pre-charge phase when it is not needed. During the evaluation phase  $M_5$  remains ON and becomes OFF only after a delay obtained from the buffer so that keeper remains OFF during the evaluation phase. Now if any one of the input becomes high then the dynamic node slowly discharges and the keeper  $M_1$  remains OFF during this period ensuring no contention current.

On complete discharge of the dynamic node  $M_7$  becomes OFF keeping keeper OFF in the rest of evaluation phase. In another scenario when all the inputs are high in the evaluation phase, the  $M_7$  remains ON while M6 becomes ON after some delay from the buffer. This operation switches ON the keeper  $M_1$  immediately for the rest of the evaluation phase when it is needed.

Such an operation of proposed design is oriented towards reducing contention as illustrated by simulation results in section III.

#### III. SIMULATION RESULTS

To study the relative performance of the proposed design in comparison with the prior keeper designs a 16-input wide fanin OR gate is implemented using the proposed technique. For the comparison purpose two 16-input wide fan-in OR gate are also designed with the conventional keeper technique and the High performance low contention (HPLC) technique [6] having same transistor sizes and specifications. The simulation has been done with the help of Tanner SpiceV14.1 simulator using PTM 50nm technology [] with supply voltage of 1V at room temperature. The 16- input dynamic OR gate is implemented for an ARM Cortex-A9 microprocessor as explained in section I, therefore the operating frequency of the implemented design should match with the application. ARM Cortex-A series processors has a maximum operating frequency of 1.6 GHz [4] and hence the operating frequency of implemented 16-input dynamic OR gate is 1.6 GHz. The sizing of the transistors have been done such that the Unity Gain DC Noise (UGDN) [9] of 0.2 V is maintained. The comparison of the proposed design has been done with the conventional keeper and the HPLC technique [6] explained in section I-C. The comparison has been done on the basis of contention current, power and delay during the worst case delay condition when one of the input in the PDN is equal to logic '1'.

To illustrate the reduction in contention current the circuit has been simulated in the worst case delay condition. Under this condition the current through the keeper has been measured and illustrated in Fig. 8. On comparing the graphs of Fig. 5 with Fig. 9 it can be observed that the contention current has been reduced from 16  $\mu A$  to 5  $\mu A.$ 



Fig. 8. Simulation result of the proposed technique showing the reduced contention current through keeper.

This comparison shows that there is a drastic amount of reduction in contention current by almost 66% as compared to HPLC design [6]. Average contention current with variation in supply voltage for the three designs has been illustrated in fig. 9.

Since with the reduction in contention current the delay incurred in discharging the dynamic node is reduced during the worst case delay condition, therefore with reduction in contention current the delay during the worst case condition is also reduced. Fig. 10 verifies that the proposed design has the reduced delay with supply voltage variation as compared to the conventional keeper design and the HPLC design.

For comparison, power dissipation of all the power sources has been taken into consideration. As seen in fig. 8 the contention current has been reduced which has a direct effect on reduction of power dissipation in the proposed design. Fig. 11 shows that the proposed design has the lowest power dissipation with supply voltage variation as compared to the conventional keeper design and the HPLC design [6]. Such a reduction in delay is due to reduction in contention current. Average contention current, delay and power dissipation for the three techniques have been summarized in Table I.

| Technique                                           | Average Contention current (uA) |      |       |       |      | Switching Delay (ps) |         |         |        |        | Power Dissipation (uW) |       |       |       |       |
|-----------------------------------------------------|---------------------------------|------|-------|-------|------|----------------------|---------|---------|--------|--------|------------------------|-------|-------|-------|-------|
| Vdd Supply (V)                                      | 1                               | 0.9  | 0.8   | 0.7   | 0.6  | 1                    | 0.9     | 0.8     | 0.7    | 0.6    | 1                      | 0.9   | 0.8   | 0.7   | 0.6   |
| Conventional<br>keeper design                       | 4.8                             | 4.38 | 3.86  | 3.64  | 3.5  | 209.52               | 204.76  | 201.06  | 198.34 | 185.66 | 4.8                    | 3.951 | 3.088 | 2.54  | 2.1   |
| High<br>Performance low<br>contention design<br>[6] | 1.8                             | 1.45 | 1.125 | 0.841 | 0.79 | 139.94               | 131.455 | 127.135 | 122.5  | 115.56 | 1.8                    | 1.305 | 0.9   | 0.588 | 0.474 |
| Proposed design                                     | 1.11                            | 0.9  | 0.73  | 0.66  | 0.51 | 136.8                | 127.396 | 121.52  | 120.96 | 111.5  | 1.11                   | 0.81  | 0.584 | 0.46  | 0.306 |

TABLE I. AVERAGE CONTENTION CURRENT, DELAY AND POWER DISSIPATION COMPARISON OF THE PROPOSED TECHNIQUE WITH THE HPLC [6] AND CONVENTIONAL TECHNIQUE



Fig.9. Power dissipation comparison of the Conventional keeper, Process variation tolerant design and proposed design at 50nm.



Fig.10. Delay comparison of the Conventional keeper, Process variation tolerant design and proposed design at 50nm.



Fig.11. Power dissipation comparison of the Conventional keeper, Process variation tolerant design and proposed design at 50nm.

#### IV. CONCLUSION

Wide fan-in dynamic OR gates implemented in high speed microprocessors such as ARM Cortex microprocessors has a major issues with it - contention current. The proposed design has achieved reduction in contention current which has been illustrated in the simulation results. Proposed design has almost 25 % and 35 % reduction in delay as compared to HPLC technique [6] and conventional keeper design respectively for entire process variation range. Similarly the proposed design has 33 % and 40% reduction in power dissipation as compared to the HPLC design and conventional keeper design respectively. Hence the proposed technique can be the most effective technique for reducing the contention issue for wide fan-in dynamic OR gate.

#### REFERENCES

 H. F. Dadgour and K. Banerjee "A Novel Variation-Tolerant Keeper Architecture for High-Performance Low-Power Wide Fan-In Dynamic OR Gates" *IEEE transaction on VLSI systems*, vol.18, NO. 11, pp. 1567 - 1577 , Nov 2010.

- [2] Srivastava, P.; Pua, A.; Welch, L., "Issues in the design of domino logic circuits," VLSI, 1998. Proceedings of the 8th Great Lakes Symposium on , vol., no., pp.108,112, 19-21 Feb 1998.
- [3] Mahmoodi, H.; Mukhopadhyay, S.; Roy, K., "High performance and low power domino logic using independent gate control in double-gate SOI MOSFETs," SOI Conference, 2004. Proceedings. 2004 IEEE International , vol., no., pp.67,68, 4-7 Oct. 2004.
- [4] J. Koppanalil, G. Yeung, D. O'Driscoll, S. Householder, C. Hawkins, "A 1.6 GHz dual-core ARM Cortex A9 implementation on a low power high-K metal gate 32nm process," VLSI Design, Automation and Test (VLSI-DAT), 2011 International Symposium on , vol., no., pp.1-4, 25-28 April 2011
- [5] Rakesh Gnana David Jeyasingh, Navakanta Bhat, and Bharadwaj Amrutur, "Adaptive Keeper Design for Dynamic Logic Circuits Using Rate Sensing Technique," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 19, no. 2, pp. 295-204, Feb. 2011.
- [6] P. Meher and K. K. Mahapatra, "A High-Performance Circuit Technique For CMOS Dynamic Logic," IEEE conf. on Very Large Scale Integr. (VLSI) Syst. 2011, pp. 1080–1085.
- [7] Lei Wang; Krishwamurthy, R.K.; Soumyanath, K.; Shanbhag, N.R., "An energy-efficient leakage-tolerant dynamic circuit technique," ASIC/SOC Conference, 2000. Proceedings. 13th Annual IEEE International , vol., no., pp.221,225, 2000.
- [8] S. Borkar, T. Karnik, S. Narendra, J. Tschanz, A. Keshavarzi, and V. De, "Parameter variations and impact on circuits and microarchitecture," in *Proc. DAC*, pp. 338–342, Dec 2003.
- [9] A. Alvandpour, R. Krishnamurthy, K. Soumyanath, and S. Borkar, "A conditional keeper technique for sub-0.13 nm wide dynamic gates," in *Proc. VLSI Circuits*, pp. 29–30, Mar 2001.