# Design of High Performance FFT Algorithm in OFDM Communication System

K Ashok Varma<sup>1</sup> B Leela Kumari<sup>2</sup>

<sup>1,2</sup>Department Of Electronics and Communication Engineering, UCEK,JNTU College Of Engineering, Kakinada, A.P., India.  $\frac{1}{2}$ varma3628@gmail.com <sup>2</sup>  $l$ leela8821@yahoo.co.in

*Abstract***— Communication system is limited by three major factors, System capacity, higher data rates and interference. To meet these requirements, 802.11-based wireless LANs has fascinated. Among them, the 802.11a standard based on OFDM (Orthogonal Frequency Division Multiplexing) modulation scheme is defined. To implement OFDM System, FFT (Fast Fourier Transform) and IFFT (Inverse Fast Fourier Transform) are the major hardware requirements. This paper presents the design of a 256 point modified** Radix2<sup>2</sup> **pipelined FFT/IFFT processor to achieve the higher data rates. The design has been coded in Verilog and the simulations is carried out using Modal Sim 6.2c and synthesis is carried out by using Xilinx 9.2i.techniques.**

*Keywords*—**Radix2<sup>2</sup> Algorithm; Quadrature Phase Shift Keying; Orthogonal Frequency Division Multiplexing.**

#### I. INTRODUCTION

The OFDM modulation is realized by the Inverse Fast Fourier Transform (IFFT) that enables the use of a large number of subcarriers accommodated within each OFDMA symbol. Before transmission, each OFDMA symbol is extended by its cyclic prefix at the transmitter. At the receiver end, cyclic prefix is discarded and OFDM demodulation is applied through the Fast Fourier Transform (FFT).

The Fast Fourier Transform (FFT) is most efficient algorithm to compute the Discrete Fourier Transform (DFT) and performs most important operations in modern digital signal processing and OFDM communication systems [1]. Classical implementation of the FFT/IFFT architecture, with digital signal processors (DSPs), requires a sequential algorithm. This increases the execution time. On the other side, the present programmable circuits, like an FPGA, uses a tens of thousands of lists and triggers during process, resulting of parallel processing system, puts the FPGA computing speed at a significant advantage over DSPs.

The Radix $2<sup>2</sup>$  algorithm [2] is used to clearly reflect the structural relation with radix-2 algorithm [3] and the identical computational requirement with radix-4 algorithm [6].

This paper gives the implementation of a 256 point modified Radix2<sup>2</sup> pipelined FFT/IFFT processor [4-5]. And also focuses on the QPSK for data constellation in OFDM.

## II. MODIFIED DIF FFT ALGORITHM

The Discrete Fourier Transform (DFT) is purely discrete: discrete-time data sets are converted into a discrete-frequency representation. Then the DFT of a sequence  $β(n)$  is defined  $as:\alpha(K), k=0, 1, ..., N-1, n=0, 1, ..., N-1.$ 

$$
\alpha(K) = \sum_{n=0}^{N-1} \beta(n) W_N^{nk}, 0 \le K < N \tag{1}
$$

Where T = nK,  $W_N = e^{-j2\pi/N}$  denotes the primitive *Nth* root of unity. The input sequence  $β(n)$  and its DFT isα(K).A 3dimensional linear index map is applied by

$$
n\ =\ < D + n_3\ >
$$

$$
K = \langle\,K_1 + 2C\,\rangle
$$

This yield

$$
X(K_1+2C) = \sum_{n_3=0}^{N-1} \sum_{n_2=0}^{1} \sum_{n_1=0}^{1} x(D+n_3)W_N(D+n_3)(K_1+2C)
$$

$$
X(K_1+2C) = \sum_{n_2=0}^{\frac{N}{4}-1} [M(K_1, K_2, n_3) W_N^P W_N^N n_2 k_2 \qquad (2)
$$

Where,  $P = n_3(k_1 + 2k_2)$  and  $M(k_1, k_2, n_3)$  is given

$$
M(k_1, k_2, A) = [x(A-(N/2)) + (-1)^{K_1}X(A)]
$$
  
+ 
$$
(-j)^{(A-\frac{N}{2})}[X(A) + (-1)^{K_1}X(A+(N/4))]
$$
 (3)

After this simplification, a set of four DFTs of length *N/4*  is obtained. The terms in equation (3) represent a Radix-2 butterfly (BFI) [2], and the entire equation (3) also represents Radix-2 butterfly(BFII) [2].

Where, A=n<sub>3</sub>+(N/2),D=(N/2)n<sub>1</sub>+(N/4)n<sub>2</sub>,C=2K<sub>2</sub>+4K<sub>3</sub>. The *N*-point FFT processor has  $log_2(N)$  stages with *i* is the stage number. Each stage consists of BFI, BFII, delay-feedback and TFM. If *N* is power of 2, the last stage is formed by BFI only. But if *N* is power of 4, the last stage formed by BFI and BFII. The formation of the last stage is different according to the size of FFT.

#### A. *BF-I Structure*

The detailed structure of BFI is shown in Fig.1. The first N/2 elements are applied to the first multiplexer and next N/2 elements are applied to the second multiplexer through pipeline registers. When both the multiplexer are switches to position ‗0' both the real parts are add/subtracted and stored in a shift register. When both the multiplexers are switches to position ‗1' both the imaginary parts add/subtracted and performs addition operation with data stored in the shift register.



Fig: 1. BFI for  $R2<sup>2</sup>$  pipeline FFT.

#### B. *BF-II Structure*

The multiplication by  $-i$  involves inversion of sign and real-imaginary swapping. The detailed structure of BFII is shown in Fig.2. The multiplexors are used to swapping the real-imaginary terms and the sign inversion is handled by switching the adding-subtracting operations by mean of multiplexing. All multiplexors switches to position "1", the real-imaginary data are swapped and the adding-subtracting operations are switched when there is a need for multiplication by  $-j$ . Where m = n+(N/2)



#### C. *TFM structure*

A six clock cycle fully-pipelined complex-multiplier has been implemented to multiply the twiddle factor by the output of BFII. According to equation (4), the algorithm of multiplying the twiddle factor  $(a + jb)$  by BFII output  $(Xr +$ *jXi*) uses four multipliers, one adder, and one subtractor.

$$
(X_r + jX_i).(a + jb) = (X_r.a + X_i.b) + (X_i.a + X_r.b)
$$
 (4)

Twiddle factor generator is a key component in IFFT/FFT computation. There are many popular generation methods for twiddle factor generation. The i<sup>th</sup> stage twiddle factors, with  $i = 0.1$  (log N) 2, is given by

$$
i = 0, 1, \dots, (\log_4 N)^2 - 2
$$
 is given by  
M<sub>i</sub> = {K<sub>x</sub>}; x = 0, 1, \dots, N/2<sup>2i</sup> with K<sub>x</sub> = e<sup>-jT</sup>

## D. *Delay-Feedback Structure*

In Radix- $2^2$  signal flow graph at stage 0 the first N/2 elements has to be delayed until the next half of the elements presented, after that the calculations can proceed. If this delay is maintained at each stage it increases the processing time as the word length increases. In order to reduce these delays here pipeline registers are used at every stage preceding to the calculations.

# III. MODIFIED RADIX2<sup>2</sup> FFT ARCHITECTURE

An implementation of the modified Radix- $2^2$  Pipeline architecture is shown in Fig.3. It uses two types of butterflies, one is same as that in R2SDF, the other contains the logic to implement the trivial twiddle factor multiplication, as shown in Fig.1 & 2 respectively. A  $log_2$ <sup>(N)</sup> bit binary counter serves two purposes: address counter for twiddle factor reading in each stages and synchronization controller. The operation of the second butterfly is same as that of the first one and the trivial twiddle factor multiplication has been implemented by real-imaginary swapping with a commutator and controlled add/subtract operations, as in Fig.1&2.



Fig.3. Block diagram for Radix-2<sup>2</sup> Pipeline FFT processor for N=256

#### IV.IMPLEMENTATION OF THE PROPOSED FFT IN OFDM COMMUNICATION SYSTEM

The fundamental principle of the OFDM system [7] is to decompose the high rate data stream (bandwidth=*W*) into *N*  lower rate data streams and then to transmit them simultaneously over a large number of subcarriers. The IFFT is used for modulation and the FFT are used for demodulation. The data constellations on the orthogonal subcarriers in OFDM system is done by QPSK modulation for easy of designing. The transmitter and receiver blocks contain the FFT and IFFT modules. The FFT processor [4] must finish the transform within the time to serve the purpose in the OFDM system. This FFT architecture effectively fits into the system.

#### V. QUADRATURE PHASE SHIFT KEYED (QPSK) MODULATION

An OFDM carrier signal is the sum of a number of orthogonal sub-carriers, with message data on each subcarrier being independently modulated commonly using some type of quadrature phase shift keying (QPSK).This composite message signal is typically used to modulate a main RF carrier. *s*[*n*] is a serial stream of input binary digits. At transmitter

these are first demultiplexed into *N* parallel streams by inverse multiplexing, and each one mapped to a (possibly complex) symbol stream using some modulation constellation (QPSK). Note that the constellations may be different, so some streams may carry a higher bit-rate than others. Each symbol represents two binary bits and these symbols acts as its equivalent complex weight. The value of that symbol determines the amplitude and phase of that particular sinusoid

## VI. IMPLEMENTATION IN VERILOG

 VERILOG Hardware Description Language (VERILOG) was introduced by Gateway Design Automation in 1984 as a proprietary hardware description and simulation language. Logic-circuit structures created by VERILOG synthesis tools directly from VERILOG behavioral descriptions, and target them into a preferred technology for realization. By using VERILOG, It is easy to design, simulate, and synthesize anything form a simple combinational circuit to a complete microprocessor based system on a chip. It started out as documentation and modeling language, allows the behavior of digital-system designs to be precisely specified, simulated and language specification allows multiple modules to be stored in a single text file. All these features of VERILOG will help better in simulation and synthesis of proposed architecture. The OFDM Transmitter and Receiver presented above has been fully coded in VERILOG Hardware Description Language (VERILOG). Once the design is coded in VERILOG, the simulations and synthesis report is carried out by using Modelsim XEIII 6.2c compiler and the Xilinx Foundation ISA Environment 9.2i.

# VII. RESULTS

The radix-4 Single-Path Delay-Commutator (R4SDC) and radix-pipeline architectures provide the highest Computational efficiency and were selected for implementation. The RTL schematic diagram for QPSK modulator is shown in fig.5.



**Fig.5 RTL schematic diagram for QPSK modulator**

It shows that input is binary data. Constellation mapping is done at QPSK in frequency domain. Here, constellation mapper takes group of bits and maps them into specific constellation points these points have both real and imaginary parts that shows the magnitude and phase of the each sub carrier. The R4SDC architecture is interesting due to the computational efficiency of its addition; however the controller design is complex. The Radix- $2^2$  pipeline architecture reduces the delays in all stages and increases the data rates. The device utilization summary for QPSK modulator is listed in table.1.

Computational efficiency is measured by resource utilization [8] percentage how often the resources are in an active state versus an idle state. As shown in the table.1. It gives the resource utilization percentage how often the resources are in an active state versus an idle state. When compare to Radix- $2^2$  FFT, the modified Radix- $2^2$  FFT utilizes the resources effectively. The proposed FFT works with more efficiency required to satisfy real-time processing. The RTL schematic diagram for FFT is shown in fig.6.

# **Table:1 Device utilization summary for QPSK modulator**



The device utilization for FFT of Radix- $2<sup>2</sup>$  algorithm and modified Radix-2 2 algorithm are shown in table: 2 and table: 3

# **Table: 2 Device utilization for FFT using Radix2<sup>2</sup> algorithm [10]**



| Logic utilization                           | <b>Used</b> | <b>Availabe</b> | Utilization |
|---------------------------------------------|-------------|-----------------|-------------|
| <b>Number of Slices</b>                     | 2978        | 3275            | 90%         |
| <b>Number of Slice Flip</b><br><b>Flops</b> | 1678        | 7532            | 22%         |
| Number of 4 input<br><b>LUTs</b>            | 5236        | 6874            | 76%         |
| Number of bonded<br><b>IOBs</b>             | 132         | 172             | 76%         |
| Number of<br>MULT18X18SIOs                  | 9           | 12              | 75%         |
| <b>Number of GCLKs</b>                      |             | 24              | 4%          |

**Table: 3 Device utilization for FFT using modified Radix2<sup>2</sup> algorithm**



**Fig.6 RTL schematic diagram for FFT Table:3 Timing summary comparision table**





**Fig.7. Simulated waveform showing that data transmission in OFDM System**

The table:4 gives the timing summary for two methods. The minimum time required to process the sample with Modified Radix- $2^2$  FFT is 4.78 ns and the maximum operating frequency is 209.03 MHz. When compared to Radix- $2^2$  FFT the processing time is very less in Modified Radix-2<sup>2</sup> FFT with pipeline architecture.

The figure.7 shows the simulated wave form of the OFDM communication system. When mess ready is 'Low' there is no data transmission. When it is 'High', then the input data constellation is done and applied to the IFFT at the transmitter. When Rx\_data is 'High' then the data is received by FFT at the receiver. Pcnt (packet count) indicates that the no. of packets is sent to the receiver.

# VIII. CONCLUSION

This paper describes the design of modified Radix- $2<sup>2</sup>$ pipelined FFT processor with N=256 points to provide the higher data rates in OFDM. The proposed architecture has the pipeline of Radix-2 2 butterfly to speed up clock frequency.In summary; the speed performance of this design easily satisfies most application requirements of mobile WiMAX 802.16e, 802.11-based wireless LANs those uses OFDM modulated wireless communication system. This design gives an easy way to increase the number of points of FFT as well as IFFT by simple modification. Future work can includes the development of complete OFDM system and upgrade it to a multiple input multiple outputs (MIMO) system.

## **REFERENCES**

- [1] Chi-Hong Su and Jen-Ming Wu, "Reconfigurable FFT Design for Low Power OFDM Communication Systems", Proc. of the 2006 IEEE 10th Intern. Symposium on Consumer Electronics (ISCE'06), 28 June - 1 July 2006, Russia, pp.1-4, 2006..
- [2] K. Harikrishna, T. Rama Rao and Vladimir A. Labay, "A RADIX-2 2 Pipeline FFT for Ultra Wide Band Technology", International Conference on Computer & Network Technology (ICCNT), conference proceedings published by World Scientific Press, Singapore. Chennai, India, Jul 24-28, 2009.
- [3] J.-Y. Oh and M.-S. Lim, "New radix-2 to the 4th power pipeline FFT processor," IEICE Transactions on Electronics, vol. E88-C, no. 8, pp. 1740–1746, 2005.
- [4] M.S Minallah and G. Raja, "Real Time FFT Processor Implementation", 2nd International Conference on Emergin Technologies, IEEE—ICET 2006. Peshawar, Pakistan 13-14 November, Pages: 192-195, 2006.
- [5] Ahmed Saeed, M. Elbably, G. Abdelfadeel, and M. I. Eladawy **"**Efficient FPGA implementation of FFT/IFFT Processor‖. Issue 3, Volume 3, 2009.
- [6] Zhijian Sun, Xuemei Liu, and Zhongxing Ji, "The Design of Radix-4 FFT by FPGA," International Symposium on Intelligent Information Technology Application Workshops, 2008, pp.765-768.
- [7] R.Prasad, "OFDM for Wireless CommunicationsSystems", London: Artech House Inc., 2004.
- [8] Y. Wang, Y. Tang, Y. Jiang, J. G. Chung, S. S. Song and M. S. Lim, **"**Novel memory reference reduction methods for FFT implementation on DSP processors," IEEE Trans Signal Processing, vol. 55, no. 5, pp.2338-2349, May 2007.
- [9] http://en.wikipedia.org/wiki/Fast\_Fourier\_transform.
- [10] K.Harikrishna, T. Rama Rao, "FPGA based FFT Algorithm Implementation in WiMAX Communications System", IEEE Trans 2011.