# High Performance Approximate Multiplier using Reversible Logic Gates

M.Narendra kumar<sup>1</sup>, K.Lakshmi Narayana<sup>2</sup>, Dr.Ajaykumar Dharmireddy\*<sup>3</sup>

<sup>1</sup>Assistant professor, 2Sr Assistant Professor, 3Associate Professor

<sup>1,2,3</sup> Department of Electronics and Communication Engineering,

<sup>1,2,3</sup>Sir C.R.Reddy College of Engineering,

Eluru-534007 Andhra Pradesh, INDIA

<sup>1</sup>sowmithri@gmail.com, <sup>2</sup>narndra@gmail.com, \*<sup>3</sup>ajaybabuji@gmail.com

Abstract— Reversible logic has previously been shown to cause higher power consumption and a significant amount of dissipated energy because of information loss in standard design methods. This project describes the approximate multiplier using Reversible logic gates. In this design, the reversible logic gates replace the half adder and full adders in the multiplier. It uses two RG(Reversible Gate) in place of single reversible gate. So that it reduces the garbage value produced, which helps to decrease the overall delay and power consumption. The proposed Approximate Multiplier uses the product's least significant half as a constant compensation term and the remaining half is precisely calculated. This can be a effective alternative for exact multipliers in practical error-resilient applications and Digital Image Processing.

Keywords—Reversible logic; appoximate multiplier; low power; error-resilient;

#### I. INTRODUCTION

Circuit designers have been forced to provide novel approaches at different levels of design abstraction because to the digital circuit industry's explosive growth and the subsequent rise in the density and complexity of digital integrated circuits at the deep nanoscale dimensions[1]. One practical method for developing energy-efficient nanoscale digital circuitry is the application of approximation computing. In many real-world applications, precision is not the primary concern. Precise calculations are not always necessary in many error-resistant applications, including multimedia applications, voice and picture recognition, and artificial neural networks (ANNs)[2]. These systems are able to accept mistakes while still generating useful results, even ones that are simple enough for humans to comprehend. Therefore, lowering the accuracy in regions that can be easily measured can diminish circuit metrics[3].

Multiplication is one of the basic arithmetic operations in microprocessors and digital signal processing units[4]. Furthermore, multipliers are widely used in neural networks. The convolutional layers of convolutional neural networks (CNNs)[5] frequently employ multiplication-accumulation (MAC) techniques. Multiplications, which are also significantly more complex than ad operations, are the MAC activities with the most resources. Therefore, reducing multipliers' energy consumption and hardware costs is essential.

The best strategy to compromise accuracy for hardware economy in error-tolerant applications is to design effective approximation multipliers [6]. Generally speaking, there are two ways to construct approximation multipliers. In the first way, the structure of the conventional multiplier is changed

to create approximation adders and compressors, and in the second method, the structure of the multiplier is changed to create an approximate design [7]. One effective method for avoiding mistakes in these multipliers is to utilize error compensation modules (ECMs).

By combining suitable truncation with efficient ECMs, effective approximation multipliers may be produced [8]. A brand-new approximation multiplier with an exceptionally powerful error correcting module is proposed in this short. The proposed design has an exceptionally minimum in complexity structure and greatly lowers energy consumption compared to competing systems [9]. For applications like image processing and neural networks that can tolerate mistakes, it offers a high enough accuracy [10]. The suggested multiplier consists of three main parts: an accurate section, a revolutionary effective error-compensation module, and a constant-truncated area that trades off hardware precision.

As a result, we propose an error compensation module to account for mistakes in circumstances where they are likely to occur. The architecture of the four ECM uses a total of 20 transistors and two four-input OR gates [11]. Additionally, because the absolute ED for each input equals the amount of 1s in that input, an input with a larger error distance has a lower possibility of happening assuming that the input bits to a multiplier are normally spread equally.

Research is also being done on the suggested approximation multiplier's applicability in neural network and image processing applications. The findings show that the suggested design provides a very successful balance between hardware efficiency and based on the findings, the suggested architecture provides a very good balance between accuracy and hardware efficiency for error-tolerant Numerous signed and applications[12]. approximation multipliers have previously been introduced in the literature. We have decided to focus on unsigned multipliers in this short since they are more common in the majority of approximation computing applications [13], such as image processing and machine vision [14]. There are usually two ways to create approximation multipliers. Designs without an error compensation module (ECM) fall under the first type. The other one features designs with ECMs that can significantly reduce fault.

The idea of approximation computing has become a viable way to deal with the trade-off between accuracy, energy, and performance [15]. Due to its potential to drastically lower power consumption while preserving respectable levels of accuracy, approximate multipliers—



which purposefully incorporate controlled flaws into the multiplication process—have drawn attention [16]. In many situations, where small errors may be made up for by later processing steps or error-tolerant algorithms, these approximation multipliers take advantage of the inherent redundancy.

A new hardware design called the "High Performance Approximate Multiplier using Reversible Logic Gates" uses error compensation techniques to improve approximation multiplication accuracy. The fundamental component of this method is a modified multiplier design that purposefully adds regulated approximations to the multiplication procedure. These estimates are precisely tuned to significantly lower the multiplier's energy usage while minimizing their effect on the final output.

#### II. LITERATURE SURVAY

Momeni, J. Han [16] et al., For digital processing at Nano metric scales, approximation (or imprecise) computing provides a compelling paradigm. For computer arithmetic designs, approximate computing is particularly intriguing. The analysis and design of two novel, roughly 4-2 compressors for use in a multiplier are the subject of this paper. These designs rely on various compression features so that computation imprecision (as measured by error at e and so-called normalized error distance) can be balanced against circuit-based design merit indicators like transistor count, delay, and power consumption. The proposed and analysed four distinct strategies for a Dadda multiplier employ the proposed approximation compressors. An application of the approximation multipliers to image processing is shown, along with extensive simulation results.

A. G. M. Strollo [17] et al., suggest a new approximate 4-2 compressor-based approximation multiplier that requires little energy. The suggested compressor has a low likelihood of error and its error circumstances are visible. When the partial compressor is used in the product reductionphaseofamultiplier,ashasalreadybeendemonstratedi ntheliterature, this enables error recovery to be put into practice. According to simulation results, proposed approximation multipliers showed a reasonable decrease in Mean Error Distance and Maximum Error Distance when compared to prior art. Peak signal-to-noise ratio improved by roughly 8dB when applied to an image processing task. The electrical performance of multipliers developed with the innovative circuit is comparable to that obtained with previously proposed approximative compressors, according to implementation results in 28nm CMOS

M. S. Ansari [18] et al., proposed 1616 unsigned designs with the greatest accuracy have a power-delay product (PDP) that is 44% less than those of other designs with comparable accuracy, according to the mean relative error distance (MRED). Comparing the PDP-MRED product to other approximation Booth multipliers with equivalent precision, the radix-4 signed Booth multiplier created using the suggested compressor yields a 52% decrease. In applications for picture sharpening and joint photographic expert groups, the proposed multipliers perform better than previous approximate designs since they produce outputs of higher quality while using less energy. We demonstrate for the first time how approximation multipliers can be used and are feasible in multiple-input multiple-output antenna communication systems with error control coding

D. Esposito [19] et al., An emerging trend in digital design is approximate computing, which trades off the need for perfect computation for increased speed and efficiency. In order to create effective approximate multipliers, this study suggests new approximate compressors and an algorithm to use them. We have created approximative multipliers for a number of operand lengths using the suggested method and a 40-nm library. The proposed circuits offer greater power or speed for a target precision when compared to previously describe approximated multipliers, according to the comparison. The paper also includes applications for adaptive least mean squares filtering and image filtering

#### III. PROPOSED METHOD

The In multiplication process there are mainly three steps: Partial product generation, Partial product reduction and Final addition.

In the process of partial product reduction we are using the reversible logic gate to enhance the multiplier Performance. To reduce the multiplier critical path delay here we are utilizing the low power deign technique namely pipelining. Pipelining transformation leads to a reduction in the critical path, which can be exploited to either increase the clock speed or sample speed or to reduce power consumption at same speed. Pipelining reduces the effective critical path by introducing pipelining latches along the data path.



Fig. 1. Exact Compressor

Consider the inaccurate compressors in the left nibble of the constant-truncated region. In that situation, having approximation compressors produce Carry and Sum equal to allinputsiscomparable.A4:2Compressor,of referred to As A "4-to-2Compressor," Is a Digital Logic Circuit used in Digital Signal Processing and Data Compression applications. It is designed to reduce the number of bits required to represent a binary number, thus achieving compression by removing redundant or insignificant information. This type of compressor is commonly used in applications where data storage or transmission efficiency is crucial. The primary operation of a 4:2 compressor involves comparing the magnitudes of the input bits. It considers the four input bits and decides which two of them are the most significant in terms of value. The selected bits are then used to generate the two output bits, effectively compressing the input data.

#### A. Reversible concept:

Reversible computing was started when the basis of thermo dynamics of information processing was shown that conventional irreversible circuits unavoidably generate heat because of losses of information during the computation. The different physical phenomena can be exploited to construct reversible circuits avoiding the energy losses. One of the most attractive architecture requirements is to build energy lossless, small and fast quantum computers. Most of the gates used in digital design are not reversible for example NAND, OR and EXOR gates. A Reversible circuit/ gate can generate unique output vector from each input vector, and vice versa, i.e., there is a one-to-one correspondence between the input and output vectors. Thus, the number of outputs in a reversible gate or circuit has the same as the number of inputs, and commonly used traditional NOT gate is the only reversible gate. Each Reversible gate has a cost associated with it called Quantum cost. The Quantum cost of a Reversible gate is the number of 2\*2 Reversible gates or Quantum logic gates required in designing. One of the most important features of a Reversible gate is its garbage output i.e., every input of the gate which is not used as input to other gate or as a primary output is called garbage output. In digital design energy loss is considered as an important performance parameter. Part of the energy dissipation is related to non-ideality of switches and materials. Higher levels of integration and new fabrication processes have dramatically reduced the heat loss over the last decades. The power dissipation in a circuit can be reduced by the use of Reversible logic.



Fig. 2. Toffoli Gate

A set of reversible gates are needed to design reversible circuit. Several such gates are proposed over the past decades.



Fig. 3. FeynmanGate

Arithmetic circuits such as Adders, Subtractors, Multipliers and Dividers are the essential blocks of a Computing system. Dedicated Adder/Sub tractor circuits are required in a number of Digital Signal Processing applications. Several designs for binary Adders and Subtractors are investigated based on Reversible logic. Minimization of the number of Reversible gates, Quantum cost and garbage inputs/outputs are the focus of research in Reversible logic.

# B. Reversible logic gates

A reversible logic gate is an n-input n-output logic device with one-to-one mapping. This helps to determine the outputs from the inputs and also the inputs can be uniquely recovered from the outputs. Also, in the synthesis of reversible circuits direct fan-Out is not allowed as one-to many concepts is not reversible. However, fan-out in

reversible circuits is achieved using additional gates. A reversible circuit should be designed using minimum number of reversible logic gates. From the point of view of reversible circuit design, there are many parameters for determining the complexity and performance of circuits. A gate with k inputs and k outputs is called k\*k gate. The gate/circuit that does not loose information is called reversible.



Fig. 4. Reversible Gate

Reversible computation is related to other emerging technologies such as quantum computation optical computing and nano technologies that use a similar or slightly extended set off gates. First implementations and fabrications of reversible logic in CMOS technology have also been accomplished. These exploit that reversible logic is particularly suitable when it comes to reuse of signal energy (in contrast to static CMOS logic that sinks the signal energy with each gate), and, when using adiabatic switching to switch transistors in a more energy efficient way.



Fig. 5. RHA (Reversible Half Adder)

In fact, reversible circuits have shown that such implementations have the potential to reduce energy consumption by a factor. A drawback of these implementations comes from another law related to transistors, namely that the energy consumption is directly related to the execution frequency. If one performs many computations every second, the energy consumption per computation rises. Performing fewer computations lowers the energy consumption per computation. Of course, this implies that not all applications are necessarily suited for implementation using reversible circuits. However, many embedded devices do not need to perform billions of computations every second.



Fig. 6. RFA(Reversible FullAdder)

The organization of an instruction pipeline will be more efficient if the instruction cycle is divided into segments of equal duration. One of the most common examples of this type of organization is a Four-segment instruction pipeline.

A four-segment instruction pipeline combines two or more different segments and makes it as a single one. For instance, the decoding of the instruction can be combined with the calculation of the effective address into one segment. The following block diagram shows a typical example of a four-segment instruction pipeline. The instruction cycle is completed in four segments.

Segment1: The instruction fetch segment can be implemented using first in, first out (FIFO) buffer.

Segment 2: The instruction fetched from memory is decoded in the second segment, and eventually, the effective address is calculated in a separate arithmetic circuit.

Segment3: An operand from memory is fetched in the third segment.

Segment 4:The instructions are finally executed in the last segment of the pipeline organization



Fig. 7. Reversible Full Adder used in the Proposed Method

#### IV. RESULTS AND DISCUSSION

In our project, we are designing an approximate multiplier with the help of reversible logic gates and



Fig. 8. Simulation results of approximate multiplier



Fig. 9. RTL Schematic

pipelining technique The simulation results and RTL view of proposed design are shown in the Fig 8 and Fig 9. The approximate multipliers are used in error-resilient applications, and image processing. It is highly recommended in domains where exact output is not necessary. Hence the approximate multipliers are very helpful for various applications.

# A. Power Consumption:

In software simulation, power usually refers to the computational power required to execute the simulation. This encompasses the processing power, memory usage, and other resources needed to accurately model the behavior of the system or software being simulated. There is decrease of 5.709 watt in power compared to the existing method. There is a decrease in 5.573W in power consumption compared to the existing method.



Fig. 10. Power consumption in proposed method

# B. Area Utilization:

Area refers to the physical resources on the FPGA device that are consumed by the synthesized and implemented design. These are a utilization metrics provided by Vivado help designers assess how efficiently their design utilizes the resources available on the target FPGA device, enabling optimization for factors like performance, power, and cost. There is a decrease of 40 no. of LUT in area compared to the existing method.



Fig. 11. Area utilization in proposed method

# C. Delay:

Delay refers to the time it takes for signals to propagate through the simulated circuit. It includes various factors such as gate delays, interconnect delays, and propagation delays caused by the behavior of the digital components in the design. There is a decrease of 2.816 ns in delay compared to the existing method.



Fig. 12. Area utilization in proposed method

# V. COMPARITIVE ANALYSIS OF EXISTING AND PROPOSED DESIGN:

Comparing the results of area, delay and power obtained using the proposed method and existing method. This is shown in below table 1.

TABLE I. COMPARISON TABLE BETWEEN PROPOSED AND EXISTING METHOD

| Method   | Area        | Delay | Power   |
|----------|-------------|-------|---------|
|          | (No.of LUT) | (ns)  | (Watts) |
| Proposed | 5           | 5.090 | 1.871   |
| Existing | 45          | 7.906 | 7.444   |

We can observe that there are significant changes in the results:

There is a decrease of 40 LUT in area.

There is a decrease of 2.816 ns in delay.

There is a decrease of 5.573 W in power.

#### VI. CONCLUSION

In this project, approximate multiplier is designed by using reversible logic gates with exact compressors and reversible full adders and half adders. Extended modified approximate multiplier yields parameters optimization when compare with existing method. The power has reduced by 5.573 watts, the delay has reduced by 2.816 ns. The area has reduced by 40 Slice LUTs in proposed architecture. This approach offers improved efficiency and reliability in digital circuits. In conclusion, the approximate multiplier has a high-performance architecture

# REFERENCES

- [1] M. Ha and S. Lee, "Multipliers with approximate 4–2 compressors and error recovery modules," IEEE Embedded Syst. Lett., vol. 10, no. 1, pp. 6–9, Mar. 2018.
- [2] Dharmireddy, A., Chakradhar, A., Akram, S.V., Deepak, R.S., Akash, S., Rajasekhar, T. and Anatha, T., 2024, June. Dermatological disease detection and preventative measures using deep convolution neural networks. In AIP Conference Proceedings (Vol. 2971, No. 1). AIP Publishing.

- [3] Ajaykumar Dharmireddy, Sreenivasarao Ijjada , "Performance Analysis of Variable Threshold Voltage (ΔVth) Model of Junction less FinTFET". IJEER,Vol. 11,issue 2, pp.323-327. 2023. DOI: 10.37391/IJEER.110211
- [4] K.Kiran kumar, Ajaykumar Dharmireddy "Design of ALU using Reversible Logic Gates" International Journal For Technological Research In Engineering, Vol. 2, issue.3, 2014.
- [5] [5]. jaykumarvDharmireddy, Sreenivasa Rao Ijjada, I.Hemalatha" Performance Analysis of Various Fin Patterns of Hybrid Tunnel FET" International journal of electrical and electronics research(IJEER) ,Vol.10(4),pp. 806–810, 2022.
- [6] J.Mohana Prithvi, Ajaykumar Dharmireddy "Multitrack Simulator Implementation in FPGA for ESM System" International Journal of Electronics Signals and Systems, 29th Sep 2013, pp. 81-84.
- [7] A. Dharmireddy, M. Greeshma, S. Chalasani, S. T. Sriya, S. B. Ratnam and S. Sana, "Azolla Crop Growing Through IOT by Using ARM CORTEX-M0," 2023 3rd International conference on Artificial Intelligence and Signal Processing (AISP), Vijayawada, India, 2023, pp. 1-5, doi: 10.1109/AISP57993.2023.10135032.
- [8] Dharmireddy, A. and Gottipalli, M.D., 2023. Social Networking Sites Fake Profiles Detection Using Machine Learning Techniques. Asian Journal For Convergence In Technology (AJCT) ISSN-2350-1146, 9(3), pp.09-15.
- [9] Dharmireddy, A., Gadi, N., Hundi, S. and Adupa, C., 2023. Street Light Controller Including Automatic Traffic Light Controller System Implementation on FPGA. International Journal on Recent and Innovation Trends in Computing and Communication, 11(9), pp.10-17762...
- [10] Ajay kumar Dharmireddy, P Srinivasulu, M Greeshma, K Shashidhar "Soft Sensor-Based Remote Monitoring System for Industrial Environments" Blockchain Technology for IoT and Wireless Communications, CRC Press, pp.103-112, 2024
- [11] K Shashidhar, Ajay kumar Dharmireddy, Ch Madhava Rao "Anti-Theft Fingerprint Security System for Motor Vehicles" Blockchain Technology for IoT and Wireless Communications, CRC Press, pp.89-102, 2024.
- [12] V. Lakshma Reddy , H. Sudhakar , Ajaykumar Dharmireddy "Realization of Redundant Binary Multiplier with Modified Partial Product Generator Using Verilog" International Journal of Scientific Research in Computer Science, Engineering and Information Technology, Vol.2, Issue 6, Dec 2017, pp. 924-927
- [13] Ajaykumar Dharmireddy, Sreenivasa Rao Ijjada "High Switching Speed and Low Power Applications of Hetro Junction Double Gate (HJDG) TFET" IJEER, Vol.11 issue no.2,pp. 596–600, 2023.
- [14] D.Govardhan Reddy, Ajaykumar Dharmireddy "Design of High Throughput AXI Compliant DDR3 Controller" International Journal of Advance Electrical and Electronics Engineering (IJAEEE), Volume-4 Issue-2, 2015, pp. 31-36
- [15] Dharmireddy, A.K., Ravikumar, M. and Kumar, B.V., 2024. Identifying Chronic Kidney Failure through Machine Learning. i-Manager's Journal on IoT and Smart Automation, 2(1), p.1..
- [16] Momeni, J. Han, P. Montuschi, and F. Lombardi, "Design and analysis of approximate compressors for multiplication," IEEE Trans. Comput., vol. 64, no. 4, pp. 984–994, Apr. 2015.
- [17] A. G. M. Strollo, D. De Caro, E. Napoli, N. Petra, and G. Di Meo, "Low-power approximate multiplier with error recovery using a new approximate 4-2 compressor," in Proc. IEEE Int. Symp. Circuits Syst., 2020, pp. 1–4.
- [18] M.S.Ansari, H.Jiang, B.F.Cock burn, and J.Han, "Low-power approximate multipliers using encoded partial products and approximate compressors," IEEE J. Emerg. Sel. Topics Circuits Syst., vol. 8, no. 3, pp. 404–416, Sep. 2018.
- [19] D. Esposito, A. G. M. Strollo, E. Napoli, D. De Caro, and N. Petra, "Approximate multipliers based on new approximate compressors," IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 65, no. 12, pp. 4169–4182, Dec. 2018.