IMPLEMENTATION OF FPGA-BASED RISC FOR LNS ARITHMETIC BY SOFTWARE & HARDWARE
محتوى المقالة الرئيسي
الملخص
ield Programmable Gate Arrays (FPGAs) have some difficulty with the implementation of deating-point operations. In particular, devoting the large number of slices needed by floating-point multipliers prohibits incorporating floating point into smaller, less expensive FPGAs. An alternative is the Logarithmic Number System (LNS), where multiplication and division are easy and fast. LNS also has the advantage of lower power consumption than fixed point. The problem with LNS has been the implementation of addition. There are many price/performance tradeoffs in the LNS design space between pure software and specialised-high-speed hardware. This paper focuses on a compromise between these extremes, and on a small RISC core design (loosely inspired by the popular ARM processor) in which only 4 percent additional investment in FPGA resources beyond that required for the integer RISC core more than doubles the speed of LNS addition compared to a pure software approach. This approach shares resources in the data path of the non-LNS parts of the RISC so that the only significant cost is the decoding and control for the LNS instruction. The preliminary experiments suggest modest LNS-FPGA implementations, like the algorithms under consideration, are more cost effective than pure software and can be as cost effective as more expensive LNS-FPGA implementations that attempt to maximise speed
تفاصيل المقالة
كيفية الاقتباس
تواريخ المنشور
المراجع
M. Arnold, T. Bailey, and J. Cowles, (1992), Comments on 'An architecture for addition and subtraction of long word length numbers in the logarithmic number system, IEEE Trans. Comput.. 41, pp. 786-788, June.
M. Arnold, T. Bailey, J. Cowles, and M. Winkel, (1992), Applying features of IEEE 754 sign/logarithm arithmetic, IEEE Trans. On Comput., 41, pp.1040-1050, Aug.. 138
M. Arnold and J. Shuler, (1997), A preprocessor that converts implicit style Verilog into one-hot esigns, 6th International Verilog HDL Conference, Santa Clara, CA, pp. 38-45, March 31-April 3,. www.verilog.vito.com for more recent versions.
M. Amold. Stadle Riverilog Digital Computer Design: Algorithms into Hardware, PTR Prentice Hall, Upper NJ,.
C. G. Bell and A. Newell (1971), Computer Structures: Readings and Examples, ch. 5. McGraw- Hill, New York, NY,
J. N. Coleman, Chester Softley, and J. Kadlec, (2000), Arithmetic on the European Logarithmic Microprocessor, IEEE Trans. Comput., 49, no. 7, pp. 702-715, July.
King 1971] N. Kingsbury and P. Rayner, (1971), Digital Filtering Using Logarithmic Arithmetic, Electron. Lett., 7, pp.56-58, Jan.
Kadlec et al., LNS ALU core for FPGA, http://www.utia cas cz/idealist-east/vilach/s/001.htm
D. M. Lewis, (1990), An architecture for addition and subtraction of long word length numbers in the logarithmic number system, IEEE Trans. Comput., 39, pp. 1325-1336, Nov.
D.M. Lewis, (1994), Interleaved memory function interpolators with application to accurate LNS rithmetic units, IEEE Trans. Comput., 43, pp. 974-982, Aug..
Montanaro, et al., (1997), A 160-MHz, 32-b, 0.5-W CMOS RISC microprocessor, Digital Technical Journal, 9, No. 1,. See also www.intel.com/design/strong/.
V. Paliouras and T. Stouraitis, (2000), Logarithmic number system for low-power arithmetic," PATMOS 2000: International Workshop on Power and Timing Modeling. Optimization and Simulation, Gottingen, Germany, 13-15 September, pp. 285-294,.
The Programmable Logic Data Book, Xilinx, San Jose, (1999), See www.support.xilinx.com for information on WebPack. I. Stouraitis, (1986), Logarithmic Number System Theory, Analysis, and Design, PhD Dissertation, University of Florida, Gainesville, pp. 122-124,.
EJ. Taylor, R. Gill, J. Joseph, and J. Radke, (1988), A 20 Bit logarithmic number system processor, IEEE Trans. Comput., C-37, pp. 190-199,
M. Wazlowski, A. Smith, R. Citro, and H. Silverman, (1995), Performing log-scale addition on a distributed memory MIMD multicomputer with reconfigurable computing capabilities, Proceedings of the 1995 International Conference on Parallel Processing, pp. III-211 - 11-214,.
J.V. Woods, P. Day, S. B. Furber, J. D. Garside, N. C. Paver, and S. Temple, (1997), AMULETI: 4 asynchronous ARM microprocessor, IEEE Trans. on Comput., 46, No. 4, pp. 385-398, April. m] www.arm.com. kins] www.xinsresearch.com.
Pan et al., (1999), A 32b 64-matrix parallel CMOS procesor, IEEE International Solid-State Arcuits Conference, San Francisco, pp. 15-17, Feb. 139