Final Report
Abstract
The project
I worked on was the Financial Monte Carlo Simulation on
Architecturally Diverse Systems Project. This
simulation will be used to find the risk of a financial
portfolio that has a probability of losing money on the
market. This project is divided into five stages; all of these
stages have been simulated onto a multiple core computer
and/or a graphics processor. Stages 1 and 2 were implemented
using FPGA code. My assignment was to code the FPGA using a Hardware Description Language, the
language that I am familiar with is Verilog. I focused on getting the
Stage 3 of this project onto the Field-Programmable Gate Array (FPGA)
integrated circuit. Stage 3 of the project consisted of using
a given equation to code a matrix multiplication with a
vector. This stage consisted of a square matrix and vector
multiplication.
Introduction
Monte Carlo Simulation is used
in option pricing and risk management assessments. This simulation is needed
to be in real time. To be able to accomplish this
objective, parallel processing implementation was necessary. Parallel processing, increased
the number of computations per unit of time as oppose to a
single processor implementation.
Results
To accomplish this implementation, it was
necessary to use a FPGA for certain portions of this project.
This integrated circuit allows customizing the chip to have
multiple components, which allows parallel processing. In this
project the stage 3 component on the Field-Programmable Gate Array (FPGA)
had not been implemented. My task was to come up with a design
and implement that design using a Hardware Description Language (HDL).
The language that I used was Verilog. The stage 3 component
designs consisted of one memory storage for the vector and the
second storage residing outside of FPGA chip for the square
matrix. The square matrix had to reside outside of the FPGA
because of the memory space needed was too large for the FPGA
memory space available. Two numbers, 32-bits in size, were
streamed into a six stage multiplier. After six clock cycles
the result of the first number showed. The six stage pipeline
was chosen because this was the optimum performance according
to the Xilinx Core Gen software. After the
multiplication, the resulting 64-bits were input into an
accumulator which accumulated the numbers coming in until the
reset was used. Another design that was considered replaced
the single port RAM with a dual FIFO system with feedback on each FIFO. The objective was to keep a
continuous stream of data coming in while doing the
computations. The first FIFO was first filled with the data,
and after its completion the second FIFO began filling up with the second
vector and while the second vector was filling up the first
vector started the multiplication process with the matrix. The
feedback was used to reuse the data necessary to do the matrix
and vector multiplication.
Conclusion
At the
conclusion of my eight weeks in the project, the multiplier
and accumulator had been debugged and simulated, but the dual
FIFO system needed to be simulated.
Both of these parts need to be implemented onto an actual Field-Programmable Gate Array
(FPGA) to test whether or not the FPGA's behavior is exactly the same as that produced during the simulation.
You can
reach me at fernandez.andres@sbcglobal.net
|