Introduction to BDS-pga 2.0
BDS-pga is an efficient BDD-based logic
synthesis and optimization program for LUT-based FPGAs. It is based
on the BDS program from the University of Massachusetts Amherst. The
BDS-pga uses the BDD manipulation functions from the CUDD package
from the University of Colorado Boulder. It takes in the circuit
design netlist file in BLIF format and optimizes the area and delay. The final synthesized
circuit netlist blif files from the BDS-pga can be mapped to LUT-based FPGAs
by using FlowMap tech mapping tool from UCLA VLSI CAD Lab. The BDS-pga program is for academic purposes
use only and was developed in the Department of Electrical and
Computer Engineering at the University of Massachusetts Amherst. IN NO EVENT SHALL THE UNIVERSITY OF MASSACHUSETTS BE LIABLE FOR ANY DAMAGE
CAUSED BY THE USE OF THIS
SOFTWARE.
Developed by: Navin Vemuri, Priyank Kalla, Kevin Oo TinMaung, and Russell Tessier
Contact: Kevin Oo TinMaung (oo@engin.umass.edu)
This research has been funded by Altera Corporation.
Description of BDS-pga 2.0
BDS-pga v2.0 is an updated version of the BDD-based synthesis tool for
LUT based FPGAs first described in:
[1] N. Vemuri, P. Kalla, and R. Tessier,
"BDD-based
Logic Synthesis for LUT-based FPGAs,"
ACM Transactions on Design Automation of Electronic Systems,
vol 7, no. 4, October 2002.
To better understand the operation of BDS-pga v2.0, it is recommended
that the user first read the above paper and the following paper on the BDS synthesis system upon which the BDS-pga system is based:
[2] C. Yang and M. Ciesielski, "BDS:
BDD-based Logic Optimization System," IEEE Transactions on Computer-Aided Design of Integrated Circuits and
Systems, vol. 21, no. 7, July 2002.
Basic operation
A detailed discussion of the algorithms used by BDS-pga is included in
the comments at the beginning of files main.c, eliminate.c, lopt.c, and
loptutil.c. We provide a brief summary of the basic flow of BDS-pga below to assist the user.
As described on [1], the goal of BDS-pga is to target BDD-based synthesis to FPGA devices based on K-input lookup tables. Synthesis
with BDS-pga is performed on each input netlist without first performing traditional
AND-OR based logic optimization
(such as SIS optimization found in scripts script.rugged and script.delay). The output of BDS-pga is a netlist in blif format which
can be input to FlowMap to complete the technology mapping process.
FlowMap must be run without the
"tech_decomp" and "dmig" options since they restructure input circuits and would modify
the optimizations of BDS-pga. Note that BDS-pga (like BDS) only operates on the combinational portions of circuits.
Area based decomposition
BDS-pga has been created by modifying the basic BDD-based synthesis algorithms presented in BDS. The following steps describes
the processing used for area-based BDS-pga optimization.
- A design is inputted in blif format and parsed by BDS-pga.
- A maximum fanout free cone (MFFC) based eliminate (collapse) approach is
used on the input gate-level netlist to identify logic that can be targeted to LUTs. This approach creates either global or
local nodes for design logic. A global node has a single output, all associated inputs, and all logic needed for the output. Local
nodes contain a subset of required logic for an output. The eliminate procedure is controlled by cost metrics which evaluate
node input count and BDD size.
- Decomposition approaches, which are based on BDS decomposition [2], are applied. BDD nodes,
created by eliminate, are decomposed in an iterative fashion until 2-input, 1-output nodes remain. Each BDD node is iteratively placed in a
queue to await decomposition. The basic BDS steps applied to each BDD node [2] in order include:
- Simple dominator approaches for algebraic decompositions including AND (1-dominator), OR (0-dominator), and XNOR (x-dominator) are applied.
- Mux-based decomposition is applied.
- BDD balancing is performed.
- BDD decomposition with respect to the top variable is performed.
- Generalized Boolean decomposition is performed.
- Generalized Boolean x-dominator decomposition (if selected with
-xhardcore option on command line).
- If none of the above options are successful, the BDD is cofactored with respect to its top
variable.
- The above set of decomposition approaches is termed
"BDS-based" decomposition. In [1], a heuristic swap-based decomposition approach is presented which allows for
variable swaps across the BDD cut line in an attempt to minimize cut cost. In BDS-pga v2.0 this heuristic technique is also
available (instead of the above BDS-based steps) by using the -heuristic command line option. Note that if heuristic
decomposition is unsuccessful for a BDD node, the BDS-based steps listed above are automatically invoked. As described in [1],
heuristic decomposition is based at K-input boundaries.
- Following decomposition, the resulting netlist of 2-input, 1-output nodes are optimized to remove redundancies and
constants and re-collapsed to nodes containing a maximum of K inputs. This factoring tree (ftree in [2]) processing creates a
K-feasible netlist for input to FlowMap. Note that BDS does not limit ftree collapsing to K inputs.
Delay re-synthesis
As described in [1], to allow for delay reduction, a delay
re-synthesis path through BDS-pga and SIS is provided. This flow
is described in detail in [1] and has not been changed for BDS-pga v2.0. An overview of the delay
re-synthesis flow is provided below:
- Following area-based decomposition, BDS-pga is rerun with the
-delay option. This invocation of BDS-pga reads in the blif file
created by area-based decomposition, determines the circuit critical path in terms of LUT delay, and collapses the path into
a "supernode". As a result of critical path collapse, some logic
replication may be necessary for non-critical paths. By varying the -eps command line option, multiple critical paths can be collapsed.
- Following critical path collapse, the resulting design is output
in blif format. Additional file with a .delay suffix is provided. This file makes it easy to isolate the
"supernode" logic for further processing.
- The supernode isolated in the .delay file is read into the SIS
tool set. The supernode is then minimized to 2-level format by using Espresso
in the SIS tool set.
- The results of Espresso optimization can be re-read into BDS-pga
using the -perform command line option. The supernode is then re-decomposed and collapsed using ftree processing to once again
create a K-bounded network.
Differences between BDS-pga v1.0 (described in [1]) and BDS-pga v2.0
Although most of the techniques used in the two synthesis tools are the same there are a few specific differences.
- The "greedy" technique described in [1] has been removed and replaced by the BDS-based approach described above.
- In BDS-pga v2.0, if the heuristic approach fails for a specific
node, the BDS-based approach is applied.
- The output of BDS-pga v2.0 has not yet been applied to Altera
devices, only FlowMap has been used with these netlists to this point.
Downloading and Installing CUDD-2.2.1
- Download the CUDD-2.2.1
package (Linux version).
- Un-tar the file.
- Compile the utility source files first by going
to cudd-2.2.1/util/ and type
make.
- Compile the rest of the CUDD source files by
going to cudd-2.2.1/ and
type make.
- Now, the CUDD package is ready to use.
Downloading, Installing, and Using BDS-pga 2.0
- Download the BDS-pga
2.0 (Linux version).
- Un-tar the file. Before compiling BDS-pga, the
CUDD package needs to be compiled first.
- Open the Makefile
and change the location that the CUDD-2.2.1 was installed
earlier by modifying the line: WHERE =
/../cudd-2.2.1
- Compile the source files by typing make.
- Now, the BDS-pga is ready to use.
|
To run BDS-pga 2.0,
type bds [options] circuit.blif
(where circuit is the name of the
design circuit file in BLIF format). The final synthesized
circuit netlist file in BLIF format (circuit.final.blif) will be in a
new directory with the circuit name. |
|
|
|
The available options
are: |
|
-k k |
to specify the K-input LUT value (the default value is
5) |
|
-useglb |
to use global BDDs only (if -useglb or -useloc
is not specified, both global and local BDDs are built by
default) |
|
-useloc |
to use local BDDs only |
|
-ethred 10 |
to specify the threshold value in eliminate
routine (10 is a good value to use) |
|
-sharing |
to perform sharing extraction before doing
decompositions |
|
-largefanout |
to allow the collapse of multiple fan-out nodes
in the first iteration in eliminate routine |
|
-heuristic |
to allow the heuristic variable swapping
decompositions |
|
-xhardcore |
to allow Boolean decompositions based on
x-dominator (XNORs) |
|
-delay |
to do critical path collapsing for delay
re-synthesis |
|
-perform |
to do performance re-decomposition for delay
re-synthesis |
Downloading and Using FlowMap
- Download the RASP
package (Sun Solaris version) from UCLA VLSI CAD LAB.
- Un-tar the file.
- Modify the RASP_SYN script so that -tech_decomp
and -dmig options are not
used when running FlowMap for BDS-pga synthesized circuit files.
- Modify the RASP_SYN script so that BLIF format is used by
setting FMT to be blif: set FMT=blif
- To run FlowMap, type rasp_syn circuit
-k k -algo flowmap (where circuit
is the name of the BDS-pga synthesized circuit file, for
example, alu2.final for
alu2.final.blif file.).
Results from using BDS-pga 2.0
Circuit design BLIF files
|