BDS-pga 2.0

BDS-pga version 2.0

BDD-based Logic Synthesis System for LUT-based FPGAs

Introduction to BDS-pga 2.0

BDS-pga is an efficient BDD-based logic synthesis and optimization program for LUT-based FPGAs. It is based on the BDS program from the University of Massachusetts Amherst. The BDS-pga uses the BDD manipulation functions from the CUDD package from the University of Colorado Boulder. It takes in the circuit design netlist file in BLIF format and optimizes the area and delay. The final synthesized circuit netlist blif files from the BDS-pga can be mapped to LUT-based FPGAs by using FlowMap tech mapping tool from UCLA VLSI CAD Lab. The BDS-pga program is for academic purposes use only and was developed in the Department of Electrical and Computer Engineering at the University of Massachusetts Amherst. IN NO EVENT SHALL THE UNIVERSITY OF MASSACHUSETTS BE LIABLE FOR ANY DAMAGE CAUSED BY THE USE OF THIS SOFTWARE.

Developed by: Navin Vemuri, Priyank Kalla, Kevin Oo TinMaung, and Russell Tessier

Contact: Kevin Oo TinMaung (oo@engin.umass.edu)

This research has been funded by Altera Corporation.

Description of BDS-pga 2.0

BDS-pga v2.0 is an updated version of the BDD-based synthesis tool for LUT based FPGAs first described in:

[1] N. Vemuri, P. Kalla, and R. Tessier, "BDD-based Logic Synthesis for LUT-based FPGAs," ACM Transactions on Design Automation of Electronic Systems, vol 7, no. 4, October 2002.

To better understand the operation of BDS-pga v2.0, it is recommended that the user first read the above paper and the following paper on the BDS synthesis system upon which the BDS-pga system is based:

[2] C. Yang and M. Ciesielski, "BDS: BDD-based Logic Optimization System," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 21, no. 7, July 2002.

Basic operation

A detailed discussion of the algorithms used by BDS-pga is included in the comments at the beginning of files main.c, eliminate.c, lopt.c, and loptutil.c. We provide a brief summary of the basic flow of BDS-pga below to assist the user.

As described on [1], the goal of BDS-pga is to target BDD-based synthesis to FPGA devices based on K-input lookup tables. Synthesis with BDS-pga is performed on each input netlist without first performing traditional AND-OR based logic optimization (such as SIS optimization found in scripts script.rugged and script.delay). The output of BDS-pga is a netlist in blif format which can be input to FlowMap to complete the technology mapping process. FlowMap must be run without the "tech_decomp" and "dmig" options since they restructure input circuits and would modify the optimizations of BDS-pga. Note that BDS-pga (like BDS) only operates on the combinational portions of circuits.

Area based decomposition

BDS-pga has been created by modifying the basic BDD-based synthesis algorithms presented in BDS. The following steps describes the processing used for area-based BDS-pga optimization.

A design is inputted in blif format and parsed by BDS-pga.
A maximum fanout free cone (MFFC) based eliminate (collapse) approach is used on the input gate-level netlist to identify logic that can be targeted to LUTs. This approach creates either global or local nodes for design logic. A global node has a single output, all associated inputs, and all logic needed for the output. Local nodes contain a subset of required logic for an output. The eliminate procedure is controlled by cost metrics which evaluate node input count and BDD size.
Decomposition approaches, which are based on BDS decomposition [2], are applied. BDD nodes, created by eliminate, are decomposed in an iterative fashion until 2-input, 1-output nodes remain. Each BDD node is iteratively placed in a queue to await decomposition. The basic BDS steps applied to each BDD node [2] in order include:

Simple dominator approaches for algebraic decompositions including AND (1-dominator), OR (0-dominator), and XNOR (x-dominator) are applied.
Mux-based decomposition is applied.
BDD balancing is performed.
BDD decomposition with respect to the top variable is performed.
Generalized Boolean decomposition is performed.
Generalized Boolean x-dominator decomposition (if selected with -xhardcore option on command line).
If none of the above options are successful, the BDD is cofactored with respect to its top variable.

The above set of decomposition approaches is termed "BDS-based" decomposition. In [1], a heuristic swap-based decomposition approach is presented which allows for variable swaps across the BDD cut line in an attempt to minimize cut cost. In BDS-pga v2.0 this heuristic technique is also available (instead of the above BDS-based steps) by using the -heuristic command line option. Note that if heuristic decomposition is unsuccessful for a BDD node, the BDS-based steps listed above are automatically invoked. As described in [1], heuristic decomposition is based at K-input boundaries.
Following decomposition, the resulting netlist of 2-input, 1-output nodes are optimized to remove redundancies and constants and re-collapsed to nodes containing a maximum of K inputs. This factoring tree (ftree in [2]) processing creates a K-feasible netlist for input to FlowMap. Note that BDS does not limit ftree collapsing to K inputs.

Delay re-synthesis

As described in [1], to allow for delay reduction, a delay re-synthesis path through BDS-pga and SIS is provided. This flow is described in detail in [1] and has not been changed for BDS-pga v2.0. An overview of the delay re-synthesis flow is provided below:

Following area-based decomposition, BDS-pga is rerun with the -delay option. This invocation of BDS-pga reads in the blif file created by area-based decomposition, determines the circuit critical path in terms of LUT delay, and collapses the path into a "supernode". As a result of critical path collapse, some logic replication may be necessary for non-critical paths. By varying the -eps command line option, multiple critical paths can be collapsed.
Following critical path collapse, the resulting design is output in blif format. Additional file with a .delay suffix is provided. This file makes it easy to isolate the "supernode" logic for further processing.
The supernode isolated in the .delay file is read into the SIS tool set. The supernode is then minimized to 2-level format by using Espresso in the SIS tool set.
The results of Espresso optimization can be re-read into BDS-pga using the -perform command line option. The supernode is then re-decomposed and collapsed using ftree processing to once again create a K-bounded network.

Differences between BDS-pga v1.0 (described in [1]) and BDS-pga v2.0

Although most of the techniques used in the two synthesis tools are the same there are a few specific differences.

The "greedy" technique described in [1] has been removed and replaced by the BDS-based approach described above.
In BDS-pga v2.0, if the heuristic approach fails for a specific node, the BDS-based approach is applied.
The output of BDS-pga v2.0 has not yet been applied to Altera devices, only FlowMap has been used with these netlists to this point.

Downloading and Installing CUDD-2.2.1

Download the CUDD-2.2.1 package (Linux version).
Un-tar the file.
Compile the utility source files first by going to cudd-2.2.1/util/ and type make.
Compile the rest of the CUDD source files by going to cudd-2.2.1/ and type make.
Now, the CUDD package is ready to use.

Downloading, Installing, and Using BDS-pga 2.0

Download the BDS-pga 2.0 (Linux version).
Un-tar the file. Before compiling BDS-pga, the CUDD package needs to be compiled first.
Open the Makefile and change the location that the CUDD-2.2.1 was installed earlier by modifying the line: WHERE = /../cudd-2.2.1
Compile the source files by typing make.
Now, the BDS-pga is ready to use.

	To run BDS-pga 2.0, type bds [options] circuit.blif (where circuit is the name of the design circuit file in BLIF format). The final synthesized circuit netlist file in BLIF format (circuit.final.blif) will be in a new directory with the circuit name.

	The available options are:
	-k k	to specify the K-input LUT value (the default value is 5)
	-useglb	to use global BDDs only (if -useglb or -useloc is not specified, both global and local BDDs are built by default)
	-useloc	to use local BDDs only
	-ethred 10	to specify the threshold value in eliminate routine (10 is a good value to use)
	-sharing	to perform sharing extraction before doing decompositions
	-largefanout	to allow the collapse of multiple fan-out nodes in the first iteration in eliminate routine
	-heuristic	to allow the heuristic variable swapping decompositions
	-xhardcore	to allow Boolean decompositions based on x-dominator (XNORs)
	-delay	to do critical path collapsing for delay re-synthesis
	-perform	to do performance re-decomposition for delay re-synthesis

Downloading and Using FlowMap

Download the RASP package (Sun Solaris version) from UCLA VLSI CAD LAB.
Un-tar the file.
Modify the RASP_SYN script so that -tech_decomp and -dmig options are not used when running FlowMap for BDS-pga synthesized circuit files.
Modify the RASP_SYN script so that BLIF format is used by setting FMT to be blif: set FMT=blif
To run FlowMap, type rasp_syn circuit -k k -algo flowmap (where circuit is the name of the BDS-pga synthesized circuit file, for example, alu2.final for alu2.final.blif file.).

Results from using BDS-pga 2.0

Circuit design BLIF files

This page was created by Kevin Oo TinMaung. August 2004.