0×01:FPGA优化思想

来源:互联网 发布:android byte数组转int 编辑:程序博客网 时间:2024/06/12 22:41

Advanced FPGA Design: Architecture, Implementation, and Optimization是一本好书
基本上所有能包含的情况都有所描述,属于手册型的,适合边学边看
以下摘录了前几章的基本架构优化思路

cipher个人感觉这里面有些种种因素相互制衡的味道比较重,比如追求速度就得损失面积,
一味的缩小面积速度肯定会下降,功耗肯定是和面积挂钩的,完美设计基本上是不可能了

速度优化的关键点:

A high-throughtput architecture is one that maximizes the number of bits per
second that can be processed by a design.

Unrolling an iterative loop increases throughput.

The penalty for unrolling an iterative loop is a proportional increase in area.

A low-latency architecture is one that minimizes the delay from input of a
module to the output.

Latency can be reduced by removing pipeline registers.

The penalty for removing pipeline registers is an increase in combinatorial
delay between registers.

Timing refers to the clock speed of a design. A design meets timing when the
maximum delay between any two sequential elements is smaller than the minimum
clock period.

Adding register layers improves timing by dividing the critical path into two
paths of smaller delay.

Separating a logic function into a number of smaller functions can be evaluated
in parallel reduces the path delay to the longest of the substructures.

By removing priority encodings where they are not needed, the logic structure is
flattened, and the path delay is reduced.

Register balancing improves timing by moving combinatorial logic from the
critical path to an adjacent path.

Timing can be improved by reordering paths that are combined with the critical
path in such a way that some of the critical path logic is placed closer to the
destination register.

面积优化关键点:

Rolling up the pipeline can optimize the area of pipelined designs with
duplicated logic in the pipeline stages.

Controls can be used to direct the reuse of logic when the shared logic is
larger than the control logic.

For compact designs where area is the primary requirement, search for resources
that have similar counterparts in other modules that can be brought to a global
point in the hierarchy and shared between multiple functional areas.

An improper reset strategy can create an unnecessarily large design and inhibit
certain area optimizations.

An optimized FPGA resource will not be used if an incompatible reset is assigned
to it. The function will be implemented with generic elements and will occupy
more area.

DSPs and other multifunction resources are typically not flexible to varying
reset strategies.

Improperly resetting a RAM can have a catastrophic impact on the area.

Using set and reset can prevent certain combinatorial logic optimizations.

Avoid using set or reset whenever possible when area is the key consideration.

低功耗关键点:
Clock control resources such as the clock enable flip-flop input or a global
clock mux should be used in place direct clock gating when they are available.

Clock gating is a direct means for reducing dynamic power dissipation but
creates difficulties in implementation and timing analysis.

Mishandling clock skew can cause catastrophic failures in the FPGA.

Clock gating can cause hold violations that may or may not be corrected by the
implementation tools

To minimize the power dissipation of input devices, minimize the rise and fall
times of the signals that drive the input.

Always terminate unused input buffers. Never let an FPGA input buffer float.

Dynamic power dissipation drops off with the square of the core voltage, but
reducing voltage will have a negative impact on performance.

Dual-edge triggered flip-flops should only be used if they are provided as
primitive elements

There is no steady-state current dissipation with a series termination.