BDTI unveils FPGA C-synthesis certification: Can C beat RTL?

来源:互联网 发布:上海圣剑网络 编辑:程序博客网 时间:2024/06/16 19:03

With the appearance of higher speeds and more DSP macrocells in low-cost FPGAs, more and more design teams are seeing the configurable chips not as glue, but as a way to accelerate the inner loops of numerical algorithms, either in conjunction with or in place of the traditional DSP chip. It's well understood that encoding critical kernels in an FPGA can increase performance by more than an order of magnitude compare to even the fastest DSP chip, often reducing energy consumption as well.

But there's a problem. You code for a DSP chip in C, and you implement using a conventional software tool chain with familiar software debug tools. You configure anFPGA starting in Verilog or VHDL—superficially similar to C but in practice profoundly different—and you implement using a hardware design flow. The two approaches require very distinct skill sets.

That's where so-called Electronic System Level (ESL) tools come in. An ESL synthesis tool lets you write your code in C, synthesize RTL from the C automatically, and then feed the RTL into your FPGA flow. Se every competent DSP programmer becomes an FPGA developer. In a manner of speaking. In reality, such tools meet with extreme skepticism from both hardware and software engineers, suspected of poor quality of results, unreliability, and other vices.

But is that fair? Berkeley Design Technology Inc. (BDTI), long a dominant force inDSP benchmarking, invested two years of planning and one year of implementation to find out. Today the company released the first results of its certification program for high-level synthesis tools. The first evaluation covers two such tools: AutoESL's AutoPilot and Synfora's PICO. Further evaluations are planned.

The bottom line finding is simple: both tools produced results in reasonable time that were far higher in performance than software on a DSP chip, and comparable in density and performance to hand-coded RTL. But beneath that level, there is a wealth of information in the fine print.

First, the methodology—always a compromise between realism and practicality. BDTI's initial benchmark is a fully functional Optical Flow design. The design comprises a three-ring binder and a DVD, which in turn contain a text description of the algorithm, the algorithm in about 600 lines of ANSI C, and a Xilinx reference design—the Video Starter Kit—which includes a board, FPGA, and, critically, IP for such items as video and DRAM interfaces, buffers, and a sophisticated programmable buffer controller.

BDTI turns the kit over to the ESL vendor, which tunes the C code for their tool and produces a design. BDTI engineers then independently repeat the process. The goal on the Optical Flow core is to achieve maximum throughput using all the resources available in the Spartan IIIA FPGA.

Unsurprisingly, both ESL vendors produced designs with about 40 times the throughput of the best BDTI engineers could do on a TI DM6437 DSP chip. More interestingly, the amount of work required to do the FPGA design, from C to programming file, was similar to the work required to program the DSP, according to BDTI president Jeff Bier. But there were significant differences in the two tasks. Optimizing the C code for one of the ESL tools caused the code to balloon from the original 559 lines supplied by BDTI to 1604 lines of C. Actually, Bier says, the work involved in the optimization was somewhat less than was required to optimize the code for the DSP chip. "It turned out that the DSP had a serious memory bottleneck that we had to code around," he explains. The synthesis tool then generated over 38,000 lines of Verilog from the optimized C.

Here's where the major difference hit. BDTI engineers, experienced DSPprogrammers, could handle the entire flow for the TI chip. But they were pretty much stumped by a huge pile of Verilog and a stack of Xilinx tools. They ended up calling in an RTL expert to shepherd the RTL through the Xilinx tool chain, debug it, and produce the configured FPGA.

As a second test, BDTI wanted to compare the C-level synthesis against a hand-crafted RTL design. But 38K lines of code were too daunting—producing a decent design would have required multiple engineer-years from an experienced RTL team. So BDTI opted to create a second reference design, a DQPSK receiver core. This design was similar in size to the Optical Flow at the C level, with 514 lines of C. But optimization only expanded the code to 635 lines, and synthesis produced a manageable 11,000 lines of Verilog. (The results from both AutoPilot and PICO were similar in size and performance, though quite different in structure, Bier says.)

As a comparison, a Xilinx engineer hand-coded this design in RTL, working from the text description and the C, employing good coding practices but not pulling out the stops for optimization. "The point was to compare good efforts on both tools, not to compare against what an FPGA vendor team could do by hand," Bier explains. Here, the results are more surprising: meeting the 18.75 Msample/sec input rate requirement, the two ESL tools and the RTL hand-design all produced about the same size core: about 6 percent of the FPGA capacity.

But did the ESL users require hardware design skills? No, and yes. Xilinx senior marketing manager Tom Hill says the skills necessary to optimize the C code for ESL synthesis are closer to software optimization skills for a DSP target than they are to hardware design skills. So no: in the beginning of the flow a DSP software team will do just fine. But the ESL tools produce RTL, not a Xilinx programming file. Bier says that familiarity with RTL, the Xilinx tools, and FPGA hardware is still necessary to complete the design. Debug is especially problematic, he observed, because the coupling between the ESL and implementation tools is less than ideal.

There's room in the BDTI report for just about everyone to say "I told you so." But clearly it is no longer prudent for design teams working with computationally-oriented cores to ignore ESL synthesis tools. And the analogy to the days when RTL synthesis was just beginning to displace schematic capture and Karnaugh maps, or for that matter the time when embedded software started to be done in C instead of assembler, is irresistible. Stand by for change.

 

Link: http://www.edn.com/blog/1690000169/post/790052079.html

原创粉丝点击