[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Hdl2v: A Code Translation Dataset for Enhanced LLM Verilog Generation

Created by
  • Haebom

Author

Charles Hong, Brendan Roberts, Huijae An, Alex Um, Advay Ratan, Yakun Sophia Shao

Outline

This paper presents hdl2v, a novel dataset for improving the performance of large-scale language models (LLMs) for Verilog code generation. hdl2v aims to augment the amount of publicly available Verilog data by translating three hardware description languages, VHDL, Chisel, and PyMTL3, to Verilog. We demonstrate that the hdl2v dataset improves the VerilogEvalV2 performance of a 32 billion-parameter open-weight model by up to 23% (pass@10), and that a data augmentation-based fine-tuning approach also improves the performance by up to 63%. We also analyze and present the characteristics of the HDL-to-Verilog dataset for future work.

Takeaways, Limitations

Takeaways:
The hdl2v dataset, generated by converting VHDL, Chisel, and PyMTL3 to Verilog, demonstrates that LLM-based Verilog code generation can significantly improve performance.
Performance improvements in LLM can be achieved without data augmentation or knowledge distillation from large models.
The hdl2v dataset is also effective in increasing the efficiency of data augmentation-based fine-tuning approaches.
We provide feature analysis of the HDL-to-Verilog dataset for future research.
Limitations:
Further research may be needed on the size and diversity of the hdl2v dataset.
It is possible that the dataset is biased towards specific hardware description languages, which may result in poor generalization performance of LLM.
Experimental results for actual hardware implementation and verification are not presented.
👍