Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Hdl2v: A Code Translation Dataset for Enhanced LLM Verilog Generation

Created by
  • Haebom

Author

Charles Hong, Brendan Roberts, Huijae An, Alex Um, Advay Ratan, Yakun Sophia Shao

Outline

This paper aims to improve the performance of large-scale language models (LLMs) for Verilog code generation by presenting the hdl2v dataset, which converts hardware description languages such as VHDL, Chisel, and PyMTL3 to Verilog. Using the hdl2v dataset, we improve the VerilogEvalV2 performance of a 32 billion parameter open weight model by up to 23% (pass@10), and also improve the performance of a data augmentation-based fine-tuning approach by 63%. We also provide feature analysis of the HDL-to-Verilog dataset for future performance improvements.

Takeaways, Limitations

Takeaways:
Introducing a new dataset hdl2v that helps solve the problem of insufficient existing Verilog data.
Demonstrated effectiveness in significantly improving LLM-based Verilog code generation performance.
Confirming synergy effects with data augmentation techniques.
Provides dataset feature analysis to suggest future research directions.
Limitations:
Further research is needed on the size and diversity of the hdl2v dataset.
Further validation of generalizability to other LLM models or code generation tasks is needed.
A deeper analysis of the errors and biases that may occur during the dataset creation process is needed.
👍