Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

HICode: Hierarchical Inductive Coding with LLMs

Created by
  • Haebom

Author

Mian Zhong, Pristina Wang, Anjalie Field

Outline

This paper proposes HICode, a novel analysis pipeline leveraging large-scale language models (LLMs) to overcome the limitations of manual labeling or statistical tools (topic modeling) for large-scale text corpora analysis. Inspired by qualitative research methods, HICode consists of a two-step process: inductively generating labels directly from data and hierarchically clustering them to derive new topics. We measure the consistency with human-generated topics across three diverse datasets and validate its robustness through automated and human evaluation. A case study analyzing litigation documents related to the US opioid crisis reveals a pharmaceutical company's aggressive marketing strategy and demonstrates the potential of HICode for deep analysis of large-scale data.

Takeaways, Limitations

Takeaways:
We present a novel method for automating sophisticated analysis of large text corpora using LLM.
Presents the possibility of overcoming the limitations of manual labeling and statistical tools and improving analysis efficiency.
Combining qualitative research methods with LLM to present a new analytical paradigm.
Demonstrates applicability to large-scale data analysis in various fields (case study on analysis of documents in opioid crisis litigation).
Limitations:
Further research is needed to determine the generalizability of the proposed HICode.
The bias and reliability issues of LLM need to be considered.
A detailed description of human evaluation and criteria for evaluation are needed.
Additional applicability and performance evaluation for various types of data is needed.
👍