This paper addresses the lack of effective evaluation tools for document chunking, a crucial component of Retrieval-Augmented Generation (RAG) systems that enhance the responsiveness of language models by integrating external knowledge sources. Based on the analysis that existing RAG evaluation benchmarks are inadequate for assessing document chunking quality due to evidence sparsity, we propose HiCBench, which incorporates manually annotated multi-level document chunking points, synthesized evidence-dense question-answer (QA) pairs, and corresponding evidence sources. Furthermore, we introduce the HiChunk framework, a multi-level document structuring framework based on a fine-tuned LLM and combined with an Auto-Merge retrieval algorithm, to improve retrieval quality. Experiments demonstrate that HiCBench effectively evaluates the impact of various chunking methods across the entire RAG pipeline, and that HiChunk achieves better chunking quality within a reasonable amount of time, thereby enhancing the overall performance of the RAG system.