ACL 2026 ยท Main Oral
HiKEY: Hierarchical Multimodal Retrieval for Open-Domain Document Question Answering
Shin, J., Shim, G., Park, J., Seo, J., & Lim, H.
Flat text chunking loses the structural context that makes complex documents interpretable. HiKEY constructs an offline heterogeneous graph from DHP-parsed document hierarchies, then performs (1) hierarchical coarse-to-fine retrieval that rapidly narrows from global routing to local section-level candidates, and (2) an ancestry-aware subgraph assembly that captures cross-section dependencies. Across multi-page ODQA benchmarks, HiKEY outperforms text-based RAG by up to 4.5% and full-page RAG by up to 6.8%, with strong end-to-end EM/ANLS gains.
DHP ํ์ฑ ๊ธฐ๋ฐ ์คํ๋ผ์ธ ์ด์ข
๊ณ์ธต ๊ทธ๋ํ๋ฅผ ๊ตฌ์ถํ๊ณ , ์ ์ญ ๋ผ์ฐํ
์์ ๋ก์ปฌ ์น์
์์ค ํ๋ณด๋ก ์ขํ๊ฐ๋ ๊ณ์ธต์ ์ฝ์ค-ํฌ-ํ์ธ ๊ฒ์๊ณผ ์กฐ์ ์ธ์ ์๋ธ๊ทธ๋ํ ์กฐํฉ์ ์ํ. ํ
์คํธ ๊ธฐ๋ฐ RAG ๋๋น ์ต๋ 4.5%, ์ ์ฒด ํ์ด์ง RAG ๋๋น ์ต๋ 6.8% ์ฑ๋ฅ ํฅ์.