Gyuho Shim
M.S. Candidate · Korea University NLP&AI Lab

Hi, I'm Gyuho Shim 👋

안녕하세요, 심규호입니다 👋

Multimodal/LLM AI Engineer specializing in document intelligence — building AI systems that are robust, practical, and reliable.

Master's student at Korea University NLP&AI Lab, advised by Prof. Heuiseok Lim.
Previously BS (Triple Major: CS, Math, Statistics) at University of Wisconsin–Madison.
Focused on Document AI, Multimodal RAG, LLM Training, and LLM Evaluation.
Seeking a 전문연구요원 (Technical Research Personnel, 신규편입) position as an AI Engineer / Research Scientist.

Education

M.S. in Computer Science
Korea University
Advisor: Prof. Heuiseok Lim · GPA: 4.33 / 4.5
Document AI · Multimodal RAG · LLM Agents
Expected Aug 2026 Seoul, South Korea
B.S. — Triple Major: CS, Mathematics, Statistics
University of Wisconsin–Madison
GPA: 3.55 / 4.0
May 2024 Madison, WI

Technical Skills

Document AI
Document Parsing Hierarchical Chunking Layout-aware Processing Document QA
Modeling
LLM/LVLM Fine-tuning Hierarchical Retrieval Knowledge Graphs Neural-Symbolic Reasoning
Programming
Python SQL C/C++ Java
Frameworks
PyTorch TensorFlow NumPy Pandas Scikit-Learn LangChain Linux
Languages
English (Native) Korean (Native)

Publications

REVISE figure
ACL 2025 · Oral
REVISE: A Framework for Revising OCRed text in Practical Information Systems with Data Contamination Strategy
Shim, G., Hong, S., & Lim, H.
OCR errors — from font degradation to complex multi-column layouts — fundamentally compromise downstream Document AI. REVISE addresses this by introducing a comprehensive hierarchical taxonomy of common OCR errors and a synthetic data contamination strategy that injects realistic OCR-like noise at the character, word, and structural level. Trained on these synthetic datasets, the model learns to robustly reconstruct original document structure, significantly improving QA and retrieval accuracy without requiring costly real-error annotations.
Benchmark Profiling figure
EMNLP 2025 · Oral
Benchmark Profiling: Mechanistic Diagnosis of LLM Benchmarks
Kim, D., Shim, G., Chun, Y. C., Kim, M., Park, C., & Lim, H.
Standard benchmark scores mask what abilities a model actually uses. Benchmark Profiling decomposes performance into 10 cognitively grounded abilities — from contextual recall to multi-step reasoning — using gradient-based importance scoring and targeted parameter ablation. The resulting Ability Impact Score (AIS) reveals that most benchmarks require a mixture of abilities, similarly-labeled datasets often rely on distinct ability profiles, and narrow domain fine-tuning yields only modest gains on code benchmarks. Analyzed across three instruction-tuned models and ten benchmarks.
HiKEY figure
ACL 2026 · Main Oral
HiKEY: Hierarchical Multimodal Retrieval for Open-Domain Document Question Answering
Shin, J., Shim, G., Park, J., Seo, J., & Lim, H.
Flat text chunking loses the structural context that makes complex documents interpretable. HiKEY constructs an offline heterogeneous graph from DHP-parsed document hierarchies, then performs (1) hierarchical coarse-to-fine retrieval that rapidly narrows from global routing to local section-level candidates, and (2) an ancestry-aware subgraph assembly that captures cross-section dependencies. Across multi-page ODQA benchmarks, HiKEY outperforms text-based RAG by up to 4.5% and full-page RAG by up to 6.8%, with strong end-to-end EM/ANLS gains.
HCLT 2025 · Oral
KULLM-R: Efficient Korean Reasoning Model
Lee, S., Kim, D., Kim, M., Shim, G., Park, C., So, A., & Lim, H.
KULLM-R is a Korean reasoning model built on GRPO reinforcement learning in VERL. The work introduces verifiable reward functions for correctness and Korean answer consistency, along with an adaptive length penalty that reduces verbosity on easy problems while preserving reasoning depth on hard ones.
HCLT 2024 · Oral
Mixture of Models: Towards Effective Domain Expert Ensemble of Large Language Models
Shim, G., Eo, S., Kim, J., Lee, J., & Lim, H.
Proposes a query-aware LLM routing framework that dynamically selects the best specialist model from a pool of 17 expert LLMs per query. Using confidence-based labeling, the routing classifier removes dependency on manual domain annotation and beats the domain-routing baseline on 12 out of 14 MMLU-Pro sub-domains, achieving +10pp over any single specialist on out-of-distribution benchmarks.
Technical Report · Jan 2026
VAETKI Technical Report
NC-AI Consortium (NC AI · ETRI · Korea University)
Technical report introducing the VAETKI series of large language models at 100B, 20B, and 7B scales, developed for national data sovereignty and industrial AI transformation. The models use a decoder-only Mixture-of-Experts (MoE) architecture with Multi-Latent Attention (MLA) and Local-Global Interleaving, achieving an 83% reduction in KV cache size and a 40% gain in training efficiency over conventional MLA. Pre-trained on a curated 5-trillion-token corpus and refined through a seven-stage pipeline including Reasoning-centric SFT, High-Quality SFT, and DPO on 8M instruction-tuning triplets. Evaluation on Korean benchmarks (CLIcK, KoBALT) and IFEval demonstrates strong multilingual and instruction-following capabilities.

Projects

WBL Independent AI Foundation Model (국가대표 AI, VAETKI)
NC-AI-consortium-VAETKI/VAETKI · Collaboration with NC AI & ETRI
  • Designed the Query Filtering Criterion adopted in the VAETKI Technical Report: a 6-dimensional clarity rubric (1–10, deterministic tie-breaking) filtering the 2M high-quality SFT subset, and trained the backing Qwen3-4B clarity tagger with per-domain thresholds.
  • Contributed to Correctness-Guaranteed Reasoning Synthesis for Math/Code, verifying synthesized reasoning chains against known answers.
  • Built a 200K-pair DPO set pairing verified syntheses against failure modes (wrong answers, repetition, verbosity) and long-context data.
Ko-VDR: Korean Vision-Language Document Retriever
johnandru/ko-vdr-preview · NLP&AI Lab, Korea University
  • Fine-tuned Qwen3-VL-Embedding-2B into a Korean retriever via LoRA (1.5% params, frozen vision encoder) on 325K pairs with FAISS hard-negative mining.
  • Designed a 3-layer nested loss (Matryoshka × SelfGuide × Cached InfoNCE) supporting 7 dims (128–2048) for retraining-free accuracy/size trade-offs.
Document-level LVLM-Parser
NLP&AI Lab, Korea University
  • Built a multimodal document parser on Qwen3-VL-4B with a 50M-param cross-attention decoder regressing bounding boxes from pooled LLM hidden states at layout-query tokens, decoupling content generation from layout grounding.
  • Designed a 3-stage curriculum (LM-only → joint → frozen-LLM grounding); gradient-masking hook trains only the 4 added layout-query embedding rows when the LLM is frozen.
  • Extending to document-level hierarchy modeling; work in progress toward conference submission.
KULLM Reasoning Model Training
nobrand/KULLM-R · NLP&AI Lab, Korea University
  • Implemented GRPO reinforcement learning in VERL for Korean reasoning tasks.
  • Designed verifiable reward functions for correctness, Korean answer consistency, and an adaptive length penalty that reduces verbosity on easy problems while preserving depth on hard ones.
DocGraph Copilot: Document AI Service
NLP&AI Lab, Korea University
  • Built end-to-end Document AI system (FastAPI + React) parsing PDF/DOCX/XLSX/PPTX/TXT via a unified pipeline orchestrating 6 layout backends (DETR, VGT, DHP, MinerU2.5, DocLayout-YOLO, heuristic) with per-model tuning and auto-generated registry.
  • Delivered evidence-first features: structure-aware search, evidence-packaged QA, hierarchical summarization, and multi-doc comparison.
  • Engineered a hierarchical document parser that infers logical document structure and organization, powering downstream chunking and analysis.
Synapse: AI Knowledge Graph Search Platform
NLP&AI Lab, Korea University
  • Architected enterprise GraphRAG platform unifying Slack/Jira/Drive into one knowledge graph with hybrid BM25 + Vector (HNSW) + Graph traversal.
  • Designed 5-stage confidence-gated extraction (≥0.7 threshold) prioritizing structured metadata over LLM calls to cut hallucinations and embedding cost.
KT–Korea University Collaborative Research
NLP&AI Lab, Korea University
  • Co-led 18-month KT Corporation collaboration on multi-LLM systems, authoring 2 papers (HCLT 2024 + ACL under review).
  • Led LLM-routing framework that selects the best of 17 expert models per query, beating any single specialist by +10pp on out-of-distribution benchmarks (MMLU-Pro, AGIEval).
  • Designed confidence-based labeling for the routing classifier, removing domain-annotation dependency; beat domain-routing baseline on 12/14 MMLU-Pro sub-domains.

Experience

NLP & AI Researcher
NLP&AI Lab, Korea University
Seoul, South Korea
May 2024 – Aug 2026
  • Research: Co-authored 10+ papers at top-tier venues (ACL, EMNLP), spanning document understanding, multimodal retrieval, LLM orchestration, reasoning, and mechanistic interpretability.
  • LLM/LVLM Training: Led training across LLMs, LVLMs, and vision-language retrievers; contributed to academic models and consortium-scale foundation models.
  • Document AI: Built multimodal document parsers and vision-language retrievers covering layout grounding, cross-page hierarchy, and visual document retrieval.
  • Multimodal RAG: Architected and implemented multimodal RAG frameworks end-to-end, covering multi-document reasoning and hierarchical knowledge integration.
  • Evaluation: Designed a multi-document QA benchmark and established reusable evaluation recipes and diagnostic workflows for model iterations.
  • Data: Built multi-million-scale data curation pipelines spanning clarity-based query filtering, correctness-verified reasoning synthesis, document image-text curation, and FAISS hard-negative mining.
Co-Founder & Co-Lead
KUDoc, Document AI Research Group, Korea University
Seoul, South Korea
May 2024 – Aug 2026
  • Co-founded and led KUDoc to 12 top-tier publications (ACL, CVPR, EMNLP), 4 industry–academia projects, and 3 technology transfers.
NLP & AI Research Assistant
Karumbaiah Lab, University of Wisconsin–Madison
Madison, WI
Sep 2023 – May 2024
  • Researched LLM-based automated essay feedback in Shamya Karumbaiah's Human-AI Lab, identifying model limitations and improvement areas.
  • Performed error analysis with Errudite and behavioral tests on linguistic attributes (negation, essay length, adjective count, location entities) to identify LLM failure patterns.
Data Scientist
Aivelabs
Seoul, South Korea
Aug 2021 – Jan 2022
  • Automated client reporting in Python/SQL/Excel, cutting generation time by 30%.
  • Collaborated with Amore Pacific marketing on SQL and Google Analytics data requests in weekly syncs.
  • Analyzed ad performance and app customer data (churn, ad placement value, conversions); applied NLTK sentiment analysis on reviews to inform promotional messaging.
Academic Coordinator
KSEA, Korean American Scientists and Engineers Association
Madison, WI
Jul 2023 – Jan 2024
  • Structured the semester's academic programming as a key executive, aligning projects with club objectives.
  • Led a 6-person data science team on a 'Zero Hunger' research project covering route optimization, supply chain, and food security analysis.
Software Coordinator
KCU, Korean Undergraduate Computer Science Union
Madison, WI
Sep 2022 – Jul 2023
  • Directed organizational planning through meeting agendas, goal-setting, and 1:1s with group leads.

Blog

Contact

Feel free to reach out for research collaborations, opportunities, or just to say hi.