A full walkthrough of Ko-VDR — the dual-encoder architecture, 325k-pair data pipeline, GradCache, Matryoshka loss, self-guide filtering, and 4-GPU DDP training.
Multimodal/LLM AI Engineer specializing in document intelligence — building AI systems that are robust, practical, and reliable.
Master's student at Korea University NLP&AI Lab, advised by Prof. Heuiseok Lim.
Previously BS (Triple Major: CS, Math, Statistics) at University of Wisconsin–Madison.
Focused on Document AI, Multimodal RAG, LLM Training, and LLM Evaluation.
Seeking a 전문연구요원 (Technical Research Personnel, 신규편입) position as an AI Engineer / Research Scientist.
A full walkthrough of Ko-VDR — the dual-encoder architecture, 325k-pair data pipeline, GradCache, Matryoshka loss, self-guide filtering, and 4-GPU DDP training.
Why flat chunking breaks complex documents — and how hierarchy-aware parsing unlocks better retrieval and reasoning.
A look at how OCR noise propagates through Document AI pipelines and what systematic error correction can do about it.
How M3DocRAG and MoLoRAG extend retrieval-augmented generation to multi-page, multi-document settings — visual retrieval with ColPali, page graph traversal, and VLM-based logical scoring.
A tour of two paradigms for layout-aware document understanding — DocLLM's Disentangled Spatial Attention and LayTextLLM's Spatial Layout Projector with Partial LoRA and Shuffled-OCR SFT.
ColPali's Late Interaction over visual patches achieves +15 NDCG@5 over OCR baselines at 18× faster indexing — and why VLM-based retrieval still loses to LLM OCR on degraded scans.
A deep dive into DocLLM (ACL 2024) — how its Disentangled Spatial Attention fuses text and layout, and why it achieves best performance on 12 out of 16 document AI benchmarks.
Feel free to reach out for research collaborations, opportunities, or just to say hi.