Sunghyun Lee

Kim Jaechul Graduate School of AI, Artificial Intelligence

KAIST(Korean Institute of Advanced Science and Technology)

Daejeon, Republic of Korea

Email: romanticbox@kaist.ac.kr

👋 Hi, I’m Sunghyun Lee

I’m a Deep Learning researcher with a deep interest in Multimodal ToM, AI for media, and Computational Creativity.
Currently, my keyword of research is conversation.

I’m keep updating my website!

Beyond that, I explore the creativity of AI — how machines can generate novel ideas, expressions, or artifacts that go beyond human expectations, and how we might evaluate such output meaningfully.

I majored Computer Science and French Language and Literature in Yonsei University, graduated as summa cum laude (top 3%).

🧠 Research Interests

Multimodal understanding and generation
Human–AI interaction and creativity
Pragmatics-based human interaction understanding and generation
Storytelling and media generation by AI
Computational artistry and creativity

✨ Motto

“The future belongs to those who can hear it coming.” — David Bowie

Feel free to check out my publications and recent blog posts, or get in touch through the icons below!

news

Nov 08, 2025	I’m so happy to share a news that our paper has been accepted to AAAI-26 as oral presentation!! Do you think MLLMs understand sound as human understands? For some models, the answer is YES, for others, the answer is NO. GPT and Gemini seems to not infer the sound as humans does. However, Qwen2.5 resembled the result of experiment conducted by humans. Our research digged down to this question. This is the first paper I participated as co-author, researching with Jinhong Jeong and supervised by professor Youngjae Yu. I would like to appreciate all authors; Jinhong Jeong, Jaeyoung Lee, Seonah Han, and Youngjae Yu. Jinhong Jeong provided fabulous and deep insights with linguistics, leading the whole research. Jaeyoung Lee provided marvelous ideas with the research, helping to turn the idea of the research to consider mechanistic interpretability with our research. Seonah Han devoted a lot of effort with ideation during our meeting, and spent lots of effort in constructing and preprocessing the dataset. And, Youngjae Yu supervised our research a lot, always helped our research to think out of the box and to think about key questions that researchers would ask. I, Sunghyun Lee, took effort in constructing the dataset, set the experiments to be precise and persuasive (introducing semantic dimension for the experiment), and analyzed the attention layers. Here I share you the links: [github] [arxiv] See you in Singapore!
Sep 18, 2025	I am pleased to announce my admission to KAIST (Korean Advanced Institute of Science and Technology) for a Master’s degree program. I will enroll in March 2026 at the Kim Jaecheol AI Graduate School (김재철 AI 대학원) under the supervision of Professor Ro Yong Man. I am delighted to join IVY Lab & IVL Lab as a new member! Furthermore, I would like to express my sincere and profound gratitude to Professor Yu Young Jae (Seoul National University) for supervising and supporting all research activities during my undergraduate research internship period. All my participation and accomplishments during my undergraduate studies would not have been possible without the guidance of the professor and the members of the MIR lab. See you in Daejeon!
Jun 26, 2025	Welcome to my personal website! I’m excited to share my research and experiences in AI and machine learning.

latest posts

Jun 20, 2025	사랑하고 사랑 받는 모든 것을 위한 아지트 : 네이버 블로그

selected publications

25’ ACL

Speaking Beyond Language: A Large-Scale Multimodal Dataset for Learning Nonverbal Cues from Video-Grounded Dialogues

25’ ACL, Vienna Main paper , Jun 2025

Abs DOI HTML PDF Video

Nonverbal communication is integral to human interaction, with gestures, facial expressions, and body language conveying critical aspects of intent and emotion. However, existing large language models (LLMs) fail to effectively incorporate these nonverbal elements, limiting their capacity to create fully immersive conversational experiences. We introduce MARS, a multimodal language model designed to understand and generate nonverbal cues alongside text, bridging this gap in conversational AI. Our key innovation is VENUS, a large-scale dataset comprising annotated videos with time-aligned text, facial expressions, and body language. Leveraging VENUS, we train MARS with a next-token prediction objective, combining text with vector-quantized nonverbal representations to achieve multimodal understanding and generation within a unified framework. Based on various analyses of the VENUS datasets, we validate its substantial scale and high effectiveness. Our quantitative and qualitative results demonstrate that MARS successfully generates text and nonverbal languages, corresponding to conversational input.
25’ EMNLP

MAVL: A Multilingual Audio-Video Lyrics Dataset for Animated Song Translation

Sunghyun Lee Woohyun Cho and Youngjae Yu

EMNLP 2025, Suzhou Main paper , Aug 2025

Abs DOI HTML PDF Video
26’ AAAI

Do Language Models Associate Sound with Meaning? A Multimodal Study of Sound Symbolism

26’ AAAI, Singapore Oral presentation , Jan 2026

Abs DOI HTML PDF Video