INVITED SPEAKER

Prof. Takashi KUREMOTO, Nippon Institute of Technology, Japan

Takashi Kuremoto received the B.E. degree in System Engineering from the University of Shanghai for Science and Technology, China, in 1986, and the M.E. and Ph.D. degrees from Yamaguchi University, Japan, in 1996 and 2014, respectively. From 1986 to 1992, he worked as a system engineer at the Research Institute of Automatic Machine, Beijing. In 2008, he was an Academic Visitor at the School of Computer Science, The University of Manchester, U.K. He was affiliated with Yamaguchi University from 1993 to 2021, and since 2021 he has been a Professor at the Nippon Institue of Technology, Japan. His research interests include artificial neural networks, bioinformatics, machine learning, complex systems, time series forecasting, and swarm intelligence. He has authored more than 300 publications and is a member of IEICE, IEEE, IIF, and SICE.

Speech Title: Research on Restoring Guqin Music through Deep Learning

Abstract: Musical notation plays a central role in cultural transmission, yet many works remain unperformed due to the complexity of their notation systems. A notable example is the Jianzipu notation used for the Guqin, China’s oldest plucked string instrument, inscribed by UNESCO as Intangible Cultural Heritage in 2008. Jianzipu condenses finger movements and string assignments into single characters but omits rhythm and tempo, requiring performers to reconstruct the music through DaPu, a demanding interpretive process. Consequently, only about 100 of the 600 Guqin pieces composed over three millennia are actively performed today. To address this challenge, our study applies deep learning–based image recognition to automate DaPu and enhance cultural heritage preservation. Prior research has pursued two approaches: whole-character classification and component-level recognition. Building on these, we expand the dataset from 55 to 203 classes by including multiple pieces, notably Xian-Weng-Cao and Chun-Xiao-Yin. Using YOLOv11n, we extracted individual characters, then combined ResNet50 feature extraction with K-means clustering to classify 1,560 images efficiently. Data augmentation yielded a balanced dataset of 40,600 samples. Fine‑tuning ResNet50 achieved a recognition accuracy of 99.09%.

Assoc. Prof. Kazuya Ueki, Meisei University, Japan

He received his B.S. degree in Information Engineering in 1997 and his M.S. degree in Computer and Mathematical Sciences in 1999, both from Tohoku University, Sendai, Japan. In 1999, he joined NEC Soft, Ltd., Tokyo, Japan, where he was mainly engaged in research on face recognition. He received his Ph.D. degree from the Graduate School of Science and Engineering, Waseda University, Tokyo, Japan, in 2007. From 2013 to 2017, he served as an Assistant Professor at Waseda University. He is currently an Associate Professor in the School of Information Science, Meisei University. His research interests include information retrieval, video anomaly detection, pattern recognition, and machine learning. He is involved in the video retrieval evaluation benchmark (TRECVID) sponsored by the National Institute of Standards and Technology (NIST), contributing to the development of video retrieval technology. His submitted systems achieved the highest performance in the TRECVID AVS task in 2016, 2017, 2022, and 2025.

Speech Title: A Survey of Recent Advances in Video Anomaly Detection Using Vision-Language Models

Abstract: Video anomaly detection (VAD) has rapidly advanced in recent years, yet most existing methods still rely primarily on visual cues and predefined anomaly classes. More recently, Vision-Language Models (VLMs) have emerged as a powerful paradigm for video understanding by aligning visual content with natural language. By leveraging the rich semantic knowledge encoded in language, VLM-based methods enable flexible and interpretable anomaly detection, often in a zero-shot manner without requiring task-specific training data. This presentation provides an overview of recent advances in VLM-based VAD, highlighting key trends in language-driven and training-free anomaly analysis. It examines how VLMs bridge the gap between low-level visual features and high-level semantic understanding, enabling more nuanced reasoning about abnormal events. Furthermore, emerging directions toward anomaly anticipation are discussed, focusing on the potential of language-based reasoning for the early recognition of risky or anomalous situations.