Research revolves around Multimodal Interaction and Modeling, with internship projects involving LLM and Image Generation.

Currently seeking works related to Multimodal algorithms and AIGC.

I will be completing my master’s degree at the University of Science and Technology of China under the guidance of Associate Professor Jun Yu. Additionally, I have corporate mentors Peng Chang, who heads the multimodal group at the Silicon Valley Research Institute of Ping An Technology in the United States, and Iek-Heng Chu. My undergraduate studies were pursued at Guangzhou University, where I was supervised by Professor Jin Li, the executive dean of the Institute of Artificial Intelligence, and Associate Professor Xianmin Wang. Currently, I have contributed to the publication of more than 10 articles.

During my undergraduate and postgraduate years, I often participated in algorithm competitions. I participated in more than 20 AI algorithm competitions in total, and gained a wealth of competition experience and strategies. I was a member of the Alibaba Security Student Expert Group. I am ranked the 7th in the Alibaba Security Challenger Program.

My research interests include:

Multimodal Interaction and Modeling (CV/NLP)
AIGC
Fine-grained Image Recognition
Robust Machine Learning

My business directions include:

Large language Models
Exploratory Data Analysis (EDA)
Data Mining
Style Transfer (Autoencoder, GAN, Diffusion)
Object Detection

📝 Published Papers

IJCAI 2024 (CCF-A)

Dialogue Cross-Enhanced Central Engagement Attention Model for Real-Time Engagement Estimation
Jun Yu, Keda Lu, Ji Zhao et al. (First student author)

Propose center-based sliding window to solve the problem of repetitive inference in sliding windows, improving inference efficiency by 100%.
Propose the central engagement attention model based on SA, surpassing previous SOTA BiLSTM model, with inference efficiency improved by 300%.
Propose cross-enhanced module based on CA and seamlessly integrated with the central engagement attention model, establish a new SOTA result.

CVPR 2024 (CCF-A) workshop

MvAV-pix2pixHD: Multi-view Aerial View Image Translation
Jun Yu, Keda Lu, Shenshen Du et al. (First student author)

Design Time-priority sampling and random sampling for sampling.
Propose MvAV-pix2pixHD for multi-view aerial view image translation and use three powerful losses.
This method won the 1st and 2nd place in the MAVIC-T competition for two multi-view image translation tasks.

ACM-MM 2023 (CCF-A)

ACM-MM 2023(CCF-A) Sliding Window Seq2seq Modeling for Engagement Estimation
Jun Yu, Keda Lu, Mohan Jing et al. (First student author)
TOMM 2024在投(CCF-B) Exploring Seq2seq Models for Engagement Estimation in Dyadic Conversations
Jun Yu, Keda Lu, Lei Wang et al. (First student author)

Design multiple Seq2seq model based on Transformer and BiLSTM.
Propose sliding window to address the significant context loss issue.
Propose Ai-BiLSTM to align and interact multimodal features of dialogue participants, further enhancing performance.
This method won the championship🏆 at ACM-MM 2023.

Trans 在投

A Comprehensive and Unified Out-of-Distribution Classification Solution Framework
Jun Yu, Keda Lu, Yifan Wang et al. (First student author)

Propose semantic masking for enhancing model robustness.
Propose OOD-DAS, a comprehensive data augmentation collection.
Propose OOD-Attention, which seamlessly integrates with SOTA classification models to improve model robustness.
Propose an iterative pseudo-labeling method for ensemble integration of multiple architecture models, further enhancing OOD recognition accuracy.
This method won the championship🏆 at ICCV 2023.

ACM-MM 2023 Answer-Based Entity Extraction and Alignment for Visual Text Question Answering Jun Yu, Mohan Jing, Weihao Liu, Tongxu Luo, Bingyuan Zhang, Keda Lu et al.
CLEF 2022 Bag of Tricks and a Strong Baseline for FGVC. Jun Yu, Hao Chang, Keda Lu et al.
CLEF 2022 Efficient Model Integration for Snake Classification Jun Yu, Hao Chang, Zhongpeng Cai, Guochen Xie, Liwen Zhang, Keda Lu et al.
CVPR 2022 workshop Pseudo-label generation and various data augmentation for semi-supervised hyperspectral object detection Jun Yu, Liwen Zhang, Shenshen Du, Hao Chang, Keda Lu et al.
AAAI 2022 workshop Mining limited data for more robust and generalized ML models, Jun Yu, Hao Chang, Keda Lu et al.
International Journal of Machine Learning and Cybernetics Generating transferable adversarial examples based on perceptually-aligned perturbation, Hongqiao Chen, Keda Lu, Xianmin Wang et al.

💻 Projects

2024.03 - now Multimodal Large Language Models

EDA Show

2023.10 - 2024.02 Loan Customer Repayment Intention Recognition

Conducte EDA on a dataset with millions of records and tens of millions of call texts.
EDA -> Data Cleaning -> Feature Engineering. Utilized BERT for text modeling to identify customers’ repayment intentions.
Explore LLM for data augmentation on call texts to enhance model robustness.

2023.05 - 2023.09 Vertical Domain Chat Assistant (Training Corpus Construction, Based on ChatGLM, Bloomz, Qwen, etc., to finetune)

OCR Large Model Showcase Platform

2023.03 - 2023.06 OCR Large Model Showcase Platform

Use Gradio to construct the entire OCR large model showcase interface, incorporating DocQA, MLLM, and pure OCR modules.
Independently maintained for internal analysis and debugging, as well as external business showcasing.
This project was awarded the 2023 H1 XXX·Enterprise Excellence Award - Technical Advancement.
Responsible for the DocQA module.

Chinese font generation

2023.01 - 2023.03 Chinese font generation of Arbitrary style (GAN、Diffusion model)

Explore Chinese font generation algorithms, including DG-Font and Diff-Font.
Collect a dataset of 400 different styles of fonts.
Design an end-to-end font generation model based on the Diffusion model (DDPM). It slightly outperformed Diff-Font and DG-Font in metrics such as SSIM and LPIPS.

Future Improvements: End-to-end, Contrastive learning, Diffusion model.

Document generation and style transfer

2022.11 - 2023.01 Document generation and style transfer (Independent research)

Explore Diffusion model and GAN for end-to-end document generation.
Research five years of style transfer articles from top conferences, CNN -> Attention -> Transformer, including AdaIN(ICCV2017), MetaNet(CVPR2018), SANet(CVPR2019), MAST(ACM-MM 2020), StyleFormer(ICCV2021), AdaAttN(ICCV2021) and StyTr2(CVPR2022).
Reproduce StyTr2(CVPR2022) and AdaAttN(ICCV2021) and transfer them to the document generation task for data augmentation.

Future improvements: Contrastive learning, GAN, Diffusion model

Face Recognition and Text Detection

2022.06 - 2022.12 Reproducing mainstream algorithms based on the Mindspore algorithmic framework

Participate in reproducing the RetinaFace face detection algorithm.
Independent reproduce the FCENet text detection algorithm.

Course Management System

2020.12 - 2021.01 Genetic Algorithm-based Intelligent Timetabling - Course Management System (Individually implemented)

Use sqlite3 databaseand and Bootstrap-Flask for visualization. Implement distinct client interfaces for students, teachers, and educational director.
Propose an intelligent timetabling algorithm and proposed a novel optimization objective function (utilizing course variance). Employed genetic algorithms for optimization in timetabling.
This project comprises over 2000 lines of Python code and 1000 lines of HTML code. It has been openly shared on my personal blog and Github.

Student performance management system

2019.04 - 2019.06 Student performance management system based on MFC (C++) (Individually implemented).

Includes all basic functions (Create, Read, Update, Delete), as well as operations like import, save, and sorting.
The design was primarily inspired by the large login button interface of QQ, aiming to create a clear and clean user experience.
This project comprises over 10,000 lines of C++ code and has been open-sourced on my personal blog and Github.

🏅 Competitions

Master phase (Main force)

2024.03 CVPR 2024: Multi-modal Aerial View Image Challenge - Translation (Top3 prize 2500$, Solo, Runner up🥈) [LeaderBoard] [[Paper]]
2023.10 ICCV 2023: Out Of Distribution Generalization: Object Classification track (Solo, Champion🏆) [LeaderBoard] [[Paper在投]]
2023.10 ICCV 2023: Out Of Distribution Generalization: Pose Estimation track (Solo, Champion🏆) [LeaderBoard] [Report]
2023.07 ACM-MM 2023: Grand challenge, Engagement Estimation (Solo, Champion🏆) [LeaderBoard] [Paper] [New]
2022.10 ECCV 2022: Out Of Distribution Generalization Track-1: Object Classification (Top3 prize 3300$, Runner up🥈) [LeaderBoard] [Code]
2022.10 ECCV 2022: Out Of Distribution Generalization Track-2: Object Detection (Top3 prize 3300$, Runner up🥈) [LeaderBoard] [Code]
2022.05 CVPR 2022: FGVC9 workshop FungiCLEF2022 challenge (Runner up🥈) [LeaderBoard] [Code] [Paper]
2022.03 CVPR 2022: Multi-modal Aerial View Object Classification - SAR+EO (Top3 prize 6000$, Champion🏆) [LeaderBoard] [Report] [New]
2022.03 CVPR 2022: Multi-modal Aerial View Object Classification - SAR (Top3 prize 6000$, Champion🏆) [LeaderBoard] [Report] [New]

Master phase（Assistance）

2023.12 ICCV 2023: WECIA - Caption Generation Challenge (Champion🏆) [LeaderBoard]
2023.07 ACM-MM 2023: Visual Text Question Answering (3rd🥉) [LeaderBoard] [Paper]
2023.03 CVPR 2023: Multi-modal Aerial View Imagery Challenges - Translation (Top3 prize 2250$, Champion🏆) [LeaderBoard] [Paper]
2022.06 CVPR 2022: Robustness in Sequential Data challenge (Champion🏆) [LeaderBoard] [Report] [New]
2022.03 CVPR 2022: Semi-Supervised Hyperspectral Object Detection Challenge (Champion🏆) [LeaderBoard] [Paper]

Bachelor phase

2022.08 Computer Competition of China (Top10 prize 560,000¥, Solo, National Second Prize, Top30/3000+) [LeaderBoard] [Code]
2022.01 AAAI 2022: Data-Centric Robust Learning on ML Models (Top10 prize 1000,000¥, Solo, Rank 10/3692) [LeaderBoard] [Code] [Paper]
2021.11 OPPO Security AI Challenge - Face Recognition Attacks (Top10 prize 600,000¥, Solo, Rank 12/2000+) [LeaderBoard] [Code]
2021.03 CVPR 2021：White-box Adversarial Attacks on ML Defense Models (Top10 prize 100,000¥, Rank 20/1681) [LeaderBoard] [Code] [Blog]
2020.10 Adversarial Attacks on forged images (Top10 prize 2 million ¥, Rank 6/1666) [LeaderBoard]
2020.08 Tencent Advertising Algorithm Competition (Top10 prize 100,000$, Rank 11/10000+) [Code] [Blog]
2020.04 Used Car Trading Price Forecast (Solo, Winner, Rank 13/2815) [LeaderBoard] [Code] [Blog]
2020.03 Text Adversarial Attack Competition (Top10 prize 68,000¥, Rank 4/1666) [LeaderBoard] [Code] [Blog]
2019.12 ImageNet Adversarial Attack Competition (Top10 prize 68,000¥, Rank /1522) [LeaderB] [Blog]
2019.10 GeekPwn2019 CAAD CTF Finals (Finals prize 100,000¥, Rank 5th place in Finals) [LeaderBoard] [New]

🎖 Honors and Awards

2023.11 Huawei Scholarship (Top 30th in the university)
2023.10 National Scholarship (Top 1% of graduate students)
2022.10 National Scholarship (Top 1% of graduate students)
2021.10 National Scholarship (Top 1% of undergraduate students)
2020.10 National Scholarship (Top 1% of undergraduate students)

🎓 Educations

2022.09 - 2025.07, University of Science and Technology of China, Computer Technology, Recommended Postgraduate, Master’s Degree
2018.09 - 2022.06, Guangzhou University, Computer Science and Technology (1/591), Bachelor’s Degree

🏛️ Academic conferences

2024.03, Mindspore AI Framework Industry Conference (organized by Huawei), invited by Huawei, Beijing.
2023.11, 31st ACM International Conference on Multimedia, Ottawa, Canada.
2020.12, The 1st AI and Security Symposium (organized by Tsinghua University and Alibaba Security), invited by Alibaba, Beijing.
2019.10, The 5th GeekPwn International Security Geek Competition, Shanghai.

💻 Internships

2023.10 - 2024.10, Palo Alto Lab, PAII, Inc.
2023.04 - 2023.06, Fuxi Lab, Netease.
2022.11 - 2023.09, YouTu lab, Tencent.
2022.06 - 2022.12, 2012 Lab, Huawei.

Thank you very much for every visitor, and I look forward to hearing from you!