3D Multimedia

Analytics, Search and Generation

In Conjunction with ICME 2024

15-19 July, Niagra Falls, Canada

News !

  • March 4, 2024:   The website is coming. Call for papers.


   Today, ubiquitous multimedia sensors and large-scale computing infrastructures are producing at a rapid velocity of 3D multi-modality data, such as 3D point cloud acquired with LIDAR sensors, RGB-D videos recorded by Kinect cameras, meshes of varying topology, and volumetric data. 3D multimedia combines different content forms such as text, audio, images, and video with 3D information, which can perceive the world better since the real world is 3-dimensional instead of 2-dimensional. For example, the robots can manipulate objects successfully by recognizing the object via RGB frames and perceiving the object size via point cloud. Researchers have strived to push the limits of 3D multimedia search and generation in various applications, such as autonomous driving, robotic visual navigation, smart industrial manufacturing, logistics distribution, and logistics picking. The 3D multimedia (e.g., the videos and point cloud) can also help the agents to grasp, move and place the packages automatically in logistics picking systems. Therefore, 3D multimedia analytics is one of the fundamental problems in multimedia understanding. Different from 3D vision, 3D multimedia analytics mainly concentrate on fusing the 3D content with other media. It is a very challenging problem that involves multiple tasks such as human 3D mesh recovery and analysis, 3D shapes and scenes generation from real-world data, 3D virtual talking head, 3D multimedia classification and retrieval, 3D semantic segmentation, 3D object detection and tracking, 3D multimedia scene understanding, and so on. Therefore, the purpose of this workshop is to: 1) bring together the state-of-the-art research on 3D multimedia analysis; 2) call for a coordinated effort to understand the opportunities and challenges emerging in 3D multimedia analysis; 3) identify key tasks and evaluate the state-of-the-art methods; 4) showcase innovative methodologies and ideas; 5) introduce interesting real-world 3D multimedia analysis systems or applications; and 6) propose new real-world or simulated datasets and discuss future directions. We solicit original contributions in all fields of 3D multimedia analysis that explore the multi-modality data to generate the strong 3D data representation. We believe this workshop will offer a timely collection of research updates to benefit researchers and practitioners in the broad multimedia communities.

Call for papers

   We invite submissions for ICME 2024 Workshop, 3D Multimedia Analytics, Search and Generation (3DMM2024), which brings researchers together to discuss robust, interpretable, and responsible technologies for 3D multimedia analysis. We solicit original research and survey papers that must be no longer than 6 pages (including all text, figures, and references). Each submitted paper will be peer-reviewed by at least three reviewers. All accepted papers will be presented as either oral or poster presentations, with the best paper award. Papers that violate anonymity, do not use the ICME submission template will be rejected without review. By submitting a manuscript to this workshop, the authors acknowledge that no paper substantially similar in content has been submitted to another workshop or conference during the review period. Authors should prepare their manuscript according to the Guide for Authors of ICME. For detailed instructions, see here. Submission address is here.
  The scope of this workshop includes, but is not limited to, the following topics:

  • Generative Models for 3D Multimedia and 3D Multimedia Synthesis
  • Generating 3D Multimedia from Real-world Data
  • 3D Multimodal Analysis and Description
  • Multimedia Virtual/Augmented Reality
  • 3D Multimedia Systems
  • 3D Multimedia Search and Recommendation
  • Mobile 3D Multimedia
  • 3D Shape Estimation and Reconstruction
  • 3D Scene and Object Understanding
  • High-level Representation of 3D Multimedia Data
  • 3D Multimedia Application in Industry

  Fast Review for Rejected Regular Submissions of ICME 2024
  We set up a Fast Review mechanism for the regular submissions rejected by the ICME main conference. We strongly encourage the rejected papers to be submitted to this workshop. In order to submit through Fast Review, authors must write a front letter (1 page) to clarify the revision of the paper and attach all previous reviews. All the papers submitted through Fast Review will be directly reviewed by meta-reviewers to make the decisions.

Invited speakers

 Siwei Ma
Peking University, China.
Title: 3D Visual Media Representation and Coding.
Abstract: The application of 3D immersive media is developing rapidly and has broad prospects. Compared with the traditional 2D visual media such as 2D video, its data volume has doubled, requiring more efficient representation and encoding techniques. This talk mainly introduces the recent progress in efficient representation and encoding technology for 3D visual media, including traditional multi view video, depth video, point cloud, and the recent mesh coding technologies and standards. In addition, it also explores emerging topics such as digital human encoding and neural radiation field compression.
Biography: Siwei Ma (Fellow, IEEE) received the B.S. degree from Shandong Normal University, Jinan, China, in 1999, and the Ph.D. degree in computer science from the Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China, in 2005. He was a Post-Doctoral Researcher with the University of Southern California, Los Angeles, CA, USA, from 2005 to 2007. He joined the School of Electronics Engineering and Computer Science, Institute of Digital Media, Peking University, Beijing, where he is currently a Professor . He has authored over 300 technical papers in refereed journals and proceedings in image and video coding, video processing, video streaming, and transmission. He served/serves as an Associate Editor for the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, IEEE TRANSACTIONS ON IMAGE PROCESSING and the Journal of Visual Communication and Image Representation.

 Bin Fan
University of Science and Technology Beijing, China
Title: Learning Good Features for Visual Localization in Known 3D Environments
Abstract: Perceiving the three-dimensional world is a key issue in fields such as computer vision, human-computer interaction, and robotics. Visual localization is one of the crucial technologies for 3D perception, aiming to calculate the position and orientation of a camera in a 3D scene based on its captured image. Due to its convenience, flexible deployment, and low cost, it is favored by various applications such as autonomous driving, augmented reality, and intelligent robots. Extracting robust image features to establish reliable correspondences across different application environments has become one of the key technologies driving the practical application of visual localization. In this talk, I will introduce some of our work in this area, including Attention Weighted Local Descriptors, Task-Aligned Local Features, GAN-based local features, and Semantic-Aware Local Features.
Biography: Bin Fan received PhD degree from the National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences (CASIA) in 2011. After got his doctoral degree, he had been worked in the NLPR, firstly as an Assistant Professor and then as an Associate Professor until 2020. He is now a full professor in University of Science and Technology Beijing, China. During 2015-2016, he visited the CVLab at EPFL. His research is mainly focused on addressing problems related to image-based 3D perception and analysis, image understanding, etc. He has published over 60 papers in highly ranked conferences and journals, including IEEE TPAMI/TIP/TNNLS/TMM, Pattern Recognition, CVPR, ICCV, ECCV. He serves as Associate Editor of Pattern Recognition, Journal of Visual Communication and Image Representation, AC of CVPR/NeurIPS/ECCV/ICME. He is a Senior member of IEEE since 2016.

 Soodeh Nikan
Western University, Canada.
Title: Revolutionizing Spatial Intelligence: The Impact of Vision-Language Models on 3D Perception.
Abstract: The integration of Vision-Language Models (VLMs) with 3D perception technologies is revolutionizing spatial intelligence, enabling AI systems to understand and interact with their environments in unprecedented ways. This talk will explore the evolution of spatial intelligence and present recent advancements in research. Specifically, I will discuss how VLMs enhance scene understanding and object detection in autonomous driving, improving safety and efficiency. Additionally, the role of VLMs in robotics will be highlighted, where they enhance navigation and human-robot interaction, enabling robots to perform complex tasks with greater precision. In healthcare, I will demonstrate how VLMs assist in accurate diagnosis and analysis of medical images, providing detailed descriptions and identifying abnormalities. The talk will also address current challenges, such as scalability and ethical considerations, including data privacy and fairness. I will propose future directions for responsible AI development, emphasizing the importance of creating robust, interpretable, and ethical AI systems. Attendees will gain insights into cutting-edge VLM research and its potential to revolutionize various industries by providing a deeper, contextual understanding of visual data.
Biography: Soodeh Nikan received her Ph.D. in Electrical and Computer Engineering from University of Windsor in 2014. She is currently an Assistant Professor in software engineering at the Department of ECE, Western University, Canada. Her research interests lie in the intersection of artificial intelligence (AI) and various engineering disciplines including computer vision, data analytics, biomedical engineering and signal processing. Dr. Nikan has made significant contributions to optimized deep/machine learning technologies for highly demanding and safety-critical areas. She has an extensive academic and industry portfolio in AI and automotive research through her research on autonomous driving and intelligent transportation at Ford Motor Company and Western University. Her research excellence has been recognized by prestigious awards such as the NSERC and Mitacs awards. Demonstrating her deep commitment to the academic community, Dr. Nikan serves as a Counselor for the IEEE London Ontario Section Branch and actively participates as a reviewer for technical committees and IEEE-sponsored venues and journals.


Peng Dai
Noah’s Ark Lab, Canada
Shan An
JD Health, China
An-An Liu
Tianjin University, China
Kun Liu
Explore Academy of JD.com, China
Wu Liu
University of Science and Technology of China, China
Antonios Gasteratos
Democritus University of Thrace, Greece

Program Chairs

Xuri Ge
University of Glasgow, UK
Junjie Ye
University of Southern California, USA
Chao Zhang
JD Health, China
Guoxin Wang
Zhejiang University, China

Accepted Papers

Oral Order Paper ID Paper Title
1 8 Visibility-aware Human Mesh Recovery via Balancing Dense Correspondence and Probability Model
2 12 Dual Attribute-Spatial Relation Alignment For 3D Visual Grounding
3 31 Automatic Malleefowl Mound Detection Using Lidar-Based Ground and Habitat Features With Planar Terrain Modelling
4 39 I3FNet: Instance-aware Feature Fusion for Few-shot Point Cloud Generation from Single Image
5 45 3DMIT: 3D Multi-Modal Instruction Tuning for Scene Understanding
6 88 Blender-NeRF: A Monocular Dynamic Human Body Explicit Reconstruction and Rendering Method

Previous Workshops on 3DMM: 3DMM-ICME2022, 3DMM-ICME2023

If you have any questions, feel free to contact < peng [DOT] dai [DOT] ca [AT] ieee.org