Autoregressive GAN for Semantic Unconditional Head Motion Generation
ACM Transactions on Multimedia Computing, Communications and Applications (2024)
• Multimodal perception for interaction
• Smart spaces/Ambient Assisted living
• Affective computing
• Sociable interaction with robots and autonomous cars
Machine Learning (including Deep Learning) with several sensors (microphones, cameras, RGB-D, LIDAR, ...) for perception of Humans.
These researches addresses the use of multimodal sensors (microphones, cameras, RGB-D, LIDAR, ...) to perceive humans, their behaviors and mental states. These researches start from low level signal processing up to high level machine learning. Deep Learning is now included as machine learning technic for its performance on some of our perception tasks.
• Machine Learning
• Deep Learning
• Computer vision
• Multimodal processing
• Sociable robot
Use perception of humans and sociable feedback from the system in the interaction loop.
In the first part of the interaction loop, we use perception of humans as input for intelligent systems (robot companions, social robots, autonomous cars...). This information permits to anticipate human needs or to predict human behaviors. The second part of the interaction loop, feedback from the system is studied. For instance, in the case of a social companion, its animation must reflect its internal state and must be directly readable/understandable by its human partner(s). For mobile devices, mobile robots or autonomous cars, their navigation must be sociably acceptable and predictable.
Perception within smart spaces from two points of views.
On this research topic, two points of view are addressed. The first one is how to distribute perception systems in smart spaces or in ubiquitous environments. Our reflection leads to Omiscid, a middleware for distributed (perception) systems in such environment. The second point of view inquiries usage of perception researches to help people in their daily life at home (notably elderly) or at work. We also study usage of IoT objects to complete human perception/system feedback in smart spaces or smart homes.
List of research projects, software and datasets contributions
Since 1998, I was involved in many research projects. I also initiate some personal projects on some specific topics like in the MobileRGBD project.
Contributions.You can take a look at my github page, to
ACM Transactions on Multimedia Computing, Communications and Applications (2024)
In this work, we address the task of unconditional head motion generation to animate still human faces in a low-dimensional semantic space from a single reference pose. Different from traditional audio-conditioned talking head generation that seldom puts emphasis on realistic head motions, we devise a GAN-based architecture that learns to synthesize rich head motion sequences over long duration while maintaining low error accumulation levels. In particular, the autoregressive generation of incremental outputs ensures smooth trajectories, while a multi-scale discriminator on input pairs drives generation toward better handling of high- and low-frequency signals and less mode collapse. We experimentally demonstrate the relevance of the proposed method and show its superiority compared to models that attained state-of-the-art performances on similar tasks.
CVPR2022 - IEEE/CVF Computer Vision and Pattern Recognition Conference
Transformers trained with self-supervision using self-distillation loss (DINO) have been shown to produce attention maps that highlight salient foreground objects. In this paper, we show a graph-based method that uses the self-supervised transformer features to discover an object from an image. Visual tokens are viewed as nodes in a weighted graph with edges representing a connectivity score based on the similarity of tokens. Foreground objects can then be segmented using a normalized graph-cut to group self-similar regions. We solve the graph-cut problem using spectral clustering with generalized eigen-decomposition and show that the second smallest eigenvector provides a cutting solution since its absolute value indicates the likelihood that a token belongs to a foreground object. Despite its simplicity, this approach significantly boosts the performance of unsupervised object discovery: we improve over the recent state-of-the-art LOST by a margin of 6.9%, 8.1%, and 8.1% respectively on the VOC07, VOC12, and COCO20K. The performance can be further improved by adding a second stage class-agnostic detector (CAD). Our proposed method can be easily extended to unsupervised saliency detection and weakly supervised object detection. For unsupervised saliency detection, we improve IoU for 4.9%, 5.2%, 12.9% on ECSSD, DUTS, DUTOMRON respectively compared to state-of-the-art. For weakly supervised object detection, we achieve competitive performance on CUB and ImageNet. Our code is available at: https://www.m-psi.fr/Papers/TokenCut2022/
ICARCV 2018 - 15th International Conference on Control, Automation, Robotics and Vision
Autonomous Vehicles navigating in urban areas have a need to understand and predict future pedestrian behavior for safer navigation. This high level of situational awareness requires observing pedestrian behavior and extrapolating their positions to know future positions. While some work has been done in this field using Hidden Markov Models (HMMs), one of the few observed drawbacks of the method is the need for informed priors for learning behavior. In this work, an extension to the Growing Hidden Markov Model (GHMM) method is proposed to solve some of these drawbacks. This is achieved by building on existing work using potential cost maps and the principle of Natural Vision. As a consequence, the proposed model is able to predict pedestrian positions more precisely over a longer horizon compared to the state of the art. The method is tested over "legal" and "illegal" behavior of pedestrians, having trained the model with sparse observations and partial trajectories. The method, with no training data, is compared against a trained state of the art model. It is observed that the proposed method is robust even in new, previously unseen areas.
Modeling Cognitive Processes from Multimodal Data (MCPMD) Workshop at 20th ACM International Conference on Multimodal Interaction (ICMI2018), Oct 2018, Boulder, Colorado, United States.
In this paper we present results from recent experiments that suggest that chess players associate emotions to game situations and reactively use these associations to guide search for planning and problem solving. We describe the design of an instrument for capturing and interpreting multimodal signals of humans engaged in solving challenging problems. We review results from a pilot experiment with human experts engaged in solving challenging problems in Chess that revealed an unexpected observation of rapid changes in emotion as players attempt to solve challenging problems. We propose a cognitive model that describes the process by which subjects select chess chunks for use in interpretation of the game situation and describe initial results from a second experiment designed to test this model.
I am Professor in Computer Science at Grenoble Alpes University and in the LIG laboratory .
I am currently leading the Multimodal Perception and Sociable Internaction (M-PSI) team at the LIG laboratory .
From 2005 to 2023, I was Maître de Conférences (Associate Professor, HDR in 2018) in Computer Science at Grenoble Alpes University and in the LIG laboratory .
in the 2000s, I was involved in the European projects FAME and CHIL for integration of the context (linguistic, thematic, situation awareness) in the acoustic perception (speech recognition, speaker's localization) within an intelligent environment. I also work on an intelligent virtual cameraman. Later, I started to work on multimodal perception and social interaction, notably in the context of Ambiant Assisted Living (AAL) and social robotics. On these research topics, I participated in the CASPER, PRAMAD, PAL, Valet and CEEGE projects. The PRIMA and Pervasive Interaction teams were part of Inria and GRAVIR/LIG laboratories.
My Ph.D. thesis was about "statistical language modeling using Internet documents for continuous speech recognition". My contibutions on French speech recognition were used within the CStar and the Nespole! international projects.