Vaufreydaz Dominique

Professor in Computer Science at University Grenoble Alpes /LIG laboratory . head of the Multimodal Perception and Sociable Internaction (M-PSI) team 

I am Professor in Computer Science at University Grenoble Alpes  and researcher in the LIG laboratory 

My current research interests are about multimodal perception and behavior analysis, mainly of humans, in the context of smart spaces/ubiquitous computing, healthcare and assistive technologies and/or affective computing. These researches could be applied to sociable robot companions, autonomous cars, smart homes or any human/agent interactions.


Research topics:

• Multimodal perception for interaction

• Smart spaces/Ambient Assisted living

• Affective computing

• Sociable interaction with robots and autonomous cars

Multimodal perception of People, Behaviors and Emotions

Machine Learning (including Deep Learning) with several sensors (microphones, cameras, RGB-D, LIDAR, ...) for perception of Humans.

These researches addresses the use of multimodal sensors (microphones, cameras, RGB-D, LIDAR, ...) to perceive humans, their behaviors and mental states. These researches start from low level signal processing up to high level machine learning. Deep Learning is now included as machine learning technic for its performance on some of our perception tasks.


• Machine Learning

• Deep Learning

• Computer vision

• Multimodal processing

• Sociable robot

Sociable Interaction with Humans

Use perception of humans and sociable feedback from the system in the interaction loop.

In the first part of the interaction loop, we use perception of humans as input for intelligent systems (robot companions, social robots, autonomous cars...). This information permits to anticipate human needs or to predict human behaviors. The second part of the interaction loop, feedback from the system is studied. For instance, in the case of a social companion, its animation must reflect its internal state and must be directly readable/understandable by its human partner(s). For mobile devices, mobile robots or autonomous cars, their navigation must be sociably acceptable and predictable.

Smart spaces/Ambient Assisted living

Perception within smart spaces from two points of views.

On this research topic, two points of view are addressed. The first one is how to distribute perception systems in smart spaces or in ubiquitous environments. Our reflection leads to Omiscid, a middleware for distributed (perception) systems in such environment. The second point of view inquiries usage of perception researches to help people in their daily life at home (notably elderly) or at work. We also study usage of IoT objects to complete human perception/system feedback in smart spaces or smart homes.

Projects, software and datasets

List of research projects, software and datasets contributions

Since 1998, I was involved in many research projects. I also initiate some personal projects on some specific topics like in the MobileRGBD project.

CEEGE, VALET, MobileRGBD, Expressive Figurines, Equipex Amiqual



You can take a look at my github page, to

MobileRGBD, a publication on BRAF-100 French corpus.

Featured publications

Autoregressive GAN for Semantic Unconditional Head Motion Generation

ACM Transactions on Multimedia Computing, Communications and Applications (2024)

In this work, we address the task of unconditional head motion generation to animate still human faces in a low-dimensional semantic space from a single reference pose. Different from traditional audio-conditioned talking head generation that seldom puts emphasis on realistic head motions, we devise a GAN-based architecture that learns to synthesize rich head motion sequences over long duration while maintaining low error accumulation levels. In particular, the autoregressive generation of incremental outputs ensures smooth trajectories, while a multi-scale discriminator on input pairs drives generation toward better handling of high- and low-frequency signals and less mode collapse. We experimentally demonstrate the relevance of the proposed method and show its superiority compared to models that attained state-of-the-art performances on similar tasks.

Read the article 

Self-Supervised Transformers for Unsupervised Object Discovery using Normalized Cut

CVPR2022 - IEEE/CVF Computer Vision and Pattern Recognition Conference

Transformers trained with self-supervision using self-distillation loss (DINO) have been shown to produce attention maps that highlight salient foreground objects. In this paper, we show a graph-based method that uses the self-supervised transformer features to discover an object from an image. Visual tokens are viewed as nodes in a weighted graph with edges representing a connectivity score based on the similarity of tokens. Foreground objects can then be segmented using a normalized graph-cut to group self-similar regions. We solve the graph-cut problem using spectral clustering with generalized eigen-decomposition and show that the second smallest eigenvector provides a cutting solution since its absolute value indicates the likelihood that a token belongs to a foreground object. Despite its simplicity, this approach significantly boosts the performance of unsupervised object discovery: we improve over the recent state-of-the-art LOST by a margin of 6.9%, 8.1%, and 8.1% respectively on the VOC07, VOC12, and COCO20K. The performance can be further improved by adding a second stage class-agnostic detector (CAD). Our proposed method can be easily extended to unsupervised saliency detection and weakly supervised object detection. For unsupervised saliency detection, we improve IoU for 4.9%, 5.2%, 12.9% on ECSSD, DUTS, DUTOMRON respectively compared to state-of-the-art. For weakly supervised object detection, we achieve competitive performance on CUB and ImageNet. Our code is available at:

Read the article 

Building Prior Knowledge: A Markov Based Pedestrian Prediction Model Using Urban Environmental Data

ICARCV 2018 - 15th International Conference on Control, Automation, Robotics and Vision

Autonomous Vehicles navigating in urban areas have a need to understand and predict future pedestrian behavior for safer navigation. This high level of situational awareness requires observing pedestrian behavior and extrapolating their positions to know future positions. While some work has been done in this field using Hidden Markov Models (HMMs), one of the few observed drawbacks of the method is the need for informed priors for learning behavior. In this work, an extension to the Growing Hidden Markov Model (GHMM) method is proposed to solve some of these drawbacks. This is achieved by building on existing work using potential cost maps and the principle of Natural Vision. As a consequence, the proposed model is able to predict pedestrian positions more precisely over a longer horizon compared to the state of the art. The method is tested over "legal" and "illegal" behavior of pedestrians, having trained the model with sparse observations and partial trajectories. The method, with no training data, is compared against a trained state of the art model. It is observed that the proposed method is robust even in new, previously unseen areas.

Read the article 

The Role of Emotion in Problem Solving: First Results from Observing Chess

Modeling Cognitive Processes from Multimodal Data (MCPMD) Workshop at 20th ACM International Conference on Multimodal Interaction (ICMI2018), Oct 2018, Boulder, Colorado, United States.

In this paper we present results from recent experiments that suggest that chess players associate emotions to game situations and reactively use these associations to guide search for planning and problem solving. We describe the design of an instrument for capturing and interpreting multimodal signals of humans engaged in solving challenging problems. We review results from a pilot experiment with human experts engaged in solving challenging problems in Chess that revealed an unexpected observation of rapid changes in emotion as players attempt to solve challenging problems. We propose a cognitive model that describes the process by which subjects select chess chunks for use in interpretation of the game situation and describe initial results from a second experiment designed to test this model.

Read the article 

Curriculum vitae

  • 2023-

    Professor in Computer Science.

    I am Professor in Computer Science at Grenoble Alpes University and in the LIG laboratory .

  • 2021-

    Head of the M-PSI research team .

    I am currently leading the Multimodal Perception and Sociable Internaction (M-PSI) team  at the LIG laboratory .

  • 2005-2023

    Maître de Conférences - HDR in Computer Science.

    From 2005 to 2023, I was Maître de Conférences (Associate Professor, HDR in 2018) in Computer Science at Grenoble Alpes University and in the LIG laboratory .

  • 2002-2020

    PRIMA and Pervasive Interaction teams.

    in the 2000s, I was involved in the European projects FAME and CHIL for integration of the context (linguistic, thematic, situation awareness) in the acoustic perception (speech recognition, speaker's localization) within an intelligent environment. I also work on an intelligent virtual cameraman. Later, I started to work on multimodal perception and social interaction, notably in the context of Ambiant Assisted Living (AAL) and social robotics. On these research topics, I participated in the CASPER, PRAMAD, PAL, Valet and CEEGE projects. The PRIMA and Pervasive Interaction teams were part of Inria  and GRAVIR/LIG  laboratories.

  • 2002

    Ph.D. in Computer Science within the GEOD team of CLIPS laboratory

    My Ph.D. thesis was about "statistical language modeling using Internet documents for continuous speech recognition". My contibutions on French speech recognition were used within the CStar and the Nespole! international projects.


  Image courtesy of JP Guilbaud.  



Laboratoire LIG

CS 40700

38058 Grenoble cedex 9 - France


Université Grenoble Alpes


BP 47

38040 Grenoble cedex 9. France