Vaufreydaz Dominique (aka Doms), Professor in Computer Science at University Grenoble Alpes/Inria/LIG

Research topics:

• Multimodal perception for interaction

• Smart spaces/Ambient Assisted living

• Affective computing

• Sociable interaction with robots and autonomous cars

Keywords:

• Machine Learning

• Deep Learning

• Computer vision

• Multimodal processing

• Sociable robot

Featured publications

Google scholar Research Gate All publications CV on HAL

Towards LLM-Powered Ambient Sensor Based Multi-Person Human Activity Recognition

ICPADS 2024 - The 30th International Conference on Parallel and Distributed Systems

Human Activity Recognition (HAR) is one of the central problems in fields such as healthcare, elderly care, and security at home. However, traditional HAR approaches face challenges including data scarcity, difficulties in model generalization, and the complexity of recognizing activities in multi-person scenarios. This paper proposes a system framework called LAHAR, based on large language models. Utilizing prompt engineering techniques, LAHAR addresses HAR in multi-person scenarios by enabling subject separation and action-level descriptions of events occurring in the environment. We validated our approach on the ARAS dataset, and the results demonstrate that LAHAR achieves comparable accuracy to the state-of-the-art method at higher resolutions and maintains robustness in multi-person scenarios.

Read the article

Autoregressive GAN for Semantic Unconditional Head Motion Generation

ACM Transactions on Multimedia Computing, Communications and Applications (2024)

In this work, we address the task of unconditional head motion generation to animate still human faces in a low-dimensional semantic space from a single reference pose. Different from traditional audio-conditioned talking head generation that seldom puts emphasis on realistic head motions, we devise a GAN-based architecture that learns to synthesize rich head motion sequences over long duration while maintaining low error accumulation levels. In particular, the autoregressive generation of incremental outputs ensures smooth trajectories, while a multi-scale discriminator on input pairs drives generation toward better handling of high- and low-frequency signals and less mode collapse. We experimentally demonstrate the relevance of the proposed method and show its superiority compared to models that attained state-of-the-art performances on similar tasks.

Read the article

Self-Supervised Transformers for Unsupervised Object Discovery using Normalized Cut

CVPR2022 - IEEE/CVF Computer Vision and Pattern Recognition Conference

Transformers trained with self-supervision using self-distillation loss (DINO) have been shown to produce attention maps that highlight salient foreground objects. In this paper, we show a graph-based method that uses the self-supervised transformer features to discover an object from an image. Visual tokens are viewed as nodes in a weighted graph with edges representing a connectivity score based on the similarity of tokens. Foreground objects can then be segmented using a normalized graph-cut to group self-similar regions. We solve the graph-cut problem using spectral clustering with generalized eigen-decomposition and show that the second smallest eigenvector provides a cutting solution since its absolute value indicates the likelihood that a token belongs to a foreground object. Despite its simplicity, this approach significantly boosts the performance of unsupervised object discovery: we improve over the recent state-of-the-art LOST by a margin of 6.9%, 8.1%, and 8.1% respectively on the VOC07, VOC12, and COCO20K. The performance can be further improved by adding a second stage class-agnostic detector (CAD). Our proposed method can be easily extended to unsupervised saliency detection and weakly supervised object detection. For unsupervised saliency detection, we improve IoU for 4.9%, 5.2%, 12.9% on ECSSD, DUTS, DUTOMRON respectively compared to state-of-the-art. For weakly supervised object detection, we achieve competitive performance on CUB and ImageNet. Our code is available at: https://www.m-psi.fr/Papers/TokenCut2022/

Read the article

The Role of Emotion in Problem Solving: First Results from Observing Chess

Modeling Cognitive Processes from Multimodal Data (MCPMD) Workshop at 20th ACM International Conference on Multimodal Interaction (ICMI2018), Oct 2018, Boulder, Colorado, United States.

In this paper we present results from recent experiments that suggest that chess players associate emotions to game situations and reactively use these associations to guide search for planning and problem solving. We describe the design of an instrument for capturing and interpreting multimodal signals of humans engaged in solving challenging problems. We review results from a pilot experiment with human experts engaged in solving challenging problems in Chess that revealed an unexpected observation of rapid changes in emotion as players attempt to solve challenging problems. We propose a cognitive model that describes the process by which subjects select chess chunks for use in interpretation of the game situation and describe initial results from a second experiment designed to test this model.

Read the article

Google scholar Research Gate All publications CV on HAL

2023-
Professor in Computer Science.

I am Professor in Computer Science at Grenoble Alpes University and in the LIG laboratory.
2021-
Head of the M-PSI research team.

I am currently leading the Multimodal Perception and Sociable Interaction (M-PSI) team at the LIG laboratory.
2005-2023
Maître de Conférences - HDR in Computer Science.

From 2005 to 2023, I was Maître de Conférences (Associate Professor, HDR in 2018) in Computer Science at Grenoble Alpes University and in the LIG laboratory.
2002-2020
PRIMA and Pervasive Interaction teams.

in the 2000s, I was involved in the European projects FAME and CHIL for integration of the context (linguistic, thematic, situation awareness) in the acoustic perception (speech recognition, speaker's localization) within an intelligent environment. I also work on an intelligent virtual cameraman. Later, I started to work on multimodal perception and social interaction, notably in the context of Ambiant Assisted Living (AAL) and social robotics. On these research topics, I participated in the CASPER, PRAMAD, PAL, Valet and CEEGE projects. The PRIMA and Pervasive Interaction teams were part of Inria and GRAVIR/LIG laboratories.
2002
Ph.D. in Computer Science within the GEOD team of CLIPS laboratory

My Ph.D. thesis was about "statistical language modeling using Internet documents for continuous speech recognition". My contibutions on French speech recognition were used within the CStar and the Nespole! international projects.

Activities

Head of the Multimodal Perception and Sociable Interaction (M-PSI) team.
Co-supervisor of the Graphics, Vision and Robotics (GVR) speciality of the MOSIG Master 2 program.

The MOSIG program is an international Master program in English. I was involved in the definition of the new syllabuses and new organization of the MOSIG program. I am currently co-responsible of the GVR speciality (student selection, student defense...).
In charge of the transversal numerical competence courses at the Grenoble Faculty of Economics

My role is to coordinate all aspects about office automation, survey management and database courses on 2 sites: Grenoble and Valence campuses. Among my responsibilities, I must define syllabuses in collaboration with disciplinary teachers, prepare both lecture and practical courses, hire and form non-permanent teachers (both external teachers and PhD students), managing, supervising, motivating and coordinating the pedagogical team (2 permanent teachers, 8 to 10 new teachers each year), make time schedules, follow-up students and their grades (more than 1 thousand students), gather information for paycheck...
Teacher of several other courses at IUT2 (University Institute of Technology) and Master DCISS

See Teaching/courses.
Task officer "Scientific animation" for the Keynotes speeches of the LIG Laboratory
Member of the CERGA (ethics committee of the University)
Co-organiser of the Working Group 5 (Human Robot Interaction) of the French Robotic Research Network.

I teach more that 300h a year in Licence and Master degrees. My main courses can be devided into the two following categories.
Numerical competence courses

Numerical culture: These courses are dedicated to leverage knowledge of non specialist students (i.e. non computer-science students) about backup (local, NAS, cloud), security (cryptography basis, secure protocols and their usage, virus/trojan/ransomeware...), data privacy (social networks, network) and cyber-bullying, information reliability and evaluation (hoax, fake news...). The courses are illustrated with news examples and contextualized with French laws.

Office automation: Models underlying word processing and spreadsheet software are explained to improve skills of student in office automation. This knowledge is put into practice using Microsoft Office, Libre Office and Google docs.

Survey management: This course presents survey methodology like questionnaire construction, population sampling and data collection. Traditional (face-to-face for instance) and more modern data collection techniques (Google form, LimeSurvey...) are used to illustrate the theoretical background.
Computer courses

C++: I am responsible for a C++ course for first year students in computer science.

Databases: With different group of students, I present relational DBMS (MySQL/MariaDB, Access). In conjunction with the Web data and Mobile programming courses, NoSQL approaches are illustrated thought the use of MongoDB. The learning process relies on both theoretical and practical contents.

Web data: Text, sound, image and video data and their usage in Web technologies are detailed in this course. Technical knowledge about HTML5/CSS3 and the new JavaScript API (Workers, WebSockets, etc.), JSON/XML data format and transformation are presented.

Mobile programming: This course depicts mobile application programming pattern using HTML5/CSS/JavaScript and the Cordova backend. Using Node.js server and MongoDB database, students put in practice this pattern to develop several (distributed) mobile applications.

System administration in heterogeneous environment: This lecture is given to sandwich-course students. It goes through administration models and practical implementation for system administration in heterogeneous environment (Linux/Windows/Mac OSX and mobile devices). Among topics, one can find Identity Management Systems, access control (cross platform identity, Kerberos, Single Sign-On, ...), administration automation, network storage (NAS, SAN) and "bring you own device" problem.

In progress

Attention distribuée et interactions socio-émotionnelles enseignants-étudiants en salle de classe sensible au contexte (03/2019-...)

Romain Laurent (Teaching Analytics Ph.D.). Co-supervisor 30% with Philippe Dessus. ( Related publications ).

Multimodal perception and statistical modeling of pedagogical classroom events using a privacy-safe non-individual approach. (11/2021-...)

Anderson Augusma. Co-supervisor 50% with Frédérique Letué. ( Related publications ).

Dynamic for automatic speech recognition. Application to low-resource languages (11/2021-...)

Sotheara Leang. Co-supervisor 34% with Eric Castelli (33%) and Sam Sethserey (33%) from the CADT.
( Related publications ).

Recognition of multi-person human activities in connected environments. (01/2023-...)

Xi Chen. Co-supervisor 33% with Julien Cumin and Fano Ramparany (Orange Labs).
( Related publications ).

Generic Modeling of Human-Machine Interactions. (11/2024-...)

Lykong Un. Co-supervisor 50% with Rottana Ly (CADT, Cambodia).
Former Ph.D. students

Object Discovery in Images, Videos, and 3D Scenes.

Yangtao Wang (Defense 11/2024). Co-supervisor 25% with James L. Crowley. ( Related publications ).

Autonomous Driving in Urban Environments in the presence of Pedestrians using Deep Reinforcement Learning

Niranjan Deshpande (defense 10/2024). Co-supervisor 50% with Anne Spalanzani ( Related publications ).

Adversarial learning methods for the generation of human interaction data

Louis Airale (defense 12/2023). Co-supervisor 50% with Xavier Alameda Pineda ( Related publications ).

Toward rotation invariance for neural networks: application to human (pose) detection

Rottana Ly (defense 11/2023). Co-supervisor 34% with Eric Castelli (33%) and Sam Sethserey (33%) from the CADT.
( Related publications ).

Estimating Expertise from Eye-Gaze and Emotions

Thomas Guntz (defense 09/2020). Co-supervisor 50% with James L. Crowley ( Related publications ).

Integrating Animation Artists into the Animation Design of Social Robots

Etienne Balit (defense 12/2019). Co-supervisor 50% with Patrick Reignier ( Related publications ).

Building and Leveraging Prior Knowledge for Predicting Pedestrian Behaviour Around Autonomous Vehicles

Pavan Vasishta (defense 09-2019). Co-supervisor 50% with Anne Spalanzani ( Related publications ).

Smartphone-based indoor positioning using Wifi, inertial sensors and Bluetooth

Viet-Cuong Ta (defense 12/2017). This thesis was co-supervised by Eric Castelli (34%) and Trung-Kien Dao (33%) at the MICA institute and myself (33%) at Inria. ( Related publications ).

Semantic Description of Services and Service Factories for Ambient Intelligence

Rémi Emonet (defense 09/2009). Co-supervisor 50% with James L. Crowley ( Related publications ).

Indoor localization using multimodal information

Yan Hue (2008-2011). Co-supervisor 50% with Eric Castelli.