Tutorials
Presenters:
Gauri Deshpande, Tata Consultancy Services
Björn Schuller, Technical University of Munich (TUM), Germany

Duration: Half-day

Brief description:
This tutorial aims to explain following aspects about behavior sensing from audio-visual cues:
  1. Introduction to “Behaviour Sensing”.
    1. Definition and scope of behaviour sensing.
    2. Importance of behaviour sensing in modern applications.
  2. Objectives of sensing behaviour.
    1. Driver monitoring for safety and alertness.
    2. Assessing candidates during interviews for non-verbal cues.
    3. Analyzing human behaviour in traffic management.
    4. Discussion of goals such as improving safety, enhancing decision-making, and optimizing user experience.
  3. Multi-modal methods of sensing behaviour.
    1. Techniques for integrating audio and visual data for comprehensive analysis.
    2. Challenges with fusion of audio-visual interpretations.
  4. Data collection strategies and pre-processing techniques.
    1. Methods for ensuring data diversity and representativeness.
    2. Pre-processing techniques include noise reduction, feature extraction, and normalization.
  5. Prevalent behaviour parameters being studied: Emotions, Confidence, Stress, Anxiety.
  6. Inferencing and real-time validation of behaviour sensing models.
    1. Overview of inferencing techniques for behaviour sensing.
    2. Real-time model validation and feedback mechanisms.

Presenter:
Shayok Chakraborty, Florida State University

Duration: Half-day

Brief Description:
While the unparalleled success of sophisticated machine learning algorithms (such as deep neural networks) has depicted commendable performance in several applications, training a robust machine learning model necessitates a large amount of hand-labeled training data, which is time-consuming and labor-intensive to acquire. This has motivated research in the field of weakly supervised learning, where the objective is to induce a robust machine learning model under the constraint that human annotation effort is expensive. Active Learning (AL) is a popular learning paradigm which attempts to address the challenge of weakly supervised learning. AL algorithms automatically select the salient and exemplar instances from large amounts of unlabeled data that need to be labeled manually; this not only tremendously reduces the human annotation effort in training an effective model, but also exposes the model to the informative samples in the underlying data population. AL has been used with remarkable success in a variety of applications, such as computer vision, text mining, bioinformatics and medical diagnosis, among others.

This tutorial will seek to present a comprehensive overview of active learning, including historical perspectives, theoretical analysis and novel variants. The novelty of this tutorial lies in its focus on the recent and emerging trends, algorithms and applications of this learning paradigm. It will aim at introducing concepts and open perspectives that motivate further work in this domain, ranging from fundamentals to applications and systems.

Presenters:
Arun Chauhan, Graphic Era University Dehradun
Deepak K. Gupta, IIT ISM Dhanbad
Arnav Chavhan, Nyun AI

Duration: Half-day

Brief Description:
Despite the success of CNNs and, more recently, transformers, the scale of deep learning models has grown exponentially due to the rapid development of computational resources. Going forward, it is critical to focus on the practical training and inference efficiency of these models, as well as ensuring that even the largest models have viable real-world applications. Moreover, compute efficiency of sheer size models is becoming increasingly crucial for downstream tasks such as segmentation, object tracking, and action recognition, among others. There exist several different directions towards making deep learning efficient for real time applications, thereby leading to a reduction in the required computational memory or the associated training and inference time. This tutorial aims at bringing together researchers and industry practitioners who work towards building efficient real time applications with deep learning. This tutorial will provide a conducive environment for deep learning practitioners to connect, learn and collaborate. A unique aspect of this tutorial is that it will also serve as a platform to discuss research efforts toward budget-aware model training and inference. Most research that exists focuses on making deep learning methods efficient, however, the resources associated with real-world AI devices can vary drastically, and a method termed efficient for a certain choice of resource budgets might be completely inefficient for a different one. Further, the recent tutorials on the topic of efficient deep learning have also focused minimally on this aspect. In this tutorial, we will also focus on methods that aim at performing budget-aware model training and inference, thereby maximally utilizing the available resources. To push research efforts in this direction, we also cover hands on focusing on Resource-efficient model training and inference where participants would be required to optimize the model training process under computational memory constraint as well as the inference process under latency constraint.

Presenters:
Soma Bandyopadhyay, Tata Consultancy Services, Kolkata, India
Anish Datta, Tata Consultancy Services, Kolkata, India
Subhasri Chatterjee, Tata Consultancy Services, Kolkata, India

Duration: Half-day

Brief description:
In this tutorial, we aim to explore
  1. Fundamentals of DL-based generative models.
  2. The basics of data generation of sensor time-series using DL-based generative models.
  3. Importance of physics guidance - The need to encode the pattern in the latent representation influenced by the diverse domain characteristics while synthesizing sensor data.
  4. Validation of synthesized data using ML-based techniques.
  5. Role of numerical simulators.
  6. Two-directional interactions between numerical simulators and generative models.
This tutorial will aim to combine theory, practical methods, and a variety of applications specifically relevant to sensor data synthesis. In addition to learning core concepts, it will also help attendees to gain theory and application of generative AI techniques.

Presenter:
Lawrence O’Gorman, Nokia Bell Labs, Murray Hill, USA

Duration: Half-day

Brief description:
In the last two decades, the art world has embraced two bodies of technology associated with pattern recognition. One is “interactive” or “immersive” art where the art viewer is pulled into the artwork rather than just viewing a passive, static object. In this way, the art viewer becomes a collaborator with the artist in creating something that is modified by the actions of the viewer. The other technology is AI, and especially generative machine learning, where the artist uses generative AI software to create an artwork such that often the question is asked, who should be attributed with the creation of the artwork, the artist, the AI, or the creator of the AI software? In this tutorial, we start at the beginning of the use of technology and art, move to interactive art, and then to AI-assisted/created art.

The tutorial will begin with history of early interactive art experiments, the Experiments in Art and Technology (E.A.T.) collaborations between artists such as Robert Rauschenberg and Merce Cunningham and Bell Labs engineers in the 1960s. We will subsequently cover aspects of art to the present day including visual art, music, dance, and theatre; and technologies including video, audio, biometrics, VR and AR, and machine learning.

We will illustrate the artists’ use of technologies with many video and audio examples, and we will include some of the projects created between artists-in-residence and scientists at Bell Labs. We will conclude with an on-site, group-interactive game, Hammer and Eggs, in which the two sides of the audience compete to break an egg with a giant hammer using their collective movement.

Presenters:
Xin Zhao, University of Science and Technology Beijing
Shiyu Hu, Chinese Academy of Sciences

Duration: Half-day

Brief description:

This tutorial is centered on the Visual Turing Test (VTT), a cutting-edge evaluation method that hinges on human-machine comparison. To elucidate this concept more effectively, we have chosen a quintessential task in computer vision: Visual Object Tracking (VOT), as a case study to present fresh insights from an evaluative standpoint. The structure of the tutorial is outlined as follows:

Part I. Human Dynamic Vision Ability and VOT Task Introduction
Part II. Experimental Environment
Part III. Algorithms and Traditional Machine-Machine Comparisons
Part IV. Visual Turing Test
Part V. Conclusions and Future Works

Presenters:
Thanh-Nghia Truong, Tokyo University of Agriculture and Technology, Tokyo, Japan
Cuong Tuan Nguyen, Vietnamese-German University, Binh Duong, Vietnam
Nam Tuan Ly, Tokyo University of Agriculture and Technology, Tokyo, Japan
Harold Mouchère, LS2N - UMR CNRS 6004, University of Nantes, Nantes, France
Masaki Nakagawa, Tokyo University of Agriculture and Technology, Tokyo, Japan

Duration: Half-day

Brief description:
Part 1: Handwritten Mathematical Expression (HME) Recognition – Overview and Structural methods before DNNs
Part 2: The Rise of Encoder-Decoder and GNN Models – DNN methods
Part 3: Related research topics to HME recognition.
  • Recognition of multiple line HMEs
  • Automatic scoring of handwritten answers
  • HME clustering methods
  • Context analysis
  • Discussion
This tutorial refers to the following paper recently published and presents methods and techniques used for HME recognition:

T. N. Truong, C. T. Nguyen, R. Zanibbi, H. Mouchère, M. Nakagawa “A survey on handwritten mathematical expression recognition: The rise of encoder-decoder and GNN models”. Pattern Recognition, Vol. 153, April 20