Project description


Technologies that can robustly and accurately analyse human facial, vocal and verbal behaviour and interactions in the wild, as observed by omnipresent webcams in digital devices, would have profound impact on both basic sciences and the industrial sector. They could open up tremendous potential to measure behaviour indicators that heretofore resisted measurement because they were too subtle or fleeting to be measured by the human eye and ear, would effectively lead to development of the next generation of efficient, seamless and user-centric human-computer interaction (affective multimodal interfaces, interactive multi-party games, and online services), would have profound impact on business (automatic market research analysis would become possible, recruitment would become green as travels would be reduced drastically), would enable next generation healthcare technologies (remote monitoring of conditions like pain, anxiety and depression), to mention but a few examples.

How SEWA works?

The overall aim of the SEWA project is to enable such technology, i.e., to capitalise on existing state-of- the-art methodologies, models and algorithms for machine analysis of facial, vocal and verbal behaviour, and then adjust and combine them to realise naturalistic human-centric human-computer interaction (HCI) and computer-mediated face-to-face interaction (FF-HCI) for data recorded by a device as cheap as a web-cam and in almost arbitrary recording conditions including semi-dark, dark and noisy rooms with dynamic change of room impulse response and distance to sensors. This extends and contrasts considerably the current state of the art in existing technological solutions to machine analysis of the facial, vocal and verbal behaviour that are used in (commercially and otherwise) available human-centric HCI and FF-HCI applications.

To wit, shortcomings of existing technologies for automatic analysis of human behaviour are numerous.

  • Current studies have been largely conducted in laboratory conditions so far with controlled noise level, reverberation, often limited verbal content, illumination, calibrated cameras, and subjects who are instructed not to eat or talk on the phone while being recorded. Such conditions are very difficult to reproduce in real-world applications and tools trained on such data usually do not generalise well to behavioural recordings made in the wild (in unconstrained settings typical for HCI and FF-HCI scenarios).
  • Interpretation of facial and vocal behaviour depends crucially on the dynamics of the behaviour (timing, velocity, frequency, temporal inter-dependencies between gestures), which is currently not taken into account.
  • Observed behaviours may be influenced by those of an interlocutor and thus require analysis of both interactants, especially to measure such critically important patterns as mimicry, rapport, and sentiment, in general, but this is not currently taken into account. Existing approaches typically perform analysis of a single individual and FF-HCI is not addressed as a problem of simultaneous analysis of both interacting parties.

The main aim of the SEWA project is to address these shortcomings of the current HCI and FF-HCI technology and develop novel, robust technology for machine analysis of facial, vocal and verbal behaviour in the wild as shown by a single person or by two (or more) interactants.

Where is SEWA applied?

As a proof of concept, and with the focus on novel HCI and FF-HCI applications, SEWA technology will be applied to:

  • machine inference of sentiment/ liking ratings in response to multimedia content (movie trailers, product adverts, etc.) watched by people in the wild, based on which a multimedia recommender system will be built, offering to the user multimedia content being similar to that she liked previously;
  • automatic estimation of sentiment, rapport and empathy shown by two people involved in unscripted computer-mediated dyadic interaction (using an online video chat service like opentok), based on which Chat Roulette Social Game will be developed that tries to find out people with whom one would like to chat (the underlying phenomena are the formation and dynamics of clusters of people that like to chat with each other, e.g., have the same opinions about debating issues; based on which formation and dynamics of social media/networks and opinion polls can be studied, and serious games providing depth of learning and participation can be built).