• The SEWA Database

    The SEWA Database is released on the followning link: 1. The SEWA Database is released at the following link: for research purposes. The SEWA database includes annotations of the recordings in terms of facial landmarks, facial action unit (FAU) intensities, various vocalisations, verbal cues, mirroring, and rapport, continuously valued valence, arousal, liking, and prototypic examples (templates) of (dis)liking and sentiment. The data has been annotated in an iterative fashion, starting with a sufficient amount of examples to be annotated in a semi-automated manner and used to train various feature extraction algorithms developed in SEWA, and ending with a large DB of annotated facial behaviour recorded in the wild.


    5. AFEW-VA is a new dataset collected in-the-wild composed of 600 challenging video clips extracted from feature film, along with highly accurate per-frame annotations of valence and arousal. Added to these are per-frame annotations of 68 facial landmarks. The dataset is made publicly available and released along with baseline and stat-of-the- art experiment as well as a thorough comparison of features demonstrating the need of such in-the-wild data. (


    6. The Conflict Escalation Resolution (CONFER) Database is a collection of excerpts from audio–visual recordings of televised political debates where conflicts naturally arise, and as such, it is suitable for the investigation of conflict behaviour as well as other social attitudes and behaviours. The database contains 142 min of naturalistic, 'in- the-wild' conversations and is the first of its kind to have been annotated in terms of continuous (real-valued) conflict intensity on a frame-by-frame basis. The release of CONFER Database is accompanied by the first systematic study on continuous estimation of conflict intensity, where various audio and visual features and classifiers are examined (


  • openXBOW - the Passau Open-Source Multimodal Bag-of-Words Toolkit

    2. A toolkit called ‘openXBOW - the Passau Open-Source Multimodal Bag-of-Words Toolkit’ implemented in Java has now been made publicly available at the following link: It generates a bag-of-words representation from a sequence of numeric and/or textual features, e.g., acoustic LLDs, visual features and transcriptions of natural speech. The tool provides a multitude of options, e.g., different modes of vector quantisation, codebook generation, term frequency weighting and methods known from natural language processing.

  • The SEWA facial point tracker tool

    The SEWA facial point tracker detects the user’s face in video recordings on a frame by frame basis and accurately tracks a set of 49 facial landmarks, the location of which are later used as input features in various high level emotion recognition tasks (such as the recognition of facial action units, valence, arousal, liking / disliking, and so on). Our method achieves high tracking accuracy ‘in-the-wild’ by automatically constructing person-specific models through incremental updating of the generic model. Experimental evaluation on LFPW and Helen datasets shows our method outperforms state-of-the-art generic face alignment strategies. Tracking experiment using SEWA dataset also shows promising results. Our current implementation is highly efficient, being able to track 8 video streams in parallel at 50fps on test machine (CPU: Intel Core i7-5960X, memory: 32GB).

    Download File
  • The SEWA AU Detector tool

    The code for ‘The SEWA AU Detector tool’ has been released at the SEWA official website: .This application requires the 49 fiducial facial points extracted using the SEWA facial point tracker as input. These are passed through several blocks for data pre-processing, including normalization, alignment and dimensionality reduction. The output is the classification of each target frame in terms of target AUs being active/non-active.

    Download File
  • Discriminant Incoherent Component Analysis - DICA tool

    Face images convey rich information which can be perceived as a superposition of low-complexity components associated with attributes, such as facial identity, expressions and activation of facial action units (AUs). For instance, low-rank components characterizing neutral facial images are associated with identity, while sparse components capturing non-rigid deformations occurring in certain face regions reveal expressions and AU activations. The Discriminant Incoherent Component Analysis (DICA) is a novel robust component analysis method that extracts low-complexity components corresponding to facial attributes, which are mutually incoherent among different classes (e.g., identity, expression, AU activation) from training data, even in the presence of gross sparse errors.

    Download File
  • Interviewskillz - the video-chat interview skills training game

    #interviewskillz is a video-chat interview skills training game. One player takes on the role of an interviewer and the other the role of a candidate. The game provides young job seekers with a platform to develop the skills necessary to be more successful at job interviews.

    This is version 1 of the game where the social and emotional feedback is provided by the other player, in the subsequent version of the game the SEWA feature detection algorithms, will automatically provide insight and training to the players.

    Launch the game

  • Continuous-time Prediction of Dimensional Behavior/Affect

    This predictive framework developed for the SEWA project addresses continuous-time prediction of dimensional affect or behavior (e.g., valence/arousal, interest, conflict). The framework explicitly model temporal dynamics of spontaneous affective or social behavior displays based on Linear Time-Invariant (LTI) system learning and can be used to predict future values of affect or behavior based on past observations from the same sequence.

    Matlab code can be found here.