• openXBOW - the Passau Open-Source Multimodal Bag-of-Words Toolkit

    openXBOW generates a bag-of-words representation from a sequence of numeric and/or textual features, e.g., acoustic LLDs, visual features and transcriptions of natural speech. The tool provides a multitude of options, e.g., different modes of vector quantisation, codebook generation, term frequency weighting and methods known from natural language processing. openXBOW is implemented in Java.

    Please call "java -jar openXBOW.jar" for information on the input format and a list of all available options.

    View on GitHub

  • The SEWA facial point tracker tool

    The SEWA facial point tracker detects the user’s face in video recordings on a frame by frame basis and accurately tracks a set of 49 facial landmarks, the location of which are later used as input features in various high level emotion recognition tasks (such as the recognition of facial action units, valence, arousal, liking / disliking, and so on). Our method achieves high tracking accuracy ‘in-the-wild’ by automatically constructing person-specific models through incremental updating of the generic model. Experimental evaluation on LFPW and Helen datasets shows our method outperforms state-of-the-art generic face alignment strategies. Tracking experiment using SEWA dataset also shows promising results. Our current implementation is highly efficient, being able to track 8 video streams in parallel at 50fps on test machine (CPU: Intel Core i7-5960X, memory: 32GB).

    Download File
  • The SEWA AU Detector tool

    This application requires the 49 fiducial facial points extracted using the SEWA facial point tracker as input. These are passed through several blocks for data pre-processing, including normalization, alignment and dimensionality reduction. The output is the classification of each target frame in terms of target AUs being active/non-active. This is performed using CRF classifier [1] trained independently for detection of AU1, AU2, AU4, AU6 and AU12.
    *Akshay Asthana, Stefanos Zafeiriou, Shiyang Cheng and Maja Pantic. Incremental Face Alignment in the Wild. In CVPR 2014.

    The input to this detector is the path to the folder that contains the .txt files generated by the tracker (follow the link: to download the tracker). Each file corresponds to the tracking results for one of the frames from the target sequence. Within the files, there are three lines. The first line gives the face's pitch, yaw and roll (in this order), the second line gives the coordinates of iris points and the third line gives the coordinates (x1 y1 x2 y2 x3 y3...) of the 49 facial landmarks.
    The results are stored in a .cvs file. In this file, each line corresponds to one frame and gives the AU detections (0/1 for non-active/active) and prediction scores (see [1] for details) in the following order: Detection_AU1, Prediction_AU1, Detection_AU2, Prediction_AU2…
    Follow the instructions in the file ‘’ to use the detector.

    Code released as is *for research purposes only*

    Feel free to modify/distribute but please cite the papers:

    [1] "Variable-state Latent Conditional Random Field Models for Facial Expression Analysis", R. Walecki, O. Rudovic, V. Pavlovic, M. Pantic. IMAVIS. October 2016.

    [2] "Multi-output Laplacian Dynamic Ordinal Regression for Facial Expression Recognition and Intensity Estimation", O. Rudovic, V. Pavlovic, M. Pantic. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2012). Providence, USA, pp. 2634 - 2641, June 2012.

    Download File
  • Discriminant Incoherent Component Analysis - DICA tool

    Face images convey rich information which can be perceived as a superposition of low-complexity components associated with attributes, such as facial identity, expressions and activation of facial action units (AUs). For instance, low-rank components characterizing neutral facial images are associated with identity, while sparse components capturing non-rigid deformations occurring in certain face regions reveal expressions and AU activations. The Discriminant Incoherent Component Analysis (DICA) is a novel robust component analysis method that extracts low-complexity components corresponding to facial attributes, which are mutually incoherent among different classes (e.g., identity, expression, AU activation) from training data, even in the presence of gross sparse errors.

    Download File
  • Interviewskillz - the video-chat interview skills training game

    #interviewskillz is a video-chat interview skills training game. One player takes on the role of an interviewer and the other the role of a candidate. The game provides young job seekers with a platform to develop the skills necessary to be more successful at job interviews.

    This is version 1 of the game where the social and emotional feedback is provided by the other player, in the subsequent version of the game the SEWA feature detection algorithms, will automatically provide insight and training to the players.

    Launch the game

  • Continuous-time Prediction of Dimensional Behavior/Affect

    This predictive framework developed for the SEWA project addresses continuous-time prediction of dimensional affect or behavior (e.g., valence/arousal, interest, conflict). The framework explicitly model temporal dynamics of spontaneous affective or social behavior displays based on Linear Time-Invariant (LTI) system learning and can be used to predict future values of affect or behavior based on past observations from the same sequence.

    Matlab code can be found here.