posted on 2020-08-01, 00:00authored byGuglielmo Menchetti
Facial expression is the natural mean used by humans to express their intentions and emo- tions. The main challenge in detecting facial expressions is given by the ambiguities between them.
The Facial Action Unit Coding System (FACS) helps to address the problem of ambiguities between expressions. The FACS consists of deconstructing each expression in a collection of facial muscle movements, also known as Facial Action Units (AU).
In this document, we focus on single-frame Action Unit recognition, in which we aim at detecting the AUs in an image containing a face.
In this work, we will examine the current state-of-the-art (SOTA) on Facial AU detection and we will introduce a novel Deep Learning architecture, that achieves competitive results compared to the current state-of-the-art on a popular benchmark.
Particularly, we will study the impact of Multi-Scale feature construction using Feature Pyramid Network, and the extraction of features related to specific regions of the face through the use of Region Of Interest (RoI) Pooling.
We will compare the obtained results with other state-of-the-art methods and we investigate the effect of the different components of the model enclosing various ablation studies.
In conclusion, we will suggest possible future improvements to the current architecture.