Master thesis »Hierarchy-aware Classification Loss for Less Severe Errors« in Ilmenau bei Fraunhofer Gesellschaft

Fraunhofer-Gesellschaft

The Fraunhofer-Gesellschaft (www.fraunhofer.com) currently operates 76 institutes and research institutions throughout Germany and is the world’s leading applied research organization. Around 30 000 employees work with an annual research budget of 2.9 billion euros.

The Fraunhofer Institute for Digital Media Technology IDMT is part of the Fraunhofer-Gesellschaft. Headquartered in Ilmenau, Germany, the institute is internationally recognized for its expertise in applied electroacoustics and audio engineering, AI-based signal analysis and machine learning, and data privacy and security.

At the headquarters, on the campus of “Technische Universität Ilmenau” researchers work on technologies for robust, trustworthy AI-based analysis and classification of audio and video data. These are used, among other things, to monitor industrial production processes, but also in traffic monitoring or in the media context, for example when it comes to automatic metadata extraction and audio manipulation detection. Another focus is the development of algorithms for the areas of virtual product development, intelligent actuator-sensor systems and audio for the automotive sector.

There are currently around 70 employees working at Fraunhofer IDMT in Ilmenau.

What you will do

In recent years, deep learning has become more and more powerful for classification tasks in many domains, replacing traditional statistical methods as more data becomes available. In music information retrieval (MIR), these tasks include, for instance, instrument classification and genre detection, where deep learning methods such as Convolutional Neural Networks achieve state-of-the-art accuracy by a wide margin.

However, evaluation metrics such as accuracy do not necessarily tell the whole story. It has been shown that while these systems achieve high accuracy scores, their errors can be particularly nonsensical [1]. In fields such as Computer Vision, such errors can be catastrophic (imagine a self-driving car mistaking a pedestrian for a road marking), and thus, some research has been done with the goal of minimizing the severity of errors with regard to an altered loss function or a hierarchical treatment of class labels [2].

We therefore propose to build upon an existing model for either instrument detection or genre classification, and investigate ways to measure and reduce its error severity for its classification task.

Specifically, in this Master's Thesis, the following objectives should be accomplished:
(1) A literature review of existing methods for error severity reduction, and existing methods for error severity measurement, drawn especially from Computer Vision.
(2) A literature review of hierarchical taxonomies for musical instruments or genres, either from existing work (e.g., [3]), from music theoretic principles, or using an unsupervised approach (e.g., clustering).
(3) The re-implementation of a simple baseline model for the chosen task (genre classification or instrument detection).
(4) The implementation of a suitable metric for measuring error severity (from (1)).
(5) The implementation and evaluation of at least 2 strategies for minimizing error severity using the baseline model from (3) and the evaluation metric from (4), compared against the model's baseline performance. At least one method should involve an implemented hierarchical classification strategy.

Finally, the student should write a final thesis document.

References:
[1] Jeanneret, G., Pérez, J. C., & Arbeláez, P. (2021). A Hierarchical Assessment of Adversarial Severity. IEEE/CVF International Conference on Computer Vision Workshops, 61–70.
[2] Bertinetto, L., Mueller, R., Tertikas, K., Samangooei, S., & Lord, N. A. (2019). Making Better Mistakes: Leveraging Class Hierarchies with Deep Networks. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 12503–12512. https://doi.org/10.48550/arxiv.1912.09393
[3] Garcia, H. F., Aguilar, A., Manilow, E., & Pardo, B. (2021). Leveraging Hierarchical Structures for Few-Shot Musical Instrument Recognition. https://doi.org/10.48550/arxiv.2107.07029
[4] Bogdanov, D., Won M., Tovstogan P., Porter A., & Serra X. (2019). The MTG-Jamendo Dataset for Automatic Music Tagging. Machine Learning for Music Discovery Workshop, International Conference on Machine Learning (ICML 2019).

What you bring to the table

Prerequisites for this topic are a knowledge of machine learning and deep learning, as well as a passion for (and some knowledge of) music.

What you can expect

exciting market-related topics with complex issues to be solved – you can be actively involved in shaping the future
challenges at a high level – on top we offer you excellent opportun