Authentication systems
paperAuthors
Credibility Rating
Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.
Rating inherited from publication venue: arXiv
This arxiv preprint demonstrates curriculum learning applied to multimodal deep learning for post-disaster analytics, relevant to AI safety through its focus on improving model robustness and reliability in high-stakes emergency response scenarios.
Paper Details
Metadata
Abstract
This paper explores post-disaster analytics using multimodal deep learning models trained with curriculum learning method. Studying post-disaster analytics is important as it plays a crucial role in mitigating the impact of disasters by providing timely and accurate insights into the extent of damage and the allocation of resources. We propose a curriculum learning strategy to enhance the performance of multimodal deep learning models. Curriculum learning emulates the progressive learning sequence in human education by training deep learning models on increasingly complex data. Our primary objective is to develop a curriculum-trained multimodal deep learning model, with a particular focus on visual question answering (VQA) capable of jointly processing image and text data, in conjunction with semantic segmentation for disaster analytics using the FloodNet\footnote{https://github.com/BinaLab/FloodNet-Challenge-EARTHVISION2021} dataset. To achieve this, U-Net model is used for semantic segmentation and image encoding. A custom built text classifier is used for visual question answering. Existing curriculum learning methods rely on manually defined difficulty functions. We introduce a novel curriculum learning approach termed Dynamic Task and Weight Prioritization (DATWEP), which leverages a gradient-based method to automatically decide task difficulty during curriculum learning training, thereby eliminating the need for explicit difficulty computation. The integration of DATWEP into our multimodal model shows improvement on VQA performance. Source code is available at https://github.com/fualsan/DATWEP.
Summary
This paper proposes a curriculum learning approach for post-disaster analytics using multimodal deep learning models that jointly process images and text. The authors introduce Dynamic Task and Weight Prioritization (DATWEP), a novel gradient-based curriculum learning method that automatically determines task difficulty during training without manual specification. The approach combines U-Net for semantic segmentation, image encoding, and a custom text classifier for visual question answering, evaluated on the FloodNet dataset for flood damage assessment.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| AI Capability Threshold Model | Analysis | 72.0 |
Cached Content Preview
Dynamic Task and Weight Prioritization Curriculum Learning for Multimodal Imagery
Hüseyin Fuat Alsan a (huseyinfuat.alsan@stu.khas.edu.tr), Taner Arsana (arsan@khas.edu.tr)
a Computer Engineering Department, Kadir Has University, Istanbul, Turkey
Corresponding Author:
Hüseyin Fuat Alsan
Computer Engineering Department, Kadir Has University, Istanbul, Turkey
Tel: (+90) 0536 825 29 11
Email: huseyinfuat.alsan@stu.khas.edu.tr
###### Abstract
This paper explores post-disaster analytics using multimodal deep learning models trained with curriculum learning method. Studying post-disaster analytics is important as it plays a crucial role in mitigating the impact of disasters by providing timely and accurate insights into the extent of damage and the allocation of resources. We propose a curriculum learning strategy to enhance the performance of multimodal deep learning models. Curriculum learning emulates the progressive learning sequence in human education by training deep learning models on increasingly complex data. Our primary objective is to develop a curriculum-trained multimodal deep learning model, with a particular focus on visual question answering (VQA) capable of jointly processing image and text data, in conjunction with semantic segmentation for disaster analytics using the FloodNet dataset. To achieve this, U-Net model is used for semantic segmentation and image encoding. A custom built text classifier is used for visual question answering. Existing curriculum learning methods rely on manually defined difficulty functions. We introduce a novel curriculum learning approach termed Dynamic Task and Weight Prioritization (DATWEP), which leverages a gradient-based method to automatically decide task difficulty during curriculum learning training, thereby eliminating the need for explicit difficulty computation. The integration of DATWEP into our multimodal model shows improvement on VQA performance. Source code is available at https://github.com/fualsan/DATWEP.
###### keywords:
Deep Learning, Multimodal Deep Learning, Curriculum Learning, Semantic Segmentation, Visual Question Answering
## 1 Introduction
Multimodal deep learning is about using (and relating between) different data types and can include images, text, audio, time series data and even tabular data. It can be seen as a generalization method for deep learning at the data level. Relations between different types (modalities) can be automatically constructed via pattern recognition commonly employed by deep learning models. However, training with multimodal data can be challenging and often requires efficient training algorithms. Curriculum learning is considered in this work to increase multimodal performance. Instead of random shuffling the data samples, curriculum learning schedules training in a meaningful order, from the data samples that are easy for deep learning models learn to the data samples that are hard to learn. A difficulty measurement function is required
... (truncated, 98 KB total)6125e188a886af2d | Stable ID: NmNhYzliOG