Skip to content
Longterm Wiki
Back

Authentication systems

paper

Authors

Huseyin Fuat Alsan·Taner Arsan

Credibility Rating

3/5
Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: arXiv

This arxiv preprint demonstrates curriculum learning applied to multimodal deep learning for post-disaster analytics, relevant to AI safety through its focus on improving model robustness and reliability in high-stakes emergency response scenarios.

Paper Details

Citations
0
1 influential
Year
2023

Metadata

arxiv preprintprimary source

Abstract

This paper explores post-disaster analytics using multimodal deep learning models trained with curriculum learning method. Studying post-disaster analytics is important as it plays a crucial role in mitigating the impact of disasters by providing timely and accurate insights into the extent of damage and the allocation of resources. We propose a curriculum learning strategy to enhance the performance of multimodal deep learning models. Curriculum learning emulates the progressive learning sequence in human education by training deep learning models on increasingly complex data. Our primary objective is to develop a curriculum-trained multimodal deep learning model, with a particular focus on visual question answering (VQA) capable of jointly processing image and text data, in conjunction with semantic segmentation for disaster analytics using the FloodNet\footnote{https://github.com/BinaLab/FloodNet-Challenge-EARTHVISION2021} dataset. To achieve this, U-Net model is used for semantic segmentation and image encoding. A custom built text classifier is used for visual question answering. Existing curriculum learning methods rely on manually defined difficulty functions. We introduce a novel curriculum learning approach termed Dynamic Task and Weight Prioritization (DATWEP), which leverages a gradient-based method to automatically decide task difficulty during curriculum learning training, thereby eliminating the need for explicit difficulty computation. The integration of DATWEP into our multimodal model shows improvement on VQA performance. Source code is available at https://github.com/fualsan/DATWEP.

Summary

This paper proposes a curriculum learning approach for post-disaster analytics using multimodal deep learning models that jointly process images and text. The authors introduce Dynamic Task and Weight Prioritization (DATWEP), a novel gradient-based curriculum learning method that automatically determines task difficulty during training without manual specification. The approach combines U-Net for semantic segmentation, image encoding, and a custom text classifier for visual question answering, evaluated on the FloodNet dataset for flood damage assessment.

Cited by 1 page

PageTypeQuality
AI Capability Threshold ModelAnalysis72.0

Cached Content Preview

HTTP 200Fetched Mar 20, 202698 KB
Dynamic Task and Weight Prioritization Curriculum Learning for Multimodal Imagery

Hüseyin Fuat Alsan a (huseyinfuat.alsan@stu.khas.edu.tr), Taner Arsana (arsan@khas.edu.tr)

a Computer Engineering Department, Kadir Has University, Istanbul, Turkey

Corresponding Author:

Hüseyin Fuat Alsan

Computer Engineering Department, Kadir Has University, Istanbul, Turkey

Tel: (+90) 0536 825 29 11

Email: huseyinfuat.alsan@stu.khas.edu.tr

###### Abstract

This paper explores post-disaster analytics using multimodal deep learning models trained with curriculum learning method. Studying post-disaster analytics is important as it plays a crucial role in mitigating the impact of disasters by providing timely and accurate insights into the extent of damage and the allocation of resources. We propose a curriculum learning strategy to enhance the performance of multimodal deep learning models. Curriculum learning emulates the progressive learning sequence in human education by training deep learning models on increasingly complex data. Our primary objective is to develop a curriculum-trained multimodal deep learning model, with a particular focus on visual question answering (VQA) capable of jointly processing image and text data, in conjunction with semantic segmentation for disaster analytics using the FloodNet dataset. To achieve this, U-Net model is used for semantic segmentation and image encoding. A custom built text classifier is used for visual question answering. Existing curriculum learning methods rely on manually defined difficulty functions. We introduce a novel curriculum learning approach termed Dynamic Task and Weight Prioritization (DATWEP), which leverages a gradient-based method to automatically decide task difficulty during curriculum learning training, thereby eliminating the need for explicit difficulty computation. The integration of DATWEP into our multimodal model shows improvement on VQA performance. Source code is available at https://github.com/fualsan/DATWEP.

###### keywords:

Deep Learning, Multimodal Deep Learning, Curriculum Learning, Semantic Segmentation, Visual Question Answering

## 1 Introduction

Multimodal deep learning is about using (and relating between) different data types and can include images, text, audio, time series data and even tabular data. It can be seen as a generalization method for deep learning at the data level. Relations between different types (modalities) can be automatically constructed via pattern recognition commonly employed by deep learning models. However, training with multimodal data can be challenging and often requires efficient training algorithms. Curriculum learning is considered in this work to increase multimodal performance. Instead of random shuffling the data samples, curriculum learning schedules training in a meaningful order, from the data samples that are easy for deep learning models learn to the data samples that are hard to learn. A difficulty measurement function is required

... (truncated, 98 KB total)
Resource ID: 6125e188a886af2d | Stable ID: NmNhYzliOG