Comparative Analysis of Persistence Storage Levels in Spark with Case Study
  • Author(s): Thet Hsu Aung ; Aye Myat Myat Paing
  • Paper ID: 1705959
  • Page: 304-313
  • Published Date: 20-06-2024
  • Published In: Iconic Research And Engineering Journals
  • Publisher: IRE Journals
  • e-ISSN: 2456-8880
  • Volume/Issue: Volume 7 Issue 12 June-2024
Abstract

This study conducts a comparative analysis of the training times for Long Short-Term Memory (LSTM) networks on Apache Spark, evaluating three different persistence storage levels: Disk-Only, Memory-Disk, and Memory-Only. The analysis is performed with and without a proposed sampling algorithm designed to address the issue of imbalanced datasets. The study specifically focuses on the Credit Card Fraud Detection dataset across varying dataset sizes. The results indicate that the Memory-Only storage level achieves the shortest training times. Considering this case study, the amount of the dataset influences the storage level selection. Therefore, this dataset indicates that memory_only is the best. Memory_disk_only is the second-best option, if memory becomes insufficient due to the growing dataset. Therefore, the choice of storage level affects performance, especially in memory usage and computation speed. Furthermore, the application of the sampling algorithm significantly enhances model performance metrics, including precision, recall, and F1-score, particularly in scenarios involving imbalanced data. These findings provide crucial insights for improving LSTM training on large-scale imbalanced datasets, highlighting the importance of selecting appropriate storage configurations and preprocessing techniques in big data environments.

Keywords

Long Short-Term Memory, Disk-Only, Memory-Only, Memory-Disk

Citations

IRE Journals:
Thet Hsu Aung , Aye Myat Myat Paing "Comparative Analysis of Persistence Storage Levels in Spark with Case Study" Iconic Research And Engineering Journals Volume 7 Issue 12 2024 Page 304-313

IEEE:
Thet Hsu Aung , Aye Myat Myat Paing "Comparative Analysis of Persistence Storage Levels in Spark with Case Study" Iconic Research And Engineering Journals, 7(12)