Machine Learning Models for Scalable Metadata Management in Data Lakes
  • Author(s): Shishir Tewari
  • Paper ID: 1707468
  • Page: 425-441
  • Published Date: 31-03-2023
  • Published In: Iconic Research And Engineering Journals
  • Publisher: IRE Journals
  • e-ISSN: 2456-8880
  • Volume/Issue: Volume 6 Issue 9 March-2023
Abstract

The quick growth of big data triggered big data lakes to become the scalable storage choice for massive handled and raw data. The preservation of effective metadata management in data lakes remains a major challenge because of inconsistencies that affect metadata together with retrieval difficulties and scalability problems. Manual tagging methods along with rule-based approaches struggle to manage rising data volumes so they produce governance problems and make data discovery difficult. Machine learning provides an effective solution to these challenges through automated processes of metadata extraction as well as metadata classification and retrieval. Numerous machine learning models provide solutions to improve scalable metadata management of data lake configurations. Metadata tagging effectiveness stands to benefit from supervised learning whereas unsupervised learning demonstrates value for pattern detection in metadata. Deep learning models which implement NLP techniques help organizations improve semantic metadata processing for data classification and retrieval purposes. Data management benefits from reinforcement learning approaches which make continuous user interaction observations to refine search efficiency as a result. The evaluation process for machine learning in metadata management utilizes a case study analysis between conventional systems and smart learning systems. The evaluation shows that better metadata accuracy and faster retrieval as well as improved scalability now exists. Through this research organizations can learn how to employ artificial intelligence technology for smarter metadata system development that leads to improved data lake governance and accessibility together with better decision capabilities

Keywords

Scalable Metadata Management, Machine Learning for Data Lakes, Automated Metadata Tagging, Big Data Governance, Metadata Optimization in Large-Scale Systems

Citations

IRE Journals:
Shishir Tewari "Machine Learning Models for Scalable Metadata Management in Data Lakes" Iconic Research And Engineering Journals Volume 6 Issue 9 2023 Page 425-441

IEEE:
Shishir Tewari "Machine Learning Models for Scalable Metadata Management in Data Lakes" Iconic Research And Engineering Journals, 6(9)