Data Lakehouse: Next Generation Information System
DOI:
https://doi.org/10.56294/mw202467Keywords:
Big Data Management, Business Intelligence, Data Warehouse, Data Lake, Data LakehouseAbstract
This paper introduces the Data Lakehouse Architecture, a transformative model in data architecture that seamlessly integrates the analytical strengths of traditional data warehouses with the schema flexibility inherent in data lakes. Departing from current frameworks, this comprehensive approach establishes a unified platform, overcoming limitations of conventional data management. Addressing the critical need for an integrated solution, our primary objective is to set a new standard for sophisticated data management. The distinctiveness of our proposal lies in the seamless fusion of data warehouse analytics and data lake schema flexibility, underscoring its originality. The full article delves into the research methodology, providing a comprehensive understanding of the study's framework proposal. The foundational outcomes showcase the successful implementation of our Data Lakehouse Architecture, revealing enhanced processing capabilities for structured data analysis, complex querying, and high-performance reporting. The conclusion emphasizes the paradigm shift and transformative impact on data management practices, reinforcing the significance of our innovative solution. This research not only contributes a novel technological framework but also highlights the importance of adaptability and performance in the face of evolving data landscapes
References
1. Hurbean, L., Miliaru, F., Muntean, M., & Danaiata, D. (2023). The Impact of Business Intelligence and Analytics Adoption on Decision Making Effectiveness and Managerial Work Performance. Scientific Annals of Economics and Business, 70, 43‑54. https://doi.org/10.47743/saeb-2023-0012
2. Abu-AlSondos, I. (2023). The impact of business intelligence system (BIS) on quality of strategic decision-making. International Journal of Data and Network Science, 7, 1901‑1912. https://doi.org/10.5267/j.ijdns.2023.7.003
3. Harby, A., & Zulkernine, F. (2022). From Data Warehouse to Lakehouse : A Comparative Review. https://doi.org/10.1109/BigData55660.2022.10020719
4. Cherradi, M., Bouhafer, F., & EL Haddadi, A. (2023). Data lake governance using IBM-Watson knowledge catalog. Scientific African, 21, e01854. https://doi.org/10.1016/j.sciaf.2023.e01854
5. Sawadogo, P. N., & Darmont, J. (2023). DLBench+ : A benchmark for quantitative and qualitative data lake assessment. Data & Knowledge Engineering, 145(C). https://doi.org/10.1016/j.datak.2023.102154
6. Dang, D., & Vartiainen, T. (2022). Digital Strategy in Information Systems : A Literature Review and an Educational Solution Based on Problem-Based Learning. Journal of Information Systems Education, 33, 261‑282.
7. Errami, S., Hajji, H., Kenza, A. E. K., & Badir, H. (2022). Managing Spatial Big Data on the Data LakeHouse. In In book : Emerging Trends in Intelligent Systems & Network Security (p. 323‑331). https://doi.org/10.1007/978-3-031-15191-0_31
8. Orescanin, D., & Hlupic, T. (2021). Data Lakehouse—A Novel Step in Analytics Architecture. International Convention on Information, Communication and Electronic Technology (MIPRO), 1242‑1246. https://doi.org/10.23919/MIPRO52101.2021.9597091
9. Shiyal, B. (2021). Modern Data Warehouses and Data Lakehouses. In In book : Beginning Azure Synapse Analytics (p. 21‑48). https://doi.org/10.1007/978-1-4842-7061-5_2
10. Debauche, O., Mahmoudi, S., Manneback, P., & Lebeau, F. (2022). Cloud and distributed architectures for data management in agriculture 4.0 : Review and future trends. Journal of King Saud University - Computer and Information Sciences, 34(9), 7494‑7514. https://doi.org/10.1016/j.jksuci.2021.09.015
11. Zaharia, M., Ghodsi, A., Xin, R., & Armbrust, M. (2021). Lakehouse : A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics. Conference on Innovative Data Systems Research. https://www.semanticscholar.org/paper/Lakehouse%3A-A-New-Generation-of-Open-Platforms-that-Zaharia-Ghodsi/451cf5fc9786ed4f7e1d9877f08d00f8b1262121
12. Armbrust, M., Das, T., Sun, L., Yavuz, B., Zhu, S., Murthy, M., Torres, J., Hovell, H., Ionescu, A., Łuszczak, A., Świtakowski, M., Szafrański, M., Li, X., Ueshin, T., Mokhtar, M., Boncz, P., Ghodsi, A., Paranjpye, S., Senster, P., & Zaharia, M. (2020). Delta lake : High-performance ACID table storage over cloud object stores. Proceedings of the VLDB Endowment, 13, 3411‑3424. https://doi.org/10.14778/3415478.3415560
13. Schneider, J., Gröger, C., Lutsch, A., Schwarz, H., & Mitschang, B. (2023). Assessing the Lakehouse : Analysis, Requirements and Definition. https://doi.org/10.5220/0011840500003467
14. Hambardzumyan, S., Tuli, A., Ghukasyan, L., Rahman, F., Topchyan, H., Isayan, D., Harutyunyan, M., Hakobyan, T., Stranic, I., & Buniatyan, D. (2022). Deep Lake : A Lakehouse for Deep Learning. https://doi.org/10.48550/arXiv.2209.10785
15. Begoli, E., Goethert, I., & Knight, K. (2021). A Lakehouse Architecture for the Management and Analysis of Heterogeneous Data for Biomedical Research and Mega-biobanks. IEEE International Conference on Big Data (Big Data), 4643‑4651. https://doi.org/10.1109/BigData52589.2021.9671534
16. Mahmoudian, M., Zanjani, s. M., Shahinzadeh, H., Kabalci, Y., Kabalcı, E., & Ebrahimi, F. (2023). An Overview of Big Data Concepts, Methods, and Analytics : Challenges, Issues, and Opportunities. https://doi.org/10.1109/GPECOM58364.2023.10175760
17. Al-Sai, Z., Abdullah, R., & Husin, H. (2019). Big Data Impacts and Challenges : A Review. https://doi.org/10.1109/JEEIT.2019.8717484
18. Cherradi, M., & El Haddadi, A. (2022). Data Lakes : A Survey Paper. In In book : Innovations in Smart Cities Applications Volume 5 (p. 823‑835). https://doi.org/10.1007/978-3-030-94191-8_66
19. Park, S., Yang, C.-S., & Kim, J. (2023). Design of Vessel Data Lakehouse with Big Data and AI Analysis Technology for Vessel Monitoring System. Electronics, 12, 1943. https://doi.org/10.3390/electronics12081943
20. wong, B. (2023). Navigating the Data Architecture Landscape : A Comparative Analysis of Data Warehouse, Data Lake, Data Lakehouse, and Data Mesh. https://doi.org/10.20944/preprints202309.2113.v1
21. Kienzler, R., Blumenstiel, B., Nagy, Z., Mukkavilli, s. K., Schmude, J., Freitag, M., Behrendt, M., Salles Civitarese, D., & Hamann, H. (2023). TensorBank:Tensor Lakehouse for Foundation Model Training.
22. Ma, C., & Hu, X. (2023). A Data Analysis Privacy Regulation Compliance Scheme for Lakehouse. Proceedings of the 2023 2nd International Conference on Algorithms, Data Mining, and Information Technology, 1‑5. https://doi.org/10.1145/3625403.3625405
Published
Issue
Section
License
Copyright (c) 2024 Mohamed Cherradi , Anass El Haddadi (Author)
This work is licensed under a Creative Commons Attribution 4.0 International License.
The article is distributed under the Creative Commons Attribution 4.0 License. Unless otherwise stated, associated published material is distributed under the same licence.