Machine Learning for Database Management and Query Optimization
DOI:
https://doi.org/10.61166/elm.v2i1.66Keywords:
Query Optimization, Machine learning, Artificial Intelligence, Database management, Database management methodsAbstract
In the present day, Traditional database management methods are becoming more inadequate for effective data processing as the volume of data created by systems grows. Machine learning approaches have shown promise in optimizing database queries and enhancing database administration functions such as query optimization, workload management, indexing, and data quality assurance to solve this problem. We investigate the different machine learning algorithms used for query optimization and database management in this comprehensive literature review. Our review shows that machine learning approaches such as Deep Learning (DL), Reinforcement learning (RL), supervised learning, natural language processing (NLP), and unsupervised learning, among others, may be employed for query analysis, execution, and assessment. It is feasible to increase query performance and react to changing conditions by introducing machine learning techniques into database management systems.
Downloads
References
Abiodun, O. I., Alawida, M., Omolara, A. E., & Alabdulatif, A. (2022). Data provenance for cloud forensic investigations, security, challenges, solutions and future perspectives: A survey. Journal of King Saud University-Computer and Information Sciences, 34(10), 10217–10245.
Adi, E., Anwar, A., Baig, Z., & Zeadally, S. (2020). Machine learning and data analytics for the IoT. Neural Computing and Applications, 32, 16205–16233.
Al-amri, R., Murugesan, R. K., Man, M., Abdulateef, A. F., Al-Sharafi, M. A., & Alkahtani, A. A. (2021). A review of machine learning and deep learning techniques for anomaly detection in IoT data. Applied Sciences, 11(12), 5320.
Alzahrani, A., Alyas, T., Alissa, K., Abbas, Q., Alsaawy, Y., & Tabassum, N. (2022). Hybrid approach for improving the performance of data reliability in cloud storage management. Sensors, 22(16), 5966.
Bai, Z., & Zhuo, R. (2020). Quality Management of Crowd Sensing Data Based on Machine Learning. 2020 International Conference on Computer Information and Big Data Applications (CIBDA), 185–188.
Blazek, P. J., & Lin, M. M. (2021). Explainable neural networks that simulate reasoning. Nature Computational Science, 1(9), 607–618.
Boehm, M., Kumar, A., & Yang, J. (2022). Data management in machine learning systems. Springer Nature.
Cai, Q., Cui, C., Xiong, Y., Wang, W., Xie, Z., & Zhang, M. (2022). A survey on deep reinforcement learning for data processing and analytics. IEEE Transactions on Knowledge and Data Engineering, 35(5), 4446–4465.
Casals, D., Buil-Aranda, C., & Valle, C. (2023). SPARQL query execution time prediction using Deep Learning.
Diène, B., Rodrigues, J. J. P. C., Diallo, O., Ndoye, E. L. H. M., & Korotaev, V. V. (2020). Data management techniques for Internet of Things. Mechanical Systems and Signal Processing, 138, 106564.
Fan, W., & Geerts, F. (2022). Foundations of data quality management. Springer Nature.
Farias, V. A. E., Sousa, F. R. C., Maia, J. G. R., Gomes, J. P. P., & Machado, J. C. (2016). Machine learning approach for cloud nosql databases performance modeling. 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), 617–620.
Gao, J., Wang, H., & Shen, H. (2020). Machine learning based workload prediction in cloud computing. 2020 29th International Conference on Computer Communications and Networks (ICCCN), 1–9.
Garg, M., & Goel, A. (2022). A systematic literature review on online assessment security: Current challenges and integrity strategies. Computers & Security, 113, 102544.
Grzegorowski, M., Zdravevski, E., Janusz, A., Lameski, P., Apanowicz, C., & Ślęzak, D. (2021). Cost optimization for big data workloads based on dynamic scheduling and cluster-size tuning. Big Data Research, 25, 100203.
Karakurt, İ., Özer, S., Ulusinan, T., & Ganiz, M. C. (2017). A machine learning approach to database failure prediction. 2017 International Conference on Computer Science and Engineering (UBMK), 1030–1035.
Karvelas, A., Foufoulas, Y., Simitsis, A., & Ioannidis, Y. E. (2023). Toulouse: Learning Join Order Optimization Policies for Rule-based Data Engines. EDBT/ICDT Workshops.
Kossmann, J., Papenbrock, T., & Naumann, F. (2022). Data dependencies for query optimization: a survey. The VLDB Journal, 31(1), 1–22.
Kreuzberger, D., Kühl, N., & Hirschl, S. (2023). Machine learning operations (mlops): Overview, definition, and architecture. IEEE Access, 11, 31866–31879.
Krishnan, S., Yang, Z., Goldberg, K., Hellerstein, J., & Stoica, I. (2018). Learning to optimize join queries with deep reinforcement learning. ArXiv Preprint ArXiv:1808.03196.
Kufel, J., Bargieł-Łączek, K., Kocot, S., Koźlik, M., Bartnikowska, W., Janik, M., Czogalik, Ł., Dudek, P., Magiera, M., & Lis, A. (2023). What is machine learning, artificial neural networks and deep learning?—Examples of practical applications in medicine. Diagnostics, 13(15), 2582.
Li, G., Zhou, X., & Cao, L. (2021). Machine learning for databases. Proceedings of the First International Conference on AI-ML Systems, 1–2.
Li, G., Zhou, X., Li, S., & Gao, B. (2019). Qtune: A query-aware database tuning system with deep reinforcement learning. Proceedings of the VLDB Endowment, 12(12), 2118–2130.
Lin, D., Crabtree, J., Dillo, I., Downs, R. R., Edmunds, R., Giaretta, D., De Giusti, M., L’Hours, H., Hugo, W., & Jenkyns, R. (2020). The TRUST Principles for digital repositories. Scientific Data, 7(1), 1–5.
Liu, C., Feng, Y., Lin, D., Wu, L., & Guo, M. (2020). Iot based laundry services: an application of big data analytics, intelligent logistics management, and machine learning techniques. International Journal of Production Research, 58(17), 5113–5131.
Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., & Yang, Y. (2024). Self-refine: Iterative refinement with self-feedback. Advances in Neural Information Processing Systems, 36.
Marcus, R., & Papaemmanouil, O. (2016). Workload management for cloud databases via machine learning. 2016 IEEE 32nd International Conference on Data Engineering Workshops (ICDEW), 27–30.
Marcus, R., & Papaemmanouil, O. (2018). Towards a hands-free query optimizer through deep learning. ArXiv Preprint ArXiv:1809.10212.
Marion, T. J., & Fixson, S. K. (2021). The transformation of the innovation process: How digital tools are changing work, collaboration, and organizations in new product development. Journal of Product Innovation Management, 38(1), 192–215.
Matheus, R., Janssen, M., & Maheshwari, D. (2020). Data science empowering the public: Data-driven dashboards for transparent and accountable decision-making in smart cities. Government Information Quarterly, 37(3), 101284.
Matošević, G., Dobša, J., & Mladenić, D. (2021). Using machine learning for web page classification in search engine optimization. Future Internet, 13(1), 9.
McGilvray, D. (2021). Executing data quality projects: Ten steps to quality data and trusted information (TM). Academic Press.
Milicevic, M., Baranovic, M., & Zubrinic, K. (2015). Application of machine learning algorithms for the query performance prediction. Advances in Electrical and Computer Engineering, 15(3), 33–44.
Naeem, M., Rizvi, S. T. H., & Coronato, A. (2020). A gentle introduction to reinforcement learning and its application in different fields. IEEE Access, 8, 209320–209344.
Nasir, M. H., Arshad, J., Khan, M. M., Fatima, M., Salah, K., & Jayaraman, R. (2022). Scalable blockchains—A systematic review. Future Generation Computer Systems, 126, 136–162.
Nassif, A. B., Talib, M. A., Nasir, Q., & Dakalbab, F. M. (2021). Machine learning for anomaly detection: A systematic review. Ieee Access, 9, 78658–78700.
Peres, F., & Castelli, M. (2021). Combinatorial optimization problems and metaheuristics: Review, challenges, design, and development. Applied Sciences, 11(14), 6449.
Perron, M., Castro Fernandez, R., DeWitt, D., & Madden, S. (2020). Starling: A scalable query engine on cloud functions. Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, 131–141.
Rahman, M. M., Islam, S., Kamruzzaman, M., & Joy, Z. H. (2024). Advanced Query Optimization In Sql Databases For Real-Time Big Data Analytics. Academic Journal on Business Administration, Innovation & Sustainability, 4(3), 1–14.
Ramadan, M., El-Kilany, A., Mokhtar, H. M. O., & Sobh, I. (2022). RL_QOptimizer: A Reinforcement Learning Based Query Optimizer. IEEE Access, 10, 70502–70515.
Roh, Y., Heo, G., & Whang, S. E. (2019). A survey on data collection for machine learning: a big data-ai integration perspective. IEEE Transactions on Knowledge and Data Engineering, 33(4), 1328–1347.
Rudniy, A. (2022). Data Warehouse Design for Big Data in Academia. Computers, Materials & Continua, 71(1).
Sarker, I. H. (2021). Data science and analytics: an overview from data-driven smart computing, decision-making and applications perspective. SN Computer Science, 2(5), 377.
Sarker, I. H. (2023). Machine learning for intelligent data analysis and automation in cybersecurity: current and future prospects. Annals of Data Science, 10(6), 1473–1498.
Senbekov, M., Saliev, T., Bukeyeva, Z., Almabayeva, A., Zhanaliyeva, M., Aitenova, N., Toishibekov, Y., & Fakhradiyev, I. (2020). The recent progress and applications of digital technologies in healthcare: a review. International Journal of Telemedicine and Applications, 2020(1), 8830200.
Sharma, V. (2021). Deep Learning Data and Indexes in a Database. Utah State University.
Stergiou, C. L., Psannis, K. E., & Gupta, B. B. (2020). IoT-based big data secure management in the fog over a 6G wireless network. IEEE Internet of Things Journal, 8(7), 5164–5171.
Taye, M. M. (2023). Understanding of machine learning with deep learning: architectures, workflow, applications and future directions. Computers, 12(5), 91.
Upadhyay, N. (2020). Demystifying blockchain: A critical analysis of challenges, applications and opportunities. International Journal of Information Management, 54, 102120.
Uzzaman, A., Jim, M. M. I., Nishat, N., & Nahar, J. (2024). Optimizing SQL databases for big data workloads: techniques and best practices. Academic Journal on Business Administration, Innovation & Sustainability, 4(3), 15–29.
Wang, C., Gruenwald, L., d’Orazio, L., & Leal, E. (2021). Cloud Query Processing with Reinforcement Learning-Based Multi-objective Re-optimization. Model and Data Engineering: 10th International Conference, MEDI 2021, Tallinn, Estonia, June 21–23, 2021, Proceedings 10, 141–155.
Whang, S. E., Roh, Y., Song, H., & Lee, J.-G. (2023). Data collection and quality challenges in deep learning: A data-centric ai perspective. The VLDB Journal, 32(4), 791–813.
Yang, L., Yang, L., Pang, Y., & Zou, L. (2022). gCBO: A Cost-based Optimizer for Graph Databases. Proceedings of the 31st ACM International Conference on Information & Knowledge Management, 5054–5058.
Zohuri, B., & Rahmani, F. M. (2023). Artificial intelligence driven resiliency with machine learning and deep learning components. Japan Journal of Research, 1(1).
Zou, B., You, J., Wang, Q., Wen, X., & Jia, L. (2022). Survey on learnable databases: A machine learning perspective. Big Data Research, 27, 100304.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 M.M.F. Fahima, A.H. Sahna Sreen, S.L. Fathima Ruksana, D.T.E. Weihena, M.H.M. Majid
This work is licensed under a Creative Commons Attribution 4.0 International License.