[1]
A. Oussous, F. Benjelloun, A. Lahcen, and S. Belfkih, "Big Data technologies: A survey," Journal of King Saud University - Computer and Information Sciences, vol. 30, no. 4, pp.431-448, 2018.
DOI: 10.1016/j.jksuci.2017.06.001
Google Scholar
[2]
M. Chen, S. Mao, and Y. Liu, "Big Data: A Survey," Mobile Networks and Applications, vol. 19, no. 2, pp.171-209, 2014.
Google Scholar
[3]
International Data Corporation (IDC), "Data Age 2025: The Digitization of the World," IDC White Paper, 2021.
Google Scholar
[4]
R. Cattell, "Scalable SQL and NoSQL data stores," ACM SIGMOD Record, vol. 39, no. 4, pp.12-27, 2010.
DOI: 10.1145/1978915.1978919
Google Scholar
[5]
S. Madden, "From databases to Big Data," IEEE Internet Computing, vol. 16, no. 3, pp.4-6, 2012.
Google Scholar
[6]
T. White, "Hadoop: The Definitive Guide," 4th ed., O'Reilly Media, 2015.
Google Scholar
[7]
K. Shvachko, H. Kuang, S. Radia, and R. Chansler, "The Hadoop Distributed File System," in Proc. IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), 2010, pp.1-10.
DOI: 10.1109/msst.2010.5496972
Google Scholar
[8]
V. Mayer-Schönberger and K. Cukier, "Big Data: A Revolution That Will Transform How We Live, Work, and Think," Houghton Mifflin Harcourt, 2013.
DOI: 10.3359/oz1314047
Google Scholar
[9]
J. Dean and S. Ghemawat, "MapReduce: Simplified Data Processing on Large Clusters," Communications of the ACM, vol. 51, no. 1, pp.107-113, 2008.
DOI: 10.1145/1327452.1327492
Google Scholar
[10]
S. Sagiroglu and D. Sinanc, "Big Data: A review," in Proc. International Conference on Collaboration Technologies and Systems (CTS), 2013, pp.42-47.
DOI: 10.1109/cts.2013.6567202
Google Scholar
[11]
M. Chen, S. Mao, and Y. Liu, "Big Data: A Survey," Mobile Networks and Applications, vol. 19, no. 2, pp.171-209, 2014.
Google Scholar
[12]
S. Ghemawat, H. Gobioff, and S.-T. Leung, "The Google File System," ACM SIGOPS Operating Systems Review, vol. 37, no. 5, pp.29-43, 2003.
DOI: 10.1145/1165389.945450
Google Scholar
[13]
J. Dean and S. Ghemawat, "MapReduce: Simplified Data Processing on Large Clusters," in Proc. OSDI, 2004.
Google Scholar
[14]
T. White, "Hadoop: The Definitive Guide," 4th ed., O'Reilly Media, 2015.
Google Scholar
[15]
Apache Software Foundation, "Apache Hadoop Releases," Apache Hadoop Documentation, 2021.
Google Scholar
[16]
L. Lamport, "Time, Clocks, and the Ordering of Events in a Distributed System," Communications of the ACM, vol. 21, no. 7, pp.558-565, 1978.
DOI: 10.1145/359545.359563
Google Scholar
[17]
M. Zaharia et al., "Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing," in Proc. NSDI, 2012.
Google Scholar
[18]
R. Buyya et al., "Cloud computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5th utility," Future Generation Computer Systems, vol. 25, no. 6, pp.599-616, 2009.
DOI: 10.1016/j.future.2008.12.001
Google Scholar
[19]
H. Karau and R. Warren, "High Performance Spark," O'Reilly Media, 2017.
Google Scholar
[20]
A. Singh and K. Reddy, "Hadoop Ecosystem: Architectural Solutions for Big Data Processing Challenges," in Big Data Processing Frameworks, vol. 1, 2024, pp.45-62.
Google Scholar
[21]
P. Raj and G. C. Deka, "A Deep Dive into NoSQL Databases: The Use Cases and Applications," Academic Press, 2018.
Google Scholar
[22]
R. Cattell, "Scalable SQL and NoSQL data stores," ACM SIGMOD Record, vol. 39, no. 4, pp.12-27, 2010.
DOI: 10.1145/1978915.1978919
Google Scholar
[23]
D. Singh and C. K. Reddy, "A survey on platforms for Big Data analytics," Journal of Big Data, vol. 2, no. 1, pp.1-20, 2015.
Google Scholar
[24]
K. Shvachko, H. Kuang, S. Radia, and R. Chansler, "The Hadoop Distributed File System," in Proc. IEEE 26th Symposium on Mass Storage Systems and Technologies, 2010, pp.1-10.
DOI: 10.1109/msst.2010.5496972
Google Scholar
[25]
D. Borthakur, "HDFS Architecture Guide," Apache Hadoop Documentation, 2020.
Google Scholar
[26]
V. K. Vavilapalli et al., "Apache Hadoop YARN: Yet Another Resource Negotiator," in Proc. 4th Annual Symposium on Cloud Computing, 2013.
DOI: 10.1145/2523616.2523633
Google Scholar
[27]
A. Thusoo et al., "Hive: A Warehousing Solution Over a Map-Reduce Framework," VLDB Endowment, vol. 2, no. 2, pp.1626-1629, 2009.
DOI: 10.14778/1687553.1687609
Google Scholar
[28]
J. Dean and S. Ghemawat, "MapReduce: Simplified Data Processing on Large Clusters," Communications of the ACM, vol. 51, no. 1, pp.107-113, 2008.
DOI: 10.1145/1327452.1327492
Google Scholar
[29]
T. White, "Hadoop: The Definitive Guide," 4th ed., O'Reilly Media, 2015.
Google Scholar
[30]
A. Thusoo et al., "Hive - A Petabyte Scale Data Warehouse Using Hadoop," in Proc. IEEE 26th International Conference on Data Engineering (ICDE), 2010.
DOI: 10.1109/icde.2010.5447738
Google Scholar
[31]
Y. Huai et al., "Major Technical Advancements in Apache Hive," in Proc. SIGMOD, 2014.
Google Scholar
[32]
J. Zhang, L. Wang, and H. Chen, "Performance Optimization of Hive Queries in Large-Scale Data Warehousing," IEEE Transactions on Big Data, vol. 8, no. 3, pp.345-362, Sep. 2023.
Google Scholar
[33]
M. Chen and K. Liu, "Comparative Analysis of Query Processing Techniques in Distributed Data Systems," in Proceedings of the IEEE International Conference on Big Data, San Francisco, CA, USA, Dec. 2022, pp.1205-1215.
Google Scholar
[34]
A. Gates et al., "Building a High-Level Dataflow System on top of Map-Reduce: The Pig Experience," VLDB Endowment, vol. 2, no. 2, pp.1414-1425, 2009.
DOI: 10.14778/1687553.1687568
Google Scholar
[35]
L. George, "HBase: The Definitive Guide," O'Reilly Media, 2011.
Google Scholar
[36]
M. Zaharia et al., "Apache Spark: A Unified Engine for Big Data Processing," Communications of the ACM, vol. 59, no. 11, pp.56-65, 2016.
Google Scholar
[37]
H. Karau and R. Warren, "High Performance Spark," O'Reilly Media, 2017.
Google Scholar
[38]
P. Hunt et al., "ZooKeeper: Wait-free Coordination for Internet-scale Systems," in Proc. USENIX Annual Technical Conference, 2010.
Google Scholar
[39]
M. Islam et al., "Oozie: Towards a Scalable Workflow Management System for Hadoop," in Proc. SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies, 2012.
DOI: 10.1145/2443416.2443420
Google Scholar
[40]
K. Shvachko et al., "The Hadoop Distributed File System," in Proc. MSST, 2010, pp.1-10.
Google Scholar
[41]
T. White, "Hadoop: The Definitive Guide," 4th ed., O'Reilly Media, 2015.
Google Scholar
[42]
D. Borthakur, "HDFS Architecture Guide," Technical Report, Apache Software Foundation, 2021.
Google Scholar
[43]
A. Pavlo et al., "A Comparison of Approaches to Large-Scale Data Analysis," in Proc. SIGMOD, 2009, pp.165-178.
Google Scholar
[44]
S. Huang et al., "The HiBench Benchmark Suite: Characterization of the MapReduce-Based Data Analysis," in Proc. ICDEW, 2010, pp.41-51.
DOI: 10.1109/icdew.2010.5452747
Google Scholar
[45]
J. Dean and S. Ghemawat, "MapReduce: Simplified Data Processing on Large Clusters," Communications of the ACM, vol. 51, no. 1, pp.107-113, 2008.
DOI: 10.1145/1327452.1327492
Google Scholar
[46]
V. K. Vavilapalli et al., "Apache Hadoop YARN: Yet Another Resource Negotiator," in Proc. SOCC, 2013.
Google Scholar
[47]
M. Zaharia et al., "The Datacenter Needs an Operating System," in Proc. HotCloud, 2011.
Google Scholar
[48]
B. Hindman et al., "Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center," in Proc. NSDI, 2011.
Google Scholar
[49]
R. Cattell, "Scalable SQL and NoSQL Data Stores," ACM SIGMOD Record, vol. 39, no. 4, pp.12-27, 2010.
DOI: 10.1145/1978915.1978919
Google Scholar
[50]
A. Thusoo et al., "Data Warehousing and Analytics Infrastructure at Facebook," in Proc. SIGMOD, 2010.
Google Scholar
[51]
O. O'Malley et al., "Hadoop Security Design," Technical Report, Yahoo, 2009.
Google Scholar