Main Article Content
Big Data is increasingly used on almost the entire planet, both online and offline. It is not related only to computers. It makes a new trend in the decision-making process and the analysis of this data will predict the results based on the explored knowledge of big data using Clustering algorithms. The response time of performance and speed presents an important challenge to classify this monstrous data. K-means and big k-mean algorithms solve this problem. In this paper, researcher find the best K value using the elbow method, then use two ways in the first sequential processing and the second is parallel processing, then apply the K-mean algorithm and the big K-mean on shared memory to make a comparative study find which one is the best in different data sizes. The analysis performed by R studio environment.
Oussous A, et al. Big data technologies: A survey. Journal of King Saud University-Computer and Information Sciences. 2018; 30(4):431-448.
Lodha R, Jan H, Kurup L. Big data challenges: Data analysis perspective. Int J Current Eng Technol. 2014;4(5):3286-3289.
Rehioui H, et al. DENCLUE-IM: A new approach for big data clustering. Procedia Computer Science. 2016;83:560-567.
IDC. Big Data Statistics. [cited 2019 19/10/2019]; Big Data Statistics; 2019. Available:https://techjury.net/stats-about/big-data-statistics/
Kurasova O, et al. Strategies for big data clustering. in 2014 IEEE 26th international conference on tools with artificial intelligence. IEEE; 2014.
Rand WM. Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 1971;66(336):846-850.
Pal K, et al. Relational mountain (density) clustering method and web log analysis. International Journal of Intelligent Systems. 2005;20(3):375-392.
Yuan C, Yang H. Research on K-value selection method of K-means clustering algorithm. J. 2019;2(2):226-235.
Wagstaff K, et al. Constrained k-means clustering with background knowledge. in Icml; 2001.
Kanungo T, et al. An efficient k-means clustering algorithm: Analysis and implementation. IEEE Transactions on Pattern Analysis & Machine Intelligence. 2002;(7):881-892.
Jain AK, Murty MN, Flynn PJ. Data clustering: A review. ACM computing surveys (CSUR). 1999;31(3):264-323.
Han J, Pei J, Kamber M. Data mining: Concepts and techniques Elsevier; 2011.
Maia R, et al. Pavo: An R package for the analysis, visualization and organization of spectral data. Methods in Ecology and Evolution. 2013;4(10):906-913.
Kane MJ, Emerson JW, Haverty P. Bigmemory: Manage massive matrices with shared memory and memory- mapped files. R Package Version. 2010; 4(3).
Chicago Traffic Tracker-Historical Congestion Estimates by Segment - 2018-Current; 2018.