In the past few years, Big Data has been a trend in the world of Information Technology. Alongside the growth of organizations, the bigger the data that is owned, some can reach the scale of Terabytes to Pentabytes. When an organization wants to do an analysis of a Big Data it takes time due to limited CPU and memory. To overcome this, there is a paradigm which is called distributed computing. There are many tools that aim to cultivate Big Data such as Apache Hadoop, Apache Spark, and etc. However, the tool that is going to be analyzed is the Apache Hadoop.The adoption rate of Big Data in Indonesia is 20% for 2 to 3 years into the future . Seeing these facts, more and more companies are planning to adopt Big Data to analyze their data. Due to the increasing use of Big Data and Apache Hadoop, it is conducted an exploratory analysis of data correlation on Apache Hadoop. In addition, testing was also done using the Apache Hadoop with varying number of nodes, mappers and reducers, and number of different block sizes .Exploratory analysis of data correlation on Apache Hadoop is done by making four types of data analysis applications, namely two applications correlation values for the data search Yahoo! Messenger and two applications for creating classification trees. Based on test results obtained that the R application is more suitable for smaller data size while Hadoop is more suitable for large data size. For large data, application R uses higher percentage of CPU and memory than Hadoop. The combination of mapper and reducer that will provide the most optimal execution time is for mapper in the range of 2/3 of the total CPU cores, while the reducer is 1/3 of the total CPU cores.