Final project

The project consists of using Hadoop and R installations on an Oracle Linux 7.9 VM to plot the data from the wordcount routine in a word cloud.

Requirements

Hadoop: using the movie name column from the movie ratings file

%	Criteria
5	Use the movie name column and generate a `JSON` file with it
5	Save the names in a `TXT` file too
20	Save data to MySQL using Sqoop, and then import it back to Hadoop
20	Get 1500 docs. (tweets) using Flume
25	Run Hadoop’s wordcount routine for the 4 files (`TXT`, `JSON`, imported database and tweets)
-20	Penalty for getting Tweets using Windows

%	Criteria
15	Reading Hadoop output into R in Linux
10	Data cleaning
-30	Penalty for transferring Hadoop’s output to Windows to use R

<aside> 👉🏼 Big Data students!!!

If you run out of space in your VM during the project, do the following: </aside>