The project consists of using Hadoop and R installations on an Oracle Linux 7.9 VM to plot the data from the wordcount
routine in a word cloud.
Hadoop: using the movie name column from the movie ratings file
% | Criteria |
---|---|
5 | Use the movie name column and generate a JSON file with it |
5 | Save the names in a TXT file too |
20 | Save data to MySQL using Sqoop, and then import it back to Hadoop |
20 | Get 1500 docs. (tweets) using Flume |
25 | Run Hadoop’s wordcount routine for the 4 files (TXT , JSON , imported database and tweets) |
-20 | Penalty for getting Tweets using Windows |
Data visualization with WordCloud:
% | Criteria |
---|---|
15 | Reading Hadoop output into R in Linux |
10 | Data cleaning |
-30 | Penalty for transferring Hadoop’s output to Windows to use R |
<aside> 👉🏼 Big Data students!!!
wordcount
in Hadoop