Yahoo! Finance is a storehouse having copious stock data which can be used to determine the pattern of the stock fluctuation.
Likewise the stock’s high and low values for a given time interval can also be determined for each month.
The file generated from Yahoo! Finance has below set of fields for a time interval (eg: APPL):
The hadoop job to process the above file would consist of:
2) Sort and Shuffle modules
Below is the description of each associated with the code:
In this job, the role of the mapper is to build a record containing key value pairs.
1) First the input would be read taking line number of the text as Key and contents of line as Value from CSV file
2) The mapper will read the contents of the file and create a new key value pair
3) The output will be key value pairs containing (‘month’, ’date;high;low’) where ‘date;high;low’ has Date and stock’s high and low value for the specific date
4) Both the values are collected in the mapper output.
Sort and Shuffle:
In MapReduce’s Sort and Shuffle, the data will be sorted based on the key value and then shuffled to respective reducers based on configuration.
The role of the reducer in this job is to:
As the list will be sorted, all the keys are grouped and considered first, then the value is split into two i.e., Date,High_Value and Low_Value. Then a loop is run to find the highest “High_Value” and “Low_Value” with its corresponding Date.
Then the output is collected from reducer and given to user with the highest and lowest stock value along with dates.
The code for the above MapReduce implementation can be found at my GitHub.