Map and reduceApril 10, 2017
Map reduce is a nice algorithm for parallelizing work. Say I want to add the numbers from 1 to 100. (yes I know there is a formula for this). If each addition takes 1 unit of time, I could do this task on my own in a 100 units. But If I call 2 friends over and say
- Friend A: add the numbers from 1 to 50 and tell me the result.
- Friend B: add the numbers from 51 to 100 and tell me the result.
Since the 2 operations are mutually exclusive my friends could do this in parallel and tell me the results. Then I would just need to add the 2 results to get my answer. That would take
50 + 2 units of time, almost twice as faster as doing it on my own. If I have 4 friends it would take me
25 + 4 = 29 time units. But if I called over 100 friends each one would give me back to number I gave them and I would have to add 100 numbers after the map operation to reduce the result. So there is an optimal point for the number of mappers to take. This point is the square root of the number of operations. Although I would guess the optimal solution formula is intuitive to most people, it can be shown with simple calculus. If you write the number of steps required as the function of the number of parallel operations, and take the 1st derivative and find the solution for zero, this will be the optimal point.
The function can be written like this: for a number of operations X I need N people doing parallel work. After that I will need N steps to get to the final result. This is can be written as
f(n) = X/n + n
lets take the 1st derivative
f(x) = X*n^-1 + n
f'(x) = -X*n^2 + C
if we say that this is equal to 0, then we can see that the equation is a quadratic equation with the optimal value being the square root of some value.
This was just a random thought that popped up in my head today.