simple way: send data to master and reduce there; will not scale as well: O(n) complex way: send data to `node[my_index/2]`, reduce, repeat: O(log n).