A Case against Tiny Tasks in Iterative Analytics



Big data systems such as Spark are built around the idea of splitting an iterative parallel program into tiny tasks with other aspects of system design built around this basic design principle. Unfortunately, in spite of immense engineering effort, tiny tasks have unavoidably large overheads. We use the example of logistic regression -- a common machine learning primitive -- to compare the performance of Spark to different designs that converge to a hand-coded parallel MPI-based implementation...