Difference between revisions of "Cacheoblivious hierarchical recursive divideandconquer usage model"
(Created page with "The cacheoblivious recursive divideandconquer model is a standard way of dividing a large task/problem recursively into smaller subproblems (tasks) until each of those sub...") 
(No difference)

Latest revision as of 22:23, 23 June 2017
The cacheoblivious recursive divideandconquer model is a standard way of dividing a large task/problem recursively into smaller subproblems (tasks) until each of those subproblems (subtasks) becomes small enough to solve them efficiently using a single thread (i.e., sequentially). The generated subtasks form a directed acyclic graph (DAC) among them and can be represented using forkjoin (spawnsync) primitives. Many practical problems such as sorting, matrix multiplication, many dynamic programming problems, stencil computations, linear algebra kernels can be solved using a cacheoblivious recursive divideandconquer technique. The word cacheoblivious here means an algorithm that does not use cache parameters in the algorithm (unlike a tiled algorithm which chooses a tile size to fill the cache wisely). It has been shown that for a class of problems cacheoblivious recursive divideandconquer technique leads to highly (parallel, faster, cache, bandwidth, and energy) efficient solutions compared to stateofthe art methods (IPDPS 2015, PPoPP 2015, 2016, 2017). Furthermore, the resulting algorithms are also selfadaptive and robust to dynamic fluctuations in shared system resources.
Randomized workstealing schedulers (e.g., available in Cilk and TBB) are perfect for scheduling these algorithms on modern multicores. However, problem comes when one wants to use hierarchical hybrid architectures (e.g., CPUs + coprocessors), since workstealing in such environment is costly. Having a general purpose hierarchical heterogeneous tasking model that can do schedule such algorithms on hybrid distributedshared memory systems will be highly beneficial.