Brutal Facts of High-Performance Computing
It is expected that supercomputers will increase performance at a similar rate as they did in the past two decades – the floating point performance increases by a factor 1000 every 11 years and it is expected that we see the first computers with exaflop/s performance in 2018/19. However, every application that is to make effective use of these computer platforms will have to cope with the following brutal facts of high-performance computing:
- Massive concurrency - The largest supercomputers today have one or two petaflop/s peak performance with order 100,000 or more processor cores. The number of processor cores is two orders of magnitude higher compared to a decade ago, and will likely increase much faster in the coming decade, since the clock frequency of processors will stagnate or even decrease in order to limit power consumption. It is anticipated that exaflop/s computers a decade from now will have billions of threads.
- Less and “slower” memory per thread - Advances in memory sub-systems are much slower compared to the continued rapid developments in processor technologies. This has already resulted in high latencies compared to the instruction speed of the processors, and resulted in deep and complex memory hierarchies on today’s commodity processors. We might see even more levels of memory in the future, and we will certainly see a significant decrease in both the amount of memory per core (and/or thread) and the memory bandwidth per instruction/second and thread.
- Only slow improvements of inter-processor and inter-thread communication (ITC & ITP) - while interconnect bandwidth will continue to increase, these improvements will be slower than the increase in concurrency, resulting in a lower bandwidth per core or thread. Latencies will likely stagnate, and synchronization will not improve.
- Stagnant I/O sub-systems - Currently technology for long-term data storage is developing at an even slower pace than memory and processors, and latencies as well as bandwidth for input and output (I/O) will almost stagnate compared to the continued rapid increase in compute performance.
- Resilience and fault tolerance - The mean time to failure of any one of the exorbitantly many components in a modern supercomputer system can be short compared to the time to solution of a simulation. The simulation system, which includes the system software and the application codes have to be resilient toward failure of individual components. As processing power continues to increase, error detection and correction will become an issue as well, and simulation methods will have to become fault tolerant.
These facts are by no means new but they could be ignored by the broader community of computational scientists, since the continued increase in processor speed, which has been closely tracking Moore’s Law until a few years ago, guaranteed that with time simulation codes would run faster without the need to make significant investments into designing/exploring new algorithms and software.
In the coming decade, the speed of individual processor-cores will no longer track Moore’s Law, and application code developers who want to take advantage of continued increases in computer performance will have to somehow face these “brutal facts of HPC” and find the best ways to map computational methods onto emerging computing platforms.