Fast signed distance generation for embedded boundaries in CFD
Alo Roosing, Nikos Nikiforakis

Dual pair lists with dynamic pruning: an algorithm for efficient parallel pair interaction calculation
Szilárd Páll, Berk Hess

A bottleneck in strong scaling molecular dynamics simulations lies in the costs of decomposition and pair search for calculating pair interaction. Reducing these costs traditionally requires increased interaction buffers and an interaction kernel work-efficiency tradeoff. To avoid these we developed an algorithm based on a dual pair list setup: and “outer” list built infrequently with a longer interaction buffer and an “inner” list with a short buffer, built re-pruning of the former. The key to efficiency is that pruning reusese the optimized data layout of the outer list resulting in fast SIMD/GPU kernels. Therefore, decomposition and pair search can be done at 100-200 steps intervals while the interaction buffer remains short and ensures low overhead in the pair-interaction kernels. These algorithms are key to extreme scale molecular dynamics and are implemented in the GROMACS 2018 release.

Anton Shterenlikht

CGPACK ( is a generic HPC cellular automata (CA) library. In this work it was applied to 3D Ising magnetisation calculations. Scaling of two halo exchange (HX) methods (Fortran coarrays, MPI) and of three CA loop routines (triple nested loop, do concurrent, OpenMP) was measured on ARCHER up to full machine capacity, 4544 nodes (109,056 cores). Ising energy was calculated with MPI_ALLREDUCE and Fortran 2018 CO_SUM collectives. Using fully populated nodes and no threading gave the highest performance, probably because the Ising model is perfectly load balanced. MPI HX scaled better than coarray HX, which is surprising because both algorithms use pair-wise “handshake” (IRECV/ISEND/WAITALL vs SYNC IMAGES).

Wee ARCHIE: A suitcase sized supercomputer
Lorna Smith, Alistair Grant, Gordon Gibb, Nick Brown

It is important that the general public is aware of the importance of supercomputing; namely how it benefits their day to day lives. It is also helpful to explain the challenges facing supercomputing as we move towards exascale computers.

Wee Archie is a suitcase-sized supercomputer, designed to let school children learn about the benefits and challenges of supercomputing. The system has been created using Raspberry Pis and is designed to be representative of the system design in massively parallel architectures. Each Pi has an LED display that provides a visual display that helps demonstrate how multiple processors work in parallel to solve complex tasks.

The Centre of Excellence in Simulation of Weather and Climate in Europe
Philipp Neumann, Grenville Lister, Bryan Lawrence, Joachim Biercamp

The Centre of Excellence in Simulation of Weather and Climate in Europe (ESiWACE) fosters the integration of the weather and climate communities by leveraging two established European networks: the European Network for Earth System Modelling representing the European Climate modelling community, and the European Centre for Medium-Range Weather Forecasts. A main goal of ESiWACE is the preparation of weather and climate models for the exascale era; in this scope, ESiWACE establishes demonstrator simulations, which run at highest affordable resolutions (target 1km) on current PetaFLOP supercomputers. This will yield insights into the computability of configurations that will be sufficient to address key scientific challenges in weather and climate prediction at exascale, such as reducing uncertainties in climate model parametrization or the prediction of extreme climate events.

The poster introduces the ESiWACE objectives and gives an overview of ESiWACE developments, including scalability and performance results achieved for the high-resolution demonstrators.

CP2K-UK: Highly scalable atomistic simulation for all!
Iain Bethune, Gordon Gibb, Lev Kantorovich, Matt Watkins, Sergey Chulkov and Ben Slater

CP2K is a scalable, flexible and increasingly popular program for atomistic simulation that scales can take advantage of massively parallel HPC up to 1000s of nodes and processor architectures including GPUs, multicore CPUs and Intel Xeon Phi KNL. We highlight some recent developments to the code which support large-scale electronic device modelling, setup and analysis of complex material structures, and time-dependent Density Functional Theory.

A Communication Algorithm for the Patch-based Overlapping Grids Applications
Hong Guo, Aiqing Zhang

Climate model computational performance: towards a new set of metrics
Harry Shepherd, Jean-Christope Rioual

Ad hoc high performance technique for radiation resistance problems
Boris Chetverushkin, Vladimir Gasilov, Mikhail Markov, Mikhail Zhukovskiy, Olga Olkhovskaya, and
Roman Uskov

Enhancing Deep Learning towards Exascale with the DEEP-EST Modular Supercomputer Architecture
Ernir Erlingsson, Gabriele Cavallaro, Morris Riedel, Helmut Neukirchen

Optimization of the boundary element method for massively parallel architecture
Michal Merta, Jan Zapletal, and Michal Kravcenko

Supervised modelling for chemogenomics</s 
Marek Pecha, Jakub Kruzík, Vladimir Chupakhin, Václav Hapla, David Hork, Martin Cermák

Optimization of MHD Models and Algorithms for the Exascale Computing Systems
Boris Chetverushkin, Andrey Saveliev, and Valeri Saveliev

An SIMD-Parallel 3D Fast Fourier Transform Library
Xingjiang Yu and Stefano Markidis

Extending HPC I/O Interfaces to Support Object Storage
Steven Wei Der Chien, Stefano Markidis, Jun Zhang, Erwin Laure and Sai Narasimhamurthy