User stories

From HiHat Wiki
Jump to navigation Jump to search

The purpose of this page is to gather requirements, in the form of user stories, for HHAT.

A link to get back up to the parent page is here.


A user story captures the essence of the requirement without over-constraining the implementation. It's tied to a role and a benefit. User stories take the following form:

As a <role>, I want <function>, so that <benefit>, subject to the following acceptance criteria <list>.

Please take this approach

  • Start each user story with a new subsection, using two equals signs and a space on either side of a title
  • Follow the template shown below

User story template (please leave as an example)

As a <role>, I want <function>, so that <benefit>, subject to the following acceptance criteria

  • item 1
  • item 2
  • item 3
  • item 4

<Your Name> <Project(s) that this would benefit> <Link to other related collateral>

Provisioning constraints

List features of the deployed solution that will make or break its usability for you. Examples:

  • Library or compiler
  • C ABI that supports layering of C++, Fortran, Python on top
  • Does not have to own main
  • Composable and interoperable, e.g. with OpenMP, Kokkos, MPI, HDF5
  • Can target heterogeneous systems, e.g. Xeon, Xeon Phi, GPUs, FPGAs, TPUs, ARM

Please fill out user stories for these and others below.

Library-based solution

As an ISV, I want a C-ABI library-based solution, so that I don't have to rebuild all of my code because of a new compiler every time I want a new feature, subject to the following acceptance criteria

  • static or dynamic linking of library, invoked through an API
  • ok to update my compiler only every 2-3 years
  • C ABI, so that I can build other language interfaces on top, like C++, Fortran or Python

Entered by CJ on behalf of many ISVs, including Siemens, MSC, Simulia, Convergent Science

Language Bindings Details


For maximum portability, C offers the most portable ABI that can easily serve other compiled languages as well as interpreters of scripting languages. The only way to break the C ABI is to use more exotic types such as extended precision numbers so we should avoid it.


C++ bindings may get complicated: object structure for one hardware may not make sense on another platform.

MPI deprecated C++ bindings for some time now. What's left are bindings provided by Boost which might be an overkill for some applications.


Fortran 2003 seems to be OK with calling C directly. But the actual Fortran module with binding specification. Some C data types might be prohibited.

Lessons learned from the MPI standard: decide whether we use pointers or integer handles. Also, asynchronous interfaces break Fortran's rule of local variables not being changed after a function returns as in (unless the ASYNCHRONOUS attribute is used):

// 'var' will be changed here
// MPI does not change to 'var' any longer

Functional requirements

Please provide detailed user stories for items like the following:

  • Selection among implementations of a given task can be made based on kinds of execution resources, e.g. Xeon, Xeon Phi, GPUs, FPGAs, TPUs, ARM
  • Selection among implementations of a given task can be made based on layouts of its data operands
  • Whether global variables are permitted
  • Whether task functions are pure, i.e. all inputs are available at the start and they run to completion, or whether they can block when they reach an internal point of execution until additional conditions (e.g. data or control dependences) are met
  • Whether tasks need to be able to spawn child tasks
  • Whether task dependences only need to be enforcable among siblings (currently restriction for OpenMP tasks), or also at different levels of a hierarchy

Application-level control of task priorities

As an application developer and tuner, I have an overall view of what tasks are on the critical path and how to order them appropriately. Priorities are for optimization only, not correctness.

Jim Phillips, NAMD/Charm++

Iteration-linked task priorities

As an application developer and tuner, for applications that operate iteratively and avoid barriers between iterations, tasks for step i should always be higher priority than step i+1, at least between synchronization points or for large ranges of steps. One approach is a circular priority model where given a priority range of e.g., 0 to N-1, i is higher priority than j iff (j - i) mod N < N/2.

Jim Phillips, NAMD/Charm++

Retargetable for different kinds of execution resources and data layouts

As a tuner, I want to be able to provide a different tuned implementation for each of several kinds of execution resources, so that my code can be retargetable and portable, subject to the following acceptance criteria

  • Invocation of a function can be made with a name which is implementation agnostic
  • Someone in a tuning role or a runtime can choose the best implementation, based on feasibility, earliest completion time, etc.
  • Implementations for new kinds of execution resources and new data layouts for operands can be added over time
  • Operands can be specified in a way that describes the set of elements, but that does not necessarily overconstrain the layout

CJ Newburn ECP and CAAR apps, perhaps QMCPACK

Performance constraints

Please provide detailed user stories for items like the following:

  • Scalability
    • Avoid serialization of startup on multiple nodes
    • Avoid or minimize centralized data structures
  • Overheads
    • Minimize indirection, e.g. with lookups
    • Enable costs to be shifted to startup rather than per-usage