Objectives

Research agenda:

  1. Architecture and programming – dataflow scheduling with conventional programming – yielding architectures that are conservative in their use of power, have good tolerance to high-latency operations and are programmed in sequential, data-parallel or functional languages (properties of determinism, deadlock freedom and locality);
  2. Hierarchy and scalability - distributed and dynamic resource allocation – yielding distributed lightweight operating systems and a solution to the dataflow curse (properties of controlled non-determinism, generality and self-adaptation);
  3. Disruption for stabilisation – from sequential to parallel – rebuilding from the foundations in necessary but requires a new infrastructure of tools (properties of scalability, binary compatibility and target-neutral programming).

Challenges

The Apple-CORE project intends to make multi-core computing mainstream and to usher in an era where many-core chips are the PCs of the future – by many think thousands to millions of cores per chip! The application of the project’s SVP programming model is much broader than this, as is its implementation as a DRISC core implementation. However, it is this goal of achieving general-purpose concurrent computing systems that gives the greatest challenges.

These challenges include:

  • The design for a chip architecture, where even on chip the memory has the characteristics of a distributed system, i.e. asynchronous access with latencies in excess of 1000s of processor cycles.
  • The selection and development of a suitable tool chain to allow the concurrent system to be deterministically and correctly programmed.
  • The management of power and the use of processing resources, where the data-driven model and dynamic scheduling of instructions support dynamic and adaptive distribution of computation as it unfolds.
  • Finally there are issues of binary-code compatibility that have constrained this segment of the market. In Apple-CORE legacy binary code must be supported but more importantly, once compiled with the new tool chain, the new binaries must execute on an arbitrary number of cores (up to some limit defined by the code’s scalability).
  • These are non-trivial challenges as the architecture and programming model are disruptive. They require new compilers, new operating systems foundations and of course new processor architectures. The Apple-CORE project is developing all of the above.

Strategy

Apple-CORE will develop compilers, operating systems and execution platforms to support and evaluate a novel architecture paradigm that can exploit many-core computer systems to the end of silicon. It differs from current approaches by adopting a systematic model of concurrency implemented as instructions in the processors’ ISA (developed in the EU FP6 AETHER project). This approach has enormous potential but is disruptive. The paradigm shift effectively requires a new infrastructure of tools as the model executes OS kernel functionality as processor instructions. The benefits are large, however, as compilers need only capture concurrency in a virtual way rather than capturing, mapping and scheduling it. This separates the concerns of programming and concurrency engineering and opens the door for successful parallelising compilers. Mapping and scheduling is performed dynamically by implementations of the concurrency control instructions. Particular benefits can be expected for data-parallel and functional programming languages as they expose their concurrency in a way that can be easily captured by a compiler.

Another advantage of this approach is the binary compatibility the new processor has with the modified ISA. Moreover, once code is compiled with the new tools, binary code is executable on an arbitrary numbers of processors and hence provides future binary-code compatibility as well as enabling dynamic resource mapping to binary programs from a pool of processors. The concurrency controls also allow for management of partial failure, which together with the binary-code compatibility provide the necessary support for reliable systems. Finally, this approach exposes information about the work to be executed on each processor and how much can be executed at any given time. This information can provide powerful mechanisms for the management of power by load balancing processors based on clock/frequency scaling. The objective of developing this infrastructure is to evaluate the model and provide opportunities to exploit the results of this research in a variety of markets, including embedded and commodity processors, and also high-performance applications. In particular, the binary compatibility provides a unique opportunity to make an impact on commodity processors in Europe.