11 OpenCL Support

TCE has experimental support for running both OpenCL C kernels and the host runtime in a same statically linked program. This can be used to benefit from parallelizable kernels written in OpenCL C without requiring a host processor with an OpenCL C compiler.

The OpenCL support uses the Portable OpenCL (pocl) project. In order to add OpenCL support to tcecc, you should download and install the latest version of pocl. At the end of the configure run, the script mentions whether the OpenCL support was enabled or not. After building TCE You can check if the OpenCL C support is enabled via ``tcecc --supported-languages'' switch.

It must be emphasized that the OpenCL support is a work in progress and does not provide full support for the standard. The missing APIs are implemented ``as needed''.

The ``statically compiled OpenCL C'' support works by relying on the reqd_work_group_size kernel attributes. It uses the work group dimensions defined with the attributes to statically parallelize multiple work items (WI) in a single work group (WG) to produce highly parallel code for the compiler instruction scheduler.

For example: the dot_product example must be written as follows to produce four parallel WIs per WG:

/* dot.cl */
 __attribute__((reqd_work_group_size(4, 1, 1)))
kernel void
dot_product (global const float4 *a,
             global const float4 *b, 
             global float *c) {
    int gid = get_global_id(0);

    c[gid] = dot(a[gid], b[gid]);    
}

Currently the host API must know about the WG dimensions and use the same ones when launching the kernel, otherwise undefined behavior occurs. For the other parts, the OpenCL host runtime API can be used similarly as in the ``regular mode`` where the kernels are (or can be) compiled and linked at runtime. This enables easier porting of OpenCL programs to the TCE standalone OpenCL mode. In the standalone mode, the compiler invokation APIs of the OpenCL host runtime are implemented as dummy functions to produce source code level compatibility with most of the OpenCL programs.

For example, the host program for invoking the dot_product code can be written identically as it was done in the original OpenCL example. The host program can invoke the compiler etc. to make the program run in the regular OpenCL environments while in TCE standalone mode the kernel is compiled offline, linked to the main program, and the compiler calls are no-operations.

The command line to compile both the host code and the kernel to a single program is as follows:

 tcecc -a mytta.adf -O3 host.c dot.cl -o dot_product -loclhost-sa

The resulting program can be simulated and executed like any other TTA program produced by tcecc.

The oclhost-sa library provides the more-or-less dummy implementations of the host runtime APIs to enable launching the kernel and moving the buffers etc.

Pekka Jääskeläinen 2018-03-12