-
Notifications
You must be signed in to change notification settings - Fork 237
/
Copy pathREADME-HowToIntroduceFunctors.txt
402 lines (299 loc) · 12.5 KB
/
README-HowToIntroduceFunctors.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
S. Chauveau
CAPS Entreprise
clBLAS Project
------------------------------
April 30,2014
This document describes the steps needed to introduce the Functor framework
for a clBLAS function currently implemented using the previous Solver mechanism.
The procedure is composed of the following steps:
(1) Declaration of a new base functor classes for the
considered clBLAS function.
(2) Create a new fallback class derived from the class created
in (1) and using the existing Solver implementation.
(3) Add the appropriate members to the clblasFunctorSolver
class
(4) Modify the clBLAS function to use the functor.
In the following, we will consider the case of the XSCAL functions.
Initial State
=============
The XSCAL functions are originally implemented in the file src/library/blas/xscal.c
Most of the Solver-based implementation occurs within the static function
doScal() that is shared by all SCAL functions. clblasSscal(), clblasDscal()
... are basically a single call to doScal()
clblasStatus doScal(...)
{
... // Do all the magic
}
clblasStatus
clblasSscal( size_t N,
float alpha,
cl_mem X,
size_t offx,
int incx,
cl_uint numCommandQueues,
cl_command_queue *commandQueues,
cl_uint numEventsInWaitList,
const cl_event *eventWaitList,
cl_event *events
)
{
CLBlasKargs kargs;
#ifdef DEBUG_SCAL
printf("\nSSCAL Called\n");
#endif
memset(&kargs, 0, sizeof(kargs));
kargs.dtype = TYPE_FLOAT;
kargs.alpha.argFloat = alpha;
return doScal(&kargs, N, X, offx, incx, numCommandQueues, commandQueues, numEventsInWaitList, eventWaitList, events);
}
clblasStatus clblasDscal(...)
...
clblasStatus clblasCscal(...)
...
clblasStatus clblasZscal(...)
...
...
Step 1: Declaration of new base functor classes
================================================
All the SCAL variants have identical arguments so it is reasonable to
use a templates to avoid rewriting similar classes again and again.
Using macros would also work. That is just a matter of personal taste.
For convenience, the base template class will provide an internal
structure type called Args that will be used to store the argument.
Using an Args type is not strictly needed but it simplifies a lot the
creation of the functor classes and of their future derived classes.
So create a new file src/library/blas/functor/include/functor_xscal.h
containing the base functor class. In that specific case we also have
to consider the case of clblasZdscal() and clblasCsscal(), which explains
why the template requires two types TX and Talpha. TX is the type of
the vector elements while Talpha is the type of the alpha argument.
template<typename TX, typename Talpha>
class clblasXscalFunctor : public clblasFunctor
{
public:
// Structure used to store all XSCAL arguments
struct Args
{
size_t N;
Talpha alpha;
cl_mem X;
size_t offx;
int incx;
cl_command_queue queue;
cl_uint numEventsInWaitList;
const cl_event * eventWaitList;
cl_event * events;
Args(size_t N,
Talpha alpha,
cl_mem X,
size_t offx,
int incx,
cl_command_queue queue,
cl_uint numEventsInWaitList,
const cl_event * eventWaitList,
cl_event * events)
: N(N),
alpha(alpha),
X(X),
offx(offx),
incx(incx),
queue(queue),
numEventsInWaitList(numEventsInWaitList),
eventWaitList(eventWaitList),
events(events)
{
}
};
virtual clblasStatus execute(Args & args) = 0;
};
Using this template class it is now possible to define the base functor
class corresponding to each SCAL function:
class clblasSscalFunctor: public clblasXscalFunctor<cl_float, cl_float>
{
};
//
// Base class for all functors providing a DSCAL implementation
//
class clblasDscalFunctor: public clblasXscalFunctor<cl_double, cl_double>
{
};
//
// Base class for all functors providing a CSCAL implementation
//
class clblasCscalFunctor: public clblasXscalFunctor<cl_float2, cl_float2>
{
};
//
// Base class for all functors providing a ZSCAL implementation
//
class clblasZscalFunctor: public clblasXscalFunctor<cl_double2, cl_double2>
{
};
//
// Base class for all functors providing a CSSCAL implementation
//
class clblasCsscalFunctor: public clblasXscalFunctor<cl_float2, cl_float>
{
};
//
// Base class for all functors providing a ZDSCAL implementation
//
class clblasZdscalFunctor: public clblasXscalFunctor<cl_double2, cl_double>
{
};
A shorter alternative could be to use 'typedef' instead but using class
offers the opportunity to extend the functor with specific features (i.e.
it is possible to add new members to a class but not to a typedef).
STEP 2: Create the new fallback classes
=======================================
In the following, we only consider the case of clblasSscal.
For each of the functor classes declared during STEP 1, we should now
declare the fallback functor class that will provide the Solver-based
implementation of the function.
We add the following src/library/blas/functor/include/functor_xscal.h
//
// Fallback functor for SSCAL : implement the sscal using the old solver mechanism
//
class clblasSscalFunctorFallback : public clblasSscalFunctor
{
public: // Inherited members from clblasFunctor
virtual void retain();
virtual void release();
public: // Inherited members from clblasSscalFunctor
virtual clblasStatus execute(Args & a);
public:
static clblasSscalFunctorFallback * provide ();
};
The file src/library/blas/xscal.c is then renamed into src/library/blas/functor/functor_xscal.cc
and modified as follow:
First, the clblasSscal() function is transformed into clblasSscalFunctorFallback::execute()
clblasStatus clblasSscalFunctorFallback::execute(Args & args)
{
CLBlasKargs kargs;
memset(&kargs, 0, sizeof(kargs));
kargs.dtype = TYPE_DOUBLE;
kargs.alpha.argDouble = args.alpha;
return doScal(&kargs,
args.N,
args.X,
args.offx,
args.incx,
1,
&args.queue,
args.numEventsInWaitList,
args.eventWaitList,
args.events);
}
Second, a single instance of clblasSscalFunctorFallback is created as a static variable
that will be returned by the clblasSscalFunctorFallback::provide() member.
static clblasSscalFunctorFallback dscal_fallback;
clblasSscalFunctorFallback * clblasSscalFunctorFallback::provide ()
{
static clblasSscalFunctorFallback dscal_fallback;
return & dscal_fallback;
}
Third, the retain() and release() members must be reimplemented to prevent the
destruction of the unique clblasSscalFunctorFallback instance.
void clblasSscalFunctorFallback::retain()
{
// clblasSscalFunctorFallback has a single global instance
// and shall never be freed
}
void clblasSscalFunctorFallback::release()
{
// clblasSscalFunctorFallback has a single global instance
// and shall never be freed
}
STEP 3: Add the appropriate members to the clblasFunctorSolver class
=======================================================================
The clblasFunctorSolver shall typically be extended with two new virtual
methods: one to select a specific functor and one to select a generic functor.
Edit the file src/library/blas/functor/include/functor_selector.h and add
the following members declarations to the class clblasFunctorSelector:
// Provide a XSCAL Functor usable in all cases
virtual clblasSscalFunctor * select_sscal_generic();
virtual clblasDscalFunctor * select_dscal_generic();
virtual clblasCscalFunctor * select_cscal_generic();
virtual clblasZscalFunctor * select_zscal_generic();
virtual clblasCsscalFunctor * select_csscal_generic();
virtual clblasZdscalFunctor * select_zdscal_generic();
// Provide XSCAL functors optimized for specific arguments
virtual clblasSscalFunctor * select_sscal_specific(clblasSscalFunctor::Args & args);
virtual clblasDscalFunctor * select_dscal_specific(clblasDscalFunctor::Args & args);
virtual clblasCscalFunctor * select_cscal_specific(clblasCscalFunctor::Args & args);
virtual clblasZscalFunctor * select_zscal_specific(clblasZscalFunctor::Args & args);
virtual clblasCsscalFunctor * select_csscal_specific(clblasCsscalFunctor::Args & args);
virtual clblasZdscalFunctor * select_zdscal_specific(clblasZdscalFunctor::Args & args);
The naming scheme used here is not mandatory but is recommended to keep the
whole infrastructure consistent.
Then, add their default implementation in src/library/blas/functor/functor_selector.cc.
clblasSscalFunctor *
clblasFunctorSelector::select_sscal_generic()
{
return clblasSscalFunctorFallback::provide();
}
clblasSscalFunctor *
clblasFunctorSelector::select_sscal_specific(clblasSscalFunctor::Args &)
{
return this->select_sscal_generic() ;
}
...
STEP 4: Modify the clBLAS function to use the functor
=====================================================
Create a file src/library/blas/xscal.cc to reimplement the clBLAS API functions.
First, copy the original functions skeletons from the now obsolete file src/library/blas/xscal.c
Then fill the skeleton to perform the following actions:
(A) Perform some consistency checks on the arguments
(B) Create and initialize a local Args object
(C) Obtain the clblasFunctorSelector corresponding
to the current device (via the queue)
(D) Ask that selector for a specific functor
(E) Execute the functor
(F) Release the functor
The code shall typically look like that
extern "C"
clblasStatus
clblasSscal(
size_t N,
float alpha,
cl_mem X,
size_t offx,
int incx,
cl_uint numCommandQueues,
cl_command_queue *commandQueues,
cl_uint numEventsInWaitList,
const cl_event *eventWaitList,
cl_event *events)
{
CHECK_VECTOR_X( X , N, offx, incx ) ;
CHECK_QUEUES( numCommandQueues, commandQueues ) ;
CHECK_WAITLIST( numEventsInWaitList, eventWaitList ) ;
if ( numCommandQueues>1 ) {
numCommandQueues = 1 ; // No support for multi-device (yet)
}
cl_command_queue queue = commandQueues[0];
clblasSscalFunctor::Args args(N,
alpha,H
X,
offx,
incx,
queue,
numEventsInWaitList,
eventWaitList,
events);
clblasFunctorSelector * fselector = clblasFunctorSelector::find(queue);
functor = fselector->select_sscal_specific(args);
clblasStatus res = functor->execute(args);
functor->release();
return res;
}
Reminder: this is a C++ file so the API functions shall be declared extern "C"
Remark: what is missing in that exemple is a proper verification of the arguments
(e.g. numCommandQueues shall be strictly positive. commandQueues[0] shall
be non-NULL, ...)
Conclusion
==========
After following all the steps above, the clBLAS APIs shall now use the Solver
based implementation via their respective fallback functor.
Other specialized functors can then be implemented and integrated in the
appropriate methods of the functor selector.