Changes to handle caching implementation by taknevski · Pull Request #25 · benoitsteiner/tensorflow-xsmm

taknevski · 2017-03-31T23:09:26Z

Regulate access to libxsmm_handle map via mutex
Remove intermediate delete of libxsmm handles, delete at end

Change: 145363673

* Fixed libxsmm_config_arguments: Fixed the incorrect value supposed to trigger auto-prefetch. Fixed the 0-threshold, which is now accounted for in LIBXSMM (by just populating the default threshold). The problem arised from the assumption "threshold: fallback to BLAS if n*m*k above this", which is wrong (the threshold populates an upper bound until which JIT code is generated). The previous configuration perhaps caused all sorts of issues due to other values derived from the 0-threshold. Note, explicitly JIT'ting code is/was never subject to a threshold. * Upgraded to libxsmm 1.6.5 * Enable the use of libxsmm for matrix multiplications * Enable the use of libxsmm to speedup 1x1 convolutions (which are computed using matrix multiplications) * Fixed libxsmm_config_arguments in libxsmm.BUILD (benoitsteiner#7) * Fixed libxsmm_config_arguments: Fixed the incorrect value supposed to trigger auto-prefetch. Fixed the 0-threshold, which is now accounted for in LIBXSMM (by just populating the default threshold). The problem arised from the assumption "threshold: fallback to BLAS if n*m*k above this", which is wrong (the threshold populates an upper bound until which JIT code is generated). The previous configuration perhaps caused all sorts of issues due to other values derived from the 0-threshold. Note, explicitly JIT'ting code is/was never subject to a threshold. * Make use of TensorFlow's allocation infrastructure even when using LIBXSMM allocation functions. In particular, the (cached) libxsmm_spmdm_init now relies on TF's cpu_allocator(). For C++ code, one can use a libxsmm_scoped_allocator<kind> in order to (temporarily) setup a different allocation mechanism. For instance, using libxsmm_tf_allocator<libxsmm_scratch_allocator> changes LIBXSMM's scratch allocator to rely on TensorFlow. The libxsmm_tf_allocator provides two kinds of c'tors: (1) the no-argument variant adopts TF's cpu_allocator(), whereas the one-argument form (2) adopts the allocator from the given OpKernelContext. Changing the allocator in LIBXSMM with pending buffers (from different allocators) is valid, and all other services in LIBXSMM's "malloc domain" work regardless of the allocation mechanism (e.g., libxsmm_malloc_size). * Simply renamed API items in order to follow changes in LIBXSMM 1.7. This is incomplete as more changes/adjustments are needed. * Account for removed non-check API. * Include libxsmm_malloc.h now that libxsmm_tf_allocator is used. * Renamed libxsmm_dnn_create_conv_handle to libxsmm_dnn_create_conv_layer. * Renamed LIBXSMM_DNN_CONV_FORMAT_* to LIBXSMM_DNN_TENSOR_FORMAT_*. * Renamed libxsmm_dnn_destroy_conv_handle to libxsmm_dnn_destroy_conv_layer. * Include missing header file (libxsmm_malloc.h). * Renamed LIBXSMM_DNN_CONV_KIND_* to LIBXSMM_DNN_COMPUTE_KIND_*. * Account for the fact that datatype_in/out is now only datatype (libxsmm_dnn_conv_desc structure). * Updated to new libxsmm_dnn_link_* functions. * Updated to use new libxsmm_dnn_bind_* functions. * Fixed calling libxsmm_dnn_transpose_filter. * Updates in preparation of LIBXSMM 1.7 (benoitsteiner#8) * Fixed libxsmm_config_arguments: Fixed the incorrect value supposed to trigger auto-prefetch. Fixed the 0-threshold, which is now accounted for in LIBXSMM (by just populating the default threshold). The problem arised from the assumption "threshold: fallback to BLAS if n*m*k above this", which is wrong (the threshold populates an upper bound until which JIT code is generated). The previous configuration perhaps caused all sorts of issues due to other values derived from the 0-threshold. Note, explicitly JIT'ting code is/was never subject to a threshold. * Upgraded to libxsmm 1.6.5 * Enable the use of libxsmm for matrix multiplications * Enable the use of libxsmm to speedup 1x1 convolutions (which are computed using matrix multiplications) * Make use of TensorFlow's allocation infrastructure even when using LIBXSMM allocation functions. In particular, the (cached) libxsmm_spmdm_init now relies on TF's cpu_allocator(). For C++ code, one can use a libxsmm_scoped_allocator<kind> in order to (temporarily) setup a different allocation mechanism. For instance, using libxsmm_tf_allocator<libxsmm_scratch_allocator> changes LIBXSMM's scratch allocator to rely on TensorFlow. The libxsmm_tf_allocator provides two kinds of c'tors: (1) the no-argument variant adopts TF's cpu_allocator(), whereas the one-argument form (2) adopts the allocator from the given OpKernelContext. Changing the allocator in LIBXSMM with pending buffers (from different allocators) is valid, and all other services in LIBXSMM's "malloc domain" work regardless of the allocation mechanism (e.g., libxsmm_malloc_size). * Simply renamed API items in order to follow changes in LIBXSMM 1.7. This is incomplete as more changes/adjustments are needed. * Account for removed non-check API. * Include libxsmm_malloc.h now that libxsmm_tf_allocator is used. * Renamed libxsmm_dnn_create_conv_handle to libxsmm_dnn_create_conv_layer. * Renamed LIBXSMM_DNN_CONV_FORMAT_* to LIBXSMM_DNN_TENSOR_FORMAT_*. * Renamed libxsmm_dnn_destroy_conv_handle to libxsmm_dnn_destroy_conv_layer. * Include missing header file (libxsmm_malloc.h). * Renamed LIBXSMM_DNN_CONV_KIND_* to LIBXSMM_DNN_COMPUTE_KIND_*. * Account for the fact that datatype_in/out is now only datatype (libxsmm_dnn_conv_desc structure). * Updated to new libxsmm_dnn_link_* functions. * Updated to use new libxsmm_dnn_bind_* functions. * Fixed calling libxsmm_dnn_transpose_filter. * integrated LIBXSMM 1.7 * support for LIBXSMM 1.7 (benoitsteiner#9) * Upgraded to libxsmm 1.6.5 * Enable the use of libxsmm for matrix multiplications * Enable the use of libxsmm to speedup 1x1 convolutions (which are computed using matrix multiplications) * integrated LIBXSMM 1.7 * updated LIBXSMM to 1.7.1 * updated to LIBXSMM 1.7.1 (benoitsteiner#10) * Upgraded to libxsmm 1.6.5 * Enable the use of libxsmm for matrix multiplications * Enable the use of libxsmm to speedup 1x1 convolutions (which are computed using matrix multiplications) * integrated LIBXSMM 1.7 * updated LIBXSMM to 1.7.1 * merge alheinecke master (benoitsteiner#11) * Upgraded to libxsmm 1.6.5 * Enable the use of libxsmm for matrix multiplications * Enable the use of libxsmm to speedup 1x1 convolutions (which are computed using matrix multiplications) * integrated LIBXSMM 1.7 * updated LIBXSMM to 1.7.1 * Take new translation units into account (LIBXSMM 1.8). * Account for adjusted header dependency in LIBXSMM (TODO: API to avoid incl. header from LIBXSMM's src). * Trigger rebuild if template changed (LIBXSMM).

…d of 'Y' (a.k.a. "Yes"). This fixes the issue unveiled by some of the regression tests (Thank you for reporting!). (benoitsteiner#15)

* improved LIBXSMM integration * added some timers to xsmm_conv2d * added detailed timing report for LIBXSMM convolutions

* improved LIBXSMM integration * added some timers to xsmm_conv2d * added detailed timing report for LIBXSMM convolutions * update

* weight update * bug fix

caisq and others added 15 commits March 30, 2017 10:24

Merge changes from github.

73db701

Change: 145363673

added backward handle caching and hash modification (benoitsteiner#12)

9860fe1

Upgraded to the latest version of libxsmm to fix build errors

318520e

Fixed calling libxsmm_spmdm_* by supplying 'T' for "transpose" instea…

e059269

…d of 'Y' (a.k.a. "Yes"). This fixes the issue unveiled by some of the regression tests (Thank you for reporting!). (benoitsteiner#15)

improved LIBXSMM integration (benoitsteiner#16)

d9c25cd

update for detailed timing of LIBXSMM (benoitsteiner#18)

57783b7

* improved LIBXSMM integration * added some timers to xsmm_conv2d * added detailed timing report for LIBXSMM convolutions

update for previous merge (benoitsteiner#19)

b73658f

* improved LIBXSMM integration * added some timers to xsmm_conv2d * added detailed timing report for LIBXSMM convolutions * update

FIxed merge issues

3c2742a

added initial padding support for backward (benoitsteiner#21)

c2dea53

weight update (benoitsteiner#23)

4f5ff50

Bug fix involving weight update (benoitsteiner#24)

654263e

* weight update * bug fix

Upgraded to libxsmm 1.8

f57b47c

adding python3 support

3047716

libxsmm cahnges for caching handles

0ccfe7f

benoitsteiner force-pushed the master branch from f57b47c to a6008d5 Compare April 3, 2017 15:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changes to handle caching implementation#25

Changes to handle caching implementation#25
taknevski wants to merge 15 commits intobenoitsteiner:masterfrom
taknevski:master

taknevski commented Mar 31, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

taknevski commented Mar 31, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants