Building an AMD Deep Learning Machine (part 3)

An update on the state of AMD-based deep learning

Since I wrote those last two posts (part 1 and part 2), there have been some changes to the landscape of AMD deep learning and deep learning in general.

Those blog posts were originally written in mid 2018. Since then, dynamic computation graphs with imperative language design in neural network frameworks has become ubiquitous (and in my opinion a must have). If you're familiar with toolkits like tensorflow version 1.* or caffe and want to learn more about how this paradigm changes the code you write, I encourage you to checkout the guidelines for tensorflow 2.0 which show comparisons of static, declarative APIs and dynamic imperative APIs.

Deprecation of HCC

AMD has deprecated hcc. Hcc was not discussed in my previous posts but it is essentially the equivalent of nvcc for AMD GPUs: it acts as a compiler to enable creation of deep learning toolkits. To put it simply, this seems really bad. How can deep learning on AMD GPUs continue if the compiler that frameworks depend upon are no longer being developed?

In actuality, this may not be that bad. HIP is a compiler that AMD also developed, which allows for compilation to Nvidia GPUs (translating hip calls to cuda and then calling nvcc) and AMD GPUs (via hcc). AMD's official statement indicates that they are pushing for HIP development instead of HCC.

AMD is deprecating HCC to put more focus on HIP development and on other languages supporting heterogeneous compute. We will no longer develop any new feature in HCC and we will stop maintaining HCC after its final release, which is planned for June 2019. If your application was developed with the hc C++ API, we would encourage you to transition it to other languages supported by AMD, such as HIP or OpenCL. HIP and hc language share the same compiler technology, so many hc kernel language features (including inline assembly) are also available through the HIP compilation path.

By pushing for HIP over HCC, AMD is encouraging cross platform GPU computing: code written with HIP is compatible with both AMD GPUs and Nvidia GPUs. HCC was specific to AMD GPUs.

It's also unclear how much this actually matters: all the deep learning frameworks that were adapted to use AMD GPUs were already using HIP over HCC.

Deep learning toolkit updates

I'm now going to talk about some updates to the state of various deep learning toolkits using AMD GPUs.

Tensorflow

Tensorflow is still the best supported solution for ROCm on AMD GPUs). In fact, ROCm has been upstreamed to mainline Tensorflow. This is great for addressing concerns about the long term maintenance of AMD deep learning solutions. The community is currently supporting builds of stable and nightly versions of tensorflow in the CI build system.

AMD is supported for tensorflow 2.0 which means that dynamic computation graphs and imperative styling are supported. For example, instead of running:

outputs = session.run(f(placeholder), feed_dict={placeholder: input})

You would run

outputs = f(input)

This allows more fine-grained control over the network design and greatly improves readability.

Personally, I have had great success running tensorflow 2.0 on an RX580. Around August 2019, running tensorflow either meant running a lagging version or compiling from source (I personally do not like using bazel so compiling from source was not enjoyable in this instance). However, now that there are community builds, you can simply install the prebuilt wheel files. Installation is incredibly smooth.

Pytorch

Pytorch support is not nearly as good as tensorflow support. Though it is possible to use pytorch, using a docker container seems to be the only supported way to do so. This github issue discusses how to build pytorch for ROOCm outside of a container with the latest versions of ROCm (3.0) and pytorch (1.3), however, there do seem to be some bugs specific to these particular versions. An approach that is encouraged is to build the wheel files inside a container and then install them on the host system. I have not tried this personally. However, I intend to investigate this soon on my RX580 system.

A note on pytorch documentation

I was initially, very confused as the documentation on AMD's own ROCm website appears to be very out of date (requesting a ROCm version of 2.1 when 3.0 is the current release).

However, the wiki for the github repo provides a ton of useful information, including build instructions, unit test status and more.

I will hopefully have more information about usability and performance once I get a chance to sink my teeth into pytorch on ROCm. I will keep you posted.

Concluding comments

While AMD support for pytorch is still a bit behind, support in tensorflow is exceptionally good. The process has been streamlined significantly and In a follow up blog, I will talk about alternative methods of running deep learning on AMD GPUs including non-python implementations like PlaidML's tile language and julia frameworks. I'll also touch on SYCL: a cross GPU standard for computation developed by the Khronos group (the creators of Vulkan, OpenCL and OpenGL).