(This is an extension of this post)
I had to build the Meep FDTD simulator from MIT for my thesis project, even though it is available in Ubuntu from the “stock” repositories. One of the most common uses of the FDTD method is the one-shot computation of the broad-band spectrum of a response (variable) of the structure being simulated, by means of the harmonic inversion of the transient simulation (basically, a Fourier Transform, although it can get more complex than that). This operation requires the assembly and solution of a (large) linear system, which warrants the use of an optimal runtime configuration for your platform. In my case, it is based on an AMD FX-8120 CPU, which runs at its best with the ACML libraries. In order to transition smoothly from the stock BLAS and LAPACK libraries of your Linux system, here’s what you’ll do:
- Visit the AMD ACML download page, download the latest version targeted for the Gfortran compiler. If in doubt because of the “int” suffix, check this out. In my case, it is acml-5-1-0-gfortran-64bit (no int64).
- Unpack and install the package. This should be a no-brainer. Note the installation directory. For me, it is /opt/acml5.1.0
- Determine your CPU’s capabilities by running: /opt/acml5.1.0/util/cpuid.exe. In particular, check for FMA3 or FMA4 capabilities. My CPU has FMA4 support.
- Determine the best variant of the ACML libraries to use:
- If you have any FMA support, focus on the gfortran*_fma* directories. Else, focus on the gfortran* directories. In my case, gfortran64_fma4*
- If yours is a multicore processor and you would like to take advantage of this (highly recommended), look for the ones above which have the _mp suffix. My choice is gfortran64_fma4_mp.
- Create the alternatives entries to replace the stock or standard ones. To suit your case, you’ll need to modify the paths in the commands below as is appropriate (matching text colors if it helps):
sudo update-alternatives --install /usr/lib/libblas.so libblas.so /opt/acml5.1.0/gfortran64_fma4_mp/lib/libacml_mp.so 60 --slave /usr/lib/libblas.a libblas.a /opt/acml5.1.0/gfortran64_fma4_mp/lib/libacml_mp.a sudo update-alternatives --install /usr/lib/liblapack.so liblapack.so /opt/acml5.1.0/gfortran64_fma4_mp/lib/libacml_mp.so 60 sudo update-alternatives --install /usr/lib/liblapack.so.3gf liblapack.so.3gf /opt/acml5.1.0/gfortran64_fma4_mp/lib/libacml_mp.so 60 sudo update-alternatives --install /usr/lib/libblas.so.3gf libblas.so.3gf /opt/acml5.1.0/gfortran64_fma4_mp/lib/libacml_mp.so 60
Note: If you don’t use the multi-processor libraries (_mp suffix), the target libraries go with no suffix as well (so their names are libacml.a, libacml.so).
- Test your configuration: either in octave or numpy , create a large (ej. 5000 x 5000) matrix A and compute its SVD:
a = rand(5000, 5000) svd(a)
Hopefully you’ll see, in your favorite system resources viewer, all cores in your machine being utilized.
I have been able to successfully compile MIT’s harminv against these optimized libraries, and I hope you will be able to do the same with your codes.