Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

llama.cpp

llama.cpp runs at native speed when compiled for CUDA architecture 86 and with cuBLAS enabled:

cmake -B build -DGGML_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES="86" -DGGML_CUDA_FORCE_CUBLAS=true

Compiling for multiple CUDA architectures should be fine as long as one of the architectures is 80, 86 or 89.
Compiling with cuBLAS disabled might lead to performance degradation.

Windows

You need to install HIP SDK to have access to rocBLAS