🧵 Fixing Common Issues When Installing llama-cpp-python on Windows

If you're working with LLMs and trying out llama-cpp-python, you might run into some frustrating issues on Windows — especially when installing or importing the package.

I recently ran into both build errors during installation and runtime errors related to missing DLLs. In this post, I’ll walk through the exact problems I faced, and how I fixed them — hopefully saving you some hours of debugging.


🔧 Problem 1: Build Failure During pip install

If you're installing the llama-cpp-python package from source (or using a wheel that requires CMake), and you're on Windows, you might see errors like:

× Building wheel for llama-cpp-python (pyproject.toml) did not run successfully.
│ exit code: 1
  ╰─> [20 lines of output]
      *** scikit-build-core 0.11.1 using CMake 4.0.1 (wheel)
      *** Configuring CMake...
      scikit_build_core - WARNING - Can't find a Python library, got libdir=None, ldlibrary=None, multiarch=None, masd=None
      -- Building for: Visual Studio 17 2022
      -- The C compiler identification is unknown
      -- The CXX compiler identification is unknown
      CMake Error at CMakeLists.txt:3 (project):
        No CMAKE_C_COMPILER could be found.

      CMake Error at CMakeLists.txt:3 (project):
        No CMAKE_CXX_COMPILER could be found.

      -- Configuring incomplete, errors occurred!

      *** CMake configuration failed
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for llama-cpp-python
Failed to build llama-cpp-python
ERROR: Failed to build installable wheels for some pyproject.toml based projects (llama-cpp-python)

Here's a StackOverflow thread error-while-installing-python-package-llama-cpp-python and the-cxx-compiler-identification-is-unknown that matches the error I encountered.

Solution: Install Required Build Tools

You can use the Visual Studio Installer to make this easy. Make sure to install the following components:

  • Windows 10 SDK (version 10.0.x)
  • C++ CMake tools for Windows
  • MSVC v14.x C++ build tools

Additionally, some setups may require MinGW (Minimalist GNU for Windows) to provide the necessary compilers (gcc, g++):

🔗 Download MinGW-w64

Make sure to:

  • Add the MinGW bin/ folder to your system PATH.
  • Verify installation by running:
gcc --version
g++ --version

Once those are installed, try running the install again:

Note: Use --no-cache-dir and --force-reinstall if you want to force a fresh build:

pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu  --no-cache-dir

🧩 Problem 2: Runtime Error When Importing llama_cpp

After installation, I ran into this error on import:

Failed to load shared library '.../llama_cpp/lib/llama.dll':
Could not find module 'llama.dll' (or one of its dependencies).
Try using the full path with constructor syntax.

This occurred even after using the prebuilt CPU-only wheel.

Root Cause
https://github.com/abetlen/llama-cpp-python/issues/1993

Optional Fix: Comment RTLD_GLOBAL on Windows

If you’re still getting import errors after the above fixes, one workaround is to modify the source code of the installed package.

In the file Lib/site-packages/llama_cpp/_ctypes_extensions.py. Comment out the following line:

# cdll_args["winmode"] = ctypes.RTLD_GLOBAL

This resolved an edge-case import issue for me specific to Windows and certain Python builds.


The Code i use to run the model:

from llama_cpp import Llama
import time

llm = Llama(
    model_path="qwen2-0_5b-instruct-q4_k_m.gguf",
    n_ctx=1024,   # depending on your model and hardware
    n_threads=2,  # adjust to your CPU threads
)

questions = [
    "What is the tallest mountain in the world?",
    "What is the fastest land animal?",
]

qa = []

for question in questions:
    start_time = time.time()
    res = llm.create_chat_completion(
        messages=[
            {"role": "system", "content": "You are an assistant who perfectly describes in a professional way."},
            {"role": "user", "content": question}
        ],
        max_tokens=100
    )

    end_time = time.time()
    duration = end_time - start_time

    print("Q:", question)
    print("A:", res["choices"][0]["message"]["content"])
    print(f"⏱ Time taken: {duration:.2f} seconds")
    print("-" * 80)
    qa.append([
        question,
        res["choices"][0]["message"]["content"],
        f"{duration:.2f} seconds"
    ])

print()
for i, t in enumerate(qa):
    print(f"Q {i + 1} takes : {t[2]}")

Image description