Pairtools is a powerful Python package for processing paired-end sequencing data, widely used in bioinformatics for tasks like Hi-C analysis. However, installing it can be tricky due to dependency linking issues, particularly with pysam. In this article, I'll walk you through a step-by-step process (that I used to install pairtools) to install pairtools successfully, avoiding common pitfalls like the dreaded ImportError caused by broken library links. By the end, you'll have a working setup, verified by running pairtools' test suite.
This guide is based on real-world troubleshooting in a GitHub Codespace environment as of March 28, 2025, using pairtools 1.1.3 and pysam 0.23.0.
Setup
Pairtools works best in Unix based systems due to it's dependence on pysam which in turn utilizes htslib.
Therefore this installation is primarily meant for Linux platform, although the results could be replicated for MacOS too.
For Windows machine, there is a workaround method to install it by using tools like gcc.
Why Installation Can Be Challenging
Pairtools relies on pysam, a Python wrapper for the htslib C library, which involves compiling Cython extensions (e.g., parse_pysam.so). During installation, these extensions must link to pysam's shared libraries (e.g., libchtslib.so). A common issue arises when pip's build isolation creates temporary environments, causing pairtools to link against paths like /tmp/pip-build-env-…/pysam/libchtslib.so. These paths vanish post-installation, leading to runtime errors like:
ImportError: /tmp/pip-build-env-k1zoq23s/overlay/lib/python3.12/site-packages/pysam/libchtslib.cpython-312-x86_64-linux-gnu.so: cannot open shared object file: No such file or directoryOur goal is to ensure pairtools links to a persistent pysam installation, avoiding temporary paths. GitHub Codespaces, with its Ubuntu-based environment, provides a clean slate to achieve this - if set up correctly.
Prerequisites
-
GitHub Codespace: Access to a Codespace for your
pairtoolsrepository (e.g., forked from open2c/pairtools). - Basic Terminal Skills: Familiarity with running commands in a Linux terminal.
-
Repository Cloned: Your Codespace should have the
pairtoolsrepo checked out (e.g., at/workspaces/pairtools). ## Step-by-Step Installation Guide ### Step 1: Start with a Fresh Codespace A clean environment is critical to avoid cached build artifacts that can cause linking errors. -
Action: Delete your existing Codespace and create a new one.
- Go to the GitHub Codespaces dashboard (
https://github.com/codespaces). - Find your current Codespace for thepairtoolsrepo, delete it, and click "New Codespace" to start fresh. -
Why: This ensures no stale dependencies or mislinked libraries interfere.
### Step 2: Install System Dependencies
Pysamandpairtoolsrequire system libraries and development tools for compilation. - Command:
bash
sudo apt update && sudo apt install -y build-essential libhts-dev python3-dev
-
Details:
-
build-essential: Providesgcc,make, and other compilation tools. -libhts-dev: Headers forhtslib, used bypysam. -python3-dev: Python development headers for building extensions. -
Why: These ensure successful compilation of
pysamandpairtoolsfrom source. ### Step 3: Uninstall Existing Packages Remove any prior installations to prevent conflicts. - Command:
bash
pip uninstall pysam pairtools -y
-
Why: A clean slate avoids version mismatches or leftover files.
### Step 4: Install
pysamfrom Source Buildpysamexplicitly to place its shared libraries in a persistent location. - Command:
bash
pip install "pysam==0.23.0" - no-binary pysam - no-cache-dir - verbose
-
Details:
-
- no-binary pysam: Forces a source build, avoiding precompiled wheels that might mislink. -- no-cache-dir: Prevents reusing cached files that could cause issues. -- verbose: Shows build output for debugging if needed. -
Verification:
- Check installed
.sofiles:
bash
ls /usr/local/python/3.12.1/lib/python3.12/site-packages/pysam/*.so
Expect files like libchtslib.cpython-312-x86_64-linux-gnu.so.
- Verify linking:
bash
ldd /usr/local/python/3.12.1/lib/python3.12/site-packages/pysam/libchtslib.cpython-312-x86_64-linux-gnu.so
Ensure no /tmp/ paths appear; expect system libraries like /lib/x86_64-linux-gnu/libc.so.6.
-
Why: A source-built
pysamensures its libraries are correctly placed and linked. Note: The exact path (/usr/local/python/3.12.1/) may vary slightly depending on your Codespace's Python setup. Replace it with/home/codespace/.python/current/if that's your environment. ### Step 5: InstallpairtoolsWithout Build Isolation Installpairtoolsin editable mode, linking it to the installedpysam. - Commands:
bash
cd /workspaces/pairtools
rm -rf build dist *.egg-info pairtools/lib/*.so # Clean old build artifacts
pip install -e .[all] - no-build-isolation - no-cache-dir - verbose
-
Details:
-
-e: Editable mode, useful for development. -[all]: Installs optional dependencies (e.g.,bioframe). -- no-build-isolation: Uses the current environment'spysam, avoiding temporary build paths. - Verification: - Check linking:
bash
ldd /workspaces/pairtools/pairtools/lib/parse_pysam.cpython-312-x86_64-linux-gnu.so
Look for:
/usr/local/python/3.12.1/lib/python3.12/site-packages/pysam/libchtslib.cpython-312-x86_64-linux-gnu.so (0x…)
No /tmp/ paths should appear.
-
Why:
- no-build-isolationensurespairtools' extensions link to the installedpysam. ### Step 6: Run Tests to Confirm Test the installation to ensure everything works. - Command:
bash
/usr/local/python/3.12.1/bin/python -m pytest -v
-
Expected Output: All 17 tests should collect and run, showing pass/fail results. No
ImportErrorforlibchtslib.so. -
Why: This verifies that
pairtoolsand its dependencies are fully functional. ### Step 7: UsepairtoolsTry it out! - Command:
bash
/usr/local/python/3.12.1/bin/pairtools - help
-
Expected Output: Displays the command-line help, confirming the CLI is operational.
## Troubleshooting Common Issues
### Linking Still Points to
/tmp/ -
Symptom:
lddshows/tmp/pip-build-env-…/pysam/libchtslib.so => not found. - Fix: - Clean and rebuild manually:
bash
cd /workspaces/pairtools
rm -rf build dist *.egg-info pairtools/lib/*.so
python setup.py build_ext - inplace - verbose
pip install -e .[all] - no-build-isolation - no-cache-dir - verbose
- Recheck with ldd.
NumPy Version Warnings
-
Symptom: Warnings like
numpy 1.26.4 is incompatible with bioframe 0.7.2. - Fix: - Downgrade NumPy:
bash
pip install "numpy<2"
- Rerun tests.
Missing System Dependencies
-
Symptom: Build fails with errors about missing
htsliborgcc. - Fix: Reinstall system dependencies (Step 2). ## Why This Works
- Fresh Codespace: Eliminates cached artifacts that cause linking errors.
-
Source-Built
pysam: Ensures shared libraries are in a stable location. -
No Build Isolation: Forces
pairtoolsto use the installedpysam, avoiding temporary paths. -
Ubuntu Environment: Native Linux in Codespaces provides reliable library linking compared to WSL quirks.
## Conclusion
Installing
pairtoolsin GitHub Codespaces doesn't have to be a headache. By starting fresh, buildingpysamfrom source, and installingpairtoolswithout build isolation, you can sidestep linking issues and get to work on your paired-end sequencing projects. This method has been battle-tested to resolve theImportErrorforlibchtslib.so, ensuring a robust setup as of March 2025. Now that yourpairtoolsis installed, explore its capabilities - parse Hi-C data, run stats, or dive into its test suite for inspiration. Happy coding!