Skip to content

Home / contributing

Contributing#

πŸ‘‹ Hi! Thanks for your interest in contributing to Icechunk!

Icechunk is an open source (Apache 2.0) project and welcomes contributions in the form of:

Development#

This guide describes the local development workflow for:

Python Development Workflow#

The Python code is developed in the icechunk-python subdirectory. To make changes first enter that directory:

cd icechunk-python

Prerequisites#

Setting up your development environment#

The easiest way to get started is with uv, which handles virtual environments and dependencies:

# Install all development dependencies (includes test dependencies, mypy, ruff, maturin)
uv sync

# Configure maturin-import-hook for fast incremental Rust compilation
uv run -m maturin_import_hook site install

# Build the Rust extension
uv run maturin develop --uv

Why these steps? Icechunk is a mixed Python/Rust project. The maturin-import-hook enables incremental Rust compilation (7-20 seconds) instead of full rebuilds (5+ minutes) every time you run tests or import the module. This makes development significantly faster.

Now you can run tests and other commands:

# Run tests (Rust changes will automatically trigger incremental rebuild)
uv run pytest

# Run type checking
uv run mypy python tests

# Run linting
uv run ruff check python
python3 -m venv .venv
source .venv/bin/activate

# Install maturin and dependencies
pip install maturin
pip install --group dev

# Build the Rust extension
maturin develop
mamba create -n icechunk python=3.12 rust zarr
mamba activate icechunk

# Install maturin and dependencies
pip install maturin
pip install --group dev

# Build the Rust extension
maturin develop

Testing#

uv run pytest
pytest

Note

By default pytest will run tests in parallel on all available cores of your machine. If you want to specify the number of cores manually set the -n <number-of-workers> manually (set 0 to run the test in serial)

Important

The full Python test suite depends on S3 and Azure compatible object stores. See here for detailed instructions.

Testing with Upstream Dependencies#

To test Icechunk against development versions of upstream packages (zarr, xarray, dask, distributed), use the nightly wheels from the scientific-python-nightly-wheels repository:

# Install with nightly wheels
export UV_INDEX="https://pypi.anaconda.org/scientific-python-nightly-wheels/simple/"
export UV_PRERELEASE=allow
uv sync --group test \
  --resolution highest \
  --index-strategy unsafe-best-match

# Run tests
uv run pytest

Running Xarray Backend Tests#

Icechunk includes integration tests that verify compatibility with Xarray's zarr backend API. These tests require the Xarray repository to be cloned locally.

Set the environment variables (adjust XARRAY_DIR to point to your local Xarray clone):

export ICECHUNK_XARRAY_BACKENDS_TESTS=1
export XARRAY_DIR=~/Documents/dev/xarray  # or your xarray location

Run the Xarray backend tests:

python -m pytest -xvs tests/run_xarray_backends_tests.py \
  -c $XARRAY_DIR/pyproject.toml \
  -W ignore \
  --override-ini="addopts="

To run a specific Xarray test you have first specify a class defined in @icechunk-python/tests/run_xarray_backends_tests.py and then specify an xarray test. For example:

python -m pytest -xvs tests/run_xarray_backends_tests.py::TestIcechunkStoreFilesystem::test_pickle \
  -c $XARRAY_DIR/pyproject.toml \
  -W ignore \
  --override-ini="addopts="

Checking Xarray Documentation Consistency#

Icechunk's to_icechunk function shares several parameters with Xarray's to_zarr function. To ensure documentation stays in sync, use the documentation checker script.

From the icechunk-python directory:

# Set XARRAY_DIR to point to your local Xarray clone
export XARRAY_DIR=~/Documents/dev/xarray

# Run the documentation consistency check
uv run scripts/check_xarray_docs_sync.py

The script will display a side-by-side comparison of any documentation differences, with missing text highlighted in red.

Known Differences: Some differences are acceptable (e.g., Sphinx formatting like :py:func: doesn't work in mkdocs). These are tracked in scripts/known-xarray-doc-diffs.json. Known differences are displayed but don't cause the check to fail.

Updating Known Differences: After making intentional documentation changes, update the known diffs file:

# Mark current diffs as known (creates/updates scripts/known-xarray-doc-diffs.json)
uv run scripts/check_xarray_docs_sync.py --update-known-diffs

# Edit scripts/known-xarray-doc-diffs.json to add reasons for each difference

CI Integration: The script returns exit code 0 if only known differences exist, allowing CI to pass while still displaying diffs for review.

Troubleshooting#

Too many open Files: If your limit for open file descriptors is set low (usually only a problem on macOS) some tests might fail. Adjusting the level with e.g. ulimit -n 1024 should fix this issue.

Rust Development Workflow#

Prerequisites#

You need to have already created and activated a virtual environment (see above), because the full rust build will also compile the python bindings.

Install the just command runner (used for build tasks and pre-commit hooks):

cargo install just

Or using other package managers:

  • macOS: brew install just
  • Ubuntu: snap install --edge --classic just

Ensure you have navigated to the root directory of the cloned repo (i.e. not the icechunk-python subdirectory).

Building#

Build the Rust workspace:

# Build all packages
just build

# Build release version
just build-release

# Compile tests without running them
just compile-tests

Testing#

# Run all tests
just test

# Run tests with logs enabled
just test-logs debug

# Run only specific tests
cargo test test_name

Important

The full Python test suite depends on S3 and Azure compatible object stores. See here for detailed instructions.

Code Quality#

To run all code quality checks you will also need cargo-deny:

cargo install cargo-deny

We use a tiered pre-commit system for fast development:

# Fast checks (~3 seconds) - format and lint only
just pre-commit-fast

# Medium checks (~2-3 minutes) - includes compilation and deps
just pre-commit

# Full CI checks (~5+ minutes) - includes all tests and examples
just pre-commit-ci

Individual checks:

# Format code
just format

# Check formatting without changing files
just format --check

# Lint with clippy
just lint

# Check dependencies for security issues
just check-deps

Pre-commit Hooks#

We use pre-commit to automatically run checks. Install it:

pip install pre-commit
pre-commit install

The pre-commit configuration automatically runs:

  • Every commit: Fast Python and Rust checks (~2 seconds total)
  • Before push: Medium Rust checks (compilation + dependencies)
  • Manual: Full CI-level checks when needed

To run manually:

# Run on changed files only
pre-commit run

# Run on all files
pre-commit run --all-files

# Run full CI checks manually
pre-commit run rust-pre-commit-ci --hook-stage manual

Building Documentation#

Python Documentation#

The documentation is built with MkDocs using Material for MkDocs.

System dependencies: Install Cairo graphics library for image processing:

brew install cairo

If mkdocs fails to find Cairo, set the library path:

export DYLD_FALLBACK_LIBRARY_PATH=/opt/homebrew/lib

You can add this to your ~/.zshrc to make it permanent.

sudo apt-get install libcairo2-dev
sudo dnf install cairo-devel

From the icechunk-python directory:

# Install icechunk with docs dependencies
uv sync --group docs

# Start the MkDocs development server
cd docs
uv run mkdocs serve

Use --livereload for file watching

Due to a Click 8.3.x bug, file watching may not work without the --livereload flag. Always use mkdocs serve --livereload to ensure automatic rebuilds when you edit files.

The development server will start at http://127.0.0.1:8000 with live reload enabled.

Build static site:

cd docs
uv run mkdocs build

This builds the site to docs/.site directory.

Tips:

  • Use mkdocs serve --dirty to only rebuild changed files (faster for iterative development)
  • You may need to restart if you make changes to mkdocs.yml
  • For debugging the doc build logs, check out docs-output-filter (you can run uv run docs-output-filter -- mkdocs serve --livereload once installed). This also works to debug remote builds like RTD with the --url flag 😍

Docker setup for local storage testing#

In order to run local versions of S3 and Azure compatible object stores with Docker, you have to install Docker first. We provide a docker compose compose.yaml file, which you can run with docker compose up -d from the root of the repo to start the containers in detached mode. docker ps should show the azurite and icechunk_minio containers as running (you can also navigate to the GUI e.g. for the minIO container at localhost:9001 and log in with the username and password from the compose.yaml file to navigate the buckets).

After testing you can clean up with docker compose down. To verify that all containers are down use docker ps again.

Roadmap#

Features#

  • Support more object stores and more of their custom features
  • Better Python API and helper functions
  • Bindings to other languages: C, Wasm
  • Better, faster, more secure distributed sessions
  • Savepoints and persistent sessions
  • Chunk and repo level statistics and metrics
  • More powerful conflict detection and resolution
  • Efficient move operation
  • Telemetry
  • Zarr-less usage from Python and other languages
  • Better documentation and examples

Performance#

  • Lower changeset memory footprint
  • Optimize virtual dataset prefixes
  • Bring back manifest joining for small arrays
  • Improve performance of ancestry, garbage_collect, get_size and other metrics
  • More flexible caching hierarchy
  • Better I/O pipeline
  • Better GIL management
  • Request batching and splitting
  • Bringing parts of the codec pipeline to the Rust side
  • Chunk compaction

We’re very excited about a number of extensions to Zarr that would work great with Icechunk.