Skip to content

Home / quickstart

Quickstart#

Icechunk is designed to be mostly in the background. As a Python user, you'll mostly be interacting with Zarr. If you're not familiar with Zarr, you may want to start with the Zarr Tutorial

Installation#

Icechunk can be installed using pip or conda:

python -m pip install icechunk
conda install -c conda-forge icechunk

Note

Icechunk is currently designed to support the Zarr V3 Specification. Using it today requires installing Zarr Python 3.

Create a new Icechunk repository#

To get started, let's create a new Icechunk repository. We recommend creating your repo on a cloud storage platform to get the most out of Icechunk's cloud-native design. However, you can also create a repo on your local filesystem.

import icechunk
storage = icechunk.s3_storage(bucket="my-bucket", prefix="my-prefix", from_env=True)
repo = icechunk.Repository.create(storage)
import icechunk
storage = icechunk.gcs_storage(bucket="my-bucket", prefix="my-prefix", from_env=True)
repo = icechunk.Repository.create(storage)
import icechunk
storage = icechunk.azure_storage(container="my-container", prefix="my-prefix", from_env=True)
repo = icechunk.Repository.create(storage)
import icechunk
import tempfile
storage = icechunk.local_filesystem_storage(tempfile.TemporaryDirectory().name)
repo = icechunk.Repository.create(storage)

Accessing the Icechunk store#

Once the repository is created, we can use Sessions to read and write data. Since there is no data in the repository yet, let's create a writable session on the default main branch.

session = repo.writable_session("main")

Now that we have a session, we can access the IcechunkStore from it to interact with the underlying data using zarr:

store = session.store  # A zarr store

Write some data and commit#

We can now use our Icechunk store with Zarr. Let's first create a group and an array within it.

import zarr
group = zarr.group(store)
array = group.create("my_array", shape=10, dtype='int32', chunks=(5,))

Now let's write some data

array[:] = 1

Now let's commit our update using the session

snapshot_id_1 = session.commit("first commit")
print(snapshot_id_1)
GPNCW00398HWZW9AH620

🎉 Congratulations! You just made your first Icechunk snapshot.

Note

Once a writable Session has been successfully committed to, it becomes read only to ensure that all writing is done explicitly. If you need to write more data, you have to start a new session.

Make a second commit#

At this point, we have already committed using our session, so we need to get a new session and store to make more changes. Here we will use an alternative syntax, using the transaction context manager. In this update, we put some new data into our array, overwriting the first five elements.

with repo.transaction("main", message="overwrite some values") as store:
    group = zarr.open_group(store)
    array = group["my_array"]
    array[:5] = 2

The transaction is automatically committed when the context exits.

Explore version history#

We can see the full version history of our repo:

hist = repo.ancestry(branch="main")
for ancestor in hist:
    print(ancestor.id, ancestor.message, ancestor.written_at)
2MV2X400C8G62T0EEYG0 overwrite some values 2026-02-17 21:13:19.588649+00:00
GPNCW00398HWZW9AH620 first commit 2026-02-17 21:13:19.572504+00:00
1CECHNKREP0F1RSTCMT0 Repository initialized 2026-02-17 21:13:19.553264+00:00

...and we can go back in time to the earlier version.

# latest version
assert array[0] == 2
# check out earlier snapshot
earlier_session = repo.readonly_session(snapshot_id=snapshot_id_1)
store = earlier_session.store

# get the array
group = zarr.open_group(store, mode="r")
array = group["my_array"]

# verify data matches first version
assert array[0] == 1

That's it! You now know how to use Icechunk!

For an overview of all of the important operations, check out the How-to guide.