Quickstart#
Icechunk is designed to be mostly in the background. As a Python user, you'll mostly be interacting with Zarr. If you're not familiar with Zarr, you may want to start with the Zarr Tutorial
Installation#
Icechunk can be installed using pip or conda:
Note
Icechunk is currently designed to support the Zarr V3 Specification. Using it today requires installing Zarr Python 3.
Create a new Icechunk repository#
To get started, let's create a new Icechunk repository. We recommend creating your repo on a cloud storage platform to get the most out of Icechunk's cloud-native design. However, you can also create a repo on your local filesystem.
Accessing the Icechunk store#
Once the repository is created, we can use Sessions to read and write data. Since there is no data in the repository yet, let's create a writable session on the default main branch.
Now that we have a session, we can access the IcechunkStore from it to interact with the underlying data using zarr:
Write some data and commit#
We can now use our Icechunk store with Zarr. Let's first create a group and an array within it.
import zarr
group = zarr.group(store)
array = group.create("my_array", shape=10, dtype='int32', chunks=(5,))
Now let's write some data
Now let's commit our update using the session
🎉 Congratulations! You just made your first Icechunk snapshot.
Note
Once a writable Session has been successfully committed to, it becomes read only to ensure that all writing is done explicitly. If you need to write more data, you have to start a new session.
Make a second commit#
At this point, we have already committed using our session, so we need to get a new session and store to make more changes. Here we will use an alternative syntax, using the transaction context manager. In this update, we put some new data into our array, overwriting the first five elements.
with repo.transaction("main", message="overwrite some values") as store:
group = zarr.open_group(store)
array = group["my_array"]
array[:5] = 2
The transaction is automatically committed when the context exits.
Explore version history#
We can see the full version history of our repo:
hist = repo.ancestry(branch="main")
for ancestor in hist:
print(ancestor.id, ancestor.message, ancestor.written_at)
...and we can go back in time to the earlier version.
# latest version
assert array[0] == 2
# check out earlier snapshot
earlier_session = repo.readonly_session(snapshot_id=snapshot_id_1)
store = earlier_session.store
# get the array
group = zarr.open_group(store, mode="r")
array = group["my_array"]
# verify data matches first version
assert array[0] == 1
That's it! You now know how to use Icechunk!
For an overview of all of the important operations, check out the How-to guide.