Skip to content

Python API Reference

Home / reference

icechunk #

Modules:

Name Description
credentials
dask
distributed
repository
session
storage
store
xarray

Classes:

Name Description
AzureCredentials

Credentials for an azure storage backend

AzureStaticCredentials

Credentials for an azure storage backend

BasicConflictSolver

A basic conflict solver that allows for simple configuration of resolution behavior

CachingConfig

Configuration for how Icechunk caches its metadata files

ChunkType

Enum for Zarr chunk types

CompressionAlgorithm

Enum for selecting the compression algorithm used by Icechunk to write its metadata files

CompressionConfig

Configuration for how Icechunk compresses its metadata files

Conflict

A conflict detected between snapshots

ConflictDetector

A conflict solver that can be used to detect conflicts between two stores, but does not resolve them

ConflictError

An error that occurs when a conflict is detected

ConflictSolver

An abstract conflict solver that can be used to detect or resolve conflicts between two stores

ConflictType

Type of conflict detected

Diff

The result of comparing two snapshots

ForkSession
GCSummary

Summarizes the results of a garbage collection operation on an icechunk repo

GcsBearerCredential

Credentials for a google cloud storage backend

GcsCredentials

Credentials for a google cloud storage backend

GcsStaticCredentials

Credentials for a google cloud storage backend

IcechunkError

Base class for all Icechunk errors

IcechunkStore
ManifestConfig

Configuration for how Icechunk manifests

ManifestFileInfo

Manifest file metadata

ManifestPreloadCondition

Configuration for conditions under which manifests will preload on session creation

ManifestPreloadConfig

Configuration for how Icechunk manifest preload on session creation

ManifestSplitCondition

Configuration for conditions under which manifests will be split into splits

ManifestSplitDimCondition

Conditions for specifying dimensions along which to shard manifests.

ManifestSplittingConfig

Configuration for manifest splitting.

RebaseFailedError

An error that occurs when a rebase operation fails

Repository

An Icechunk repository.

RepositoryConfig

Configuration for an Icechunk repository

S3Credentials

Credentials for an S3 storage backend

S3Options

Options for accessing an S3-compatible storage backend

S3StaticCredentials

Credentials for an S3 storage backend

Session

A session object that allows for reading and writing data from an Icechunk repository.

SessionMode

Enum for session access modes

SnapshotInfo

Metadata for a snapshot

Storage

Storage configuration for an IcechunkStore

StorageConcurrencySettings

Configuration for how Icechunk uses its Storage instance

StorageRetriesSettings

Configuration for how Icechunk retries requests.

StorageSettings

Configuration for how Icechunk uses its Storage instance

VersionSelection

Enum for selecting the which version of a conflict

VirtualChunkContainer

A virtual chunk container is a configuration that allows Icechunk to read virtual references from a storage backend.

VirtualChunkSpec

The specification for a virtual chunk reference.

Functions:

Name Description
_upgrade_icechunk_repository

Migrate a repository to the latest version of Icechunk.

azure_credentials

Create credentials Azure Blob Storage object store.

azure_from_env_credentials

Instruct Azure Blob Storage object store to fetch credentials from the operative system environment.

azure_static_credentials

Create static credentials Azure Blob Storage object store.

azure_storage

Create a Storage instance that saves data in Azure Blob Storage object store.

containers_credentials

Build a map of credentials for virtual chunk containers.

gcs_credentials

Create credentials Google Cloud Storage object store.

gcs_from_env_credentials

Instruct Google Cloud Storage object store to fetch credentials from the operative system environment.

gcs_refreshable_credentials

Create refreshable credentials for Google Cloud Storage object store.

gcs_static_credentials

Create static credentials Google Cloud Storage object store.

gcs_storage

Create a Storage instance that saves data in Google Cloud Storage object store.

gcs_store

Build an ObjectStoreConfig instance for Google Cloud Storage object stores.

http_storage

Create a read-only Storage instance that reads data from an HTTP(s) server

http_store

Build an ObjectStoreConfig instance for HTTP object stores.

in_memory_storage

Create a Storage instance that saves data in memory.

initialize_logs

Initialize the logging system for the library.

local_filesystem_storage

Create a Storage instance that saves data in the local file system.

local_filesystem_store

Build an ObjectStoreConfig instance for local file stores.

r2_storage

Create a Storage instance that saves data in Tigris object store.

s3_anonymous_credentials

Create no-signature credentials for S3 and S3 compatible object stores.

s3_credentials

Create credentials for S3 and S3 compatible object stores.

s3_from_env_credentials

Instruct S3 and S3 compatible object stores to gather credentials from the operative system environment.

s3_refreshable_credentials

Create refreshable credentials for S3 and S3 compatible object stores.

s3_static_credentials

Create static credentials for S3 and S3 compatible object stores.

s3_storage

Create a Storage instance that saves data in S3 or S3 compatible object stores.

s3_store

Build an ObjectStoreConfig instance for S3 or S3 compatible object stores.

set_logs_filter

Set filters and log levels for the different modules.

spec_version

The version of the Icechunk specification that the library is compatible with.

tigris_storage

Create a Storage instance that saves data in Tigris object store.

AzureCredentials #

Credentials for an azure storage backend

This can be used to authenticate with an azure storage backend.

Classes:

Name Description
FromEnv

Uses credentials from environment variables

Static

Uses azure credentials without expiration

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class AzureCredentials:
    """Credentials for an azure storage backend

    This can be used to authenticate with an azure storage backend.
    """
    class FromEnv:
        """Uses credentials from environment variables"""
        def __init__(self) -> None: ...

    class Static:
        """Uses azure credentials without expiration"""
        def __init__(self, credentials: AnyAzureStaticCredential) -> None: ...

FromEnv #

Uses credentials from environment variables

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class FromEnv:
    """Uses credentials from environment variables"""
    def __init__(self) -> None: ...

Static #

Uses azure credentials without expiration

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class Static:
    """Uses azure credentials without expiration"""
    def __init__(self, credentials: AnyAzureStaticCredential) -> None: ...

AzureStaticCredentials #

Credentials for an azure storage backend

Classes:

Name Description
AccessKey

Credentials for an azure storage backend using an access key

BearerToken

Credentials for an azure storage backend using a bearer token

SasToken

Credentials for an azure storage backend using a shared access signature token

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class AzureStaticCredentials:
    """Credentials for an azure storage backend"""
    class AccessKey:
        """Credentials for an azure storage backend using an access key

        Parameters
        ----------
        key: str
            The access key to use for authentication.
        """
        def __init__(self, key: str) -> None: ...

    class SasToken:
        """Credentials for an azure storage backend using a shared access signature token

        Parameters
        ----------
        token: str
            The shared access signature token to use for authentication.
        """
        def __init__(self, token: str) -> None: ...

    class BearerToken:
        """Credentials for an azure storage backend using a bearer token

        Parameters
        ----------
        token: str
            The bearer token to use for authentication.
        """
        def __init__(self, token: str) -> None: ...

AccessKey #

Credentials for an azure storage backend using an access key

Parameters:

Name Type Description Default
key str

The access key to use for authentication.

required
Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class AccessKey:
    """Credentials for an azure storage backend using an access key

    Parameters
    ----------
    key: str
        The access key to use for authentication.
    """
    def __init__(self, key: str) -> None: ...

BearerToken #

Credentials for an azure storage backend using a bearer token

Parameters:

Name Type Description Default
token str

The bearer token to use for authentication.

required
Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class BearerToken:
    """Credentials for an azure storage backend using a bearer token

    Parameters
    ----------
    token: str
        The bearer token to use for authentication.
    """
    def __init__(self, token: str) -> None: ...

SasToken #

Credentials for an azure storage backend using a shared access signature token

Parameters:

Name Type Description Default
token str

The shared access signature token to use for authentication.

required
Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class SasToken:
    """Credentials for an azure storage backend using a shared access signature token

    Parameters
    ----------
    token: str
        The shared access signature token to use for authentication.
    """
    def __init__(self, token: str) -> None: ...

BasicConflictSolver #

Bases: ConflictSolver

A basic conflict solver that allows for simple configuration of resolution behavior

This conflict solver allows for simple configuration of resolution behavior for conflicts that may occur during a rebase operation. It will attempt to resolve a limited set of conflicts based on the configuration options provided.

  • When a chunk conflict is encountered, the behavior is determined by the on_chunk_conflict option
  • When an array is deleted that has been updated, fail_on_delete_of_updated_array will determine whether to fail the rebase operation
  • When a group is deleted that has been updated, fail_on_delete_of_updated_group will determine whether to fail the rebase operation

Methods:

Name Description
__init__

Create a BasicConflictSolver object with the given configuration options

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class BasicConflictSolver(ConflictSolver):
    """A basic conflict solver that allows for simple configuration of resolution behavior

    This conflict solver allows for simple configuration of resolution behavior for conflicts that may occur during a rebase operation.
    It will attempt to resolve a limited set of conflicts based on the configuration options provided.

    - When a chunk conflict is encountered, the behavior is determined by the `on_chunk_conflict` option
    - When an array is deleted that has been updated, `fail_on_delete_of_updated_array` will determine whether to fail the rebase operation
    - When a group is deleted that has been updated, `fail_on_delete_of_updated_group` will determine whether to fail the rebase operation
    """

    def __init__(
        self,
        *,
        on_chunk_conflict: VersionSelection = VersionSelection.UseOurs,
        fail_on_delete_of_updated_array: bool = False,
        fail_on_delete_of_updated_group: bool = False,
    ) -> None:
        """Create a BasicConflictSolver object with the given configuration options

        Parameters
        ----------
        on_chunk_conflict: VersionSelection
            The behavior to use when a chunk conflict is encountered, by default VersionSelection.use_theirs()
        fail_on_delete_of_updated_array: bool
            Whether to fail when a chunk is deleted that has been updated, by default False
        fail_on_delete_of_updated_group: bool
            Whether to fail when a group is deleted that has been updated, by default False
        """
        ...

__init__ #

__init__(*, on_chunk_conflict=VersionSelection.UseOurs, fail_on_delete_of_updated_array=False, fail_on_delete_of_updated_group=False)

Create a BasicConflictSolver object with the given configuration options

Parameters:

Name Type Description Default
on_chunk_conflict VersionSelection

The behavior to use when a chunk conflict is encountered, by default VersionSelection.use_theirs()

UseOurs
fail_on_delete_of_updated_array bool

Whether to fail when a chunk is deleted that has been updated, by default False

False
fail_on_delete_of_updated_group bool

Whether to fail when a group is deleted that has been updated, by default False

False
Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
def __init__(
    self,
    *,
    on_chunk_conflict: VersionSelection = VersionSelection.UseOurs,
    fail_on_delete_of_updated_array: bool = False,
    fail_on_delete_of_updated_group: bool = False,
) -> None:
    """Create a BasicConflictSolver object with the given configuration options

    Parameters
    ----------
    on_chunk_conflict: VersionSelection
        The behavior to use when a chunk conflict is encountered, by default VersionSelection.use_theirs()
    fail_on_delete_of_updated_array: bool
        Whether to fail when a chunk is deleted that has been updated, by default False
    fail_on_delete_of_updated_group: bool
        Whether to fail when a group is deleted that has been updated, by default False
    """
    ...

CachingConfig #

Configuration for how Icechunk caches its metadata files

Methods:

Name Description
__init__

Create a new CachingConfig object

Attributes:

Name Type Description
num_bytes_attributes int | None

The number of bytes of attributes to cache.

num_bytes_chunks int | None

The number of bytes of chunks to cache.

num_chunk_refs int | None

The number of chunk references to cache.

num_snapshot_nodes int | None

The number of snapshot nodes to cache.

num_transaction_changes int | None

The number of transaction changes to cache.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class CachingConfig:
    """Configuration for how Icechunk caches its metadata files"""

    def __init__(
        self,
        num_snapshot_nodes: int | None = None,
        num_chunk_refs: int | None = None,
        num_transaction_changes: int | None = None,
        num_bytes_attributes: int | None = None,
        num_bytes_chunks: int | None = None,
    ) -> None:
        """
        Create a new `CachingConfig` object

        Parameters
        ----------
        num_snapshot_nodes: int | None
            The number of snapshot nodes to cache.
        num_chunk_refs: int | None
            The number of chunk references to cache.
        num_transaction_changes: int | None
            The number of transaction changes to cache.
        num_bytes_attributes: int | None
            The number of bytes of attributes to cache.
        num_bytes_chunks: int | None
            The number of bytes of chunks to cache.
        """
    @property
    def num_snapshot_nodes(self) -> int | None:
        """
        The number of snapshot nodes to cache.

        Returns
        -------
        int | None
            The number of snapshot nodes to cache.
        """
        ...
    @num_snapshot_nodes.setter
    def num_snapshot_nodes(self, value: int | None) -> None:
        """
        Set the number of snapshot nodes to cache.

        Parameters
        ----------
        value: int | None
            The number of snapshot nodes to cache.
        """
        ...
    @property
    def num_chunk_refs(self) -> int | None:
        """
        The number of chunk references to cache.

        Returns
        -------
        int | None
            The number of chunk references to cache.
        """
        ...
    @num_chunk_refs.setter
    def num_chunk_refs(self, value: int | None) -> None:
        """
        Set the number of chunk references to cache.

        Parameters
        ----------
        value: int | None
            The number of chunk references to cache.
        """
        ...
    @property
    def num_transaction_changes(self) -> int | None:
        """
        The number of transaction changes to cache.

        Returns
        -------
        int | None
            The number of transaction changes to cache.
        """
        ...
    @num_transaction_changes.setter
    def num_transaction_changes(self, value: int | None) -> None:
        """
        Set the number of transaction changes to cache.

        Parameters
        ----------
        value: int | None
            The number of transaction changes to cache.
        """
        ...
    @property
    def num_bytes_attributes(self) -> int | None:
        """
        The number of bytes of attributes to cache.

        Returns
        -------
        int | None
            The number of bytes of attributes to cache.
        """
        ...
    @num_bytes_attributes.setter
    def num_bytes_attributes(self, value: int | None) -> None:
        """
        Set the number of bytes of attributes to cache.

        Parameters
        ----------
        value: int | None
            The number of bytes of attributes to cache.
        """
        ...
    @property
    def num_bytes_chunks(self) -> int | None:
        """
        The number of bytes of chunks to cache.

        Returns
        -------
        int | None
            The number of bytes of chunks to cache.
        """
        ...
    @num_bytes_chunks.setter
    def num_bytes_chunks(self, value: int | None) -> None:
        """
        Set the number of bytes of chunks to cache.

        Parameters
        ----------
        value: int | None
            The number of bytes of chunks to cache.
        """
        ...

num_bytes_attributes property writable #

num_bytes_attributes

The number of bytes of attributes to cache.

Returns:

Type Description
int | None

The number of bytes of attributes to cache.

num_bytes_chunks property writable #

num_bytes_chunks

The number of bytes of chunks to cache.

Returns:

Type Description
int | None

The number of bytes of chunks to cache.

num_chunk_refs property writable #

num_chunk_refs

The number of chunk references to cache.

Returns:

Type Description
int | None

The number of chunk references to cache.

num_snapshot_nodes property writable #

num_snapshot_nodes

The number of snapshot nodes to cache.

Returns:

Type Description
int | None

The number of snapshot nodes to cache.

num_transaction_changes property writable #

num_transaction_changes

The number of transaction changes to cache.

Returns:

Type Description
int | None

The number of transaction changes to cache.

__init__ #

__init__(num_snapshot_nodes=None, num_chunk_refs=None, num_transaction_changes=None, num_bytes_attributes=None, num_bytes_chunks=None)

Create a new CachingConfig object

Parameters:

Name Type Description Default
num_snapshot_nodes int | None

The number of snapshot nodes to cache.

None
num_chunk_refs int | None

The number of chunk references to cache.

None
num_transaction_changes int | None

The number of transaction changes to cache.

None
num_bytes_attributes int | None

The number of bytes of attributes to cache.

None
num_bytes_chunks int | None

The number of bytes of chunks to cache.

None
Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
def __init__(
    self,
    num_snapshot_nodes: int | None = None,
    num_chunk_refs: int | None = None,
    num_transaction_changes: int | None = None,
    num_bytes_attributes: int | None = None,
    num_bytes_chunks: int | None = None,
) -> None:
    """
    Create a new `CachingConfig` object

    Parameters
    ----------
    num_snapshot_nodes: int | None
        The number of snapshot nodes to cache.
    num_chunk_refs: int | None
        The number of chunk references to cache.
    num_transaction_changes: int | None
        The number of transaction changes to cache.
    num_bytes_attributes: int | None
        The number of bytes of attributes to cache.
    num_bytes_chunks: int | None
        The number of bytes of chunks to cache.
    """

ChunkType #

Bases: Enum

Enum for Zarr chunk types

Attributes:

Name Type Description
Uninitialized int

Chunk doesn't have a materialized type yet

Native int

Regular Zarr chunks

Virtual int

Chunk conforming to the VirtualiZarr spec

Inline int

Chunk is store inline in the manifest

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class ChunkType(Enum):
    """Enum for Zarr chunk types

    Attributes
    ----------
    Uninitialized: int
        Chunk doesn't have a materialized type yet
    Native: int
        Regular Zarr chunks
    Virtual: int
        Chunk conforming to the VirtualiZarr spec
    Inline: int
        Chunk is store inline in the manifest
    """

    UNINITIALIZED = 0
    NATIVE = 1
    VIRTUAL = 2
    INLINE = 3

CompressionAlgorithm #

Bases: Enum

Enum for selecting the compression algorithm used by Icechunk to write its metadata files

Attributes:

Name Type Description
Zstd int

The Zstd compression algorithm.

Methods:

Name Description
default

The default compression algorithm used by Icechunk to write its metadata files.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class CompressionAlgorithm(Enum):
    """Enum for selecting the compression algorithm used by Icechunk to write its metadata files

    Attributes
    ----------
    Zstd: int
        The Zstd compression algorithm.
    """

    Zstd = 0

    def __init__(self) -> None: ...
    @staticmethod
    def default() -> CompressionAlgorithm:
        """
        The default compression algorithm used by Icechunk to write its metadata files.

        Returns
        -------
        CompressionAlgorithm
            The default compression algorithm.
        """
        ...

default staticmethod #

default()

The default compression algorithm used by Icechunk to write its metadata files.

Returns:

Type Description
CompressionAlgorithm

The default compression algorithm.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
@staticmethod
def default() -> CompressionAlgorithm:
    """
    The default compression algorithm used by Icechunk to write its metadata files.

    Returns
    -------
    CompressionAlgorithm
        The default compression algorithm.
    """
    ...

CompressionConfig #

Configuration for how Icechunk compresses its metadata files

Methods:

Name Description
__init__

Create a new CompressionConfig object

default

The default compression configuration used by Icechunk to write its metadata files.

Attributes:

Name Type Description
algorithm CompressionAlgorithm | None

The compression algorithm used by Icechunk to write its metadata files.

level int | None

The compression level used by Icechunk to write its metadata files.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class CompressionConfig:
    """Configuration for how Icechunk compresses its metadata files"""

    def __init__(
        self, algorithm: CompressionAlgorithm | None = None, level: int | None = None
    ) -> None:
        """
        Create a new `CompressionConfig` object

        Parameters
        ----------
        algorithm: CompressionAlgorithm | None
            The compression algorithm to use.
        level: int | None
            The compression level to use.
        """
        ...
    @property
    def algorithm(self) -> CompressionAlgorithm | None:
        """
        The compression algorithm used by Icechunk to write its metadata files.

        Returns
        -------
        CompressionAlgorithm | None
            The compression algorithm used by Icechunk to write its metadata files.
        """
        ...
    @algorithm.setter
    def algorithm(self, value: CompressionAlgorithm | None) -> None:
        """
        Set the compression algorithm used by Icechunk to write its metadata files.

        Parameters
        ----------
        value: CompressionAlgorithm | None
            The compression algorithm to use.
        """
        ...
    @property
    def level(self) -> int | None:
        """
        The compression level used by Icechunk to write its metadata files.

        Returns
        -------
        int | None
            The compression level used by Icechunk to write its metadata files.
        """
        ...
    @level.setter
    def level(self, value: int | None) -> None:
        """
        Set the compression level used by Icechunk to write its metadata files.

        Parameters
        ----------
        value: int | None
            The compression level to use.
        """
        ...
    @staticmethod
    def default() -> CompressionConfig:
        """
        The default compression configuration used by Icechunk to write its metadata files.

        Returns
        -------
        CompressionConfig
        """

algorithm property writable #

algorithm

The compression algorithm used by Icechunk to write its metadata files.

Returns:

Type Description
CompressionAlgorithm | None

The compression algorithm used by Icechunk to write its metadata files.

level property writable #

level

The compression level used by Icechunk to write its metadata files.

Returns:

Type Description
int | None

The compression level used by Icechunk to write its metadata files.

__init__ #

__init__(algorithm=None, level=None)

Create a new CompressionConfig object

Parameters:

Name Type Description Default
algorithm CompressionAlgorithm | None

The compression algorithm to use.

None
level int | None

The compression level to use.

None
Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
def __init__(
    self, algorithm: CompressionAlgorithm | None = None, level: int | None = None
) -> None:
    """
    Create a new `CompressionConfig` object

    Parameters
    ----------
    algorithm: CompressionAlgorithm | None
        The compression algorithm to use.
    level: int | None
        The compression level to use.
    """
    ...

default staticmethod #

default()

The default compression configuration used by Icechunk to write its metadata files.

Returns:

Type Description
CompressionConfig
Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
@staticmethod
def default() -> CompressionConfig:
    """
    The default compression configuration used by Icechunk to write its metadata files.

    Returns
    -------
    CompressionConfig
    """

Conflict #

A conflict detected between snapshots

Methods:

Name Description
__init__

Create a new Conflict.

Attributes:

Name Type Description
conflict_type ConflictType

The type of conflict detected

conflicted_chunks list[list[int]] | None

If the conflict is a chunk conflict, this will return the list of chunk indices that are in conflict

path str

The path of the node that caused the conflict

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class Conflict:
    """A conflict detected between snapshots"""

    def __init__(
        self,
        conflict_type: ConflictType,
        path: str,
        conflicted_chunks: list[list[int]] | None = None,
    ) -> None:
        """
        Create a new Conflict.

        Parameters
        ----------
        conflict_type: ConflictType
            The type of conflict.
        path: str
            The path of the node that caused the conflict.
        conflicted_chunks: list[list[int]] | None
            If the conflict is a chunk conflict, the list of chunk indices in conflict.
        """
        ...

    @property
    def conflict_type(self) -> ConflictType:
        """The type of conflict detected

        Returns:
            ConflictType: The type of conflict detected
        """
        ...

    @property
    def path(self) -> str:
        """The path of the node that caused the conflict

        Returns:
            str: The path of the node that caused the conflict
        """
        ...

    @property
    def conflicted_chunks(self) -> list[list[int]] | None:
        """If the conflict is a chunk conflict, this will return the list of chunk indices that are in conflict

        Returns:
            list[list[int]] | None: The list of chunk indices that are in conflict
        """
        ...

conflict_type property #

conflict_type

The type of conflict detected

Returns: ConflictType: The type of conflict detected

conflicted_chunks property #

conflicted_chunks

If the conflict is a chunk conflict, this will return the list of chunk indices that are in conflict

Returns: list[list[int]] | None: The list of chunk indices that are in conflict

path property #

path

The path of the node that caused the conflict

Returns: str: The path of the node that caused the conflict

__init__ #

__init__(conflict_type, path, conflicted_chunks=None)

Create a new Conflict.

Parameters:

Name Type Description Default
conflict_type ConflictType

The type of conflict.

required
path str

The path of the node that caused the conflict.

required
conflicted_chunks list[list[int]] | None

If the conflict is a chunk conflict, the list of chunk indices in conflict.

None
Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
def __init__(
    self,
    conflict_type: ConflictType,
    path: str,
    conflicted_chunks: list[list[int]] | None = None,
) -> None:
    """
    Create a new Conflict.

    Parameters
    ----------
    conflict_type: ConflictType
        The type of conflict.
    path: str
        The path of the node that caused the conflict.
    conflicted_chunks: list[list[int]] | None
        If the conflict is a chunk conflict, the list of chunk indices in conflict.
    """
    ...

ConflictDetector #

Bases: ConflictSolver

A conflict solver that can be used to detect conflicts between two stores, but does not resolve them

Where the BasicConflictSolver will attempt to resolve conflicts, the ConflictDetector will only detect them. This means that during a rebase operation the ConflictDetector will raise a RebaseFailed error if any conflicts are detected, and allow the rebase operation to be retried with a different conflict resolution strategy. Otherwise, if no conflicts are detected the rebase operation will succeed.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class ConflictDetector(ConflictSolver):
    """A conflict solver that can be used to detect conflicts between two stores, but does not resolve them

    Where the `BasicConflictSolver` will attempt to resolve conflicts, the `ConflictDetector` will only detect them. This means
    that during a rebase operation the `ConflictDetector` will raise a `RebaseFailed` error if any conflicts are detected, and
    allow the rebase operation to be retried with a different conflict resolution strategy. Otherwise, if no conflicts are detected
    the rebase operation will succeed.
    """

    def __init__(self) -> None: ...

ConflictError #

Bases: Exception

An error that occurs when a conflict is detected

Methods:

Name Description
__init__

Create a new ConflictError.

Attributes:

Name Type Description
actual_parent str

The actual parent snapshot ID of the branch that the session attempted to commit to.

expected_parent str

The expected parent snapshot ID.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class ConflictError(Exception):
    """An error that occurs when a conflict is detected"""

    def __init__(
        self,
        expected_parent: str | None = None,
        actual_parent: str | None = None,
    ) -> None:
        """
        Create a new ConflictError.

        Parameters
        ----------
        expected_parent: str | None
            The expected parent snapshot ID.
        actual_parent: str | None
            The actual parent snapshot ID of the branch.
        """
        ...

    @property
    def expected_parent(self) -> str:
        """The expected parent snapshot ID.

        This is the snapshot ID that the session was based on when the
        commit operation was called.
        """
        ...
    @property
    def actual_parent(self) -> str:
        """
        The actual parent snapshot ID of the branch that the session attempted to commit to.

        When the session is based on a branch, this is the snapshot ID of the branch tip. If this
        error is raised, it means the branch was modified and committed by another session after
        the session was created.
        """
        ...
    ...

actual_parent property #

actual_parent

The actual parent snapshot ID of the branch that the session attempted to commit to.

When the session is based on a branch, this is the snapshot ID of the branch tip. If this error is raised, it means the branch was modified and committed by another session after the session was created.

expected_parent property #

expected_parent

The expected parent snapshot ID.

This is the snapshot ID that the session was based on when the commit operation was called.

__init__ #

__init__(expected_parent=None, actual_parent=None)

Create a new ConflictError.

Parameters:

Name Type Description Default
expected_parent str | None

The expected parent snapshot ID.

None
actual_parent str | None

The actual parent snapshot ID of the branch.

None
Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
def __init__(
    self,
    expected_parent: str | None = None,
    actual_parent: str | None = None,
) -> None:
    """
    Create a new ConflictError.

    Parameters
    ----------
    expected_parent: str | None
        The expected parent snapshot ID.
    actual_parent: str | None
        The actual parent snapshot ID of the branch.
    """
    ...

ConflictSolver #

An abstract conflict solver that can be used to detect or resolve conflicts between two stores

This should never be used directly, but should be subclassed to provide specific conflict resolution behavior

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class ConflictSolver:
    """An abstract conflict solver that can be used to detect or resolve conflicts between two stores

    This should never be used directly, but should be subclassed to provide specific conflict resolution behavior
    """

    ...

ConflictType #

Bases: Enum

Type of conflict detected

Attributes:

Name Type Description
ChunkDoubleUpdate

A chunk update conflicts with an existing chunk update

ChunksUpdatedInDeletedArray

Chunks are updated in a deleted array

ChunksUpdatedInUpdatedArray

Chunks are updated in an updated array

DeleteOfUpdatedArray

A delete is attempted on an updated array

DeleteOfUpdatedGroup

A delete is attempted on an updated group

NewNodeConflictsWithExistingNode

A new node conflicts with an existing node

NewNodeInInvalidGroup

A new node is in an invalid group

ZarrMetadataDoubleUpdate

A zarr metadata update conflicts with an existing zarr metadata update

ZarrMetadataUpdateOfDeletedArray

A zarr metadata update is attempted on a deleted array

ZarrMetadataUpdateOfDeletedGroup

A zarr metadata update is attempted on a deleted group

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class ConflictType(Enum):
    """Type of conflict detected"""

    NewNodeConflictsWithExistingNode = (1,)
    """A new node conflicts with an existing node"""

    NewNodeInInvalidGroup = (2,)
    """A new node is in an invalid group"""

    ZarrMetadataDoubleUpdate = (3,)
    """A zarr metadata update conflicts with an existing zarr metadata update"""

    ZarrMetadataUpdateOfDeletedArray = (4,)
    """A zarr metadata update is attempted on a deleted array"""

    ZarrMetadataUpdateOfDeletedGroup = (5,)
    """A zarr metadata update is attempted on a deleted group"""

    ChunkDoubleUpdate = (6,)
    """A chunk update conflicts with an existing chunk update"""

    ChunksUpdatedInDeletedArray = (7,)
    """Chunks are updated in a deleted array"""

    ChunksUpdatedInUpdatedArray = (8,)
    """Chunks are updated in an updated array"""

    DeleteOfUpdatedArray = (9,)
    """A delete is attempted on an updated array"""

    DeleteOfUpdatedGroup = (10,)
    """A delete is attempted on an updated group"""

    (MoveOperationCannotBeRebased,) = (11,)
    """Move operation cannot be rebased"""

ChunkDoubleUpdate class-attribute instance-attribute #

ChunkDoubleUpdate = (6,)

A chunk update conflicts with an existing chunk update

ChunksUpdatedInDeletedArray class-attribute instance-attribute #

ChunksUpdatedInDeletedArray = (7,)

Chunks are updated in a deleted array

ChunksUpdatedInUpdatedArray class-attribute instance-attribute #

ChunksUpdatedInUpdatedArray = (8,)

Chunks are updated in an updated array

DeleteOfUpdatedArray class-attribute instance-attribute #

DeleteOfUpdatedArray = (9,)

A delete is attempted on an updated array

DeleteOfUpdatedGroup class-attribute instance-attribute #

DeleteOfUpdatedGroup = (10,)

A delete is attempted on an updated group

NewNodeConflictsWithExistingNode class-attribute instance-attribute #

NewNodeConflictsWithExistingNode = (1,)

A new node conflicts with an existing node

NewNodeInInvalidGroup class-attribute instance-attribute #

NewNodeInInvalidGroup = (2,)

A new node is in an invalid group

ZarrMetadataDoubleUpdate class-attribute instance-attribute #

ZarrMetadataDoubleUpdate = (3,)

A zarr metadata update conflicts with an existing zarr metadata update

ZarrMetadataUpdateOfDeletedArray class-attribute instance-attribute #

ZarrMetadataUpdateOfDeletedArray = (4,)

A zarr metadata update is attempted on a deleted array

ZarrMetadataUpdateOfDeletedGroup class-attribute instance-attribute #

ZarrMetadataUpdateOfDeletedGroup = (5,)

A zarr metadata update is attempted on a deleted group

Diff #

The result of comparing two snapshots

Methods:

Name Description
is_empty

Returns True if the diff contains no changes.

Attributes:

Name Type Description
deleted_arrays set[str]

The arrays that were deleted in the target ref.

deleted_groups set[str]

The groups that were deleted in the target ref.

moved_nodes list[tuple[str, str]]

The list of node moves, in order of application, as tuples (from_path, to_path).

new_arrays set[str]

The arrays that were added to the target ref.

new_groups set[str]

The groups that were added to the target ref.

updated_arrays set[str]

The arrays that were updated via zarr metadata in the target ref.

updated_chunks dict[str, list[list[int]]]

The chunks indices that had data updated in the target ref, keyed by the path to the array.

updated_groups set[str]

The groups that were updated via zarr metadata in the target ref.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class Diff:
    """The result of comparing two snapshots"""
    def is_empty(self) -> bool:
        """
        Returns True if the diff contains no changes.
        """
        ...
    @property
    def new_groups(self) -> set[str]:
        """
        The groups that were added to the target ref.
        """
        ...
    @property
    def new_arrays(self) -> set[str]:
        """
        The arrays that were added to the target ref.
        """
        ...
    @property
    def deleted_groups(self) -> set[str]:
        """
        The groups that were deleted in the target ref.
        """
        ...
    @property
    def deleted_arrays(self) -> set[str]:
        """
        The arrays that were deleted in the target ref.
        """
        ...
    @property
    def updated_groups(self) -> set[str]:
        """
        The groups that were updated via zarr metadata in the target ref.
        """
        ...
    @property
    def updated_arrays(self) -> set[str]:
        """
        The arrays that were updated via zarr metadata in the target ref.
        """
        ...
    @property
    def updated_chunks(self) -> dict[str, list[list[int]]]:
        """
        The chunks indices that had data updated in the target ref, keyed by the path to the array.
        """
        ...
    @property
    def moved_nodes(self) -> list[tuple[str, str]]:
        """
        The list of node moves, in order of application, as tuples (from_path, to_path).
        """
        ...

deleted_arrays property #

deleted_arrays

The arrays that were deleted in the target ref.

deleted_groups property #

deleted_groups

The groups that were deleted in the target ref.

moved_nodes property #

moved_nodes

The list of node moves, in order of application, as tuples (from_path, to_path).

new_arrays property #

new_arrays

The arrays that were added to the target ref.

new_groups property #

new_groups

The groups that were added to the target ref.

updated_arrays property #

updated_arrays

The arrays that were updated via zarr metadata in the target ref.

updated_chunks property #

updated_chunks

The chunks indices that had data updated in the target ref, keyed by the path to the array.

updated_groups property #

updated_groups

The groups that were updated via zarr metadata in the target ref.

is_empty #

is_empty()

Returns True if the diff contains no changes.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
def is_empty(self) -> bool:
    """
    Returns True if the diff contains no changes.
    """
    ...

ForkSession #

Bases: Session

Methods:

Name Description
merge_async

Merge the changes for this fork session with the changes from other fork sessions (async version).

Attributes:

Name Type Description
store IcechunkStore

Get a zarr Store object for reading and writing data from the repository using zarr python.

Source code in icechunk-python/python/icechunk/session.py
class ForkSession(Session):
    def __getstate__(self) -> object:
        state = {"_session": self._session.as_bytes()}
        return state

    def __setstate__(self, state: object) -> None:
        if not isinstance(state, dict):
            raise ValueError("Invalid state")
        self._session = PySession.from_bytes(state["_session"])

    def merge(self, *others: Self) -> None:
        for other in others:
            if not isinstance(other, ForkSession):
                raise TypeError(
                    f"A ForkSession can only be merged with another ForkSession. Received {type(other)} instead."
                )
            self._session.merge(other._session)

    async def merge_async(self, *others: Self) -> None:
        """
        Merge the changes for this fork session with the changes from other fork sessions (async version).

        Parameters
        ----------
        others : ForkSession
            The other fork sessions to merge changes from.
        """
        for other in others:
            if not isinstance(other, ForkSession):
                raise TypeError(
                    f"A ForkSession can only be merged with another ForkSession. Received {type(other)} instead."
                )
            await self._session.merge_async(other._session)

    def commit(
        self,
        message: str,
        metadata: dict[str, Any] | None = None,
        rebase_with: ConflictSolver | None = None,
        rebase_tries: int = 1_000,
        allow_empty: bool = False,
    ) -> NoReturn:
        raise TypeError(
            "Cannot commit a fork of a Session. If you are using uncooperative writes, "
            "please send the Repository object to your workers, not a Session. "
            "See https://icechunk.io/en/stable/icechunk-python/parallel/#distributed-writes for more."
        )

    async def commit_async(
        self,
        message: str,
        metadata: dict[str, Any] | None = None,
        rebase_with: ConflictSolver | None = None,
        rebase_tries: int = 1_000,
        allow_empty: bool = False,
    ) -> NoReturn:
        raise TypeError(
            "Cannot commit a fork of a Session. If you are using uncooperative writes, "
            "please send the Repository object to your workers, not a Session. "
            "See https://icechunk.io/en/stable/icechunk-python/parallel/#distributed-writes for more."
        )

    def flush(
        self,
        message: str,
        metadata: dict[str, Any] | None = None,
    ) -> NoReturn:
        raise TypeError(
            "Cannot flush a fork of a Session. If you are using uncooperative writes, "
            "please send the Repository object to your workers, not a Session. "
            "See https://icechunk.io/en/stable/icechunk-python/parallel/#distributed-writes for more."
        )

    async def flush_async(
        self,
        message: str,
        metadata: dict[str, Any] | None = None,
    ) -> NoReturn:
        raise TypeError(
            "Cannot flush a fork of a Session. If you are using uncooperative writes, "
            "please send the Repository object to your workers, not a Session. "
            "See https://icechunk.io/en/stable/icechunk-python/parallel/#distributed-writes for more."
        )

    @property
    def store(self) -> IcechunkStore:
        """
        Get a zarr Store object for reading and writing data from the repository using zarr python.

        Returns
        -------
        IcechunkStore
            A zarr Store object for reading and writing data from the repository.
        """
        return IcechunkStore(self._session.store, for_fork=True)

store property #

store

Get a zarr Store object for reading and writing data from the repository using zarr python.

Returns:

Type Description
IcechunkStore

A zarr Store object for reading and writing data from the repository.

merge_async async #

merge_async(*others)

Merge the changes for this fork session with the changes from other fork sessions (async version).

Parameters:

Name Type Description Default
others ForkSession

The other fork sessions to merge changes from.

()
Source code in icechunk-python/python/icechunk/session.py
async def merge_async(self, *others: Self) -> None:
    """
    Merge the changes for this fork session with the changes from other fork sessions (async version).

    Parameters
    ----------
    others : ForkSession
        The other fork sessions to merge changes from.
    """
    for other in others:
        if not isinstance(other, ForkSession):
            raise TypeError(
                f"A ForkSession can only be merged with another ForkSession. Received {type(other)} instead."
            )
        await self._session.merge_async(other._session)

GCSummary #

Summarizes the results of a garbage collection operation on an icechunk repo

Attributes:

Name Type Description
attributes_deleted int

How many attributes were deleted.

bytes_deleted int

How many bytes were deleted.

chunks_deleted int

How many chunks were deleted.

manifests_deleted int

How many manifests were deleted.

snapshots_deleted int

How many snapshots were deleted.

transaction_logs_deleted int

How many transaction logs were deleted.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class GCSummary:
    """Summarizes the results of a garbage collection operation on an icechunk repo"""
    @property
    def bytes_deleted(self) -> int:
        """
        How many bytes were deleted.
        """
        ...
    @property
    def chunks_deleted(self) -> int:
        """
        How many chunks were deleted.
        """
        ...
    @property
    def manifests_deleted(self) -> int:
        """
        How many manifests were deleted.
        """
        ...
    @property
    def snapshots_deleted(self) -> int:
        """
        How many snapshots were deleted.
        """
        ...
    @property
    def attributes_deleted(self) -> int:
        """
        How many attributes were deleted.
        """
        ...
    @property
    def transaction_logs_deleted(self) -> int:
        """
        How many transaction logs were deleted.
        """
        ...

attributes_deleted property #

attributes_deleted

How many attributes were deleted.

bytes_deleted property #

bytes_deleted

How many bytes were deleted.

chunks_deleted property #

chunks_deleted

How many chunks were deleted.

manifests_deleted property #

manifests_deleted

How many manifests were deleted.

snapshots_deleted property #

snapshots_deleted

How many snapshots were deleted.

transaction_logs_deleted property #

transaction_logs_deleted

How many transaction logs were deleted.

GcsBearerCredential #

Credentials for a google cloud storage backend

This is a bearer token that has an expiration time.

Methods:

Name Description
__init__

Create a GcsBearerCredential object

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class GcsBearerCredential:
    """Credentials for a google cloud storage backend

    This is a bearer token that has an expiration time.
    """

    def __init__(
        self, bearer: str, *, expires_after: datetime.datetime | None = None
    ) -> None:
        """Create a GcsBearerCredential object

        Parameters
        ----------
        bearer: str
            The bearer token to use for authentication.
        expires_after: datetime.datetime | None
            The expiration time of the bearer token.
        """

    @property
    def bearer(self) -> str: ...
    @property
    def expires_after(self) -> datetime.datetime | None: ...

__init__ #

__init__(bearer, *, expires_after=None)

Create a GcsBearerCredential object

Parameters:

Name Type Description Default
bearer str

The bearer token to use for authentication.

required
expires_after datetime | None

The expiration time of the bearer token.

None
Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
def __init__(
    self, bearer: str, *, expires_after: datetime.datetime | None = None
) -> None:
    """Create a GcsBearerCredential object

    Parameters
    ----------
    bearer: str
        The bearer token to use for authentication.
    expires_after: datetime.datetime | None
        The expiration time of the bearer token.
    """

GcsCredentials #

Credentials for a google cloud storage backend

This can be used to authenticate with a google cloud storage backend.

Classes:

Name Description
Anonymous

Uses anonymous credentials

FromEnv

Uses credentials from environment variables

Refreshable

Allows for an outside authority to pass in a function that can be used to provide credentials.

Static

Uses gcs credentials without expiration

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class GcsCredentials:
    """Credentials for a google cloud storage backend

    This can be used to authenticate with a google cloud storage backend.
    """
    class Anonymous:
        """Uses anonymous credentials"""
        def __init__(self) -> None: ...

    class FromEnv:
        """Uses credentials from environment variables"""
        def __init__(self) -> None: ...

    class Static:
        """Uses gcs credentials without expiration"""
        def __init__(self, credentials: AnyGcsStaticCredential) -> None: ...

    class Refreshable:
        """Allows for an outside authority to pass in a function that can be used to provide credentials.

        This is useful for credentials that have an expiration time, or are otherwise not known ahead of time.
        """
        def __init__(
            self, pickled_function: bytes, current: GcsBearerCredential | None = None
        ) -> None: ...

Anonymous #

Uses anonymous credentials

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class Anonymous:
    """Uses anonymous credentials"""
    def __init__(self) -> None: ...

FromEnv #

Uses credentials from environment variables

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class FromEnv:
    """Uses credentials from environment variables"""
    def __init__(self) -> None: ...

Refreshable #

Allows for an outside authority to pass in a function that can be used to provide credentials.

This is useful for credentials that have an expiration time, or are otherwise not known ahead of time.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class Refreshable:
    """Allows for an outside authority to pass in a function that can be used to provide credentials.

    This is useful for credentials that have an expiration time, or are otherwise not known ahead of time.
    """
    def __init__(
        self, pickled_function: bytes, current: GcsBearerCredential | None = None
    ) -> None: ...

Static #

Uses gcs credentials without expiration

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class Static:
    """Uses gcs credentials without expiration"""
    def __init__(self, credentials: AnyGcsStaticCredential) -> None: ...

GcsStaticCredentials #

Credentials for a google cloud storage backend

Classes:

Name Description
ApplicationCredentials

Credentials for a google cloud storage backend using application default credentials

BearerToken

Credentials for a google cloud storage backend using a bearer token

ServiceAccount

Credentials for a google cloud storage backend using a service account json file

ServiceAccountKey

Credentials for a google cloud storage backend using a a serialized service account key

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class GcsStaticCredentials:
    """Credentials for a google cloud storage backend"""
    class ServiceAccount:
        """Credentials for a google cloud storage backend using a service account json file

        Parameters
        ----------
        path: str
            The path to the service account json file.
        """
        def __init__(self, path: str) -> None: ...

    class ServiceAccountKey:
        """Credentials for a google cloud storage backend using a a serialized service account key

        Parameters
        ----------
        key: str
            The serialized service account key.
        """
        def __init__(self, key: str) -> None: ...

    class ApplicationCredentials:
        """Credentials for a google cloud storage backend using application default credentials

        Parameters
        ----------
        path: str
            The path to the application default credentials (ADC) file.
        """
        def __init__(self, path: str) -> None: ...

    class BearerToken:
        """Credentials for a google cloud storage backend using a bearer token

        Parameters
        ----------
        token: str
            The bearer token to use for authentication.
        """
        def __init__(self, token: str) -> None: ...

ApplicationCredentials #

Credentials for a google cloud storage backend using application default credentials

Parameters:

Name Type Description Default
path str

The path to the application default credentials (ADC) file.

required
Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class ApplicationCredentials:
    """Credentials for a google cloud storage backend using application default credentials

    Parameters
    ----------
    path: str
        The path to the application default credentials (ADC) file.
    """
    def __init__(self, path: str) -> None: ...

BearerToken #

Credentials for a google cloud storage backend using a bearer token

Parameters:

Name Type Description Default
token str

The bearer token to use for authentication.

required
Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class BearerToken:
    """Credentials for a google cloud storage backend using a bearer token

    Parameters
    ----------
    token: str
        The bearer token to use for authentication.
    """
    def __init__(self, token: str) -> None: ...

ServiceAccount #

Credentials for a google cloud storage backend using a service account json file

Parameters:

Name Type Description Default
path str

The path to the service account json file.

required
Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class ServiceAccount:
    """Credentials for a google cloud storage backend using a service account json file

    Parameters
    ----------
    path: str
        The path to the service account json file.
    """
    def __init__(self, path: str) -> None: ...

ServiceAccountKey #

Credentials for a google cloud storage backend using a a serialized service account key

Parameters:

Name Type Description Default
key str

The serialized service account key.

required
Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class ServiceAccountKey:
    """Credentials for a google cloud storage backend using a a serialized service account key

    Parameters
    ----------
    key: str
        The serialized service account key.
    """
    def __init__(self, key: str) -> None: ...

IcechunkError #

Bases: Exception

Base class for all Icechunk errors

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class IcechunkError(Exception):
    """Base class for all Icechunk errors"""

    @property
    def message(self) -> str: ...

IcechunkStore #

Bases: Store, SyncMixin

Methods:

Name Description
__init__

Create a new IcechunkStore.

clear

Clear the store.

delete

Remove a key from the store

delete_dir

Delete a prefix

exists

Check if a key exists in the store.

get

Retrieve the value associated with a given key.

get_partial_values

Retrieve possibly partial values from given key_ranges.

is_empty

Check if the directory is empty.

list

Retrieve all keys in the store.

list_dir

Retrieve all keys and prefixes with a given prefix and which do not contain the character

list_prefix

Retrieve all keys in the store that begin with a given prefix. Keys are returned relative

set

Store a (key, value) pair.

set_if_not_exists

Store a key to value if the key is not already present.

set_partial_values

Store values at a given key, starting at byte range_start.

set_virtual_ref

Store a virtual reference to a chunk.

set_virtual_ref_async

Store a virtual reference to a chunk asynchronously.

set_virtual_refs

Store multiple virtual references for the same array.

set_virtual_refs_async

Store multiple virtual references for the same array asynchronously.

sync_clear

Clear the store.

Attributes:

Name Type Description
supports_listing bool

Does the store support listing?

supports_partial_writes Literal[False]

Does the store support partial writes?

supports_writes bool

Does the store support writes?

Source code in icechunk-python/python/icechunk/store.py
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
class IcechunkStore(Store, SyncMixin):
    _store: PyStore
    _for_fork: bool

    def __init__(
        self,
        store: PyStore,
        for_fork: bool,
        read_only: bool | None = None,
        *args: Any,
        **kwargs: Any,
    ):
        """Create a new IcechunkStore.

        This should not be called directly, instead use the `create`, `open_existing` or `open_or_create` class methods.
        """
        read_only = read_only if read_only is not None else store.read_only
        super().__init__(read_only=read_only)
        if store is None:
            raise ValueError(
                "An IcechunkStore should not be created with the default constructor, instead use either the create or open_existing class methods."
            )
        self._store = store
        self._is_open = True
        self._for_fork = for_fork

    def __eq__(self, value: object) -> bool:
        if not isinstance(value, IcechunkStore):
            return False
        return self._store == value._store

    def __getstate__(self) -> object:
        # for read_only sessions we allow pickling, this allows distributed reads without forking
        writable = not self.session.read_only
        if writable and not self._for_fork:
            raise ValueError(
                "You must opt-in to pickle writable sessions in a distributed context "
                "using Session.fork(). "
                # link to docs
                "If you are using xarray's `Dataset.to_zarr` method to write dask arrays, "
                "please use `icechunk.xarray.to_icechunk` instead. "
            )
        d = self.__dict__.copy()
        # we serialize the Rust store as bytes
        d["_store"] = self._store.as_bytes()
        d["_for_fork"] = self._for_fork
        return d

    def __setstate__(self, state: Any) -> None:
        # we have to deserialize the bytes of the Rust store
        store_repr = state["_store"]
        state["_store"] = PyStore.from_bytes(store_repr)
        self.__dict__ = state

    def with_read_only(self, read_only: bool = False) -> Store:
        new_store = IcechunkStore(store=self._store, for_fork=False, read_only=read_only)
        new_store._is_open = False
        return new_store

    @property
    def session(self) -> "Session":
        from icechunk.session import ForkSession, Session

        if self._for_fork:
            return ForkSession(self._store.session)
        else:
            return Session(self._store.session)

    async def clear(self) -> None:
        """Clear the store.

        This will remove all contents from the current session,
        including all groups and all arrays. But it will not modify the repository history.
        """
        return await self._store.clear()

    def sync_clear(self) -> None:
        """Clear the store.

        This will remove all contents from the current session,
        including all groups and all arrays. But it will not modify the repository history.
        """
        return self._store.sync_clear()

    async def is_empty(self, prefix: str) -> bool:
        """
        Check if the directory is empty.

        Parameters
        ----------
        prefix : str
            Prefix of keys to check.

        Returns
        -------
        bool
            True if the store is empty, False otherwise.
        """
        return await self._store.is_empty(prefix)

    async def get(
        self,
        key: str,
        prototype: BufferPrototype,
        byte_range: ByteRequest | None = None,
    ) -> Buffer | None:
        """Retrieve the value associated with a given key.

        Parameters
        ----------
        key : str
        byte_range : ByteRequest, optional

            ByteRequest may be one of the following. If not provided, all data associated with the key is retrieved.

            - RangeByteRequest(int, int): Request a specific range of bytes in the form (start, end). The end is exclusive. If the given range is zero-length or starts after the end of the object, an error will be returned. Additionally, if the range ends after the end of the object, the entire remainder of the object will be returned. Otherwise, the exact requested range will be returned.
            - OffsetByteRequest(int): Request all bytes starting from a given byte offset. This is equivalent to bytes={int}- as an HTTP header.
            - SuffixByteRequest(int): Request the last int bytes. Note that here, int is the size of the request, not the byte offset. This is equivalent to bytes=-{int} as an HTTP header.

        Returns
        -------
        Buffer
        """

        try:
            result = await self._store.get(key, _byte_request_to_tuple(byte_range))
        except KeyError as _e:
            # Zarr python expects None to be returned if the key does not exist
            # but an IcechunkStore returns an error if the key does not exist
            return None

        return prototype.buffer.from_bytes(result)

    async def get_partial_values(
        self,
        prototype: BufferPrototype,
        key_ranges: Iterable[tuple[str, ByteRequest | None]],
    ) -> list[Buffer | None]:
        """Retrieve possibly partial values from given key_ranges.

        Parameters
        ----------
        key_ranges : Iterable[tuple[str, tuple[int | None, int | None]]]
            Ordered set of key, range pairs, a key may occur multiple times with different ranges

        Returns
        -------
        list of values, in the order of the key_ranges, may contain null/none for missing keys
        """
        # NOTE: pyo3 has not implicit conversion from an Iterable to a rust iterable. So we convert it
        # to a list here first. Possible opportunity for optimization.
        ranges = [(k[0], _byte_request_to_tuple(k[1])) for k in key_ranges]
        result = await self._store.get_partial_values(list(ranges))
        return [prototype.buffer.from_bytes(r) for r in result]

    async def exists(self, key: str) -> bool:
        """Check if a key exists in the store.

        Parameters
        ----------
        key : str

        Returns
        -------
        bool
        """
        return await self._store.exists(key)

    @property
    def supports_writes(self) -> bool:
        """Does the store support writes?"""
        return self._store.supports_writes

    async def set(self, key: str, value: Buffer) -> None:
        """Store a (key, value) pair.

        Parameters
        ----------
        key : str
        value : Buffer
        """
        if not isinstance(value, Buffer):
            raise TypeError(
                f"IcechunkStore.set(): `value` must be a Buffer instance. Got an instance of {type(value)} instead."
            )
        return await self._store.set(key, value.to_bytes())

    async def set_if_not_exists(self, key: str, value: Buffer) -> None:
        """
        Store a key to ``value`` if the key is not already present.

        Parameters
        -----------
        key : str
        value : Buffer
        """
        return await self._store.set_if_not_exists(key, value.to_bytes())

    def set_virtual_ref(
        self,
        key: str,
        location: str,
        *,
        offset: int,
        length: int,
        checksum: str | datetime | None = None,
        validate_container: bool = True,
    ) -> None:
        """Store a virtual reference to a chunk.

        Parameters
        ----------
        key : str
            The chunk to store the reference under. This is the fully qualified zarr key eg: 'array/c/0/0/0'
        location : str
            The location of the chunk in storage. This is absolute path to the chunk in storage eg: 's3://bucket/path/to/file.nc'
        offset : int
            The offset in bytes from the start of the file location in storage the chunk starts at
        length : int
            The length of the chunk in bytes, measured from the given offset
        checksum : str | datetime | None
            The etag or last_medified_at field of the object
        validate_container: bool
            If set to true, fail for locations that don't match any existing virtual chunk container
        """
        return self._store.set_virtual_ref(
            key, location, offset, length, checksum, validate_container
        )

    async def set_virtual_ref_async(
        self,
        key: str,
        location: str,
        *,
        offset: int,
        length: int,
        checksum: str | datetime | None = None,
        validate_container: bool = True,
    ) -> None:
        """Store a virtual reference to a chunk asynchronously.

        Parameters
        ----------
        key : str
            The chunk to store the reference under. This is the fully qualified zarr key eg: 'array/c/0/0/0'
        location : str
            The location of the chunk in storage. This is absolute path to the chunk in storage eg: 's3://bucket/path/to/file.nc'
        offset : int
            The offset in bytes from the start of the file location in storage the chunk starts at
        length : int
            The length of the chunk in bytes, measured from the given offset
        checksum : str | datetime | None
            The etag or last_medified_at field of the object
        validate_container: bool
            If set to true, fail for locations that don't match any existing virtual chunk container
        """
        return await self._store.set_virtual_ref_async(
            key, location, offset, length, checksum, validate_container
        )

    def set_virtual_refs(
        self,
        array_path: str,
        chunks: list[VirtualChunkSpec],
        *,
        validate_containers: bool = True,
    ) -> list[tuple[int, ...]] | None:
        """Store multiple virtual references for the same array.

        Parameters
        ----------
        array_path : str
            The path to the array inside the Zarr store. Example: "/groupA/groupB/outputs/my-array"
        chunks : list[VirtualChunkSpec],
            The list of virtual chunks to add
        validate_containers: bool
            If set to true, ignore virtual references for locations that don't match any existing virtual chunk container


        Returns
        -------
        list[tuple[int, ...]] | None

            If all virtual references where successfully updated, it returns None.
            If there were validation errors, it returns the chunk indices of all failed references.
        """
        return self._store.set_virtual_refs(array_path, chunks, validate_containers)

    async def set_virtual_refs_async(
        self,
        array_path: str,
        chunks: list[VirtualChunkSpec],
        *,
        validate_containers: bool = True,
    ) -> list[tuple[int, ...]] | None:
        """Store multiple virtual references for the same array asynchronously.

        Parameters
        ----------
        array_path : str
            The path to the array inside the Zarr store. Example: "/groupA/groupB/outputs/my-array"
        chunks : list[VirtualChunkSpec],
            The list of virtual chunks to add
        validate_containers: bool
            If set to true, ignore virtual references for locations that don't match any existing virtual chunk container


        Returns
        -------
        list[tuple[int, ...]] | None

            If all virtual references where successfully updated, it returns None.
            If there were validation errors, it returns the chunk indices of all failed references.
        """
        return await self._store.set_virtual_refs_async(
            array_path, chunks, validate_containers
        )

    async def delete(self, key: str) -> None:
        """Remove a key from the store

        Parameters
        ----------
        key : str
        """
        return await self._store.delete(key)

    async def delete_dir(self, prefix: str) -> None:
        """Delete a prefix

        Parameters
        ----------
        prefix : str
        """
        return await self._store.delete_dir(prefix)

    @property
    def supports_partial_writes(self) -> Literal[False]:
        """Does the store support partial writes?

        Partial writes are no longer used by Zarr, so this is always false.
        """
        return self._store.supports_partial_writes  # type: ignore[return-value]

    async def set_partial_values(
        self, key_start_values: Iterable[tuple[str, int, BytesLike]]
    ) -> None:
        """Store values at a given key, starting at byte range_start.

        Parameters
        ----------
        key_start_values : list[tuple[str, int, BytesLike]]
            set of key, range_start, values triples, a key may occur multiple times with different
            range_starts, range_starts (considering the length of the respective values) must not
            specify overlapping ranges for the same key
        """
        # NOTE: pyo3 does not implicit conversion from an Iterable to a rust iterable. So we convert it
        # to a list here first. Possible opportunity for optimization.
        # NOTE: currently we only implement the case where the values are bytes
        return await self._store.set_partial_values(list(key_start_values))  # type: ignore[arg-type]

    @property
    def supports_listing(self) -> bool:
        """Does the store support listing?"""
        return self._store.supports_listing

    @property
    def supports_consolidated_metadata(self) -> bool:
        return self._store.supports_consolidated_metadata

    @property
    def supports_deletes(self) -> bool:
        return self._store.supports_deletes

    def list(self) -> AsyncIterator[str]:
        """Retrieve all keys in the store.

        Returns
        -------
        AsyncIterator[str, None]
        """
        # This method should be async, like overridden methods in child classes.
        # However, that's not straightforward:
        # https://stackoverflow.com/questions/68905848

        # The zarr spec specefies that that this and other
        # listing methods should not be async, so we need to
        # wrap the async method in a sync method.
        return self._store.list()

    def list_prefix(self, prefix: str) -> AsyncIterator[str]:
        """Retrieve all keys in the store that begin with a given prefix. Keys are returned relative
        to the root of the store.

        Parameters
        ----------
        prefix : str

        Returns
        -------
        AsyncIterator[str, None]
        """
        # The zarr spec specefies that that this and other
        # listing methods should not be async, so we need to
        # wrap the async method in a sync method.
        return self._store.list_prefix(prefix)

    def list_dir(self, prefix: str) -> AsyncIterator[str]:
        """
        Retrieve all keys and prefixes with a given prefix and which do not contain the character
        “/” after the given prefix.

        Parameters
        ----------
        prefix : str

        Returns
        -------
        AsyncIterator[str, None]
        """
        # The zarr spec specefies that that this and other
        # listing methods should not be async, so we need to
        # wrap the async method in a sync method.
        return self._store.list_dir(prefix)

    async def getsize(self, key: str) -> int:
        return await self._store.getsize(key)

    async def getsize_prefix(self, prefix: str) -> int:
        return await self._store.getsize_prefix(prefix)

supports_listing property #

supports_listing

Does the store support listing?

supports_partial_writes property #

supports_partial_writes

Does the store support partial writes?

Partial writes are no longer used by Zarr, so this is always false.

supports_writes property #

supports_writes

Does the store support writes?

__init__ #

__init__(store, for_fork, read_only=None, *args, **kwargs)

Create a new IcechunkStore.

This should not be called directly, instead use the create, open_existing or open_or_create class methods.

Source code in icechunk-python/python/icechunk/store.py
def __init__(
    self,
    store: PyStore,
    for_fork: bool,
    read_only: bool | None = None,
    *args: Any,
    **kwargs: Any,
):
    """Create a new IcechunkStore.

    This should not be called directly, instead use the `create`, `open_existing` or `open_or_create` class methods.
    """
    read_only = read_only if read_only is not None else store.read_only
    super().__init__(read_only=read_only)
    if store is None:
        raise ValueError(
            "An IcechunkStore should not be created with the default constructor, instead use either the create or open_existing class methods."
        )
    self._store = store
    self._is_open = True
    self._for_fork = for_fork

clear async #

clear()

Clear the store.

This will remove all contents from the current session, including all groups and all arrays. But it will not modify the repository history.

Source code in icechunk-python/python/icechunk/store.py
async def clear(self) -> None:
    """Clear the store.

    This will remove all contents from the current session,
    including all groups and all arrays. But it will not modify the repository history.
    """
    return await self._store.clear()

delete async #

delete(key)

Remove a key from the store

Parameters:

Name Type Description Default
key str
required
Source code in icechunk-python/python/icechunk/store.py
async def delete(self, key: str) -> None:
    """Remove a key from the store

    Parameters
    ----------
    key : str
    """
    return await self._store.delete(key)

delete_dir async #

delete_dir(prefix)

Delete a prefix

Parameters:

Name Type Description Default
prefix str
required
Source code in icechunk-python/python/icechunk/store.py
async def delete_dir(self, prefix: str) -> None:
    """Delete a prefix

    Parameters
    ----------
    prefix : str
    """
    return await self._store.delete_dir(prefix)

exists async #

exists(key)

Check if a key exists in the store.

Parameters:

Name Type Description Default
key str
required

Returns:

Type Description
bool
Source code in icechunk-python/python/icechunk/store.py
async def exists(self, key: str) -> bool:
    """Check if a key exists in the store.

    Parameters
    ----------
    key : str

    Returns
    -------
    bool
    """
    return await self._store.exists(key)

get async #

get(key, prototype, byte_range=None)

Retrieve the value associated with a given key.

Parameters:

Name Type Description Default
key str
required
byte_range ByteRequest

ByteRequest may be one of the following. If not provided, all data associated with the key is retrieved.

  • RangeByteRequest(int, int): Request a specific range of bytes in the form (start, end). The end is exclusive. If the given range is zero-length or starts after the end of the object, an error will be returned. Additionally, if the range ends after the end of the object, the entire remainder of the object will be returned. Otherwise, the exact requested range will be returned.
  • OffsetByteRequest(int): Request all bytes starting from a given byte offset. This is equivalent to bytes={int}- as an HTTP header.
  • SuffixByteRequest(int): Request the last int bytes. Note that here, int is the size of the request, not the byte offset. This is equivalent to bytes=-{int} as an HTTP header.
None

Returns:

Type Description
Buffer
Source code in icechunk-python/python/icechunk/store.py
async def get(
    self,
    key: str,
    prototype: BufferPrototype,
    byte_range: ByteRequest | None = None,
) -> Buffer | None:
    """Retrieve the value associated with a given key.

    Parameters
    ----------
    key : str
    byte_range : ByteRequest, optional

        ByteRequest may be one of the following. If not provided, all data associated with the key is retrieved.

        - RangeByteRequest(int, int): Request a specific range of bytes in the form (start, end). The end is exclusive. If the given range is zero-length or starts after the end of the object, an error will be returned. Additionally, if the range ends after the end of the object, the entire remainder of the object will be returned. Otherwise, the exact requested range will be returned.
        - OffsetByteRequest(int): Request all bytes starting from a given byte offset. This is equivalent to bytes={int}- as an HTTP header.
        - SuffixByteRequest(int): Request the last int bytes. Note that here, int is the size of the request, not the byte offset. This is equivalent to bytes=-{int} as an HTTP header.

    Returns
    -------
    Buffer
    """

    try:
        result = await self._store.get(key, _byte_request_to_tuple(byte_range))
    except KeyError as _e:
        # Zarr python expects None to be returned if the key does not exist
        # but an IcechunkStore returns an error if the key does not exist
        return None

    return prototype.buffer.from_bytes(result)

get_partial_values async #

get_partial_values(prototype, key_ranges)

Retrieve possibly partial values from given key_ranges.

Parameters:

Name Type Description Default
key_ranges Iterable[tuple[str, tuple[int | None, int | None]]]

Ordered set of key, range pairs, a key may occur multiple times with different ranges

required

Returns:

Type Description
list of values, in the order of the key_ranges, may contain null/none for missing keys
Source code in icechunk-python/python/icechunk/store.py
async def get_partial_values(
    self,
    prototype: BufferPrototype,
    key_ranges: Iterable[tuple[str, ByteRequest | None]],
) -> list[Buffer | None]:
    """Retrieve possibly partial values from given key_ranges.

    Parameters
    ----------
    key_ranges : Iterable[tuple[str, tuple[int | None, int | None]]]
        Ordered set of key, range pairs, a key may occur multiple times with different ranges

    Returns
    -------
    list of values, in the order of the key_ranges, may contain null/none for missing keys
    """
    # NOTE: pyo3 has not implicit conversion from an Iterable to a rust iterable. So we convert it
    # to a list here first. Possible opportunity for optimization.
    ranges = [(k[0], _byte_request_to_tuple(k[1])) for k in key_ranges]
    result = await self._store.get_partial_values(list(ranges))
    return [prototype.buffer.from_bytes(r) for r in result]

is_empty async #

is_empty(prefix)

Check if the directory is empty.

Parameters:

Name Type Description Default
prefix str

Prefix of keys to check.

required

Returns:

Type Description
bool

True if the store is empty, False otherwise.

Source code in icechunk-python/python/icechunk/store.py
async def is_empty(self, prefix: str) -> bool:
    """
    Check if the directory is empty.

    Parameters
    ----------
    prefix : str
        Prefix of keys to check.

    Returns
    -------
    bool
        True if the store is empty, False otherwise.
    """
    return await self._store.is_empty(prefix)

list #

list()

Retrieve all keys in the store.

Returns:

Type Description
AsyncIterator[str, None]
Source code in icechunk-python/python/icechunk/store.py
def list(self) -> AsyncIterator[str]:
    """Retrieve all keys in the store.

    Returns
    -------
    AsyncIterator[str, None]
    """
    # This method should be async, like overridden methods in child classes.
    # However, that's not straightforward:
    # https://stackoverflow.com/questions/68905848

    # The zarr spec specefies that that this and other
    # listing methods should not be async, so we need to
    # wrap the async method in a sync method.
    return self._store.list()

list_dir #

list_dir(prefix)

Retrieve all keys and prefixes with a given prefix and which do not contain the character “/” after the given prefix.

Parameters:

Name Type Description Default
prefix str
required

Returns:

Type Description
AsyncIterator[str, None]
Source code in icechunk-python/python/icechunk/store.py
def list_dir(self, prefix: str) -> AsyncIterator[str]:
    """
    Retrieve all keys and prefixes with a given prefix and which do not contain the character
    “/” after the given prefix.

    Parameters
    ----------
    prefix : str

    Returns
    -------
    AsyncIterator[str, None]
    """
    # The zarr spec specefies that that this and other
    # listing methods should not be async, so we need to
    # wrap the async method in a sync method.
    return self._store.list_dir(prefix)

list_prefix #

list_prefix(prefix)

Retrieve all keys in the store that begin with a given prefix. Keys are returned relative to the root of the store.

Parameters:

Name Type Description Default
prefix str
required

Returns:

Type Description
AsyncIterator[str, None]
Source code in icechunk-python/python/icechunk/store.py
def list_prefix(self, prefix: str) -> AsyncIterator[str]:
    """Retrieve all keys in the store that begin with a given prefix. Keys are returned relative
    to the root of the store.

    Parameters
    ----------
    prefix : str

    Returns
    -------
    AsyncIterator[str, None]
    """
    # The zarr spec specefies that that this and other
    # listing methods should not be async, so we need to
    # wrap the async method in a sync method.
    return self._store.list_prefix(prefix)

set async #

set(key, value)

Store a (key, value) pair.

Parameters:

Name Type Description Default
key str
required
value Buffer
required
Source code in icechunk-python/python/icechunk/store.py
async def set(self, key: str, value: Buffer) -> None:
    """Store a (key, value) pair.

    Parameters
    ----------
    key : str
    value : Buffer
    """
    if not isinstance(value, Buffer):
        raise TypeError(
            f"IcechunkStore.set(): `value` must be a Buffer instance. Got an instance of {type(value)} instead."
        )
    return await self._store.set(key, value.to_bytes())

set_if_not_exists async #

set_if_not_exists(key, value)

Store a key to value if the key is not already present.

Parameters:

Name Type Description Default
key str
required
value Buffer
required
Source code in icechunk-python/python/icechunk/store.py
async def set_if_not_exists(self, key: str, value: Buffer) -> None:
    """
    Store a key to ``value`` if the key is not already present.

    Parameters
    -----------
    key : str
    value : Buffer
    """
    return await self._store.set_if_not_exists(key, value.to_bytes())

set_partial_values async #

set_partial_values(key_start_values)

Store values at a given key, starting at byte range_start.

Parameters:

Name Type Description Default
key_start_values list[tuple[str, int, BytesLike]]

set of key, range_start, values triples, a key may occur multiple times with different range_starts, range_starts (considering the length of the respective values) must not specify overlapping ranges for the same key

required
Source code in icechunk-python/python/icechunk/store.py
async def set_partial_values(
    self, key_start_values: Iterable[tuple[str, int, BytesLike]]
) -> None:
    """Store values at a given key, starting at byte range_start.

    Parameters
    ----------
    key_start_values : list[tuple[str, int, BytesLike]]
        set of key, range_start, values triples, a key may occur multiple times with different
        range_starts, range_starts (considering the length of the respective values) must not
        specify overlapping ranges for the same key
    """
    # NOTE: pyo3 does not implicit conversion from an Iterable to a rust iterable. So we convert it
    # to a list here first. Possible opportunity for optimization.
    # NOTE: currently we only implement the case where the values are bytes
    return await self._store.set_partial_values(list(key_start_values))  # type: ignore[arg-type]

set_virtual_ref #

set_virtual_ref(key, location, *, offset, length, checksum=None, validate_container=True)

Store a virtual reference to a chunk.

Parameters:

Name Type Description Default
key str

The chunk to store the reference under. This is the fully qualified zarr key eg: 'array/c/0/0/0'

required
location str

The location of the chunk in storage. This is absolute path to the chunk in storage eg: 's3://bucket/path/to/file.nc'

required
offset int

The offset in bytes from the start of the file location in storage the chunk starts at

required
length int

The length of the chunk in bytes, measured from the given offset

required
checksum str | datetime | None

The etag or last_medified_at field of the object

None
validate_container bool

If set to true, fail for locations that don't match any existing virtual chunk container

True
Source code in icechunk-python/python/icechunk/store.py
def set_virtual_ref(
    self,
    key: str,
    location: str,
    *,
    offset: int,
    length: int,
    checksum: str | datetime | None = None,
    validate_container: bool = True,
) -> None:
    """Store a virtual reference to a chunk.

    Parameters
    ----------
    key : str
        The chunk to store the reference under. This is the fully qualified zarr key eg: 'array/c/0/0/0'
    location : str
        The location of the chunk in storage. This is absolute path to the chunk in storage eg: 's3://bucket/path/to/file.nc'
    offset : int
        The offset in bytes from the start of the file location in storage the chunk starts at
    length : int
        The length of the chunk in bytes, measured from the given offset
    checksum : str | datetime | None
        The etag or last_medified_at field of the object
    validate_container: bool
        If set to true, fail for locations that don't match any existing virtual chunk container
    """
    return self._store.set_virtual_ref(
        key, location, offset, length, checksum, validate_container
    )

set_virtual_ref_async async #

set_virtual_ref_async(key, location, *, offset, length, checksum=None, validate_container=True)

Store a virtual reference to a chunk asynchronously.

Parameters:

Name Type Description Default
key str

The chunk to store the reference under. This is the fully qualified zarr key eg: 'array/c/0/0/0'

required
location str

The location of the chunk in storage. This is absolute path to the chunk in storage eg: 's3://bucket/path/to/file.nc'

required
offset int

The offset in bytes from the start of the file location in storage the chunk starts at

required
length int

The length of the chunk in bytes, measured from the given offset

required
checksum str | datetime | None

The etag or last_medified_at field of the object

None
validate_container bool

If set to true, fail for locations that don't match any existing virtual chunk container

True
Source code in icechunk-python/python/icechunk/store.py
async def set_virtual_ref_async(
    self,
    key: str,
    location: str,
    *,
    offset: int,
    length: int,
    checksum: str | datetime | None = None,
    validate_container: bool = True,
) -> None:
    """Store a virtual reference to a chunk asynchronously.

    Parameters
    ----------
    key : str
        The chunk to store the reference under. This is the fully qualified zarr key eg: 'array/c/0/0/0'
    location : str
        The location of the chunk in storage. This is absolute path to the chunk in storage eg: 's3://bucket/path/to/file.nc'
    offset : int
        The offset in bytes from the start of the file location in storage the chunk starts at
    length : int
        The length of the chunk in bytes, measured from the given offset
    checksum : str | datetime | None
        The etag or last_medified_at field of the object
    validate_container: bool
        If set to true, fail for locations that don't match any existing virtual chunk container
    """
    return await self._store.set_virtual_ref_async(
        key, location, offset, length, checksum, validate_container
    )

set_virtual_refs #

set_virtual_refs(array_path, chunks, *, validate_containers=True)

Store multiple virtual references for the same array.

Parameters:

Name Type Description Default
array_path str

The path to the array inside the Zarr store. Example: "/groupA/groupB/outputs/my-array"

required
chunks (list[VirtualChunkSpec],)

The list of virtual chunks to add

required
validate_containers bool

If set to true, ignore virtual references for locations that don't match any existing virtual chunk container

True

Returns:

Type Description
list[tuple[int, ...]] | None

If all virtual references where successfully updated, it returns None. If there were validation errors, it returns the chunk indices of all failed references.

Source code in icechunk-python/python/icechunk/store.py
def set_virtual_refs(
    self,
    array_path: str,
    chunks: list[VirtualChunkSpec],
    *,
    validate_containers: bool = True,
) -> list[tuple[int, ...]] | None:
    """Store multiple virtual references for the same array.

    Parameters
    ----------
    array_path : str
        The path to the array inside the Zarr store. Example: "/groupA/groupB/outputs/my-array"
    chunks : list[VirtualChunkSpec],
        The list of virtual chunks to add
    validate_containers: bool
        If set to true, ignore virtual references for locations that don't match any existing virtual chunk container


    Returns
    -------
    list[tuple[int, ...]] | None

        If all virtual references where successfully updated, it returns None.
        If there were validation errors, it returns the chunk indices of all failed references.
    """
    return self._store.set_virtual_refs(array_path, chunks, validate_containers)

set_virtual_refs_async async #

set_virtual_refs_async(array_path, chunks, *, validate_containers=True)

Store multiple virtual references for the same array asynchronously.

Parameters:

Name Type Description Default
array_path str

The path to the array inside the Zarr store. Example: "/groupA/groupB/outputs/my-array"

required
chunks (list[VirtualChunkSpec],)

The list of virtual chunks to add

required
validate_containers bool

If set to true, ignore virtual references for locations that don't match any existing virtual chunk container

True

Returns:

Type Description
list[tuple[int, ...]] | None

If all virtual references where successfully updated, it returns None. If there were validation errors, it returns the chunk indices of all failed references.

Source code in icechunk-python/python/icechunk/store.py
async def set_virtual_refs_async(
    self,
    array_path: str,
    chunks: list[VirtualChunkSpec],
    *,
    validate_containers: bool = True,
) -> list[tuple[int, ...]] | None:
    """Store multiple virtual references for the same array asynchronously.

    Parameters
    ----------
    array_path : str
        The path to the array inside the Zarr store. Example: "/groupA/groupB/outputs/my-array"
    chunks : list[VirtualChunkSpec],
        The list of virtual chunks to add
    validate_containers: bool
        If set to true, ignore virtual references for locations that don't match any existing virtual chunk container


    Returns
    -------
    list[tuple[int, ...]] | None

        If all virtual references where successfully updated, it returns None.
        If there were validation errors, it returns the chunk indices of all failed references.
    """
    return await self._store.set_virtual_refs_async(
        array_path, chunks, validate_containers
    )

sync_clear #

sync_clear()

Clear the store.

This will remove all contents from the current session, including all groups and all arrays. But it will not modify the repository history.

Source code in icechunk-python/python/icechunk/store.py
def sync_clear(self) -> None:
    """Clear the store.

    This will remove all contents from the current session,
    including all groups and all arrays. But it will not modify the repository history.
    """
    return self._store.sync_clear()

ManifestConfig #

Configuration for how Icechunk manifests

Methods:

Name Description
__init__

Create a new ManifestConfig object

Attributes:

Name Type Description
preload ManifestPreloadConfig | None

The configuration for how Icechunk manifests will be preloaded.

splitting ManifestSplittingConfig | None

The configuration for how Icechunk manifests will be split.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class ManifestConfig:
    """Configuration for how Icechunk manifests"""

    def __init__(
        self,
        preload: ManifestPreloadConfig | None = None,
        splitting: ManifestSplittingConfig | None = None,
    ) -> None:
        """
        Create a new `ManifestConfig` object

        Parameters
        ----------
        preload: ManifestPreloadConfig | None
            The configuration for how Icechunk manifests will be preloaded.
        splitting: ManifestSplittingConfig | None
            The configuration for how Icechunk manifests will be split.
        """
        ...
    @property
    def preload(self) -> ManifestPreloadConfig | None:
        """
        The configuration for how Icechunk manifests will be preloaded.

        Returns
        -------
        ManifestPreloadConfig | None
            The configuration for how Icechunk manifests will be preloaded.
        """
        ...
    @preload.setter
    def preload(self, value: ManifestPreloadConfig | None) -> None:
        """
        Set the configuration for how Icechunk manifests will be preloaded.

        Parameters
        ----------
        value: ManifestPreloadConfig | None
            The configuration for how Icechunk manifests will be preloaded.
        """
        ...

    @property
    def splitting(self) -> ManifestSplittingConfig | None:
        """
        The configuration for how Icechunk manifests will be split.

        Returns
        -------
        ManifestSplittingConfig | None
            The configuration for how Icechunk manifests will be split.
        """
        ...

    @splitting.setter
    def splitting(self, value: ManifestSplittingConfig | None) -> None:
        """
        Set the configuration for how Icechunk manifests will be split.

        Parameters
        ----------
        value: ManifestSplittingConfig | None
            The configuration for how Icechunk manifests will be split.
        """
        ...

preload property writable #

preload

The configuration for how Icechunk manifests will be preloaded.

Returns:

Type Description
ManifestPreloadConfig | None

The configuration for how Icechunk manifests will be preloaded.

splitting property writable #

splitting

The configuration for how Icechunk manifests will be split.

Returns:

Type Description
ManifestSplittingConfig | None

The configuration for how Icechunk manifests will be split.

__init__ #

__init__(preload=None, splitting=None)

Create a new ManifestConfig object

Parameters:

Name Type Description Default
preload ManifestPreloadConfig | None

The configuration for how Icechunk manifests will be preloaded.

None
splitting ManifestSplittingConfig | None

The configuration for how Icechunk manifests will be split.

None
Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
def __init__(
    self,
    preload: ManifestPreloadConfig | None = None,
    splitting: ManifestSplittingConfig | None = None,
) -> None:
    """
    Create a new `ManifestConfig` object

    Parameters
    ----------
    preload: ManifestPreloadConfig | None
        The configuration for how Icechunk manifests will be preloaded.
    splitting: ManifestSplittingConfig | None
        The configuration for how Icechunk manifests will be split.
    """
    ...

ManifestFileInfo #

Manifest file metadata

Attributes:

Name Type Description
id str

The manifest id

num_chunk_refs int

The number of chunk references contained in this manifest

size_bytes int

The size in bytes of the

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class ManifestFileInfo:
    """Manifest file metadata"""

    @property
    def id(self) -> str:
        """The manifest id"""
        ...
    @property
    def size_bytes(self) -> int:
        """The size in bytes of the"""
        ...
    @property
    def num_chunk_refs(self) -> int:
        """The number of chunk references contained in this manifest"""
        ...

id property #

id

The manifest id

num_chunk_refs property #

num_chunk_refs

The number of chunk references contained in this manifest

size_bytes property #

size_bytes

The size in bytes of the

ManifestPreloadCondition #

Configuration for conditions under which manifests will preload on session creation

Methods:

Name Description
__and__

Create a preload condition that matches if both this condition and other match.

__or__

Create a preload condition that matches if either this condition or other match.

and_conditions

Create a preload condition that matches only if all passed conditions match

false

Create a preload condition that never matches any manifests

name_matches

Create a preload condition that matches if the array's name matches the passed regex.

num_refs

Create a preload condition that matches only if the number of chunk references in the manifest is within the given range.

or_conditions

Create a preload condition that matches if any of conditions matches

path_matches

Create a preload condition that matches if the full path to the array matches the passed regex.

true

Create a preload condition that always matches any manifest

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class ManifestPreloadCondition:
    """Configuration for conditions under which manifests will preload on session creation"""

    @staticmethod
    def or_conditions(
        conditions: list[ManifestPreloadCondition],
    ) -> ManifestPreloadCondition:
        """Create a preload condition that matches if any of `conditions` matches"""
        ...
    @staticmethod
    def and_conditions(
        conditions: list[ManifestPreloadCondition],
    ) -> ManifestPreloadCondition:
        """Create a preload condition that matches only if all passed `conditions` match"""
        ...
    @staticmethod
    def path_matches(regex: str) -> ManifestPreloadCondition:
        """Create a preload condition that matches if the full path to the array matches the passed regex.

        Array paths are absolute, as in `/path/to/my/array`
        """
        ...
    @staticmethod
    def name_matches(regex: str) -> ManifestPreloadCondition:
        """Create a preload condition that matches if the array's name matches the passed regex.

        Example, for an array  `/model/outputs/temperature`, the following will match:
        ```
        name_matches(".*temp.*")
        ```
        """
        ...
    @staticmethod
    def num_refs(from_refs: int | None, to_refs: int | None) -> ManifestPreloadCondition:
        """Create a preload condition that matches only if the number of chunk references in the manifest is within the given range.

        from_refs is inclusive, to_refs is exclusive.
        """
        ...
    @staticmethod
    def true() -> ManifestPreloadCondition:
        """Create a preload condition that always matches any manifest"""
        ...
    @staticmethod
    def false() -> ManifestPreloadCondition:
        """Create a preload condition that never matches any manifests"""
        ...
    def __and__(self, other: ManifestPreloadCondition) -> ManifestPreloadCondition:
        """Create a preload condition that matches if both this condition and `other` match."""
        ...
    def __or__(self, other: ManifestPreloadCondition) -> ManifestPreloadCondition:
        """Create a preload condition that matches if either this condition or `other` match."""
        ...

__and__ #

__and__(other)

Create a preload condition that matches if both this condition and other match.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
def __and__(self, other: ManifestPreloadCondition) -> ManifestPreloadCondition:
    """Create a preload condition that matches if both this condition and `other` match."""
    ...

__or__ #

__or__(other)

Create a preload condition that matches if either this condition or other match.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
def __or__(self, other: ManifestPreloadCondition) -> ManifestPreloadCondition:
    """Create a preload condition that matches if either this condition or `other` match."""
    ...

and_conditions staticmethod #

and_conditions(conditions)

Create a preload condition that matches only if all passed conditions match

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
@staticmethod
def and_conditions(
    conditions: list[ManifestPreloadCondition],
) -> ManifestPreloadCondition:
    """Create a preload condition that matches only if all passed `conditions` match"""
    ...

false staticmethod #

false()

Create a preload condition that never matches any manifests

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
@staticmethod
def false() -> ManifestPreloadCondition:
    """Create a preload condition that never matches any manifests"""
    ...

name_matches staticmethod #

name_matches(regex)

Create a preload condition that matches if the array's name matches the passed regex.

Example, for an array /model/outputs/temperature, the following will match:

name_matches(".*temp.*")

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
@staticmethod
def name_matches(regex: str) -> ManifestPreloadCondition:
    """Create a preload condition that matches if the array's name matches the passed regex.

    Example, for an array  `/model/outputs/temperature`, the following will match:
    ```
    name_matches(".*temp.*")
    ```
    """
    ...

num_refs staticmethod #

num_refs(from_refs, to_refs)

Create a preload condition that matches only if the number of chunk references in the manifest is within the given range.

from_refs is inclusive, to_refs is exclusive.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
@staticmethod
def num_refs(from_refs: int | None, to_refs: int | None) -> ManifestPreloadCondition:
    """Create a preload condition that matches only if the number of chunk references in the manifest is within the given range.

    from_refs is inclusive, to_refs is exclusive.
    """
    ...

or_conditions staticmethod #

or_conditions(conditions)

Create a preload condition that matches if any of conditions matches

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
@staticmethod
def or_conditions(
    conditions: list[ManifestPreloadCondition],
) -> ManifestPreloadCondition:
    """Create a preload condition that matches if any of `conditions` matches"""
    ...

path_matches staticmethod #

path_matches(regex)

Create a preload condition that matches if the full path to the array matches the passed regex.

Array paths are absolute, as in /path/to/my/array

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
@staticmethod
def path_matches(regex: str) -> ManifestPreloadCondition:
    """Create a preload condition that matches if the full path to the array matches the passed regex.

    Array paths are absolute, as in `/path/to/my/array`
    """
    ...

true staticmethod #

true()

Create a preload condition that always matches any manifest

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
@staticmethod
def true() -> ManifestPreloadCondition:
    """Create a preload condition that always matches any manifest"""
    ...

ManifestPreloadConfig #

Configuration for how Icechunk manifest preload on session creation

Methods:

Name Description
__init__

Create a new ManifestPreloadConfig object

Attributes:

Name Type Description
max_arrays_to_scan int | None

The maximum number of arrays to scan when looking for manifests to preload.

max_total_refs int | None

The maximum number of references to preload.

preload_if ManifestPreloadCondition | None

The condition under which manifests will be preloaded.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class ManifestPreloadConfig:
    """Configuration for how Icechunk manifest preload on session creation"""

    def __init__(
        self,
        max_total_refs: int | None = None,
        preload_if: ManifestPreloadCondition | None = None,
        max_arrays_to_scan: int | None = None,
    ) -> None:
        """
        Create a new `ManifestPreloadConfig` object

        Parameters
        ----------
        max_total_refs: int | None
            The maximum number of references to preload.
        preload_if: ManifestPreloadCondition | None
            The condition under which manifests will be preloaded.
        max_arrays_to_scan: int | None
            The maximum number of arrays to scan when looking for manifests to preload.
            Default is 50. Increase for repositories with many nested groups.
        """
        ...
    @property
    def max_total_refs(self) -> int | None:
        """
        The maximum number of references to preload.

        Returns
        -------
        int | None
            The maximum number of references to preload.
        """
        ...
    @max_total_refs.setter
    def max_total_refs(self, value: int | None) -> None:
        """
        Set the maximum number of references to preload.

        Parameters
        ----------
        value: int | None
            The maximum number of references to preload.
        """
        ...
    @property
    def preload_if(self) -> ManifestPreloadCondition | None:
        """
        The condition under which manifests will be preloaded.

        Returns
        -------
        ManifestPreloadCondition | None
            The condition under which manifests will be preloaded.
        """
        ...
    @preload_if.setter
    def preload_if(self, value: ManifestPreloadCondition | None) -> None:
        """
        Set the condition under which manifests will be preloaded.

        Parameters
        ----------
        value: ManifestPreloadCondition | None
            The condition under which manifests will be preloaded.
        """
        ...
    @property
    def max_arrays_to_scan(self) -> int | None:
        """
        The maximum number of arrays to scan when looking for manifests to preload.

        Returns
        -------
        int | None
            The maximum number of arrays to scan. Default is 50.
        """
        ...
    @max_arrays_to_scan.setter
    def max_arrays_to_scan(self, value: int | None) -> None:
        """
        Set the maximum number of arrays to scan when looking for manifests to preload.

        Parameters
        ----------
        value: int | None
            The maximum number of arrays to scan.
        """
        ...

max_arrays_to_scan property writable #

max_arrays_to_scan

The maximum number of arrays to scan when looking for manifests to preload.

Returns:

Type Description
int | None

The maximum number of arrays to scan. Default is 50.

max_total_refs property writable #

max_total_refs

The maximum number of references to preload.

Returns:

Type Description
int | None

The maximum number of references to preload.

preload_if property writable #

preload_if

The condition under which manifests will be preloaded.

Returns:

Type Description
ManifestPreloadCondition | None

The condition under which manifests will be preloaded.

__init__ #

__init__(max_total_refs=None, preload_if=None, max_arrays_to_scan=None)

Create a new ManifestPreloadConfig object

Parameters:

Name Type Description Default
max_total_refs int | None

The maximum number of references to preload.

None
preload_if ManifestPreloadCondition | None

The condition under which manifests will be preloaded.

None
max_arrays_to_scan int | None

The maximum number of arrays to scan when looking for manifests to preload. Default is 50. Increase for repositories with many nested groups.

None
Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
def __init__(
    self,
    max_total_refs: int | None = None,
    preload_if: ManifestPreloadCondition | None = None,
    max_arrays_to_scan: int | None = None,
) -> None:
    """
    Create a new `ManifestPreloadConfig` object

    Parameters
    ----------
    max_total_refs: int | None
        The maximum number of references to preload.
    preload_if: ManifestPreloadCondition | None
        The condition under which manifests will be preloaded.
    max_arrays_to_scan: int | None
        The maximum number of arrays to scan when looking for manifests to preload.
        Default is 50. Increase for repositories with many nested groups.
    """
    ...

ManifestSplitCondition #

Configuration for conditions under which manifests will be split into splits

Methods:

Name Description
AnyArray

Create a splitting condition that matches any array.

__and__

Create a splitting condition that matches if both this condition and other match

__or__

Create a splitting condition that matches if either this condition or other matches

and_conditions

Create a splitting condition that matches only if all passed conditions match

name_matches

Create a splitting condition that matches if the array's name matches the passed regex.

or_conditions

Create a splitting condition that matches if any of conditions matches

path_matches

Create a splitting condition that matches if the full path to the array matches the passed regex.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class ManifestSplitCondition:
    """Configuration for conditions under which manifests will be split into splits"""

    @staticmethod
    def or_conditions(
        conditions: list[ManifestSplitCondition],
    ) -> ManifestSplitCondition:
        """Create a splitting condition that matches if any of `conditions` matches"""
        ...
    @staticmethod
    def and_conditions(
        conditions: list[ManifestSplitCondition],
    ) -> ManifestSplitCondition:
        """Create a splitting condition that matches only if all passed `conditions` match"""
        ...
    @staticmethod
    def path_matches(regex: str) -> ManifestSplitCondition:
        """Create a splitting condition that matches if the full path to the array matches the passed regex.

        Array paths are absolute, as in `/path/to/my/array`
        """
        ...
    @staticmethod
    def name_matches(regex: str) -> ManifestSplitCondition:
        """Create a splitting condition that matches if the array's name matches the passed regex.

        Example, for an array  `/model/outputs/temperature`, the following will match:
        ```
        name_matches(".*temp.*")
        ```
        """
        ...

    @staticmethod
    def AnyArray() -> ManifestSplitCondition:
        """Create a splitting condition that matches any array."""
        ...

    def __or__(self, other: ManifestSplitCondition) -> ManifestSplitCondition:
        """Create a splitting condition that matches if either this condition or `other` matches"""
        ...

    def __and__(self, other: ManifestSplitCondition) -> ManifestSplitCondition:
        """Create a splitting condition that matches if both this condition and `other` match"""
        ...

AnyArray staticmethod #

AnyArray()

Create a splitting condition that matches any array.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
@staticmethod
def AnyArray() -> ManifestSplitCondition:
    """Create a splitting condition that matches any array."""
    ...

__and__ #

__and__(other)

Create a splitting condition that matches if both this condition and other match

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
def __and__(self, other: ManifestSplitCondition) -> ManifestSplitCondition:
    """Create a splitting condition that matches if both this condition and `other` match"""
    ...

__or__ #

__or__(other)

Create a splitting condition that matches if either this condition or other matches

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
def __or__(self, other: ManifestSplitCondition) -> ManifestSplitCondition:
    """Create a splitting condition that matches if either this condition or `other` matches"""
    ...

and_conditions staticmethod #

and_conditions(conditions)

Create a splitting condition that matches only if all passed conditions match

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
@staticmethod
def and_conditions(
    conditions: list[ManifestSplitCondition],
) -> ManifestSplitCondition:
    """Create a splitting condition that matches only if all passed `conditions` match"""
    ...

name_matches staticmethod #

name_matches(regex)

Create a splitting condition that matches if the array's name matches the passed regex.

Example, for an array /model/outputs/temperature, the following will match:

name_matches(".*temp.*")

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
@staticmethod
def name_matches(regex: str) -> ManifestSplitCondition:
    """Create a splitting condition that matches if the array's name matches the passed regex.

    Example, for an array  `/model/outputs/temperature`, the following will match:
    ```
    name_matches(".*temp.*")
    ```
    """
    ...

or_conditions staticmethod #

or_conditions(conditions)

Create a splitting condition that matches if any of conditions matches

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
@staticmethod
def or_conditions(
    conditions: list[ManifestSplitCondition],
) -> ManifestSplitCondition:
    """Create a splitting condition that matches if any of `conditions` matches"""
    ...

path_matches staticmethod #

path_matches(regex)

Create a splitting condition that matches if the full path to the array matches the passed regex.

Array paths are absolute, as in /path/to/my/array

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
@staticmethod
def path_matches(regex: str) -> ManifestSplitCondition:
    """Create a splitting condition that matches if the full path to the array matches the passed regex.

    Array paths are absolute, as in `/path/to/my/array`
    """
    ...

ManifestSplitDimCondition #

Conditions for specifying dimensions along which to shard manifests.

Classes:

Name Description
Any

Split along any other unspecified dimension.

Axis

Split along specified integer axis.

DimensionName

Split along specified named dimension.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class ManifestSplitDimCondition:
    """Conditions for specifying dimensions along which to shard manifests."""
    class Axis:
        """Split along specified integer axis."""
        def __init__(self, axis: int) -> None: ...

    class DimensionName:
        """Split along specified named dimension."""
        def __init__(self, regex: str) -> None: ...

    class Any:
        """Split along any other unspecified dimension."""
        def __init__(self) -> None: ...

Any #

Split along any other unspecified dimension.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class Any:
    """Split along any other unspecified dimension."""
    def __init__(self) -> None: ...

Axis #

Split along specified integer axis.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class Axis:
    """Split along specified integer axis."""
    def __init__(self, axis: int) -> None: ...

DimensionName #

Split along specified named dimension.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class DimensionName:
    """Split along specified named dimension."""
    def __init__(self, regex: str) -> None: ...

ManifestSplittingConfig #

Configuration for manifest splitting.

Methods:

Name Description
__init__

Configuration for how Icechunk manifests will be split.

Attributes:

Name Type Description
split_sizes SplitSizes

Configuration for how Icechunk manifests will be split.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class ManifestSplittingConfig:
    """Configuration for manifest splitting."""

    @staticmethod
    def from_dict(
        split_sizes: dict[
            ManifestSplitCondition,
            dict[
                ManifestSplitDimCondition.Axis
                | ManifestSplitDimCondition.DimensionName
                | ManifestSplitDimCondition.Any,
                int,
            ],
        ],
    ) -> ManifestSplittingConfig: ...
    def to_dict(
        config: ManifestSplittingConfig,
    ) -> dict[
        ManifestSplitCondition,
        dict[
            ManifestSplitDimCondition.Axis
            | ManifestSplitDimCondition.DimensionName
            | ManifestSplitDimCondition.Any,
            int,
        ],
    ]: ...
    def __init__(self, split_sizes: SplitSizes) -> None:
        """Configuration for how Icechunk manifests will be split.

        Parameters
        ----------
        split_sizes: tuple[tuple[ManifestSplitCondition, tuple[tuple[ManifestSplitDimCondition, int], ...]], ...]
            The configuration for how Icechunk manifests will be preloaded.

        Examples
        --------

        Split manifests for the `temperature` array, with 3 chunks per shard along the `longitude` dimension.
        >>> ManifestSplittingConfig.from_dict(
        ...     {
        ...         ManifestSplitCondition.name_matches("temperature"): {
        ...             ManifestSplitDimCondition.DimensionName("longitude"): 3
        ...         }
        ...     }
        ... )
        """
        pass

    @property
    def split_sizes(self) -> SplitSizes:
        """
        Configuration for how Icechunk manifests will be split.

        Returns
        -------
        tuple[tuple[ManifestSplitCondition, tuple[tuple[ManifestSplitDimCondition, int], ...]], ...]
            The configuration for how Icechunk manifests will be preloaded.
        """
        ...

    @split_sizes.setter
    def split_sizes(self, value: SplitSizes) -> None:
        """
        Set the sizes for how Icechunk manifests will be split.

        Parameters
        ----------
        value: tuple[tuple[ManifestSplitCondition, tuple[tuple[ManifestSplitDimCondition, int], ...]], ...]
            The configuration for how Icechunk manifests will be preloaded.
        """
        ...

split_sizes property writable #

split_sizes

Configuration for how Icechunk manifests will be split.

Returns:

Type Description
tuple[tuple[ManifestSplitCondition, tuple[tuple[ManifestSplitDimCondition, int], ...]], ...]

The configuration for how Icechunk manifests will be preloaded.

__init__ #

__init__(split_sizes)

Configuration for how Icechunk manifests will be split.

Parameters:

Name Type Description Default
split_sizes SplitSizes

The configuration for how Icechunk manifests will be preloaded.

required

Examples:

Split manifests for the temperature array, with 3 chunks per shard along the longitude dimension.

>>> ManifestSplittingConfig.from_dict(
...     {
...         ManifestSplitCondition.name_matches("temperature"): {
...             ManifestSplitDimCondition.DimensionName("longitude"): 3
...         }
...     }
... )
Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
def __init__(self, split_sizes: SplitSizes) -> None:
    """Configuration for how Icechunk manifests will be split.

    Parameters
    ----------
    split_sizes: tuple[tuple[ManifestSplitCondition, tuple[tuple[ManifestSplitDimCondition, int], ...]], ...]
        The configuration for how Icechunk manifests will be preloaded.

    Examples
    --------

    Split manifests for the `temperature` array, with 3 chunks per shard along the `longitude` dimension.
    >>> ManifestSplittingConfig.from_dict(
    ...     {
    ...         ManifestSplitCondition.name_matches("temperature"): {
    ...             ManifestSplitDimCondition.DimensionName("longitude"): 3
    ...         }
    ...     }
    ... )
    """
    pass

RebaseFailedError #

Bases: IcechunkError

An error that occurs when a rebase operation fails

Methods:

Name Description
__init__

Create a new RebaseFailedError.

Attributes:

Name Type Description
conflicts list[Conflict]

The conflicts that occurred during the rebase operation

snapshot str

The snapshot ID that the session was rebased to

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class RebaseFailedError(IcechunkError):
    """An error that occurs when a rebase operation fails"""

    def __init__(self, snapshot: str, conflicts: list[Conflict]) -> None:
        """
        Create a new RebaseFailedError.

        Parameters
        ----------
        snapshot: str
            The snapshot ID that the session was rebased to.
        conflicts: list[Conflict]
            The conflicts that occurred during the rebase operation.
        """
        ...

    @property
    def snapshot(self) -> str:
        """The snapshot ID that the session was rebased to"""
        ...

    @property
    def conflicts(self) -> list[Conflict]:
        """The conflicts that occurred during the rebase operation

        Returns:
            list[Conflict]: The conflicts that occurred during the rebase operation
        """
    ...

conflicts property #

conflicts

The conflicts that occurred during the rebase operation

Returns: list[Conflict]: The conflicts that occurred during the rebase operation

snapshot property #

snapshot

The snapshot ID that the session was rebased to

__init__ #

__init__(snapshot, conflicts)

Create a new RebaseFailedError.

Parameters:

Name Type Description Default
snapshot str

The snapshot ID that the session was rebased to.

required
conflicts list[Conflict]

The conflicts that occurred during the rebase operation.

required
Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
def __init__(self, snapshot: str, conflicts: list[Conflict]) -> None:
    """
    Create a new RebaseFailedError.

    Parameters
    ----------
    snapshot: str
        The snapshot ID that the session was rebased to.
    conflicts: list[Conflict]
        The conflicts that occurred during the rebase operation.
    """
    ...

Repository #

An Icechunk repository.

Methods:

Name Description
ancestry

Get the ancestry of a snapshot.

async_ancestry

Get the ancestry of a snapshot.

chunk_storage_stats

Calculate the total storage used for chunks, in bytes.

chunk_storage_stats_async

Calculate the total storage used for chunks, in bytes (async version).

create

Create a new Icechunk repository.

create_async

Create a new Icechunk repository asynchronously.

create_branch

Create a new branch at the given snapshot.

create_branch_async

Create a new branch at the given snapshot (async version).

create_tag

Create a new tag at the given snapshot.

create_tag_async

Create a new tag at the given snapshot (async version).

default_commit_metadata

Get the current configured default commit metadata for the repository.

delete_branch

Delete a branch.

delete_branch_async

Delete a branch (async version).

delete_tag

Delete a tag.

delete_tag_async

Delete a tag (async version).

diff

Compute an overview of the operations executed from version from to version to.

diff_async

Compute an overview of the operations executed from version from to version to (async version).

exists

Check if a repository exists at the given storage location.

exists_async

Check if a repository exists at the given storage location (async version).

expire_snapshots

Expire all snapshots older than a threshold.

expire_snapshots_async

Expire all snapshots older than a threshold (async version).

fetch_config

Fetch the configuration for the repository saved in storage.

fetch_config_async

Fetch the configuration for the repository saved in storage (async version).

fetch_spec_version

Fetch the spec version of a repository without fully opening it.

fetch_spec_version_async

Fetch the spec version of a repository without fully opening it (async version).

garbage_collect

Delete any objects no longer accessible from any branches or tags.

garbage_collect_async

Delete any objects no longer accessible from any branches or tags (async version).

get_metadata

Get the current configured repository metadata.

get_metadata_async

Get the current configured repository metadata.

list_branches

List the branches in the repository.

list_branches_async

List the branches in the repository (async version).

list_manifest_files

Get the manifest files used by the given snapshot ID

list_manifest_files_async

Get the manifest files used by the given snapshot ID

list_tags

List the tags in the repository.

list_tags_async

List the tags in the repository (async version).

lookup_branch

Get the tip snapshot ID of a branch.

lookup_branch_async

Get the tip snapshot ID of a branch (async version).

lookup_snapshot

Get the SnapshotInfo given a snapshot ID

lookup_snapshot_async

Get the SnapshotInfo given a snapshot ID (async version)

lookup_tag

Get the snapshot ID of a tag.

lookup_tag_async

Get the snapshot ID of a tag (async version).

open

Open an existing Icechunk repository.

open_async

Open an existing Icechunk repository asynchronously.

open_or_create

Open an existing Icechunk repository or create a new one if it does not exist.

open_or_create_async

Open an existing Icechunk repository or create a new one if it does not exist (async version).

ops_log

Get a summary of changes to the repository

ops_log_async

Get a summary of changes to the repository

readonly_session

Create a read-only session.

readonly_session_async

Create a read-only session (async version).

rearrange_session

Create a session to move/rename nodes in the Zarr hierarchy.

rearrange_session_async

Create a session to move/rename nodes in the Zarr hierarchy.

reopen

Reopen the repository with new configuration or credentials.

reopen_async

Reopen the repository with new configuration or credentials (async version).

reset_branch

Reset a branch to a specific snapshot.

reset_branch_async

Reset a branch to a specific snapshot (async version).

rewrite_manifests

Rewrite manifests for all arrays.

rewrite_manifests_async

Rewrite manifests for all arrays (async version).

save_config

Save the repository configuration to storage, this configuration will be used in future calls to Repository.open.

save_config_async

Save the repository configuration to storage (async version).

set_default_commit_metadata

Set the default commit metadata for the repository. This is useful for providing

set_metadata

Set the repository metadata, the passed dict will replace the complete metadata.

set_metadata_async

Set the repository metadata, the passed dict will replace the complete metadata.

total_chunks_storage

Calculate the total storage used for chunks, in bytes.

total_chunks_storage_async

Calculate the total storage used for chunks, in bytes (async version).

transaction

Create a transaction on a branch.

update_metadata

Update the repository metadata.

update_metadata_async

Update the repository metadata.

writable_session

Create a writable session on a branch.

writable_session_async

Create a writable session on a branch (async version).

Attributes:

Name Type Description
authorized_virtual_container_prefixes set[str]

Get all authorized virtual chunk container prefixes.

config RepositoryConfig

Get a copy of this repository's config.

metadata dict[str, Any]

Get the current configured repository metadata.

storage Storage

Get a copy of this repository's Storage instance.

Source code in icechunk-python/python/icechunk/repository.py
  24
  25
  26
  27
  28
  29
  30
  31
  32
  33
  34
  35
  36
  37
  38
  39
  40
  41
  42
  43
  44
  45
  46
  47
  48
  49
  50
  51
  52
  53
  54
  55
  56
  57
  58
  59
  60
  61
  62
  63
  64
  65
  66
  67
  68
  69
  70
  71
  72
  73
  74
  75
  76
  77
  78
  79
  80
  81
  82
  83
  84
  85
  86
  87
  88
  89
  90
  91
  92
  93
  94
  95
  96
  97
  98
  99
 100
 101
 102
 103
 104
 105
 106
 107
 108
 109
 110
 111
 112
 113
 114
 115
 116
 117
 118
 119
 120
 121
 122
 123
 124
 125
 126
 127
 128
 129
 130
 131
 132
 133
 134
 135
 136
 137
 138
 139
 140
 141
 142
 143
 144
 145
 146
 147
 148
 149
 150
 151
 152
 153
 154
 155
 156
 157
 158
 159
 160
 161
 162
 163
 164
 165
 166
 167
 168
 169
 170
 171
 172
 173
 174
 175
 176
 177
 178
 179
 180
 181
 182
 183
 184
 185
 186
 187
 188
 189
 190
 191
 192
 193
 194
 195
 196
 197
 198
 199
 200
 201
 202
 203
 204
 205
 206
 207
 208
 209
 210
 211
 212
 213
 214
 215
 216
 217
 218
 219
 220
 221
 222
 223
 224
 225
 226
 227
 228
 229
 230
 231
 232
 233
 234
 235
 236
 237
 238
 239
 240
 241
 242
 243
 244
 245
 246
 247
 248
 249
 250
 251
 252
 253
 254
 255
 256
 257
 258
 259
 260
 261
 262
 263
 264
 265
 266
 267
 268
 269
 270
 271
 272
 273
 274
 275
 276
 277
 278
 279
 280
 281
 282
 283
 284
 285
 286
 287
 288
 289
 290
 291
 292
 293
 294
 295
 296
 297
 298
 299
 300
 301
 302
 303
 304
 305
 306
 307
 308
 309
 310
 311
 312
 313
 314
 315
 316
 317
 318
 319
 320
 321
 322
 323
 324
 325
 326
 327
 328
 329
 330
 331
 332
 333
 334
 335
 336
 337
 338
 339
 340
 341
 342
 343
 344
 345
 346
 347
 348
 349
 350
 351
 352
 353
 354
 355
 356
 357
 358
 359
 360
 361
 362
 363
 364
 365
 366
 367
 368
 369
 370
 371
 372
 373
 374
 375
 376
 377
 378
 379
 380
 381
 382
 383
 384
 385
 386
 387
 388
 389
 390
 391
 392
 393
 394
 395
 396
 397
 398
 399
 400
 401
 402
 403
 404
 405
 406
 407
 408
 409
 410
 411
 412
 413
 414
 415
 416
 417
 418
 419
 420
 421
 422
 423
 424
 425
 426
 427
 428
 429
 430
 431
 432
 433
 434
 435
 436
 437
 438
 439
 440
 441
 442
 443
 444
 445
 446
 447
 448
 449
 450
 451
 452
 453
 454
 455
 456
 457
 458
 459
 460
 461
 462
 463
 464
 465
 466
 467
 468
 469
 470
 471
 472
 473
 474
 475
 476
 477
 478
 479
 480
 481
 482
 483
 484
 485
 486
 487
 488
 489
 490
 491
 492
 493
 494
 495
 496
 497
 498
 499
 500
 501
 502
 503
 504
 505
 506
 507
 508
 509
 510
 511
 512
 513
 514
 515
 516
 517
 518
 519
 520
 521
 522
 523
 524
 525
 526
 527
 528
 529
 530
 531
 532
 533
 534
 535
 536
 537
 538
 539
 540
 541
 542
 543
 544
 545
 546
 547
 548
 549
 550
 551
 552
 553
 554
 555
 556
 557
 558
 559
 560
 561
 562
 563
 564
 565
 566
 567
 568
 569
 570
 571
 572
 573
 574
 575
 576
 577
 578
 579
 580
 581
 582
 583
 584
 585
 586
 587
 588
 589
 590
 591
 592
 593
 594
 595
 596
 597
 598
 599
 600
 601
 602
 603
 604
 605
 606
 607
 608
 609
 610
 611
 612
 613
 614
 615
 616
 617
 618
 619
 620
 621
 622
 623
 624
 625
 626
 627
 628
 629
 630
 631
 632
 633
 634
 635
 636
 637
 638
 639
 640
 641
 642
 643
 644
 645
 646
 647
 648
 649
 650
 651
 652
 653
 654
 655
 656
 657
 658
 659
 660
 661
 662
 663
 664
 665
 666
 667
 668
 669
 670
 671
 672
 673
 674
 675
 676
 677
 678
 679
 680
 681
 682
 683
 684
 685
 686
 687
 688
 689
 690
 691
 692
 693
 694
 695
 696
 697
 698
 699
 700
 701
 702
 703
 704
 705
 706
 707
 708
 709
 710
 711
 712
 713
 714
 715
 716
 717
 718
 719
 720
 721
 722
 723
 724
 725
 726
 727
 728
 729
 730
 731
 732
 733
 734
 735
 736
 737
 738
 739
 740
 741
 742
 743
 744
 745
 746
 747
 748
 749
 750
 751
 752
 753
 754
 755
 756
 757
 758
 759
 760
 761
 762
 763
 764
 765
 766
 767
 768
 769
 770
 771
 772
 773
 774
 775
 776
 777
 778
 779
 780
 781
 782
 783
 784
 785
 786
 787
 788
 789
 790
 791
 792
 793
 794
 795
 796
 797
 798
 799
 800
 801
 802
 803
 804
 805
 806
 807
 808
 809
 810
 811
 812
 813
 814
 815
 816
 817
 818
 819
 820
 821
 822
 823
 824
 825
 826
 827
 828
 829
 830
 831
 832
 833
 834
 835
 836
 837
 838
 839
 840
 841
 842
 843
 844
 845
 846
 847
 848
 849
 850
 851
 852
 853
 854
 855
 856
 857
 858
 859
 860
 861
 862
 863
 864
 865
 866
 867
 868
 869
 870
 871
 872
 873
 874
 875
 876
 877
 878
 879
 880
 881
 882
 883
 884
 885
 886
 887
 888
 889
 890
 891
 892
 893
 894
 895
 896
 897
 898
 899
 900
 901
 902
 903
 904
 905
 906
 907
 908
 909
 910
 911
 912
 913
 914
 915
 916
 917
 918
 919
 920
 921
 922
 923
 924
 925
 926
 927
 928
 929
 930
 931
 932
 933
 934
 935
 936
 937
 938
 939
 940
 941
 942
 943
 944
 945
 946
 947
 948
 949
 950
 951
 952
 953
 954
 955
 956
 957
 958
 959
 960
 961
 962
 963
 964
 965
 966
 967
 968
 969
 970
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
class Repository:
    """An Icechunk repository."""

    _repository: PyRepository

    def __init__(self, repository: PyRepository):
        self._repository = repository

    @classmethod
    def create(
        cls,
        storage: Storage,
        config: RepositoryConfig | None = None,
        authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
        spec_version: int | None = None,
    ) -> Self:
        """
        Create a new Icechunk repository.
        If one already exists at the given store location, an error will be raised.

        !!! warning
            Attempting to create a Repo concurrently in the same location from multiple processes is not safe.
            Instead, create a Repo once and then open it concurrently.

        Parameters
        ----------
        storage : Storage
            The storage configuration for the repository.
        config : RepositoryConfig, optional
            The repository configuration. If not provided, a default configuration will be used.
        authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
            Authorize Icechunk to access virtual chunks in these containers. A mapping
            from container url_prefix to the credentials to use to access chunks in
            that container. If credential is `None`, they will be fetched from the
            environment, or anonymous credentials will be used if the container allows it.
            As a security measure, Icechunk will block access to virtual chunks if the
            container is not authorized using this argument.
        spec_version : int, optional
            Use this version of the spec for the new repository. If not passed, the latest version
            of the spec that was available before the library version release will be used.

        Returns
        -------
        Self
            An instance of the Repository class.
        """
        return cls(
            PyRepository.create(
                storage,
                config=config,
                authorize_virtual_chunk_access=authorize_virtual_chunk_access,
                spec_version=spec_version,
            )
        )

    @classmethod
    async def create_async(
        cls,
        storage: Storage,
        config: RepositoryConfig | None = None,
        authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
        spec_version: int | None = None,
    ) -> Self:
        """
        Create a new Icechunk repository asynchronously.
        If one already exists at the given store location, an error will be raised.

        !!! warning
            Attempting to create a Repo concurrently in the same location from multiple processes is not safe.
            Instead, create a Repo once and then open it concurrently.

        Parameters
        ----------
        storage : Storage
            The storage configuration for the repository.
        config : RepositoryConfig, optional
            The repository configuration. If not provided, a default configuration will be used.
        authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
            Authorize Icechunk to access virtual chunks in these containers. A mapping
            from container url_prefix to the credentials to use to access chunks in
            that container. If credential is `None`, they will be fetched from the
            environment, or anonymous credentials will be used if the container allows it.
            As a security measure, Icechunk will block access to virtual chunks if the
            container is not authorized using this argument.
        spec_version : int, optional
            Use this version of the spec for the new repository. If not passed, the latest version
            of the spec that was available before the library version release will be used.

        Returns
        -------
        Self
            An instance of the Repository class.
        """
        return cls(
            await PyRepository.create_async(
                storage,
                config=config,
                authorize_virtual_chunk_access=authorize_virtual_chunk_access,
                spec_version=spec_version,
            )
        )

    @classmethod
    def open(
        cls,
        storage: Storage,
        config: RepositoryConfig | None = None,
        authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
    ) -> Self:
        """
        Open an existing Icechunk repository.

        If no repository exists at the given storage location, an error will be raised.

        !!! warning
            This method must be used with care in a multiprocessing context.
            Read more in our [Parallel Write Guide](./parallel.md#uncooperative-distributed-writes).

        Parameters
        ----------
        storage : Storage
            The storage configuration for the repository.
        config : RepositoryConfig, optional
            The repository settings. If not provided, a default configuration will be
            loaded from the repository.
        authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
            Authorize Icechunk to access virtual chunks in these containers. A mapping
            from container url_prefix to the credentials to use to access chunks in
            that container. If credential is `None`, they will be fetched from the
            environment, or anonymous credentials will be used if the container allows it.
            As a security measure, Icechunk will block access to virtual chunks if the
            container is not authorized using this argument.

        Returns
        -------
        Self
            An instance of the Repository class.
        """
        return cls(
            PyRepository.open(
                storage,
                config=config,
                authorize_virtual_chunk_access=authorize_virtual_chunk_access,
            )
        )

    @classmethod
    async def open_async(
        cls,
        storage: Storage,
        config: RepositoryConfig | None = None,
        authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
    ) -> Self:
        """
        Open an existing Icechunk repository asynchronously.

        If no repository exists at the given storage location, an error will be raised.

        !!! warning
            This method must be used with care in a multiprocessing context.
            Read more in our [Parallel Write Guide](./parallel.md#uncooperative-distributed-writes).

        Parameters
        ----------
        storage : Storage
            The storage configuration for the repository.
        config : RepositoryConfig, optional
            The repository settings. If not provided, a default configuration will be
            loaded from the repository.
        authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
            Authorize Icechunk to access virtual chunks in these containers. A mapping
            from container url_prefix to the credentials to use to access chunks in
            that container. If credential is `None`, they will be fetched from the
            environment, or anonymous credentials will be used if the container allows it.
            As a security measure, Icechunk will block access to virtual chunks if the
            container is not authorized using this argument.

        Returns
        -------
        Self
            An instance of the Repository class.
        """
        return cls(
            await PyRepository.open_async(
                storage,
                config=config,
                authorize_virtual_chunk_access=authorize_virtual_chunk_access,
            )
        )

    @classmethod
    def open_or_create(
        cls,
        storage: Storage,
        config: RepositoryConfig | None = None,
        authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
        create_version: int | None = None,
    ) -> Self:
        """
        Open an existing Icechunk repository or create a new one if it does not exist.

        !!! warning
            This method must be used with care in a multiprocessing context.
            Read more in our [Parallel Write Guide](./parallel.md#uncooperative-distributed-writes).

            Attempting to create a Repo concurrently in the same location from multiple processes is not safe.
            Instead, create a Repo once and then open it concurrently.

        Parameters
        ----------
        storage : Storage
            The storage configuration for the repository.
        config : RepositoryConfig, optional
            The repository settings. If not provided, a default configuration will be
            loaded from the repository.
        authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
            Authorize Icechunk to access virtual chunks in these containers. A mapping
            from container url_prefix to the credentials to use to access chunks in
            that container. If credential is `None`, they will be fetched from the
            environment, or anonymous credentials will be used if the container allows it.
            As a security measure, Icechunk will block access to virtual chunks if the
            container is not authorized using this argument.
        create_version : int, optional
            Use this version of the spec for the new repository, if it needs to be created.
            If not passed, the latest version of the spec that was available before the
            library version release will be used.


        Returns
        -------
        Self
            An instance of the Repository class.
        """
        return cls(
            PyRepository.open_or_create(
                storage,
                config=config,
                authorize_virtual_chunk_access=authorize_virtual_chunk_access,
                create_version=create_version,
            )
        )

    @classmethod
    async def open_or_create_async(
        cls,
        storage: Storage,
        config: RepositoryConfig | None = None,
        authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
        create_version: int | None = None,
    ) -> Self:
        """
        Open an existing Icechunk repository or create a new one if it does not exist (async version).

        !!! warning
            This method must be used with care in a multiprocessing context.
            Read more in our [Parallel Write Guide](./parallel.md#uncooperative-distributed-writes).

            Attempting to create a Repo concurrently in the same location from multiple processes is not safe.
            Instead, create a Repo once and then open it concurrently.

        Parameters
        ----------
        storage : Storage
            The storage configuration for the repository.
        config : RepositoryConfig, optional
            The repository settings. If not provided, a default configuration will be
            loaded from the repository.
        authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
            Authorize Icechunk to access virtual chunks in these containers. A mapping
            from container url_prefix to the credentials to use to access chunks in
            that container. If credential is `None`, they will be fetched from the
            environment, or anonymous credentials will be used if the container allows it.
            As a security measure, Icechunk will block access to virtual chunks if the
            container is not authorized using this argument.
        create_version : int, optional
            Use this version of the spec for the new repository, if it needs to be created.
            If not passed, the latest version of the spec that was available before the
            library version release will be used.

        Returns
        -------
        Self
            An instance of the Repository class.
        """
        return cls(
            await PyRepository.open_or_create_async(
                storage,
                config=config,
                authorize_virtual_chunk_access=authorize_virtual_chunk_access,
                create_version=create_version,
            )
        )

    @staticmethod
    def exists(storage: Storage) -> bool:
        """
        Check if a repository exists at the given storage location.

        Parameters
        ----------
        storage : Storage
            The storage configuration for the repository.

        Returns
        -------
        bool
            True if the repository exists, False otherwise.
        """
        return PyRepository.exists(storage)

    @staticmethod
    async def exists_async(storage: Storage) -> bool:
        """
        Check if a repository exists at the given storage location (async version).

        Parameters
        ----------
        storage : Storage
            The storage configuration for the repository.

        Returns
        -------
        bool
            True if the repository exists, False otherwise.
        """
        return await PyRepository.exists_async(storage)

    @staticmethod
    def fetch_spec_version(storage: Storage) -> int | None:
        """
        Fetch the spec version of a repository without fully opening it.

        This is useful for checking the repository format version before opening,
        for example to know what version of the library is needed to open it.

        Parameters
        ----------
        storage : Storage
            The storage configuration for the repository.

        Returns
        -------
        int | None
            The spec version of the repository if it exists, None if no repository
            exists at the given location.
        """
        return PyRepository.fetch_spec_version(storage)

    @staticmethod
    async def fetch_spec_version_async(storage: Storage) -> int | None:
        """
        Fetch the spec version of a repository without fully opening it (async version).

        This is useful for checking the repository format version before opening,
        for example to know what version of the library is needed to open it.

        Parameters
        ----------
        storage : Storage
            The storage configuration for the repository.

        Returns
        -------
        int | None
            The spec version of the repository if it exists, None if no repository
            exists at the given location.
        """
        return await PyRepository.fetch_spec_version_async(storage)

    def __getstate__(self) -> object:
        return {
            "_repository": self._repository.as_bytes(),
        }

    def __setstate__(self, state: object) -> None:
        if not isinstance(state, dict):
            raise ValueError("Invalid repository state")
        self._repository = PyRepository.from_bytes(state["_repository"])

    @staticmethod
    def fetch_config(storage: Storage) -> RepositoryConfig | None:
        """
        Fetch the configuration for the repository saved in storage.

        Parameters
        ----------
        storage : Storage
            The storage configuration for the repository.

        Returns
        -------
        RepositoryConfig | None
            The repository configuration if it exists, None otherwise.
        """
        return PyRepository.fetch_config(storage)

    @staticmethod
    async def fetch_config_async(storage: Storage) -> RepositoryConfig | None:
        """
        Fetch the configuration for the repository saved in storage (async version).

        Parameters
        ----------
        storage : Storage
            The storage configuration for the repository.

        Returns
        -------
        RepositoryConfig | None
            The repository configuration if it exists, None otherwise.
        """
        return await PyRepository.fetch_config_async(storage)

    def save_config(self) -> None:
        """
        Save the repository configuration to storage, this configuration will be used in future calls to Repository.open.

        Returns
        -------
        None
        """
        return self._repository.save_config()

    async def save_config_async(self) -> None:
        """
        Save the repository configuration to storage (async version).

        Returns
        -------
        None
        """
        return await self._repository.save_config_async()

    @property
    def config(self) -> RepositoryConfig:
        """
        Get a copy of this repository's config.

        Returns
        -------
        RepositoryConfig
            The repository configuration.
        """
        return self._repository.config()

    @property
    def storage(self) -> Storage:
        """
        Get a copy of this repository's Storage instance.

        Returns
        -------
        Storage
            The repository storage instance.
        """
        return self._repository.storage()

    @property
    def authorized_virtual_container_prefixes(self) -> set[str]:
        """
        Get all authorized virtual chunk container prefixes.

        Returns
        -------
        url_prefixes: set[str]
            The set of authorized url prefixes for each virtual chunk container
        """
        return self._repository.authorized_virtual_container_prefixes

    def reopen(
        self,
        config: RepositoryConfig | None = None,
        authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
    ) -> Self:
        """
        Reopen the repository with new configuration or credentials.

        Parameters
        ----------
        config : RepositoryConfig, optional
            The new repository configuration. If not provided, uses the existing configuration.
        authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
            New virtual chunk access credentials.

        Returns
        -------
        Self
            A new Repository instance with the updated configuration.
        """
        return self.__class__(
            self._repository.reopen(
                config=config,
                authorize_virtual_chunk_access=authorize_virtual_chunk_access,
            )
        )

    async def reopen_async(
        self,
        config: RepositoryConfig | None = None,
        authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
    ) -> Self:
        """
        Reopen the repository with new configuration or credentials (async version).

        Parameters
        ----------
        config : RepositoryConfig, optional
            The new repository configuration. If not provided, uses the existing configuration.
        authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
            New virtual chunk access credentials.

        Returns
        -------
        Self
            A new Repository instance with the updated configuration.
        """
        return self.__class__(
            await self._repository.reopen_async(
                config=config,
                authorize_virtual_chunk_access=authorize_virtual_chunk_access,
            )
        )

    def set_default_commit_metadata(self, metadata: dict[str, Any]) -> None:
        """
        Set the default commit metadata for the repository. This is useful for providing
        addition static system conexted metadata to all commits.

        When a commit is made, the metadata will be merged with the metadata provided, with any
        duplicate keys being overwritten by the metadata provided in the commit.

        !!! warning
            This metadata is only applied to sessions that are created after this call. Any open
            writable sessions will not be affected and will not use the new default metadata.

        Parameters
        ----------
        metadata : dict[str, Any]
            The default commit metadata. Pass an empty dict to clear the default metadata.
        """
        return self._repository.set_default_commit_metadata(metadata)

    def default_commit_metadata(self) -> dict[str, Any]:
        """
        Get the current configured default commit metadata for the repository.

        Returns
        -------
        dict[str, Any]
            The default commit metadata.
        """
        return self._repository.default_commit_metadata()

    def get_metadata(self) -> dict[str, Any]:
        """
        Get the current configured repository metadata.

        Returns
        -------
        dict[str, Any]
            The repository level metadata.
        """
        return self._repository.get_metadata()

    @property
    def metadata(self) -> dict[str, Any]:
        """
        Get the current configured repository metadata.

        Returns
        -------
        dict[str, Any]
            The repository level metadata.
        """
        return self._repository.get_metadata()

    async def get_metadata_async(self) -> dict[str, Any]:
        """
        Get the current configured repository metadata.

        Returns
        -------
        dict[str, Any]
            The repository level metadata.
        """
        return await self._repository.get_metadata_async()

    def set_metadata(self, metadata: dict[str, Any]) -> None:
        """
        Set the repository metadata, the passed dict will replace the complete metadata.

        If you prefer to only update some metadata values, use Repository.update_metadata

        Parameters
        ----------
        metadata : dict[str, Any]
            The value to use as repository metadata.
        """
        self._repository.set_metadata(metadata)

    async def set_metadata_async(self, metadata: dict[str, Any]) -> None:
        """
        Set the repository metadata, the passed dict will replace the complete metadata.

        If you prefer to only update some metadata values, use Repository.update_metadata

        Parameters
        ----------
        metadata : dict[str, Any]
            The value to use as repository metadata.
        """
        await self._repository.set_metadata_async(metadata)

    def update_metadata(self, metadata: dict[str, Any]) -> dict[str, Any]:
        """
        Update the repository metadata.

        The passed dict will be merged with the current metadata, overriding existing keys.

        Parameters
        ----------
        metadata : dict[str, Any]
            The dict to merge into the repository metadata.
        """
        return self._repository.update_metadata(metadata)

    async def update_metadata_async(self, metadata: dict[str, Any]) -> dict[str, Any]:
        """
        Update the repository metadata.

        The passed dict will be merged with the current metadata, overriding existing keys.

        Parameters
        ----------
        metadata : dict[str, Any]
            The dict to merge into the repository metadata.
        """
        return await self._repository.update_metadata_async(metadata)

    def ancestry(
        self,
        *,
        branch: str | None = None,
        tag: str | None = None,
        snapshot_id: str | None = None,
    ) -> Iterator[SnapshotInfo]:
        """
        Get the ancestry of a snapshot.

        Parameters
        ----------
        branch : str, optional
            The branch to get the ancestry of.
        tag : str, optional
            The tag to get the ancestry of.
        snapshot_id : str, optional
            The snapshot ID to get the ancestry of.

        Returns
        -------
        list[SnapshotInfo]
            The ancestry of the snapshot, listing out the snapshots and their metadata.

        Notes
        -----
        Only one of the arguments can be specified.
        """

        # the returned object is both an Async and Sync iterator
        res = cast(
            Iterator[SnapshotInfo],
            self._repository.async_ancestry(
                branch=branch, tag=tag, snapshot_id=snapshot_id
            ),
        )
        return res

    def async_ancestry(
        self,
        *,
        branch: str | None = None,
        tag: str | None = None,
        snapshot_id: str | None = None,
    ) -> AsyncIterator[SnapshotInfo]:
        """
        Get the ancestry of a snapshot.

        Parameters
        ----------
        branch : str, optional
            The branch to get the ancestry of.
        tag : str, optional
            The tag to get the ancestry of.
        snapshot_id : str, optional
            The snapshot ID to get the ancestry of.

        Returns
        -------
        list[SnapshotInfo]
            The ancestry of the snapshot, listing out the snapshots and their metadata.

        Notes
        -----
        Only one of the arguments can be specified.
        """
        return self._repository.async_ancestry(
            branch=branch, tag=tag, snapshot_id=snapshot_id
        )

    def ops_log(self) -> Iterator[UpdateType]:
        """
        Get a summary of changes to the repository
        """

        # the returned object is both an Async and Sync iterator
        res = cast(
            Iterator[UpdateType],
            self._repository.async_ops_log(),
        )
        return res

    def ops_log_async(self) -> AsyncIterator[UpdateType]:
        """
        Get a summary of changes to the repository
        """

        # the returned object is both an Async and Sync iterator
        return self._repository.async_ops_log()

    def create_branch(self, branch: str, snapshot_id: str) -> None:
        """
        Create a new branch at the given snapshot.

        Parameters
        ----------
        branch : str
            The name of the branch to create.
        snapshot_id : str
            The snapshot ID to create the branch at.

        Returns
        -------
        None
        """
        self._repository.create_branch(branch, snapshot_id)

    async def create_branch_async(self, branch: str, snapshot_id: str) -> None:
        """
        Create a new branch at the given snapshot (async version).

        Parameters
        ----------
        branch : str
            The name of the branch to create.
        snapshot_id : str
            The snapshot ID to create the branch at.

        Returns
        -------
        None
        """
        await self._repository.create_branch_async(branch, snapshot_id)

    def list_branches(self) -> set[str]:
        """
        List the branches in the repository.

        Returns
        -------
        set[str]
            A set of branch names.
        """
        return self._repository.list_branches()

    async def list_branches_async(self) -> set[str]:
        """
        List the branches in the repository (async version).

        Returns
        -------
        set[str]
            A set of branch names.
        """
        return await self._repository.list_branches_async()

    def lookup_branch(self, branch: str) -> str:
        """
        Get the tip snapshot ID of a branch.

        Parameters
        ----------
        branch : str
            The branch to get the tip of.

        Returns
        -------
        str
            The snapshot ID of the tip of the branch.
        """
        return self._repository.lookup_branch(branch)

    async def lookup_branch_async(self, branch: str) -> str:
        """
        Get the tip snapshot ID of a branch (async version).

        Parameters
        ----------
        branch : str
            The branch to get the tip of.

        Returns
        -------
        str
            The snapshot ID of the tip of the branch.
        """
        return await self._repository.lookup_branch_async(branch)

    def lookup_snapshot(self, snapshot_id: str) -> SnapshotInfo:
        """
        Get the SnapshotInfo given a snapshot ID

        Parameters
        ----------
        snapshot_id : str
            The id of the snapshot to look up

        Returns
        -------
        SnapshotInfo
        """
        return self._repository.lookup_snapshot(snapshot_id)

    async def lookup_snapshot_async(self, snapshot_id: str) -> SnapshotInfo:
        """
        Get the SnapshotInfo given a snapshot ID (async version)

        Parameters
        ----------
        snapshot_id : str
            The id of the snapshot to look up

        Returns
        -------
        SnapshotInfo
        """
        return await self._repository.lookup_snapshot_async(snapshot_id)

    def list_manifest_files(self, snapshot_id: str) -> list[ManifestFileInfo]:
        """
        Get the manifest files used by the given snapshot ID

        Parameters
        ----------
        snapshot_id : str
            The id of the snapshot to get information for

        Returns
        -------
        list[ManifestFileInfo]
        """
        return self._repository.list_manifest_files(snapshot_id)

    async def list_manifest_files_async(self, snapshot_id: str) -> list[ManifestFileInfo]:
        """
        Get the manifest files used by the given snapshot ID

        Parameters
        ----------
        snapshot_id : str
            The id of the snapshot to get information for

        Returns
        -------
        list[ManifestFileInfo]
        """
        return await self._repository.list_manifest_files_async(snapshot_id)

    def reset_branch(
        self, branch: str, snapshot_id: str, *, from_snapshot_id: str | None = None
    ) -> None:
        """
        Reset a branch to a specific snapshot.

        This will permanently alter the history of the branch such that the tip of
        the branch is the specified snapshot.

        Parameters
        ----------
        branch : str
            The branch to reset.
        snapshot_id : str
            The snapshot ID to reset the branch to.
        from_snapshot_id : str | None
            If passed, the reset will only be executed if the branch currently
            points to from_snapshot_id.

        Returns
        -------
        None
        """
        self._repository.reset_branch(branch, snapshot_id, from_snapshot_id)

    async def reset_branch_async(
        self, branch: str, snapshot_id: str, *, from_snapshot_id: str | None = None
    ) -> None:
        """
        Reset a branch to a specific snapshot (async version).

        This will permanently alter the history of the branch such that the tip of
        the branch is the specified snapshot.

        Parameters
        ----------
        branch : str
            The branch to reset.
        snapshot_id : str
            The snapshot ID to reset the branch to.
        from_snapshot_id : str | None
            If passed, the reset will only be executed if the branch currently
            points to from_snapshot_id.

        Returns
        -------
        None
        """
        await self._repository.reset_branch_async(branch, snapshot_id, from_snapshot_id)

    def delete_branch(self, branch: str) -> None:
        """
        Delete a branch.

        Parameters
        ----------
        branch : str
            The branch to delete.

        Returns
        -------
        None
        """
        self._repository.delete_branch(branch)

    async def delete_branch_async(self, branch: str) -> None:
        """
        Delete a branch (async version).

        Parameters
        ----------
        branch : str
            The branch to delete.

        Returns
        -------
        None
        """
        await self._repository.delete_branch_async(branch)

    def delete_tag(self, tag: str) -> None:
        """
        Delete a tag.

        Parameters
        ----------
        tag : str
            The tag to delete.

        Returns
        -------
        None
        """
        self._repository.delete_tag(tag)

    async def delete_tag_async(self, tag: str) -> None:
        """
        Delete a tag (async version).

        Parameters
        ----------
        tag : str
            The tag to delete.

        Returns
        -------
        None
        """
        await self._repository.delete_tag_async(tag)

    def create_tag(self, tag: str, snapshot_id: str) -> None:
        """
        Create a new tag at the given snapshot.

        Parameters
        ----------
        tag : str
            The name of the tag to create.
        snapshot_id : str
            The snapshot ID to create the tag at.

        Returns
        -------
        None
        """
        self._repository.create_tag(tag, snapshot_id)

    async def create_tag_async(self, tag: str, snapshot_id: str) -> None:
        """
        Create a new tag at the given snapshot (async version).

        Parameters
        ----------
        tag : str
            The name of the tag to create.
        snapshot_id : str
            The snapshot ID to create the tag at.

        Returns
        -------
        None
        """
        await self._repository.create_tag_async(tag, snapshot_id)

    def list_tags(self) -> set[str]:
        """
        List the tags in the repository.

        Returns
        -------
        set[str]
            A set of tag names.
        """
        return self._repository.list_tags()

    async def list_tags_async(self) -> set[str]:
        """
        List the tags in the repository (async version).

        Returns
        -------
        set[str]
            A set of tag names.
        """
        return await self._repository.list_tags_async()

    def lookup_tag(self, tag: str) -> str:
        """
        Get the snapshot ID of a tag.

        Parameters
        ----------
        tag : str
            The tag to get the snapshot ID of.

        Returns
        -------
        str
            The snapshot ID of the tag.
        """
        return self._repository.lookup_tag(tag)

    async def lookup_tag_async(self, tag: str) -> str:
        """
        Get the snapshot ID of a tag (async version).

        Parameters
        ----------
        tag : str
            The tag to get the snapshot ID of.

        Returns
        -------
        str
            The snapshot ID of the tag.
        """
        return await self._repository.lookup_tag_async(tag)

    def diff(
        self,
        *,
        from_branch: str | None = None,
        from_tag: str | None = None,
        from_snapshot_id: str | None = None,
        to_branch: str | None = None,
        to_tag: str | None = None,
        to_snapshot_id: str | None = None,
    ) -> Diff:
        """
        Compute an overview of the operations executed from version `from` to version `to`.

        Both versions, `from` and `to`, must be identified. Identification can be done using a branch, tag or snapshot id.
        The styles used to identify the `from` and `to` versions can be different.

        The `from` version must be a member of the `ancestry` of `to`.

        Returns
        -------
        Diff
            The operations executed between the two versions
        """
        return self._repository.diff(
            from_branch=from_branch,
            from_tag=from_tag,
            from_snapshot_id=from_snapshot_id,
            to_branch=to_branch,
            to_tag=to_tag,
            to_snapshot_id=to_snapshot_id,
        )

    async def diff_async(
        self,
        *,
        from_branch: str | None = None,
        from_tag: str | None = None,
        from_snapshot_id: str | None = None,
        to_branch: str | None = None,
        to_tag: str | None = None,
        to_snapshot_id: str | None = None,
    ) -> Diff:
        """
        Compute an overview of the operations executed from version `from` to version `to` (async version).

        Both versions, `from` and `to`, must be identified. Identification can be done using a branch, tag or snapshot id.
        The styles used to identify the `from` and `to` versions can be different.

        The `from` version must be a member of the `ancestry` of `to`.

        Returns
        -------
        Diff
            The operations executed between the two versions
        """
        return await self._repository.diff_async(
            from_branch=from_branch,
            from_tag=from_tag,
            from_snapshot_id=from_snapshot_id,
            to_branch=to_branch,
            to_tag=to_tag,
            to_snapshot_id=to_snapshot_id,
        )

    def readonly_session(
        self,
        branch: str | None = None,
        *,
        tag: str | None = None,
        snapshot_id: str | None = None,
        as_of: datetime.datetime | None = None,
    ) -> Session:
        """
        Create a read-only session.

        This can be thought of as a read-only checkout of the repository at a given snapshot.
        When branch or tag are provided, the session will be based on the tip of the branch or
        the snapshot ID of the tag.

        Parameters
        ----------
        branch : str, optional
            If provided, the branch to create the session on.
        tag : str, optional
            If provided, the tag to create the session on.
        snapshot_id : str, optional
            If provided, the snapshot ID to create the session on.
        as_of: datetime.datetime, optional
            When combined with the branch argument, it will open the session at the last
            snapshot that is at or before this datetime

        Returns
        -------
        Session
            The read-only session, pointing to the specified snapshot, tag, or branch.

        Notes
        -----
        Only one of the arguments can be specified.
        """
        return Session(
            self._repository.readonly_session(
                branch=branch, tag=tag, snapshot_id=snapshot_id, as_of=as_of
            )
        )

    async def readonly_session_async(
        self,
        branch: str | None = None,
        *,
        tag: str | None = None,
        snapshot_id: str | None = None,
        as_of: datetime.datetime | None = None,
    ) -> Session:
        """
        Create a read-only session (async version).

        This can be thought of as a read-only checkout of the repository at a given snapshot.
        When branch or tag are provided, the session will be based on the tip of the branch or
        the snapshot ID of the tag.

        Parameters
        ----------
        branch : str, optional
            If provided, the branch to create the session on.
        tag : str, optional
            If provided, the tag to create the session on.
        snapshot_id : str, optional
            If provided, the snapshot ID to create the session on.
        as_of: datetime.datetime, optional
            When combined with the branch argument, it will open the session at the last
            snapshot that is at or before this datetime

        Returns
        -------
        Session
            The read-only session, pointing to the specified snapshot, tag, or branch.

        Notes
        -----
        Only one of the arguments can be specified.
        """
        return Session(
            await self._repository.readonly_session_async(
                branch=branch, tag=tag, snapshot_id=snapshot_id, as_of=as_of
            )
        )

    def writable_session(self, branch: str) -> Session:
        """
        Create a writable session on a branch.

        Like the read-only session, this can be thought of as a checkout of the repository at the
        tip of the branch. However, this session is writable and can be used to make changes to the
        repository. When ready, the changes can be committed to the branch, after which the session will
        become a read-only session on the new snapshot.

        Parameters
        ----------
        branch : str
            The branch to create the session on.

        Returns
        -------
        Session
            The writable session on the branch.
        """
        return Session(self._repository.writable_session(branch))

    async def writable_session_async(self, branch: str) -> Session:
        """
        Create a writable session on a branch (async version).

        Like the read-only session, this can be thought of as a checkout of the repository at the
        tip of the branch. However, this session is writable and can be used to make changes to the
        repository. When ready, the changes can be committed to the branch, after which the session will
        become a read-only session on the new snapshot.

        Parameters
        ----------
        branch : str
            The branch to create the session on.

        Returns
        -------
        Session
            The writable session on the branch.
        """
        return Session(await self._repository.writable_session_async(branch))

    def rearrange_session(self, branch: str) -> Session:
        """
        Create a session to move/rename nodes in the Zarr hierarchy.

        Like the read-only session, this can be thought of as a checkout of the repository at the
        tip of the branch. However, this session is writable and can be used to make changes to the
        repository. When ready, the changes can be committed to the branch, after which the session will
        become a read-only session on the new snapshot.

        This session only allows to make changes through `Session.move`. If you want to modify data, and
        not only move nodes, use `Session.writable_session` instead.

        Parameters
        ----------
        branch : str
            The branch to create the session on.

        Returns
        -------
        Session
            The writable session on the branch.
        """
        return Session(self._repository.rearrange_session(branch))

    async def rearrange_session_async(self, branch: str) -> Session:
        """
        Create a session to move/rename nodes in the Zarr hierarchy.

        Like the read-only session, this can be thought of as a checkout of the repository at the
        tip of the branch. However, this session is writable and can be used to make changes to the
        repository. When ready, the changes can be committed to the branch, after which the session will
        become a read-only session on the new snapshot.

        This session only allows to make changes through `Session.move`. If you want to modify data, and
        not only move nodes, use `Session.writable_session` instead.

        Parameters
        ----------
        branch : str
            The branch to create the session on.

        Returns
        -------
        Session
            The writable session on the branch.
        """
        return Session(await self._repository.rearrange_session_async(branch))

    @contextmanager
    def transaction(
        self,
        branch: str,
        *,
        message: str,
        metadata: dict[str, Any] | None = None,
        rebase_with: ConflictSolver | None = None,
        rebase_tries: int = 1_000,
    ) -> Iterator[IcechunkStore]:
        """
        Create a transaction on a branch.

        This is a context manager that creates a writable session on the specified branch.
        When the context is exited, the session will be committed to the branch
        using the specified message.

        Parameters
        ----------
        branch : str
            The branch to create the transaction on.
        message : str
            The commit message to use when committing the session.
        metadata : dict[str, Any] | None, optional
            Additional metadata to store with the commit snapshot.
        rebase_with : ConflictSolver | None, optional
            If other session committed while the current session was writing, use Session.rebase with this solver.
        rebase_tries : int, optional
            If other session committed while the current session was writing, use Session.rebase up to this many times in a loop.

        Yields
        -------
        store : IcechunkStore
            A Zarr Store which can be used to interact with the data in the repository.
        """
        session = self.writable_session(branch)
        yield session.store
        session.commit(
            message=message,
            metadata=metadata,
            rebase_with=rebase_with,
            rebase_tries=rebase_tries,
        )

    def expire_snapshots(
        self,
        older_than: datetime.datetime,
        *,
        delete_expired_branches: bool = False,
        delete_expired_tags: bool = False,
    ) -> set[str]:
        """Expire all snapshots older than a threshold.

        This processes snapshots found by navigating all references in
        the repo, tags first, branches leter, both in lexicographical order.

        Returns the ids of all snapshots considered expired and skipped
        from history. Notice that this snapshot are not necessarily
        available for garbage collection, they could still be pointed by
        ether refs.

        If `delete_expired_*` is set to True, branches or tags that, after the
        expiration process, point to expired snapshots directly, will be
        deleted.

        Danger
        ------
        This is an administrative operation, it should be run
        carefully. The repository can still operate concurrently while
        `expire_snapshots` runs, but other readers can get inconsistent
        views of the repository history.

        Parameters
        ----------
        older_than: datetime.datetime
            Expire snapshots older than this time.
        delete_expired_branches: bool, optional
            Whether to delete any branches that now have only expired snapshots.
        delete_expired_tags: bool, optional
            Whether to delete any tags associated with expired snapshots

        Returns
        -------
        set of expires snapshot IDs
        """
        return self._repository.expire_snapshots(
            older_than,
            delete_expired_branches=delete_expired_branches,
            delete_expired_tags=delete_expired_tags,
        )

    async def expire_snapshots_async(
        self,
        older_than: datetime.datetime,
        *,
        delete_expired_branches: bool = False,
        delete_expired_tags: bool = False,
    ) -> set[str]:
        """Expire all snapshots older than a threshold (async version).

        This processes snapshots found by navigating all references in
        the repo, tags first, branches leter, both in lexicographical order.

        Returns the ids of all snapshots considered expired and skipped
        from history. Notice that this snapshot are not necessarily
        available for garbage collection, they could still be pointed by
        ether refs.

        If `delete_expired_*` is set to True, branches or tags that, after the
        expiration process, point to expired snapshots directly, will be
        deleted.

        Danger
        ------
        This is an administrative operation, it should be run
        carefully. The repository can still operate concurrently while
        `expire_snapshots` runs, but other readers can get inconsistent
        views of the repository history.

        Parameters
        ----------
        older_than: datetime.datetime
            Expire snapshots older than this time.
        delete_expired_branches: bool, optional
            Whether to delete any branches that now have only expired snapshots.
        delete_expired_tags: bool, optional
            Whether to delete any tags associated with expired snapshots

        Returns
        -------
        set of expires snapshot IDs
        """
        return await self._repository.expire_snapshots_async(
            older_than,
            delete_expired_branches=delete_expired_branches,
            delete_expired_tags=delete_expired_tags,
        )

    def rewrite_manifests(
        self, message: str, *, branch: str, metadata: dict[str, Any] | None = None
    ) -> str:
        """
        Rewrite manifests for all arrays.

        This method will start a new writable session on the specified branch,
        rewrite manifests for all arrays, and then commits with the specifeid ``message``
        and ``metadata``.

        A JSON representation of the currently active splitting configuration will be
        stored in the commit's metadata under the key `"splitting_config"`.

        Parameters
        ----------
        message : str
            The message to write with the commit.
        branch: str
            The branch to commit to.
        metadata : dict[str, Any] | None, optional
            Additional metadata to store with the commit snapshot.

        Returns
        -------
        str
            The snapshot ID of the new commit.

        """
        return self._repository.rewrite_manifests(
            message, branch=branch, metadata=metadata
        )

    async def rewrite_manifests_async(
        self, message: str, *, branch: str, metadata: dict[str, Any] | None = None
    ) -> str:
        """
        Rewrite manifests for all arrays (async version).

        This method will start a new writable session on the specified branch,
        rewrite manifests for all arrays, and then commits with the specifeid ``message``
        and ``metadata``.

        A JSON representation of the currently active splitting configuration will be
        stored in the commit's metadata under the key `"splitting_config"`.

        Parameters
        ----------
        message : str
            The message to write with the commit.
        branch: str
            The branch to commit to.
        metadata : dict[str, Any] | None, optional
            Additional metadata to store with the commit snapshot.

        Returns
        -------
        str
            The snapshot ID of the new commit.

        """
        return await self._repository.rewrite_manifests_async(
            message, branch=branch, metadata=metadata
        )

    def garbage_collect(
        self,
        delete_object_older_than: datetime.datetime,
        *,
        dry_run: bool = False,
        max_snapshots_in_memory: int = 50,
        max_compressed_manifest_mem_bytes: int = 512 * 1024 * 1024,
        max_concurrent_manifest_fetches: int = 500,
    ) -> GCSummary:
        """Delete any objects no longer accessible from any branches or tags.

        Danger
        ------
        This is an administrative operation, it should be run
        carefully. The repository can still operate concurrently while
        `garbage_collect` runs, but other reades can get inconsistent
        views if they are trying to access the expired snapshots.

        Parameters
        ----------
        delete_object_older_than: datetime.datetime
            Delete objects older than this time.
        dry_run: bool : bool
            Report results but don't delete any objects
        max_snapshots_in_memory : int
            Don't prefetch more than this many Snapshots to memory.
        max_compressed_manifest_mem_bytes : int
            Don't use more than this memory to store compressed in-flight manifests.
        max_concurrent_manifest_fetches : int
            Don't run more than this many concurrent manifest fetches.

        Returns
        -------
        GCSummary
            Summary of objects deleted.
        """

        return self._repository.garbage_collect(
            delete_object_older_than,
            dry_run=dry_run,
            max_snapshots_in_memory=max_snapshots_in_memory,
            max_compressed_manifest_mem_bytes=max_compressed_manifest_mem_bytes,
            max_concurrent_manifest_fetches=max_concurrent_manifest_fetches,
        )

    async def garbage_collect_async(
        self,
        delete_object_older_than: datetime.datetime,
        *,
        dry_run: bool = False,
        max_snapshots_in_memory: int = 50,
        max_compressed_manifest_mem_bytes: int = 512 * 1024 * 1024,
        max_concurrent_manifest_fetches: int = 500,
    ) -> GCSummary:
        """Delete any objects no longer accessible from any branches or tags (async version).

        Danger
        ------
        This is an administrative operation, it should be run
        carefully. The repository can still operate concurrently while
        `garbage_collect` runs, but other reades can get inconsistent
        views if they are trying to access the expired snapshots.

        Parameters
        ----------
        delete_object_older_than: datetime.datetime
            Delete objects older than this time.
        dry_run: bool : bool
            Report results but don't delete any objects
        max_snapshots_in_memory : int
            Don't prefetch more than this many Snapshots to memory.
        max_compressed_manifest_mem_bytes : int
            Don't use more than this memory to store compressed in-flight manifests.
        max_concurrent_manifest_fetches : int
            Don't run more than this many concurrent manifest fetches.

        Returns
        -------
        GCSummary
            Summary of objects deleted.
        """

        return await self._repository.garbage_collect_async(
            delete_object_older_than,
            dry_run=dry_run,
            max_snapshots_in_memory=max_snapshots_in_memory,
            max_compressed_manifest_mem_bytes=max_compressed_manifest_mem_bytes,
            max_concurrent_manifest_fetches=max_concurrent_manifest_fetches,
        )

    def chunk_storage_stats(
        self,
        *,
        max_snapshots_in_memory: int = 50,
        max_compressed_manifest_mem_bytes: int = 512 * 1024 * 1024,
        max_concurrent_manifest_fetches: int = 500,
    ) -> ChunkStorageStats:
        """Calculate the total storage used for chunks, in bytes.

        It reports the storage needed to store all snapshots in the repository that
        are reachable from any branches or tags. Unreachable snapshots can be generated
        by using `reset_branch` or `expire_snapshots`. The chunks for these snapshots
        are not included in the result, and they should probably be deleted using
        `garbage_collection`.

        The result is a dataclass with attributes for storage consumed by different
        types of chunks (e.g. `native_bytes`, `virtual_bytes`, `total_bytes`).

        Parameters
        ----------
        max_snapshots_in_memory: int
            Don't prefetch more than this many Snapshots to memory.
        max_compressed_manifest_mem_bytes : int
            Don't use more than this memory to store compressed in-flight manifests.
        max_concurrent_manifest_fetches : int
            Don't run more than this many concurrent manifest fetches.
        """
        return self._repository.chunk_storage_stats(
            max_snapshots_in_memory=max_snapshots_in_memory,
            max_compressed_manifest_mem_bytes=max_compressed_manifest_mem_bytes,
            max_concurrent_manifest_fetches=max_concurrent_manifest_fetches,
        )

    async def chunk_storage_stats_async(
        self,
        *,
        max_snapshots_in_memory: int = 50,
        max_compressed_manifest_mem_bytes: int = 512 * 1024 * 1024,
        max_concurrent_manifest_fetches: int = 500,
    ) -> ChunkStorageStats:
        """Calculate the total storage used for chunks, in bytes (async version).

        It reports the storage needed to store all snapshots in the repository that
        are reachable from any branches or tags. Unreachable snapshots can be generated
        by using `reset_branch` or `expire_snapshots`. The chunks for these snapshots
        are not included in the result, and they should probably be deleted using
        `garbage_collection`.

        The result is a dataclass with attributes for storage consumed by different
        types of chunks (e.g. `native_bytes`, `virtual_bytes`, `total_bytes`).

        Parameters
        ----------
        max_snapshots_in_memory: int
            Don't prefetch more than this many Snapshots to memory.
        max_compressed_manifest_mem_bytes : int
            Don't use more than this memory to store compressed in-flight manifests.
        max_concurrent_manifest_fetches : int
            Don't run more than this many concurrent manifest fetches.
        """
        return await self._repository.chunk_storage_stats_async(
            max_snapshots_in_memory=max_snapshots_in_memory,
            max_compressed_manifest_mem_bytes=max_compressed_manifest_mem_bytes,
            max_concurrent_manifest_fetches=max_concurrent_manifest_fetches,
        )

    def total_chunks_storage(
        self,
        *,
        max_snapshots_in_memory: int = 50,
        max_compressed_manifest_mem_bytes: int = 512 * 1024 * 1024,
        max_concurrent_manifest_fetches: int = 500,
    ) -> int:
        """Calculate the total storage used for chunks, in bytes.

        It reports the storage needed to store all snapshots in the repository that
        are reachable from any branches or tags. Unreachable snapshots can be generated
        by using `reset_branch` or `expire_snapshots`. The chunks for these snapshots
        are not included in the result, and they should probably be deleted using
        `garbage_collection`.

        The result includes only native chunks, not adding virtual or inline chunks.

        Parameters
        ----------
        max_snapshots_in_memory: int
            Don't prefetch more than this many Snapshots to memory.
        max_compressed_manifest_mem_bytes : int
            Don't use more than this memory to store compressed in-flight manifests.
        max_concurrent_manifest_fetches : int
            Don't run more than this many concurrent manifest fetches.
        """

        warnings.warn(
            "The ``total_chunks_storage`` method has been deprecated in favour of the ``chunk_storage_stats`` method. "
            "The new method is superior, as it actually calculates storage size occupied by inlined and virtual chunks in addition to native chunks. "
            "You can still access just the total native bytes: to keep your existing behaviour using API that will not be removed in a future version, "
            "please replace your existing ``.total_chunks_storage(**kwargs)`` method call with ``.chunk_storage_stats(**same_kwargs).native_bytes``.",
            DeprecationWarning,
            stacklevel=2,
        )

        stats = self._repository.chunk_storage_stats(
            max_snapshots_in_memory=max_snapshots_in_memory,
            max_compressed_manifest_mem_bytes=max_compressed_manifest_mem_bytes,
            max_concurrent_manifest_fetches=max_concurrent_manifest_fetches,
        )
        return stats.native_bytes

    async def total_chunks_storage_async(
        self,
        *,
        max_snapshots_in_memory: int = 50,
        max_compressed_manifest_mem_bytes: int = 512 * 1024 * 1024,
        max_concurrent_manifest_fetches: int = 500,
    ) -> int:
        """Calculate the total storage used for chunks, in bytes (async version).

        It reports the storage needed to store all snapshots in the repository that
        are reachable from any branches or tags. Unreachable snapshots can be generated
        by using `reset_branch` or `expire_snapshots`. The chunks for these snapshots
        are not included in the result, and they should probably be deleted using
        `garbage_collection`.

        The result includes only native chunks, not adding virtual or inline chunks.

        Parameters
        ----------
        max_snapshots_in_memory: int
            Don't prefetch more than this many Snapshots to memory.
        max_compressed_manifest_mem_bytes : int
            Don't use more than this memory to store compressed in-flight manifests.
        max_concurrent_manifest_fetches : int
            Don't run more than this many concurrent manifest fetches.
        """

        warnings.warn(
            "The ``total_chunks_storage_async`` method has been deprecated in favour of the ``chunk_storage_stats_async`` method. "
            "The new method is superior, as it actually calculates storage size occupied by inlined and virtual chunks in addition to native chunks. "
            "You can still access just the total native bytes: to keep your existing behaviour using API that will not be removed in a future version, "
            "please replace your existing ``.total_chunks_storage_async(**kwargs)`` method call with ``.chunk_storage_stats_async(**same_kwargs).native_bytes``.",
            DeprecationWarning,
            stacklevel=2,
        )

        stats = await self._repository.chunk_storage_stats_async(
            max_snapshots_in_memory=max_snapshots_in_memory,
            max_compressed_manifest_mem_bytes=max_compressed_manifest_mem_bytes,
            max_concurrent_manifest_fetches=max_concurrent_manifest_fetches,
        )
        return stats.native_bytes

    def inspect_snapshot(self, snapshot_id: str, *, pretty: bool = True) -> str:
        return self._repository.inspect_snapshot(snapshot_id, pretty=pretty)

    async def inspect_snapshot_async(
        self, snapshot_id: str, *, pretty: bool = True
    ) -> str:
        return await self._repository.inspect_snapshot_async(snapshot_id, pretty=pretty)

    @property
    def spec_version(self) -> int:
        return self._repository.spec_version

authorized_virtual_container_prefixes property #

authorized_virtual_container_prefixes

Get all authorized virtual chunk container prefixes.

Returns:

Name Type Description
url_prefixes set[str]

The set of authorized url prefixes for each virtual chunk container

config property #

config

Get a copy of this repository's config.

Returns:

Type Description
RepositoryConfig

The repository configuration.

metadata property #

metadata

Get the current configured repository metadata.

Returns:

Type Description
dict[str, Any]

The repository level metadata.

storage property #

storage

Get a copy of this repository's Storage instance.

Returns:

Type Description
Storage

The repository storage instance.

ancestry #

ancestry(*, branch=None, tag=None, snapshot_id=None)

Get the ancestry of a snapshot.

Parameters:

Name Type Description Default
branch str

The branch to get the ancestry of.

None
tag str

The tag to get the ancestry of.

None
snapshot_id str

The snapshot ID to get the ancestry of.

None

Returns:

Type Description
list[SnapshotInfo]

The ancestry of the snapshot, listing out the snapshots and their metadata.

Notes

Only one of the arguments can be specified.

Source code in icechunk-python/python/icechunk/repository.py
def ancestry(
    self,
    *,
    branch: str | None = None,
    tag: str | None = None,
    snapshot_id: str | None = None,
) -> Iterator[SnapshotInfo]:
    """
    Get the ancestry of a snapshot.

    Parameters
    ----------
    branch : str, optional
        The branch to get the ancestry of.
    tag : str, optional
        The tag to get the ancestry of.
    snapshot_id : str, optional
        The snapshot ID to get the ancestry of.

    Returns
    -------
    list[SnapshotInfo]
        The ancestry of the snapshot, listing out the snapshots and their metadata.

    Notes
    -----
    Only one of the arguments can be specified.
    """

    # the returned object is both an Async and Sync iterator
    res = cast(
        Iterator[SnapshotInfo],
        self._repository.async_ancestry(
            branch=branch, tag=tag, snapshot_id=snapshot_id
        ),
    )
    return res

async_ancestry #

async_ancestry(*, branch=None, tag=None, snapshot_id=None)

Get the ancestry of a snapshot.

Parameters:

Name Type Description Default
branch str

The branch to get the ancestry of.

None
tag str

The tag to get the ancestry of.

None
snapshot_id str

The snapshot ID to get the ancestry of.

None

Returns:

Type Description
list[SnapshotInfo]

The ancestry of the snapshot, listing out the snapshots and their metadata.

Notes

Only one of the arguments can be specified.

Source code in icechunk-python/python/icechunk/repository.py
def async_ancestry(
    self,
    *,
    branch: str | None = None,
    tag: str | None = None,
    snapshot_id: str | None = None,
) -> AsyncIterator[SnapshotInfo]:
    """
    Get the ancestry of a snapshot.

    Parameters
    ----------
    branch : str, optional
        The branch to get the ancestry of.
    tag : str, optional
        The tag to get the ancestry of.
    snapshot_id : str, optional
        The snapshot ID to get the ancestry of.

    Returns
    -------
    list[SnapshotInfo]
        The ancestry of the snapshot, listing out the snapshots and their metadata.

    Notes
    -----
    Only one of the arguments can be specified.
    """
    return self._repository.async_ancestry(
        branch=branch, tag=tag, snapshot_id=snapshot_id
    )

chunk_storage_stats #

chunk_storage_stats(*, max_snapshots_in_memory=50, max_compressed_manifest_mem_bytes=512 * 1024 * 1024, max_concurrent_manifest_fetches=500)

Calculate the total storage used for chunks, in bytes.

It reports the storage needed to store all snapshots in the repository that are reachable from any branches or tags. Unreachable snapshots can be generated by using reset_branch or expire_snapshots. The chunks for these snapshots are not included in the result, and they should probably be deleted using garbage_collection.

The result is a dataclass with attributes for storage consumed by different types of chunks (e.g. native_bytes, virtual_bytes, total_bytes).

Parameters:

Name Type Description Default
max_snapshots_in_memory int

Don't prefetch more than this many Snapshots to memory.

50
max_compressed_manifest_mem_bytes int

Don't use more than this memory to store compressed in-flight manifests.

512 * 1024 * 1024
max_concurrent_manifest_fetches int

Don't run more than this many concurrent manifest fetches.

500
Source code in icechunk-python/python/icechunk/repository.py
def chunk_storage_stats(
    self,
    *,
    max_snapshots_in_memory: int = 50,
    max_compressed_manifest_mem_bytes: int = 512 * 1024 * 1024,
    max_concurrent_manifest_fetches: int = 500,
) -> ChunkStorageStats:
    """Calculate the total storage used for chunks, in bytes.

    It reports the storage needed to store all snapshots in the repository that
    are reachable from any branches or tags. Unreachable snapshots can be generated
    by using `reset_branch` or `expire_snapshots`. The chunks for these snapshots
    are not included in the result, and they should probably be deleted using
    `garbage_collection`.

    The result is a dataclass with attributes for storage consumed by different
    types of chunks (e.g. `native_bytes`, `virtual_bytes`, `total_bytes`).

    Parameters
    ----------
    max_snapshots_in_memory: int
        Don't prefetch more than this many Snapshots to memory.
    max_compressed_manifest_mem_bytes : int
        Don't use more than this memory to store compressed in-flight manifests.
    max_concurrent_manifest_fetches : int
        Don't run more than this many concurrent manifest fetches.
    """
    return self._repository.chunk_storage_stats(
        max_snapshots_in_memory=max_snapshots_in_memory,
        max_compressed_manifest_mem_bytes=max_compressed_manifest_mem_bytes,
        max_concurrent_manifest_fetches=max_concurrent_manifest_fetches,
    )

chunk_storage_stats_async async #

chunk_storage_stats_async(*, max_snapshots_in_memory=50, max_compressed_manifest_mem_bytes=512 * 1024 * 1024, max_concurrent_manifest_fetches=500)

Calculate the total storage used for chunks, in bytes (async version).

It reports the storage needed to store all snapshots in the repository that are reachable from any branches or tags. Unreachable snapshots can be generated by using reset_branch or expire_snapshots. The chunks for these snapshots are not included in the result, and they should probably be deleted using garbage_collection.

The result is a dataclass with attributes for storage consumed by different types of chunks (e.g. native_bytes, virtual_bytes, total_bytes).

Parameters:

Name Type Description Default
max_snapshots_in_memory int

Don't prefetch more than this many Snapshots to memory.

50
max_compressed_manifest_mem_bytes int

Don't use more than this memory to store compressed in-flight manifests.

512 * 1024 * 1024
max_concurrent_manifest_fetches int

Don't run more than this many concurrent manifest fetches.

500
Source code in icechunk-python/python/icechunk/repository.py
async def chunk_storage_stats_async(
    self,
    *,
    max_snapshots_in_memory: int = 50,
    max_compressed_manifest_mem_bytes: int = 512 * 1024 * 1024,
    max_concurrent_manifest_fetches: int = 500,
) -> ChunkStorageStats:
    """Calculate the total storage used for chunks, in bytes (async version).

    It reports the storage needed to store all snapshots in the repository that
    are reachable from any branches or tags. Unreachable snapshots can be generated
    by using `reset_branch` or `expire_snapshots`. The chunks for these snapshots
    are not included in the result, and they should probably be deleted using
    `garbage_collection`.

    The result is a dataclass with attributes for storage consumed by different
    types of chunks (e.g. `native_bytes`, `virtual_bytes`, `total_bytes`).

    Parameters
    ----------
    max_snapshots_in_memory: int
        Don't prefetch more than this many Snapshots to memory.
    max_compressed_manifest_mem_bytes : int
        Don't use more than this memory to store compressed in-flight manifests.
    max_concurrent_manifest_fetches : int
        Don't run more than this many concurrent manifest fetches.
    """
    return await self._repository.chunk_storage_stats_async(
        max_snapshots_in_memory=max_snapshots_in_memory,
        max_compressed_manifest_mem_bytes=max_compressed_manifest_mem_bytes,
        max_concurrent_manifest_fetches=max_concurrent_manifest_fetches,
    )

create classmethod #

create(storage, config=None, authorize_virtual_chunk_access=None, spec_version=None)

Create a new Icechunk repository. If one already exists at the given store location, an error will be raised.

Warning

Attempting to create a Repo concurrently in the same location from multiple processes is not safe. Instead, create a Repo once and then open it concurrently.

Parameters:

Name Type Description Default
storage Storage

The storage configuration for the repository.

required
config RepositoryConfig

The repository configuration. If not provided, a default configuration will be used.

None
authorize_virtual_chunk_access dict[str, AnyCredential | None]

Authorize Icechunk to access virtual chunks in these containers. A mapping from container url_prefix to the credentials to use to access chunks in that container. If credential is None, they will be fetched from the environment, or anonymous credentials will be used if the container allows it. As a security measure, Icechunk will block access to virtual chunks if the container is not authorized using this argument.

None
spec_version int

Use this version of the spec for the new repository. If not passed, the latest version of the spec that was available before the library version release will be used.

None

Returns:

Type Description
Self

An instance of the Repository class.

Source code in icechunk-python/python/icechunk/repository.py
@classmethod
def create(
    cls,
    storage: Storage,
    config: RepositoryConfig | None = None,
    authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
    spec_version: int | None = None,
) -> Self:
    """
    Create a new Icechunk repository.
    If one already exists at the given store location, an error will be raised.

    !!! warning
        Attempting to create a Repo concurrently in the same location from multiple processes is not safe.
        Instead, create a Repo once and then open it concurrently.

    Parameters
    ----------
    storage : Storage
        The storage configuration for the repository.
    config : RepositoryConfig, optional
        The repository configuration. If not provided, a default configuration will be used.
    authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
        Authorize Icechunk to access virtual chunks in these containers. A mapping
        from container url_prefix to the credentials to use to access chunks in
        that container. If credential is `None`, they will be fetched from the
        environment, or anonymous credentials will be used if the container allows it.
        As a security measure, Icechunk will block access to virtual chunks if the
        container is not authorized using this argument.
    spec_version : int, optional
        Use this version of the spec for the new repository. If not passed, the latest version
        of the spec that was available before the library version release will be used.

    Returns
    -------
    Self
        An instance of the Repository class.
    """
    return cls(
        PyRepository.create(
            storage,
            config=config,
            authorize_virtual_chunk_access=authorize_virtual_chunk_access,
            spec_version=spec_version,
        )
    )

create_async async classmethod #

create_async(storage, config=None, authorize_virtual_chunk_access=None, spec_version=None)

Create a new Icechunk repository asynchronously. If one already exists at the given store location, an error will be raised.

Warning

Attempting to create a Repo concurrently in the same location from multiple processes is not safe. Instead, create a Repo once and then open it concurrently.

Parameters:

Name Type Description Default
storage Storage

The storage configuration for the repository.

required
config RepositoryConfig

The repository configuration. If not provided, a default configuration will be used.

None
authorize_virtual_chunk_access dict[str, AnyCredential | None]

Authorize Icechunk to access virtual chunks in these containers. A mapping from container url_prefix to the credentials to use to access chunks in that container. If credential is None, they will be fetched from the environment, or anonymous credentials will be used if the container allows it. As a security measure, Icechunk will block access to virtual chunks if the container is not authorized using this argument.

None
spec_version int

Use this version of the spec for the new repository. If not passed, the latest version of the spec that was available before the library version release will be used.

None

Returns:

Type Description
Self

An instance of the Repository class.

Source code in icechunk-python/python/icechunk/repository.py
@classmethod
async def create_async(
    cls,
    storage: Storage,
    config: RepositoryConfig | None = None,
    authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
    spec_version: int | None = None,
) -> Self:
    """
    Create a new Icechunk repository asynchronously.
    If one already exists at the given store location, an error will be raised.

    !!! warning
        Attempting to create a Repo concurrently in the same location from multiple processes is not safe.
        Instead, create a Repo once and then open it concurrently.

    Parameters
    ----------
    storage : Storage
        The storage configuration for the repository.
    config : RepositoryConfig, optional
        The repository configuration. If not provided, a default configuration will be used.
    authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
        Authorize Icechunk to access virtual chunks in these containers. A mapping
        from container url_prefix to the credentials to use to access chunks in
        that container. If credential is `None`, they will be fetched from the
        environment, or anonymous credentials will be used if the container allows it.
        As a security measure, Icechunk will block access to virtual chunks if the
        container is not authorized using this argument.
    spec_version : int, optional
        Use this version of the spec for the new repository. If not passed, the latest version
        of the spec that was available before the library version release will be used.

    Returns
    -------
    Self
        An instance of the Repository class.
    """
    return cls(
        await PyRepository.create_async(
            storage,
            config=config,
            authorize_virtual_chunk_access=authorize_virtual_chunk_access,
            spec_version=spec_version,
        )
    )

create_branch #

create_branch(branch, snapshot_id)

Create a new branch at the given snapshot.

Parameters:

Name Type Description Default
branch str

The name of the branch to create.

required
snapshot_id str

The snapshot ID to create the branch at.

required

Returns:

Type Description
None
Source code in icechunk-python/python/icechunk/repository.py
def create_branch(self, branch: str, snapshot_id: str) -> None:
    """
    Create a new branch at the given snapshot.

    Parameters
    ----------
    branch : str
        The name of the branch to create.
    snapshot_id : str
        The snapshot ID to create the branch at.

    Returns
    -------
    None
    """
    self._repository.create_branch(branch, snapshot_id)

create_branch_async async #

create_branch_async(branch, snapshot_id)

Create a new branch at the given snapshot (async version).

Parameters:

Name Type Description Default
branch str

The name of the branch to create.

required
snapshot_id str

The snapshot ID to create the branch at.

required

Returns:

Type Description
None
Source code in icechunk-python/python/icechunk/repository.py
async def create_branch_async(self, branch: str, snapshot_id: str) -> None:
    """
    Create a new branch at the given snapshot (async version).

    Parameters
    ----------
    branch : str
        The name of the branch to create.
    snapshot_id : str
        The snapshot ID to create the branch at.

    Returns
    -------
    None
    """
    await self._repository.create_branch_async(branch, snapshot_id)

create_tag #

create_tag(tag, snapshot_id)

Create a new tag at the given snapshot.

Parameters:

Name Type Description Default
tag str

The name of the tag to create.

required
snapshot_id str

The snapshot ID to create the tag at.

required

Returns:

Type Description
None
Source code in icechunk-python/python/icechunk/repository.py
def create_tag(self, tag: str, snapshot_id: str) -> None:
    """
    Create a new tag at the given snapshot.

    Parameters
    ----------
    tag : str
        The name of the tag to create.
    snapshot_id : str
        The snapshot ID to create the tag at.

    Returns
    -------
    None
    """
    self._repository.create_tag(tag, snapshot_id)

create_tag_async async #

create_tag_async(tag, snapshot_id)

Create a new tag at the given snapshot (async version).

Parameters:

Name Type Description Default
tag str

The name of the tag to create.

required
snapshot_id str

The snapshot ID to create the tag at.

required

Returns:

Type Description
None
Source code in icechunk-python/python/icechunk/repository.py
async def create_tag_async(self, tag: str, snapshot_id: str) -> None:
    """
    Create a new tag at the given snapshot (async version).

    Parameters
    ----------
    tag : str
        The name of the tag to create.
    snapshot_id : str
        The snapshot ID to create the tag at.

    Returns
    -------
    None
    """
    await self._repository.create_tag_async(tag, snapshot_id)

default_commit_metadata #

default_commit_metadata()

Get the current configured default commit metadata for the repository.

Returns:

Type Description
dict[str, Any]

The default commit metadata.

Source code in icechunk-python/python/icechunk/repository.py
def default_commit_metadata(self) -> dict[str, Any]:
    """
    Get the current configured default commit metadata for the repository.

    Returns
    -------
    dict[str, Any]
        The default commit metadata.
    """
    return self._repository.default_commit_metadata()

delete_branch #

delete_branch(branch)

Delete a branch.

Parameters:

Name Type Description Default
branch str

The branch to delete.

required

Returns:

Type Description
None
Source code in icechunk-python/python/icechunk/repository.py
def delete_branch(self, branch: str) -> None:
    """
    Delete a branch.

    Parameters
    ----------
    branch : str
        The branch to delete.

    Returns
    -------
    None
    """
    self._repository.delete_branch(branch)

delete_branch_async async #

delete_branch_async(branch)

Delete a branch (async version).

Parameters:

Name Type Description Default
branch str

The branch to delete.

required

Returns:

Type Description
None
Source code in icechunk-python/python/icechunk/repository.py
async def delete_branch_async(self, branch: str) -> None:
    """
    Delete a branch (async version).

    Parameters
    ----------
    branch : str
        The branch to delete.

    Returns
    -------
    None
    """
    await self._repository.delete_branch_async(branch)

delete_tag #

delete_tag(tag)

Delete a tag.

Parameters:

Name Type Description Default
tag str

The tag to delete.

required

Returns:

Type Description
None
Source code in icechunk-python/python/icechunk/repository.py
def delete_tag(self, tag: str) -> None:
    """
    Delete a tag.

    Parameters
    ----------
    tag : str
        The tag to delete.

    Returns
    -------
    None
    """
    self._repository.delete_tag(tag)

delete_tag_async async #

delete_tag_async(tag)

Delete a tag (async version).

Parameters:

Name Type Description Default
tag str

The tag to delete.

required

Returns:

Type Description
None
Source code in icechunk-python/python/icechunk/repository.py
async def delete_tag_async(self, tag: str) -> None:
    """
    Delete a tag (async version).

    Parameters
    ----------
    tag : str
        The tag to delete.

    Returns
    -------
    None
    """
    await self._repository.delete_tag_async(tag)

diff #

diff(*, from_branch=None, from_tag=None, from_snapshot_id=None, to_branch=None, to_tag=None, to_snapshot_id=None)

Compute an overview of the operations executed from version from to version to.

Both versions, from and to, must be identified. Identification can be done using a branch, tag or snapshot id. The styles used to identify the from and to versions can be different.

The from version must be a member of the ancestry of to.

Returns:

Type Description
Diff

The operations executed between the two versions

Source code in icechunk-python/python/icechunk/repository.py
def diff(
    self,
    *,
    from_branch: str | None = None,
    from_tag: str | None = None,
    from_snapshot_id: str | None = None,
    to_branch: str | None = None,
    to_tag: str | None = None,
    to_snapshot_id: str | None = None,
) -> Diff:
    """
    Compute an overview of the operations executed from version `from` to version `to`.

    Both versions, `from` and `to`, must be identified. Identification can be done using a branch, tag or snapshot id.
    The styles used to identify the `from` and `to` versions can be different.

    The `from` version must be a member of the `ancestry` of `to`.

    Returns
    -------
    Diff
        The operations executed between the two versions
    """
    return self._repository.diff(
        from_branch=from_branch,
        from_tag=from_tag,
        from_snapshot_id=from_snapshot_id,
        to_branch=to_branch,
        to_tag=to_tag,
        to_snapshot_id=to_snapshot_id,
    )

diff_async async #

diff_async(*, from_branch=None, from_tag=None, from_snapshot_id=None, to_branch=None, to_tag=None, to_snapshot_id=None)

Compute an overview of the operations executed from version from to version to (async version).

Both versions, from and to, must be identified. Identification can be done using a branch, tag or snapshot id. The styles used to identify the from and to versions can be different.

The from version must be a member of the ancestry of to.

Returns:

Type Description
Diff

The operations executed between the two versions

Source code in icechunk-python/python/icechunk/repository.py
async def diff_async(
    self,
    *,
    from_branch: str | None = None,
    from_tag: str | None = None,
    from_snapshot_id: str | None = None,
    to_branch: str | None = None,
    to_tag: str | None = None,
    to_snapshot_id: str | None = None,
) -> Diff:
    """
    Compute an overview of the operations executed from version `from` to version `to` (async version).

    Both versions, `from` and `to`, must be identified. Identification can be done using a branch, tag or snapshot id.
    The styles used to identify the `from` and `to` versions can be different.

    The `from` version must be a member of the `ancestry` of `to`.

    Returns
    -------
    Diff
        The operations executed between the two versions
    """
    return await self._repository.diff_async(
        from_branch=from_branch,
        from_tag=from_tag,
        from_snapshot_id=from_snapshot_id,
        to_branch=to_branch,
        to_tag=to_tag,
        to_snapshot_id=to_snapshot_id,
    )

exists staticmethod #

exists(storage)

Check if a repository exists at the given storage location.

Parameters:

Name Type Description Default
storage Storage

The storage configuration for the repository.

required

Returns:

Type Description
bool

True if the repository exists, False otherwise.

Source code in icechunk-python/python/icechunk/repository.py
@staticmethod
def exists(storage: Storage) -> bool:
    """
    Check if a repository exists at the given storage location.

    Parameters
    ----------
    storage : Storage
        The storage configuration for the repository.

    Returns
    -------
    bool
        True if the repository exists, False otherwise.
    """
    return PyRepository.exists(storage)

exists_async async staticmethod #

exists_async(storage)

Check if a repository exists at the given storage location (async version).

Parameters:

Name Type Description Default
storage Storage

The storage configuration for the repository.

required

Returns:

Type Description
bool

True if the repository exists, False otherwise.

Source code in icechunk-python/python/icechunk/repository.py
@staticmethod
async def exists_async(storage: Storage) -> bool:
    """
    Check if a repository exists at the given storage location (async version).

    Parameters
    ----------
    storage : Storage
        The storage configuration for the repository.

    Returns
    -------
    bool
        True if the repository exists, False otherwise.
    """
    return await PyRepository.exists_async(storage)

expire_snapshots #

expire_snapshots(older_than, *, delete_expired_branches=False, delete_expired_tags=False)

Expire all snapshots older than a threshold.

This processes snapshots found by navigating all references in the repo, tags first, branches leter, both in lexicographical order.

Returns the ids of all snapshots considered expired and skipped from history. Notice that this snapshot are not necessarily available for garbage collection, they could still be pointed by ether refs.

If delete_expired_* is set to True, branches or tags that, after the expiration process, point to expired snapshots directly, will be deleted.

Danger

This is an administrative operation, it should be run carefully. The repository can still operate concurrently while expire_snapshots runs, but other readers can get inconsistent views of the repository history.

Parameters:

Name Type Description Default
older_than datetime

Expire snapshots older than this time.

required
delete_expired_branches bool

Whether to delete any branches that now have only expired snapshots.

False
delete_expired_tags bool

Whether to delete any tags associated with expired snapshots

False

Returns:

Type Description
set of expires snapshot IDs
Source code in icechunk-python/python/icechunk/repository.py
def expire_snapshots(
    self,
    older_than: datetime.datetime,
    *,
    delete_expired_branches: bool = False,
    delete_expired_tags: bool = False,
) -> set[str]:
    """Expire all snapshots older than a threshold.

    This processes snapshots found by navigating all references in
    the repo, tags first, branches leter, both in lexicographical order.

    Returns the ids of all snapshots considered expired and skipped
    from history. Notice that this snapshot are not necessarily
    available for garbage collection, they could still be pointed by
    ether refs.

    If `delete_expired_*` is set to True, branches or tags that, after the
    expiration process, point to expired snapshots directly, will be
    deleted.

    Danger
    ------
    This is an administrative operation, it should be run
    carefully. The repository can still operate concurrently while
    `expire_snapshots` runs, but other readers can get inconsistent
    views of the repository history.

    Parameters
    ----------
    older_than: datetime.datetime
        Expire snapshots older than this time.
    delete_expired_branches: bool, optional
        Whether to delete any branches that now have only expired snapshots.
    delete_expired_tags: bool, optional
        Whether to delete any tags associated with expired snapshots

    Returns
    -------
    set of expires snapshot IDs
    """
    return self._repository.expire_snapshots(
        older_than,
        delete_expired_branches=delete_expired_branches,
        delete_expired_tags=delete_expired_tags,
    )

expire_snapshots_async async #

expire_snapshots_async(older_than, *, delete_expired_branches=False, delete_expired_tags=False)

Expire all snapshots older than a threshold (async version).

This processes snapshots found by navigating all references in the repo, tags first, branches leter, both in lexicographical order.

Returns the ids of all snapshots considered expired and skipped from history. Notice that this snapshot are not necessarily available for garbage collection, they could still be pointed by ether refs.

If delete_expired_* is set to True, branches or tags that, after the expiration process, point to expired snapshots directly, will be deleted.

Danger

This is an administrative operation, it should be run carefully. The repository can still operate concurrently while expire_snapshots runs, but other readers can get inconsistent views of the repository history.

Parameters:

Name Type Description Default
older_than datetime

Expire snapshots older than this time.

required
delete_expired_branches bool

Whether to delete any branches that now have only expired snapshots.

False
delete_expired_tags bool

Whether to delete any tags associated with expired snapshots

False

Returns:

Type Description
set of expires snapshot IDs
Source code in icechunk-python/python/icechunk/repository.py
async def expire_snapshots_async(
    self,
    older_than: datetime.datetime,
    *,
    delete_expired_branches: bool = False,
    delete_expired_tags: bool = False,
) -> set[str]:
    """Expire all snapshots older than a threshold (async version).

    This processes snapshots found by navigating all references in
    the repo, tags first, branches leter, both in lexicographical order.

    Returns the ids of all snapshots considered expired and skipped
    from history. Notice that this snapshot are not necessarily
    available for garbage collection, they could still be pointed by
    ether refs.

    If `delete_expired_*` is set to True, branches or tags that, after the
    expiration process, point to expired snapshots directly, will be
    deleted.

    Danger
    ------
    This is an administrative operation, it should be run
    carefully. The repository can still operate concurrently while
    `expire_snapshots` runs, but other readers can get inconsistent
    views of the repository history.

    Parameters
    ----------
    older_than: datetime.datetime
        Expire snapshots older than this time.
    delete_expired_branches: bool, optional
        Whether to delete any branches that now have only expired snapshots.
    delete_expired_tags: bool, optional
        Whether to delete any tags associated with expired snapshots

    Returns
    -------
    set of expires snapshot IDs
    """
    return await self._repository.expire_snapshots_async(
        older_than,
        delete_expired_branches=delete_expired_branches,
        delete_expired_tags=delete_expired_tags,
    )

fetch_config staticmethod #

fetch_config(storage)

Fetch the configuration for the repository saved in storage.

Parameters:

Name Type Description Default
storage Storage

The storage configuration for the repository.

required

Returns:

Type Description
RepositoryConfig | None

The repository configuration if it exists, None otherwise.

Source code in icechunk-python/python/icechunk/repository.py
@staticmethod
def fetch_config(storage: Storage) -> RepositoryConfig | None:
    """
    Fetch the configuration for the repository saved in storage.

    Parameters
    ----------
    storage : Storage
        The storage configuration for the repository.

    Returns
    -------
    RepositoryConfig | None
        The repository configuration if it exists, None otherwise.
    """
    return PyRepository.fetch_config(storage)

fetch_config_async async staticmethod #

fetch_config_async(storage)

Fetch the configuration for the repository saved in storage (async version).

Parameters:

Name Type Description Default
storage Storage

The storage configuration for the repository.

required

Returns:

Type Description
RepositoryConfig | None

The repository configuration if it exists, None otherwise.

Source code in icechunk-python/python/icechunk/repository.py
@staticmethod
async def fetch_config_async(storage: Storage) -> RepositoryConfig | None:
    """
    Fetch the configuration for the repository saved in storage (async version).

    Parameters
    ----------
    storage : Storage
        The storage configuration for the repository.

    Returns
    -------
    RepositoryConfig | None
        The repository configuration if it exists, None otherwise.
    """
    return await PyRepository.fetch_config_async(storage)

fetch_spec_version staticmethod #

fetch_spec_version(storage)

Fetch the spec version of a repository without fully opening it.

This is useful for checking the repository format version before opening, for example to know what version of the library is needed to open it.

Parameters:

Name Type Description Default
storage Storage

The storage configuration for the repository.

required

Returns:

Type Description
int | None

The spec version of the repository if it exists, None if no repository exists at the given location.

Source code in icechunk-python/python/icechunk/repository.py
@staticmethod
def fetch_spec_version(storage: Storage) -> int | None:
    """
    Fetch the spec version of a repository without fully opening it.

    This is useful for checking the repository format version before opening,
    for example to know what version of the library is needed to open it.

    Parameters
    ----------
    storage : Storage
        The storage configuration for the repository.

    Returns
    -------
    int | None
        The spec version of the repository if it exists, None if no repository
        exists at the given location.
    """
    return PyRepository.fetch_spec_version(storage)

fetch_spec_version_async async staticmethod #

fetch_spec_version_async(storage)

Fetch the spec version of a repository without fully opening it (async version).

This is useful for checking the repository format version before opening, for example to know what version of the library is needed to open it.

Parameters:

Name Type Description Default
storage Storage

The storage configuration for the repository.

required

Returns:

Type Description
int | None

The spec version of the repository if it exists, None if no repository exists at the given location.

Source code in icechunk-python/python/icechunk/repository.py
@staticmethod
async def fetch_spec_version_async(storage: Storage) -> int | None:
    """
    Fetch the spec version of a repository without fully opening it (async version).

    This is useful for checking the repository format version before opening,
    for example to know what version of the library is needed to open it.

    Parameters
    ----------
    storage : Storage
        The storage configuration for the repository.

    Returns
    -------
    int | None
        The spec version of the repository if it exists, None if no repository
        exists at the given location.
    """
    return await PyRepository.fetch_spec_version_async(storage)

garbage_collect #

garbage_collect(delete_object_older_than, *, dry_run=False, max_snapshots_in_memory=50, max_compressed_manifest_mem_bytes=512 * 1024 * 1024, max_concurrent_manifest_fetches=500)

Delete any objects no longer accessible from any branches or tags.

Danger

This is an administrative operation, it should be run carefully. The repository can still operate concurrently while garbage_collect runs, but other reades can get inconsistent views if they are trying to access the expired snapshots.

Parameters:

Name Type Description Default
delete_object_older_than datetime

Delete objects older than this time.

required
dry_run bool

Report results but don't delete any objects

False
max_snapshots_in_memory int

Don't prefetch more than this many Snapshots to memory.

50
max_compressed_manifest_mem_bytes int

Don't use more than this memory to store compressed in-flight manifests.

512 * 1024 * 1024
max_concurrent_manifest_fetches int

Don't run more than this many concurrent manifest fetches.

500

Returns:

Type Description
GCSummary

Summary of objects deleted.

Source code in icechunk-python/python/icechunk/repository.py
def garbage_collect(
    self,
    delete_object_older_than: datetime.datetime,
    *,
    dry_run: bool = False,
    max_snapshots_in_memory: int = 50,
    max_compressed_manifest_mem_bytes: int = 512 * 1024 * 1024,
    max_concurrent_manifest_fetches: int = 500,
) -> GCSummary:
    """Delete any objects no longer accessible from any branches or tags.

    Danger
    ------
    This is an administrative operation, it should be run
    carefully. The repository can still operate concurrently while
    `garbage_collect` runs, but other reades can get inconsistent
    views if they are trying to access the expired snapshots.

    Parameters
    ----------
    delete_object_older_than: datetime.datetime
        Delete objects older than this time.
    dry_run: bool : bool
        Report results but don't delete any objects
    max_snapshots_in_memory : int
        Don't prefetch more than this many Snapshots to memory.
    max_compressed_manifest_mem_bytes : int
        Don't use more than this memory to store compressed in-flight manifests.
    max_concurrent_manifest_fetches : int
        Don't run more than this many concurrent manifest fetches.

    Returns
    -------
    GCSummary
        Summary of objects deleted.
    """

    return self._repository.garbage_collect(
        delete_object_older_than,
        dry_run=dry_run,
        max_snapshots_in_memory=max_snapshots_in_memory,
        max_compressed_manifest_mem_bytes=max_compressed_manifest_mem_bytes,
        max_concurrent_manifest_fetches=max_concurrent_manifest_fetches,
    )

garbage_collect_async async #

garbage_collect_async(delete_object_older_than, *, dry_run=False, max_snapshots_in_memory=50, max_compressed_manifest_mem_bytes=512 * 1024 * 1024, max_concurrent_manifest_fetches=500)

Delete any objects no longer accessible from any branches or tags (async version).

Danger

This is an administrative operation, it should be run carefully. The repository can still operate concurrently while garbage_collect runs, but other reades can get inconsistent views if they are trying to access the expired snapshots.

Parameters:

Name Type Description Default
delete_object_older_than datetime

Delete objects older than this time.

required
dry_run bool

Report results but don't delete any objects

False
max_snapshots_in_memory int

Don't prefetch more than this many Snapshots to memory.

50
max_compressed_manifest_mem_bytes int

Don't use more than this memory to store compressed in-flight manifests.

512 * 1024 * 1024
max_concurrent_manifest_fetches int

Don't run more than this many concurrent manifest fetches.

500

Returns:

Type Description
GCSummary

Summary of objects deleted.

Source code in icechunk-python/python/icechunk/repository.py
async def garbage_collect_async(
    self,
    delete_object_older_than: datetime.datetime,
    *,
    dry_run: bool = False,
    max_snapshots_in_memory: int = 50,
    max_compressed_manifest_mem_bytes: int = 512 * 1024 * 1024,
    max_concurrent_manifest_fetches: int = 500,
) -> GCSummary:
    """Delete any objects no longer accessible from any branches or tags (async version).

    Danger
    ------
    This is an administrative operation, it should be run
    carefully. The repository can still operate concurrently while
    `garbage_collect` runs, but other reades can get inconsistent
    views if they are trying to access the expired snapshots.

    Parameters
    ----------
    delete_object_older_than: datetime.datetime
        Delete objects older than this time.
    dry_run: bool : bool
        Report results but don't delete any objects
    max_snapshots_in_memory : int
        Don't prefetch more than this many Snapshots to memory.
    max_compressed_manifest_mem_bytes : int
        Don't use more than this memory to store compressed in-flight manifests.
    max_concurrent_manifest_fetches : int
        Don't run more than this many concurrent manifest fetches.

    Returns
    -------
    GCSummary
        Summary of objects deleted.
    """

    return await self._repository.garbage_collect_async(
        delete_object_older_than,
        dry_run=dry_run,
        max_snapshots_in_memory=max_snapshots_in_memory,
        max_compressed_manifest_mem_bytes=max_compressed_manifest_mem_bytes,
        max_concurrent_manifest_fetches=max_concurrent_manifest_fetches,
    )

get_metadata #

get_metadata()

Get the current configured repository metadata.

Returns:

Type Description
dict[str, Any]

The repository level metadata.

Source code in icechunk-python/python/icechunk/repository.py
def get_metadata(self) -> dict[str, Any]:
    """
    Get the current configured repository metadata.

    Returns
    -------
    dict[str, Any]
        The repository level metadata.
    """
    return self._repository.get_metadata()

get_metadata_async async #

get_metadata_async()

Get the current configured repository metadata.

Returns:

Type Description
dict[str, Any]

The repository level metadata.

Source code in icechunk-python/python/icechunk/repository.py
async def get_metadata_async(self) -> dict[str, Any]:
    """
    Get the current configured repository metadata.

    Returns
    -------
    dict[str, Any]
        The repository level metadata.
    """
    return await self._repository.get_metadata_async()

list_branches #

list_branches()

List the branches in the repository.

Returns:

Type Description
set[str]

A set of branch names.

Source code in icechunk-python/python/icechunk/repository.py
def list_branches(self) -> set[str]:
    """
    List the branches in the repository.

    Returns
    -------
    set[str]
        A set of branch names.
    """
    return self._repository.list_branches()

list_branches_async async #

list_branches_async()

List the branches in the repository (async version).

Returns:

Type Description
set[str]

A set of branch names.

Source code in icechunk-python/python/icechunk/repository.py
async def list_branches_async(self) -> set[str]:
    """
    List the branches in the repository (async version).

    Returns
    -------
    set[str]
        A set of branch names.
    """
    return await self._repository.list_branches_async()

list_manifest_files #

list_manifest_files(snapshot_id)

Get the manifest files used by the given snapshot ID

Parameters:

Name Type Description Default
snapshot_id str

The id of the snapshot to get information for

required

Returns:

Type Description
list[ManifestFileInfo]
Source code in icechunk-python/python/icechunk/repository.py
def list_manifest_files(self, snapshot_id: str) -> list[ManifestFileInfo]:
    """
    Get the manifest files used by the given snapshot ID

    Parameters
    ----------
    snapshot_id : str
        The id of the snapshot to get information for

    Returns
    -------
    list[ManifestFileInfo]
    """
    return self._repository.list_manifest_files(snapshot_id)

list_manifest_files_async async #

list_manifest_files_async(snapshot_id)

Get the manifest files used by the given snapshot ID

Parameters:

Name Type Description Default
snapshot_id str

The id of the snapshot to get information for

required

Returns:

Type Description
list[ManifestFileInfo]
Source code in icechunk-python/python/icechunk/repository.py
async def list_manifest_files_async(self, snapshot_id: str) -> list[ManifestFileInfo]:
    """
    Get the manifest files used by the given snapshot ID

    Parameters
    ----------
    snapshot_id : str
        The id of the snapshot to get information for

    Returns
    -------
    list[ManifestFileInfo]
    """
    return await self._repository.list_manifest_files_async(snapshot_id)

list_tags #

list_tags()

List the tags in the repository.

Returns:

Type Description
set[str]

A set of tag names.

Source code in icechunk-python/python/icechunk/repository.py
def list_tags(self) -> set[str]:
    """
    List the tags in the repository.

    Returns
    -------
    set[str]
        A set of tag names.
    """
    return self._repository.list_tags()

list_tags_async async #

list_tags_async()

List the tags in the repository (async version).

Returns:

Type Description
set[str]

A set of tag names.

Source code in icechunk-python/python/icechunk/repository.py
async def list_tags_async(self) -> set[str]:
    """
    List the tags in the repository (async version).

    Returns
    -------
    set[str]
        A set of tag names.
    """
    return await self._repository.list_tags_async()

lookup_branch #

lookup_branch(branch)

Get the tip snapshot ID of a branch.

Parameters:

Name Type Description Default
branch str

The branch to get the tip of.

required

Returns:

Type Description
str

The snapshot ID of the tip of the branch.

Source code in icechunk-python/python/icechunk/repository.py
def lookup_branch(self, branch: str) -> str:
    """
    Get the tip snapshot ID of a branch.

    Parameters
    ----------
    branch : str
        The branch to get the tip of.

    Returns
    -------
    str
        The snapshot ID of the tip of the branch.
    """
    return self._repository.lookup_branch(branch)

lookup_branch_async async #

lookup_branch_async(branch)

Get the tip snapshot ID of a branch (async version).

Parameters:

Name Type Description Default
branch str

The branch to get the tip of.

required

Returns:

Type Description
str

The snapshot ID of the tip of the branch.

Source code in icechunk-python/python/icechunk/repository.py
async def lookup_branch_async(self, branch: str) -> str:
    """
    Get the tip snapshot ID of a branch (async version).

    Parameters
    ----------
    branch : str
        The branch to get the tip of.

    Returns
    -------
    str
        The snapshot ID of the tip of the branch.
    """
    return await self._repository.lookup_branch_async(branch)

lookup_snapshot #

lookup_snapshot(snapshot_id)

Get the SnapshotInfo given a snapshot ID

Parameters:

Name Type Description Default
snapshot_id str

The id of the snapshot to look up

required

Returns:

Type Description
SnapshotInfo
Source code in icechunk-python/python/icechunk/repository.py
def lookup_snapshot(self, snapshot_id: str) -> SnapshotInfo:
    """
    Get the SnapshotInfo given a snapshot ID

    Parameters
    ----------
    snapshot_id : str
        The id of the snapshot to look up

    Returns
    -------
    SnapshotInfo
    """
    return self._repository.lookup_snapshot(snapshot_id)

lookup_snapshot_async async #

lookup_snapshot_async(snapshot_id)

Get the SnapshotInfo given a snapshot ID (async version)

Parameters:

Name Type Description Default
snapshot_id str

The id of the snapshot to look up

required

Returns:

Type Description
SnapshotInfo
Source code in icechunk-python/python/icechunk/repository.py
async def lookup_snapshot_async(self, snapshot_id: str) -> SnapshotInfo:
    """
    Get the SnapshotInfo given a snapshot ID (async version)

    Parameters
    ----------
    snapshot_id : str
        The id of the snapshot to look up

    Returns
    -------
    SnapshotInfo
    """
    return await self._repository.lookup_snapshot_async(snapshot_id)

lookup_tag #

lookup_tag(tag)

Get the snapshot ID of a tag.

Parameters:

Name Type Description Default
tag str

The tag to get the snapshot ID of.

required

Returns:

Type Description
str

The snapshot ID of the tag.

Source code in icechunk-python/python/icechunk/repository.py
def lookup_tag(self, tag: str) -> str:
    """
    Get the snapshot ID of a tag.

    Parameters
    ----------
    tag : str
        The tag to get the snapshot ID of.

    Returns
    -------
    str
        The snapshot ID of the tag.
    """
    return self._repository.lookup_tag(tag)

lookup_tag_async async #

lookup_tag_async(tag)

Get the snapshot ID of a tag (async version).

Parameters:

Name Type Description Default
tag str

The tag to get the snapshot ID of.

required

Returns:

Type Description
str

The snapshot ID of the tag.

Source code in icechunk-python/python/icechunk/repository.py
async def lookup_tag_async(self, tag: str) -> str:
    """
    Get the snapshot ID of a tag (async version).

    Parameters
    ----------
    tag : str
        The tag to get the snapshot ID of.

    Returns
    -------
    str
        The snapshot ID of the tag.
    """
    return await self._repository.lookup_tag_async(tag)

open classmethod #

open(storage, config=None, authorize_virtual_chunk_access=None)

Open an existing Icechunk repository.

If no repository exists at the given storage location, an error will be raised.

Warning

This method must be used with care in a multiprocessing context. Read more in our Parallel Write Guide.

Parameters:

Name Type Description Default
storage Storage

The storage configuration for the repository.

required
config RepositoryConfig

The repository settings. If not provided, a default configuration will be loaded from the repository.

None
authorize_virtual_chunk_access dict[str, AnyCredential | None]

Authorize Icechunk to access virtual chunks in these containers. A mapping from container url_prefix to the credentials to use to access chunks in that container. If credential is None, they will be fetched from the environment, or anonymous credentials will be used if the container allows it. As a security measure, Icechunk will block access to virtual chunks if the container is not authorized using this argument.

None

Returns:

Type Description
Self

An instance of the Repository class.

Source code in icechunk-python/python/icechunk/repository.py
@classmethod
def open(
    cls,
    storage: Storage,
    config: RepositoryConfig | None = None,
    authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
) -> Self:
    """
    Open an existing Icechunk repository.

    If no repository exists at the given storage location, an error will be raised.

    !!! warning
        This method must be used with care in a multiprocessing context.
        Read more in our [Parallel Write Guide](./parallel.md#uncooperative-distributed-writes).

    Parameters
    ----------
    storage : Storage
        The storage configuration for the repository.
    config : RepositoryConfig, optional
        The repository settings. If not provided, a default configuration will be
        loaded from the repository.
    authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
        Authorize Icechunk to access virtual chunks in these containers. A mapping
        from container url_prefix to the credentials to use to access chunks in
        that container. If credential is `None`, they will be fetched from the
        environment, or anonymous credentials will be used if the container allows it.
        As a security measure, Icechunk will block access to virtual chunks if the
        container is not authorized using this argument.

    Returns
    -------
    Self
        An instance of the Repository class.
    """
    return cls(
        PyRepository.open(
            storage,
            config=config,
            authorize_virtual_chunk_access=authorize_virtual_chunk_access,
        )
    )

open_async async classmethod #

open_async(storage, config=None, authorize_virtual_chunk_access=None)

Open an existing Icechunk repository asynchronously.

If no repository exists at the given storage location, an error will be raised.

Warning

This method must be used with care in a multiprocessing context. Read more in our Parallel Write Guide.

Parameters:

Name Type Description Default
storage Storage

The storage configuration for the repository.

required
config RepositoryConfig

The repository settings. If not provided, a default configuration will be loaded from the repository.

None
authorize_virtual_chunk_access dict[str, AnyCredential | None]

Authorize Icechunk to access virtual chunks in these containers. A mapping from container url_prefix to the credentials to use to access chunks in that container. If credential is None, they will be fetched from the environment, or anonymous credentials will be used if the container allows it. As a security measure, Icechunk will block access to virtual chunks if the container is not authorized using this argument.

None

Returns:

Type Description
Self

An instance of the Repository class.

Source code in icechunk-python/python/icechunk/repository.py
@classmethod
async def open_async(
    cls,
    storage: Storage,
    config: RepositoryConfig | None = None,
    authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
) -> Self:
    """
    Open an existing Icechunk repository asynchronously.

    If no repository exists at the given storage location, an error will be raised.

    !!! warning
        This method must be used with care in a multiprocessing context.
        Read more in our [Parallel Write Guide](./parallel.md#uncooperative-distributed-writes).

    Parameters
    ----------
    storage : Storage
        The storage configuration for the repository.
    config : RepositoryConfig, optional
        The repository settings. If not provided, a default configuration will be
        loaded from the repository.
    authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
        Authorize Icechunk to access virtual chunks in these containers. A mapping
        from container url_prefix to the credentials to use to access chunks in
        that container. If credential is `None`, they will be fetched from the
        environment, or anonymous credentials will be used if the container allows it.
        As a security measure, Icechunk will block access to virtual chunks if the
        container is not authorized using this argument.

    Returns
    -------
    Self
        An instance of the Repository class.
    """
    return cls(
        await PyRepository.open_async(
            storage,
            config=config,
            authorize_virtual_chunk_access=authorize_virtual_chunk_access,
        )
    )

open_or_create classmethod #

open_or_create(storage, config=None, authorize_virtual_chunk_access=None, create_version=None)

Open an existing Icechunk repository or create a new one if it does not exist.

Warning

This method must be used with care in a multiprocessing context. Read more in our Parallel Write Guide.

Attempting to create a Repo concurrently in the same location from multiple processes is not safe. Instead, create a Repo once and then open it concurrently.

Parameters:

Name Type Description Default
storage Storage

The storage configuration for the repository.

required
config RepositoryConfig

The repository settings. If not provided, a default configuration will be loaded from the repository.

None
authorize_virtual_chunk_access dict[str, AnyCredential | None]

Authorize Icechunk to access virtual chunks in these containers. A mapping from container url_prefix to the credentials to use to access chunks in that container. If credential is None, they will be fetched from the environment, or anonymous credentials will be used if the container allows it. As a security measure, Icechunk will block access to virtual chunks if the container is not authorized using this argument.

None
create_version int

Use this version of the spec for the new repository, if it needs to be created. If not passed, the latest version of the spec that was available before the library version release will be used.

None

Returns:

Type Description
Self

An instance of the Repository class.

Source code in icechunk-python/python/icechunk/repository.py
@classmethod
def open_or_create(
    cls,
    storage: Storage,
    config: RepositoryConfig | None = None,
    authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
    create_version: int | None = None,
) -> Self:
    """
    Open an existing Icechunk repository or create a new one if it does not exist.

    !!! warning
        This method must be used with care in a multiprocessing context.
        Read more in our [Parallel Write Guide](./parallel.md#uncooperative-distributed-writes).

        Attempting to create a Repo concurrently in the same location from multiple processes is not safe.
        Instead, create a Repo once and then open it concurrently.

    Parameters
    ----------
    storage : Storage
        The storage configuration for the repository.
    config : RepositoryConfig, optional
        The repository settings. If not provided, a default configuration will be
        loaded from the repository.
    authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
        Authorize Icechunk to access virtual chunks in these containers. A mapping
        from container url_prefix to the credentials to use to access chunks in
        that container. If credential is `None`, they will be fetched from the
        environment, or anonymous credentials will be used if the container allows it.
        As a security measure, Icechunk will block access to virtual chunks if the
        container is not authorized using this argument.
    create_version : int, optional
        Use this version of the spec for the new repository, if it needs to be created.
        If not passed, the latest version of the spec that was available before the
        library version release will be used.


    Returns
    -------
    Self
        An instance of the Repository class.
    """
    return cls(
        PyRepository.open_or_create(
            storage,
            config=config,
            authorize_virtual_chunk_access=authorize_virtual_chunk_access,
            create_version=create_version,
        )
    )

open_or_create_async async classmethod #

open_or_create_async(storage, config=None, authorize_virtual_chunk_access=None, create_version=None)

Open an existing Icechunk repository or create a new one if it does not exist (async version).

Warning

This method must be used with care in a multiprocessing context. Read more in our Parallel Write Guide.

Attempting to create a Repo concurrently in the same location from multiple processes is not safe. Instead, create a Repo once and then open it concurrently.

Parameters:

Name Type Description Default
storage Storage

The storage configuration for the repository.

required
config RepositoryConfig

The repository settings. If not provided, a default configuration will be loaded from the repository.

None
authorize_virtual_chunk_access dict[str, AnyCredential | None]

Authorize Icechunk to access virtual chunks in these containers. A mapping from container url_prefix to the credentials to use to access chunks in that container. If credential is None, they will be fetched from the environment, or anonymous credentials will be used if the container allows it. As a security measure, Icechunk will block access to virtual chunks if the container is not authorized using this argument.

None
create_version int

Use this version of the spec for the new repository, if it needs to be created. If not passed, the latest version of the spec that was available before the library version release will be used.

None

Returns:

Type Description
Self

An instance of the Repository class.

Source code in icechunk-python/python/icechunk/repository.py
@classmethod
async def open_or_create_async(
    cls,
    storage: Storage,
    config: RepositoryConfig | None = None,
    authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
    create_version: int | None = None,
) -> Self:
    """
    Open an existing Icechunk repository or create a new one if it does not exist (async version).

    !!! warning
        This method must be used with care in a multiprocessing context.
        Read more in our [Parallel Write Guide](./parallel.md#uncooperative-distributed-writes).

        Attempting to create a Repo concurrently in the same location from multiple processes is not safe.
        Instead, create a Repo once and then open it concurrently.

    Parameters
    ----------
    storage : Storage
        The storage configuration for the repository.
    config : RepositoryConfig, optional
        The repository settings. If not provided, a default configuration will be
        loaded from the repository.
    authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
        Authorize Icechunk to access virtual chunks in these containers. A mapping
        from container url_prefix to the credentials to use to access chunks in
        that container. If credential is `None`, they will be fetched from the
        environment, or anonymous credentials will be used if the container allows it.
        As a security measure, Icechunk will block access to virtual chunks if the
        container is not authorized using this argument.
    create_version : int, optional
        Use this version of the spec for the new repository, if it needs to be created.
        If not passed, the latest version of the spec that was available before the
        library version release will be used.

    Returns
    -------
    Self
        An instance of the Repository class.
    """
    return cls(
        await PyRepository.open_or_create_async(
            storage,
            config=config,
            authorize_virtual_chunk_access=authorize_virtual_chunk_access,
            create_version=create_version,
        )
    )

ops_log #

ops_log()

Get a summary of changes to the repository

Source code in icechunk-python/python/icechunk/repository.py
def ops_log(self) -> Iterator[UpdateType]:
    """
    Get a summary of changes to the repository
    """

    # the returned object is both an Async and Sync iterator
    res = cast(
        Iterator[UpdateType],
        self._repository.async_ops_log(),
    )
    return res

ops_log_async #

ops_log_async()

Get a summary of changes to the repository

Source code in icechunk-python/python/icechunk/repository.py
def ops_log_async(self) -> AsyncIterator[UpdateType]:
    """
    Get a summary of changes to the repository
    """

    # the returned object is both an Async and Sync iterator
    return self._repository.async_ops_log()

readonly_session #

readonly_session(branch=None, *, tag=None, snapshot_id=None, as_of=None)

Create a read-only session.

This can be thought of as a read-only checkout of the repository at a given snapshot. When branch or tag are provided, the session will be based on the tip of the branch or the snapshot ID of the tag.

Parameters:

Name Type Description Default
branch str

If provided, the branch to create the session on.

None
tag str

If provided, the tag to create the session on.

None
snapshot_id str

If provided, the snapshot ID to create the session on.

None
as_of datetime | None

When combined with the branch argument, it will open the session at the last snapshot that is at or before this datetime

None

Returns:

Type Description
Session

The read-only session, pointing to the specified snapshot, tag, or branch.

Notes

Only one of the arguments can be specified.

Source code in icechunk-python/python/icechunk/repository.py
def readonly_session(
    self,
    branch: str | None = None,
    *,
    tag: str | None = None,
    snapshot_id: str | None = None,
    as_of: datetime.datetime | None = None,
) -> Session:
    """
    Create a read-only session.

    This can be thought of as a read-only checkout of the repository at a given snapshot.
    When branch or tag are provided, the session will be based on the tip of the branch or
    the snapshot ID of the tag.

    Parameters
    ----------
    branch : str, optional
        If provided, the branch to create the session on.
    tag : str, optional
        If provided, the tag to create the session on.
    snapshot_id : str, optional
        If provided, the snapshot ID to create the session on.
    as_of: datetime.datetime, optional
        When combined with the branch argument, it will open the session at the last
        snapshot that is at or before this datetime

    Returns
    -------
    Session
        The read-only session, pointing to the specified snapshot, tag, or branch.

    Notes
    -----
    Only one of the arguments can be specified.
    """
    return Session(
        self._repository.readonly_session(
            branch=branch, tag=tag, snapshot_id=snapshot_id, as_of=as_of
        )
    )

readonly_session_async async #

readonly_session_async(branch=None, *, tag=None, snapshot_id=None, as_of=None)

Create a read-only session (async version).

This can be thought of as a read-only checkout of the repository at a given snapshot. When branch or tag are provided, the session will be based on the tip of the branch or the snapshot ID of the tag.

Parameters:

Name Type Description Default
branch str

If provided, the branch to create the session on.

None
tag str

If provided, the tag to create the session on.

None
snapshot_id str

If provided, the snapshot ID to create the session on.

None
as_of datetime | None

When combined with the branch argument, it will open the session at the last snapshot that is at or before this datetime

None

Returns:

Type Description
Session

The read-only session, pointing to the specified snapshot, tag, or branch.

Notes

Only one of the arguments can be specified.

Source code in icechunk-python/python/icechunk/repository.py
async def readonly_session_async(
    self,
    branch: str | None = None,
    *,
    tag: str | None = None,
    snapshot_id: str | None = None,
    as_of: datetime.datetime | None = None,
) -> Session:
    """
    Create a read-only session (async version).

    This can be thought of as a read-only checkout of the repository at a given snapshot.
    When branch or tag are provided, the session will be based on the tip of the branch or
    the snapshot ID of the tag.

    Parameters
    ----------
    branch : str, optional
        If provided, the branch to create the session on.
    tag : str, optional
        If provided, the tag to create the session on.
    snapshot_id : str, optional
        If provided, the snapshot ID to create the session on.
    as_of: datetime.datetime, optional
        When combined with the branch argument, it will open the session at the last
        snapshot that is at or before this datetime

    Returns
    -------
    Session
        The read-only session, pointing to the specified snapshot, tag, or branch.

    Notes
    -----
    Only one of the arguments can be specified.
    """
    return Session(
        await self._repository.readonly_session_async(
            branch=branch, tag=tag, snapshot_id=snapshot_id, as_of=as_of
        )
    )

rearrange_session #

rearrange_session(branch)

Create a session to move/rename nodes in the Zarr hierarchy.

Like the read-only session, this can be thought of as a checkout of the repository at the tip of the branch. However, this session is writable and can be used to make changes to the repository. When ready, the changes can be committed to the branch, after which the session will become a read-only session on the new snapshot.

This session only allows to make changes through Session.move. If you want to modify data, and not only move nodes, use Session.writable_session instead.

Parameters:

Name Type Description Default
branch str

The branch to create the session on.

required

Returns:

Type Description
Session

The writable session on the branch.

Source code in icechunk-python/python/icechunk/repository.py
def rearrange_session(self, branch: str) -> Session:
    """
    Create a session to move/rename nodes in the Zarr hierarchy.

    Like the read-only session, this can be thought of as a checkout of the repository at the
    tip of the branch. However, this session is writable and can be used to make changes to the
    repository. When ready, the changes can be committed to the branch, after which the session will
    become a read-only session on the new snapshot.

    This session only allows to make changes through `Session.move`. If you want to modify data, and
    not only move nodes, use `Session.writable_session` instead.

    Parameters
    ----------
    branch : str
        The branch to create the session on.

    Returns
    -------
    Session
        The writable session on the branch.
    """
    return Session(self._repository.rearrange_session(branch))

rearrange_session_async async #

rearrange_session_async(branch)

Create a session to move/rename nodes in the Zarr hierarchy.

Like the read-only session, this can be thought of as a checkout of the repository at the tip of the branch. However, this session is writable and can be used to make changes to the repository. When ready, the changes can be committed to the branch, after which the session will become a read-only session on the new snapshot.

This session only allows to make changes through Session.move. If you want to modify data, and not only move nodes, use Session.writable_session instead.

Parameters:

Name Type Description Default
branch str

The branch to create the session on.

required

Returns:

Type Description
Session

The writable session on the branch.

Source code in icechunk-python/python/icechunk/repository.py
async def rearrange_session_async(self, branch: str) -> Session:
    """
    Create a session to move/rename nodes in the Zarr hierarchy.

    Like the read-only session, this can be thought of as a checkout of the repository at the
    tip of the branch. However, this session is writable and can be used to make changes to the
    repository. When ready, the changes can be committed to the branch, after which the session will
    become a read-only session on the new snapshot.

    This session only allows to make changes through `Session.move`. If you want to modify data, and
    not only move nodes, use `Session.writable_session` instead.

    Parameters
    ----------
    branch : str
        The branch to create the session on.

    Returns
    -------
    Session
        The writable session on the branch.
    """
    return Session(await self._repository.rearrange_session_async(branch))

reopen #

reopen(config=None, authorize_virtual_chunk_access=None)

Reopen the repository with new configuration or credentials.

Parameters:

Name Type Description Default
config RepositoryConfig

The new repository configuration. If not provided, uses the existing configuration.

None
authorize_virtual_chunk_access dict[str, AnyCredential | None]

New virtual chunk access credentials.

None

Returns:

Type Description
Self

A new Repository instance with the updated configuration.

Source code in icechunk-python/python/icechunk/repository.py
def reopen(
    self,
    config: RepositoryConfig | None = None,
    authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
) -> Self:
    """
    Reopen the repository with new configuration or credentials.

    Parameters
    ----------
    config : RepositoryConfig, optional
        The new repository configuration. If not provided, uses the existing configuration.
    authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
        New virtual chunk access credentials.

    Returns
    -------
    Self
        A new Repository instance with the updated configuration.
    """
    return self.__class__(
        self._repository.reopen(
            config=config,
            authorize_virtual_chunk_access=authorize_virtual_chunk_access,
        )
    )

reopen_async async #

reopen_async(config=None, authorize_virtual_chunk_access=None)

Reopen the repository with new configuration or credentials (async version).

Parameters:

Name Type Description Default
config RepositoryConfig

The new repository configuration. If not provided, uses the existing configuration.

None
authorize_virtual_chunk_access dict[str, AnyCredential | None]

New virtual chunk access credentials.

None

Returns:

Type Description
Self

A new Repository instance with the updated configuration.

Source code in icechunk-python/python/icechunk/repository.py
async def reopen_async(
    self,
    config: RepositoryConfig | None = None,
    authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
) -> Self:
    """
    Reopen the repository with new configuration or credentials (async version).

    Parameters
    ----------
    config : RepositoryConfig, optional
        The new repository configuration. If not provided, uses the existing configuration.
    authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
        New virtual chunk access credentials.

    Returns
    -------
    Self
        A new Repository instance with the updated configuration.
    """
    return self.__class__(
        await self._repository.reopen_async(
            config=config,
            authorize_virtual_chunk_access=authorize_virtual_chunk_access,
        )
    )

reset_branch #

reset_branch(branch, snapshot_id, *, from_snapshot_id=None)

Reset a branch to a specific snapshot.

This will permanently alter the history of the branch such that the tip of the branch is the specified snapshot.

Parameters:

Name Type Description Default
branch str

The branch to reset.

required
snapshot_id str

The snapshot ID to reset the branch to.

required
from_snapshot_id str | None

If passed, the reset will only be executed if the branch currently points to from_snapshot_id.

None

Returns:

Type Description
None
Source code in icechunk-python/python/icechunk/repository.py
def reset_branch(
    self, branch: str, snapshot_id: str, *, from_snapshot_id: str | None = None
) -> None:
    """
    Reset a branch to a specific snapshot.

    This will permanently alter the history of the branch such that the tip of
    the branch is the specified snapshot.

    Parameters
    ----------
    branch : str
        The branch to reset.
    snapshot_id : str
        The snapshot ID to reset the branch to.
    from_snapshot_id : str | None
        If passed, the reset will only be executed if the branch currently
        points to from_snapshot_id.

    Returns
    -------
    None
    """
    self._repository.reset_branch(branch, snapshot_id, from_snapshot_id)

reset_branch_async async #

reset_branch_async(branch, snapshot_id, *, from_snapshot_id=None)

Reset a branch to a specific snapshot (async version).

This will permanently alter the history of the branch such that the tip of the branch is the specified snapshot.

Parameters:

Name Type Description Default
branch str

The branch to reset.

required
snapshot_id str

The snapshot ID to reset the branch to.

required
from_snapshot_id str | None

If passed, the reset will only be executed if the branch currently points to from_snapshot_id.

None

Returns:

Type Description
None
Source code in icechunk-python/python/icechunk/repository.py
async def reset_branch_async(
    self, branch: str, snapshot_id: str, *, from_snapshot_id: str | None = None
) -> None:
    """
    Reset a branch to a specific snapshot (async version).

    This will permanently alter the history of the branch such that the tip of
    the branch is the specified snapshot.

    Parameters
    ----------
    branch : str
        The branch to reset.
    snapshot_id : str
        The snapshot ID to reset the branch to.
    from_snapshot_id : str | None
        If passed, the reset will only be executed if the branch currently
        points to from_snapshot_id.

    Returns
    -------
    None
    """
    await self._repository.reset_branch_async(branch, snapshot_id, from_snapshot_id)

rewrite_manifests #

rewrite_manifests(message, *, branch, metadata=None)

Rewrite manifests for all arrays.

This method will start a new writable session on the specified branch, rewrite manifests for all arrays, and then commits with the specifeid message and metadata.

A JSON representation of the currently active splitting configuration will be stored in the commit's metadata under the key "splitting_config".

Parameters:

Name Type Description Default
message str

The message to write with the commit.

required
branch str

The branch to commit to.

required
metadata dict[str, Any] | None

Additional metadata to store with the commit snapshot.

None

Returns:

Type Description
str

The snapshot ID of the new commit.

Source code in icechunk-python/python/icechunk/repository.py
def rewrite_manifests(
    self, message: str, *, branch: str, metadata: dict[str, Any] | None = None
) -> str:
    """
    Rewrite manifests for all arrays.

    This method will start a new writable session on the specified branch,
    rewrite manifests for all arrays, and then commits with the specifeid ``message``
    and ``metadata``.

    A JSON representation of the currently active splitting configuration will be
    stored in the commit's metadata under the key `"splitting_config"`.

    Parameters
    ----------
    message : str
        The message to write with the commit.
    branch: str
        The branch to commit to.
    metadata : dict[str, Any] | None, optional
        Additional metadata to store with the commit snapshot.

    Returns
    -------
    str
        The snapshot ID of the new commit.

    """
    return self._repository.rewrite_manifests(
        message, branch=branch, metadata=metadata
    )

rewrite_manifests_async async #

rewrite_manifests_async(message, *, branch, metadata=None)

Rewrite manifests for all arrays (async version).

This method will start a new writable session on the specified branch, rewrite manifests for all arrays, and then commits with the specifeid message and metadata.

A JSON representation of the currently active splitting configuration will be stored in the commit's metadata under the key "splitting_config".

Parameters:

Name Type Description Default
message str

The message to write with the commit.

required
branch str

The branch to commit to.

required
metadata dict[str, Any] | None

Additional metadata to store with the commit snapshot.

None

Returns:

Type Description
str

The snapshot ID of the new commit.

Source code in icechunk-python/python/icechunk/repository.py
async def rewrite_manifests_async(
    self, message: str, *, branch: str, metadata: dict[str, Any] | None = None
) -> str:
    """
    Rewrite manifests for all arrays (async version).

    This method will start a new writable session on the specified branch,
    rewrite manifests for all arrays, and then commits with the specifeid ``message``
    and ``metadata``.

    A JSON representation of the currently active splitting configuration will be
    stored in the commit's metadata under the key `"splitting_config"`.

    Parameters
    ----------
    message : str
        The message to write with the commit.
    branch: str
        The branch to commit to.
    metadata : dict[str, Any] | None, optional
        Additional metadata to store with the commit snapshot.

    Returns
    -------
    str
        The snapshot ID of the new commit.

    """
    return await self._repository.rewrite_manifests_async(
        message, branch=branch, metadata=metadata
    )

save_config #

save_config()

Save the repository configuration to storage, this configuration will be used in future calls to Repository.open.

Returns:

Type Description
None
Source code in icechunk-python/python/icechunk/repository.py
def save_config(self) -> None:
    """
    Save the repository configuration to storage, this configuration will be used in future calls to Repository.open.

    Returns
    -------
    None
    """
    return self._repository.save_config()

save_config_async async #

save_config_async()

Save the repository configuration to storage (async version).

Returns:

Type Description
None
Source code in icechunk-python/python/icechunk/repository.py
async def save_config_async(self) -> None:
    """
    Save the repository configuration to storage (async version).

    Returns
    -------
    None
    """
    return await self._repository.save_config_async()

set_default_commit_metadata #

set_default_commit_metadata(metadata)

Set the default commit metadata for the repository. This is useful for providing addition static system conexted metadata to all commits.

When a commit is made, the metadata will be merged with the metadata provided, with any duplicate keys being overwritten by the metadata provided in the commit.

Warning

This metadata is only applied to sessions that are created after this call. Any open writable sessions will not be affected and will not use the new default metadata.

Parameters:

Name Type Description Default
metadata dict[str, Any]

The default commit metadata. Pass an empty dict to clear the default metadata.

required
Source code in icechunk-python/python/icechunk/repository.py
def set_default_commit_metadata(self, metadata: dict[str, Any]) -> None:
    """
    Set the default commit metadata for the repository. This is useful for providing
    addition static system conexted metadata to all commits.

    When a commit is made, the metadata will be merged with the metadata provided, with any
    duplicate keys being overwritten by the metadata provided in the commit.

    !!! warning
        This metadata is only applied to sessions that are created after this call. Any open
        writable sessions will not be affected and will not use the new default metadata.

    Parameters
    ----------
    metadata : dict[str, Any]
        The default commit metadata. Pass an empty dict to clear the default metadata.
    """
    return self._repository.set_default_commit_metadata(metadata)

set_metadata #

set_metadata(metadata)

Set the repository metadata, the passed dict will replace the complete metadata.

If you prefer to only update some metadata values, use Repository.update_metadata

Parameters:

Name Type Description Default
metadata dict[str, Any]

The value to use as repository metadata.

required
Source code in icechunk-python/python/icechunk/repository.py
def set_metadata(self, metadata: dict[str, Any]) -> None:
    """
    Set the repository metadata, the passed dict will replace the complete metadata.

    If you prefer to only update some metadata values, use Repository.update_metadata

    Parameters
    ----------
    metadata : dict[str, Any]
        The value to use as repository metadata.
    """
    self._repository.set_metadata(metadata)

set_metadata_async async #

set_metadata_async(metadata)

Set the repository metadata, the passed dict will replace the complete metadata.

If you prefer to only update some metadata values, use Repository.update_metadata

Parameters:

Name Type Description Default
metadata dict[str, Any]

The value to use as repository metadata.

required
Source code in icechunk-python/python/icechunk/repository.py
async def set_metadata_async(self, metadata: dict[str, Any]) -> None:
    """
    Set the repository metadata, the passed dict will replace the complete metadata.

    If you prefer to only update some metadata values, use Repository.update_metadata

    Parameters
    ----------
    metadata : dict[str, Any]
        The value to use as repository metadata.
    """
    await self._repository.set_metadata_async(metadata)

total_chunks_storage #

total_chunks_storage(*, max_snapshots_in_memory=50, max_compressed_manifest_mem_bytes=512 * 1024 * 1024, max_concurrent_manifest_fetches=500)

Calculate the total storage used for chunks, in bytes.

It reports the storage needed to store all snapshots in the repository that are reachable from any branches or tags. Unreachable snapshots can be generated by using reset_branch or expire_snapshots. The chunks for these snapshots are not included in the result, and they should probably be deleted using garbage_collection.

The result includes only native chunks, not adding virtual or inline chunks.

Parameters:

Name Type Description Default
max_snapshots_in_memory int

Don't prefetch more than this many Snapshots to memory.

50
max_compressed_manifest_mem_bytes int

Don't use more than this memory to store compressed in-flight manifests.

512 * 1024 * 1024
max_concurrent_manifest_fetches int

Don't run more than this many concurrent manifest fetches.

500
Source code in icechunk-python/python/icechunk/repository.py
def total_chunks_storage(
    self,
    *,
    max_snapshots_in_memory: int = 50,
    max_compressed_manifest_mem_bytes: int = 512 * 1024 * 1024,
    max_concurrent_manifest_fetches: int = 500,
) -> int:
    """Calculate the total storage used for chunks, in bytes.

    It reports the storage needed to store all snapshots in the repository that
    are reachable from any branches or tags. Unreachable snapshots can be generated
    by using `reset_branch` or `expire_snapshots`. The chunks for these snapshots
    are not included in the result, and they should probably be deleted using
    `garbage_collection`.

    The result includes only native chunks, not adding virtual or inline chunks.

    Parameters
    ----------
    max_snapshots_in_memory: int
        Don't prefetch more than this many Snapshots to memory.
    max_compressed_manifest_mem_bytes : int
        Don't use more than this memory to store compressed in-flight manifests.
    max_concurrent_manifest_fetches : int
        Don't run more than this many concurrent manifest fetches.
    """

    warnings.warn(
        "The ``total_chunks_storage`` method has been deprecated in favour of the ``chunk_storage_stats`` method. "
        "The new method is superior, as it actually calculates storage size occupied by inlined and virtual chunks in addition to native chunks. "
        "You can still access just the total native bytes: to keep your existing behaviour using API that will not be removed in a future version, "
        "please replace your existing ``.total_chunks_storage(**kwargs)`` method call with ``.chunk_storage_stats(**same_kwargs).native_bytes``.",
        DeprecationWarning,
        stacklevel=2,
    )

    stats = self._repository.chunk_storage_stats(
        max_snapshots_in_memory=max_snapshots_in_memory,
        max_compressed_manifest_mem_bytes=max_compressed_manifest_mem_bytes,
        max_concurrent_manifest_fetches=max_concurrent_manifest_fetches,
    )
    return stats.native_bytes

total_chunks_storage_async async #

total_chunks_storage_async(*, max_snapshots_in_memory=50, max_compressed_manifest_mem_bytes=512 * 1024 * 1024, max_concurrent_manifest_fetches=500)

Calculate the total storage used for chunks, in bytes (async version).

It reports the storage needed to store all snapshots in the repository that are reachable from any branches or tags. Unreachable snapshots can be generated by using reset_branch or expire_snapshots. The chunks for these snapshots are not included in the result, and they should probably be deleted using garbage_collection.

The result includes only native chunks, not adding virtual or inline chunks.

Parameters:

Name Type Description Default
max_snapshots_in_memory int

Don't prefetch more than this many Snapshots to memory.

50
max_compressed_manifest_mem_bytes int

Don't use more than this memory to store compressed in-flight manifests.

512 * 1024 * 1024
max_concurrent_manifest_fetches int

Don't run more than this many concurrent manifest fetches.

500
Source code in icechunk-python/python/icechunk/repository.py
async def total_chunks_storage_async(
    self,
    *,
    max_snapshots_in_memory: int = 50,
    max_compressed_manifest_mem_bytes: int = 512 * 1024 * 1024,
    max_concurrent_manifest_fetches: int = 500,
) -> int:
    """Calculate the total storage used for chunks, in bytes (async version).

    It reports the storage needed to store all snapshots in the repository that
    are reachable from any branches or tags. Unreachable snapshots can be generated
    by using `reset_branch` or `expire_snapshots`. The chunks for these snapshots
    are not included in the result, and they should probably be deleted using
    `garbage_collection`.

    The result includes only native chunks, not adding virtual or inline chunks.

    Parameters
    ----------
    max_snapshots_in_memory: int
        Don't prefetch more than this many Snapshots to memory.
    max_compressed_manifest_mem_bytes : int
        Don't use more than this memory to store compressed in-flight manifests.
    max_concurrent_manifest_fetches : int
        Don't run more than this many concurrent manifest fetches.
    """

    warnings.warn(
        "The ``total_chunks_storage_async`` method has been deprecated in favour of the ``chunk_storage_stats_async`` method. "
        "The new method is superior, as it actually calculates storage size occupied by inlined and virtual chunks in addition to native chunks. "
        "You can still access just the total native bytes: to keep your existing behaviour using API that will not be removed in a future version, "
        "please replace your existing ``.total_chunks_storage_async(**kwargs)`` method call with ``.chunk_storage_stats_async(**same_kwargs).native_bytes``.",
        DeprecationWarning,
        stacklevel=2,
    )

    stats = await self._repository.chunk_storage_stats_async(
        max_snapshots_in_memory=max_snapshots_in_memory,
        max_compressed_manifest_mem_bytes=max_compressed_manifest_mem_bytes,
        max_concurrent_manifest_fetches=max_concurrent_manifest_fetches,
    )
    return stats.native_bytes

transaction #

transaction(branch, *, message, metadata=None, rebase_with=None, rebase_tries=1000)

Create a transaction on a branch.

This is a context manager that creates a writable session on the specified branch. When the context is exited, the session will be committed to the branch using the specified message.

Parameters:

Name Type Description Default
branch str

The branch to create the transaction on.

required
message str

The commit message to use when committing the session.

required
metadata dict[str, Any] | None

Additional metadata to store with the commit snapshot.

None
rebase_with ConflictSolver | None

If other session committed while the current session was writing, use Session.rebase with this solver.

None
rebase_tries int

If other session committed while the current session was writing, use Session.rebase up to this many times in a loop.

1000

Yields:

Name Type Description
store IcechunkStore

A Zarr Store which can be used to interact with the data in the repository.

Source code in icechunk-python/python/icechunk/repository.py
@contextmanager
def transaction(
    self,
    branch: str,
    *,
    message: str,
    metadata: dict[str, Any] | None = None,
    rebase_with: ConflictSolver | None = None,
    rebase_tries: int = 1_000,
) -> Iterator[IcechunkStore]:
    """
    Create a transaction on a branch.

    This is a context manager that creates a writable session on the specified branch.
    When the context is exited, the session will be committed to the branch
    using the specified message.

    Parameters
    ----------
    branch : str
        The branch to create the transaction on.
    message : str
        The commit message to use when committing the session.
    metadata : dict[str, Any] | None, optional
        Additional metadata to store with the commit snapshot.
    rebase_with : ConflictSolver | None, optional
        If other session committed while the current session was writing, use Session.rebase with this solver.
    rebase_tries : int, optional
        If other session committed while the current session was writing, use Session.rebase up to this many times in a loop.

    Yields
    -------
    store : IcechunkStore
        A Zarr Store which can be used to interact with the data in the repository.
    """
    session = self.writable_session(branch)
    yield session.store
    session.commit(
        message=message,
        metadata=metadata,
        rebase_with=rebase_with,
        rebase_tries=rebase_tries,
    )

update_metadata #

update_metadata(metadata)

Update the repository metadata.

The passed dict will be merged with the current metadata, overriding existing keys.

Parameters:

Name Type Description Default
metadata dict[str, Any]

The dict to merge into the repository metadata.

required
Source code in icechunk-python/python/icechunk/repository.py
def update_metadata(self, metadata: dict[str, Any]) -> dict[str, Any]:
    """
    Update the repository metadata.

    The passed dict will be merged with the current metadata, overriding existing keys.

    Parameters
    ----------
    metadata : dict[str, Any]
        The dict to merge into the repository metadata.
    """
    return self._repository.update_metadata(metadata)

update_metadata_async async #

update_metadata_async(metadata)

Update the repository metadata.

The passed dict will be merged with the current metadata, overriding existing keys.

Parameters:

Name Type Description Default
metadata dict[str, Any]

The dict to merge into the repository metadata.

required
Source code in icechunk-python/python/icechunk/repository.py
async def update_metadata_async(self, metadata: dict[str, Any]) -> dict[str, Any]:
    """
    Update the repository metadata.

    The passed dict will be merged with the current metadata, overriding existing keys.

    Parameters
    ----------
    metadata : dict[str, Any]
        The dict to merge into the repository metadata.
    """
    return await self._repository.update_metadata_async(metadata)

writable_session #

writable_session(branch)

Create a writable session on a branch.

Like the read-only session, this can be thought of as a checkout of the repository at the tip of the branch. However, this session is writable and can be used to make changes to the repository. When ready, the changes can be committed to the branch, after which the session will become a read-only session on the new snapshot.

Parameters:

Name Type Description Default
branch str

The branch to create the session on.

required

Returns:

Type Description
Session

The writable session on the branch.

Source code in icechunk-python/python/icechunk/repository.py
def writable_session(self, branch: str) -> Session:
    """
    Create a writable session on a branch.

    Like the read-only session, this can be thought of as a checkout of the repository at the
    tip of the branch. However, this session is writable and can be used to make changes to the
    repository. When ready, the changes can be committed to the branch, after which the session will
    become a read-only session on the new snapshot.

    Parameters
    ----------
    branch : str
        The branch to create the session on.

    Returns
    -------
    Session
        The writable session on the branch.
    """
    return Session(self._repository.writable_session(branch))

writable_session_async async #

writable_session_async(branch)

Create a writable session on a branch (async version).

Like the read-only session, this can be thought of as a checkout of the repository at the tip of the branch. However, this session is writable and can be used to make changes to the repository. When ready, the changes can be committed to the branch, after which the session will become a read-only session on the new snapshot.

Parameters:

Name Type Description Default
branch str

The branch to create the session on.

required

Returns:

Type Description
Session

The writable session on the branch.

Source code in icechunk-python/python/icechunk/repository.py
async def writable_session_async(self, branch: str) -> Session:
    """
    Create a writable session on a branch (async version).

    Like the read-only session, this can be thought of as a checkout of the repository at the
    tip of the branch. However, this session is writable and can be used to make changes to the
    repository. When ready, the changes can be committed to the branch, after which the session will
    become a read-only session on the new snapshot.

    Parameters
    ----------
    branch : str
        The branch to create the session on.

    Returns
    -------
    Session
        The writable session on the branch.
    """
    return Session(await self._repository.writable_session_async(branch))

RepositoryConfig #

Configuration for an Icechunk repository

Methods:

Name Description
__init__

Create a new RepositoryConfig object

clear_virtual_chunk_containers

Clear all virtual chunk containers from the repository.

default

Create a default repository config instance

get_virtual_chunk_container

Get the virtual chunk container for the repository associated with the given name.

merge

Merge another RepositoryConfig with this one.

set_virtual_chunk_container

Set the virtual chunk container for the repository.

Attributes:

Name Type Description
caching CachingConfig | None

The caching configuration for the repository.

compression CompressionConfig | None

The compression configuration for the repository.

get_partial_values_concurrency int | None

The number of concurrent requests to make when getting partial values from storage.

inline_chunk_threshold_bytes int | None

The maximum size of a chunk that will be stored inline in the repository. Chunks larger than this size will be written to storage.

manifest ManifestConfig | None

The manifest configuration for the repository.

max_concurrent_requests int | None

The maximum number of concurrent HTTP requests Icechunk will do for this repo.

storage StorageSettings | None

The storage configuration for the repository.

virtual_chunk_containers dict[str, VirtualChunkContainer] | None

The virtual chunk containers for the repository.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class RepositoryConfig:
    """Configuration for an Icechunk repository"""

    def __init__(
        self,
        inline_chunk_threshold_bytes: int | None = None,
        get_partial_values_concurrency: int | None = None,
        compression: CompressionConfig | None = None,
        max_concurrent_requests: int | None = None,
        caching: CachingConfig | None = None,
        storage: StorageSettings | None = None,
        virtual_chunk_containers: dict[str, VirtualChunkContainer] | None = None,
        manifest: ManifestConfig | None = None,
    ) -> None:
        """
        Create a new `RepositoryConfig` object

        Parameters
        ----------
        inline_chunk_threshold_bytes: int | None
            The maximum size of a chunk that will be stored inline in the repository.
        get_partial_values_concurrency: int | None
            The number of concurrent requests to make when getting partial values from storage.
        compression: CompressionConfig | None
            The compression configuration for the repository.
        max_concurrent_requests: int | None
            The maximum number of concurrent HTTP requests Icechunk will do for this repo.
            Default is 256.
        caching: CachingConfig | None
            The caching configuration for the repository.
        storage: StorageSettings | None
            The storage configuration for the repository.
        virtual_chunk_containers: dict[str, VirtualChunkContainer] | None
            The virtual chunk containers for the repository.
        manifest: ManifestConfig | None
            The manifest configuration for the repository.
        """
        ...
    @staticmethod
    def default() -> RepositoryConfig:
        """Create a default repository config instance"""
        ...
    @property
    def inline_chunk_threshold_bytes(self) -> int | None:
        """
        The maximum size of a chunk that will be stored inline in the repository. Chunks larger than this size will be written to storage.
        """
        ...
    @inline_chunk_threshold_bytes.setter
    def inline_chunk_threshold_bytes(self, value: int | None) -> None:
        """
        Set the maximum size of a chunk that will be stored inline in the repository. Chunks larger than this size will be written to storage.
        """
        ...
    @property
    def get_partial_values_concurrency(self) -> int | None:
        """
        The number of concurrent requests to make when getting partial values from storage.

        Returns
        -------
        int | None
            The number of concurrent requests to make when getting partial values from storage.
        """
        ...
    @get_partial_values_concurrency.setter
    def get_partial_values_concurrency(self, value: int | None) -> None:
        """
        Set the number of concurrent requests to make when getting partial values from storage.

        Parameters
        ----------
        value: int | None
            The number of concurrent requests to make when getting partial values from storage.
        """
        ...
    @property
    def compression(self) -> CompressionConfig | None:
        """
        The compression configuration for the repository.

        Returns
        -------
        CompressionConfig | None
            The compression configuration for the repository.
        """
        ...
    @compression.setter
    def compression(self, value: CompressionConfig | None) -> None:
        """
        Set the compression configuration for the repository.

        Parameters
        ----------
        value: CompressionConfig | None
            The compression configuration for the repository.
        """
        ...
    @property
    def max_concurrent_requests(self) -> int | None:
        """
        The maximum number of concurrent HTTP requests Icechunk will do for this repo.

        Returns
        -------
        int | None
            The maximum number of concurrent HTTP requests Icechunk will do for this repo.
        """
        ...
    @max_concurrent_requests.setter
    def max_concurrent_requests(self, value: int | None) -> None:
        """
        Set the maximum number of concurrent HTTP requests Icechunk should do for this repo.

        Parameters
        ----------
        value: int | None
            The maximum allowed.
        """
        ...
    @property
    def caching(self) -> CachingConfig | None:
        """
        The caching configuration for the repository.

        Returns
        -------
        CachingConfig | None
            The caching configuration for the repository.
        """
        ...
    @caching.setter
    def caching(self, value: CachingConfig | None) -> None:
        """
        Set the caching configuration for the repository.

        Parameters
        ----------
        value: CachingConfig | None
            The caching configuration for the repository.
        """
        ...
    @property
    def storage(self) -> StorageSettings | None:
        """
        The storage configuration for the repository.

        Returns
        -------
        StorageSettings | None
            The storage configuration for the repository.
        """
        ...
    @storage.setter
    def storage(self, value: StorageSettings | None) -> None:
        """
        Set the storage configuration for the repository.

        Parameters
        ----------
        value: StorageSettings | None
            The storage configuration for the repository.
        """
        ...
    @property
    def manifest(self) -> ManifestConfig | None:
        """
        The manifest configuration for the repository.

        Returns
        -------
        ManifestConfig | None
            The manifest configuration for the repository.
        """
        ...
    @manifest.setter
    def manifest(self, value: ManifestConfig | None) -> None:
        """
        Set the manifest configuration for the repository.

        Parameters
        ----------
        value: ManifestConfig | None
            The manifest configuration for the repository.
        """
        ...
    @property
    def virtual_chunk_containers(self) -> dict[str, VirtualChunkContainer] | None:
        """
        The virtual chunk containers for the repository.

        Returns
        -------
        dict[str, VirtualChunkContainer] | None
            The virtual chunk containers for the repository.
        """
        ...
    def get_virtual_chunk_container(self, name: str) -> VirtualChunkContainer | None:
        """
        Get the virtual chunk container for the repository associated with the given name.

        Parameters
        ----------
        name: str
            The name of the virtual chunk container to get.

        Returns
        -------
        VirtualChunkContainer | None
            The virtual chunk container for the repository associated with the given name.
        """
        ...
    def set_virtual_chunk_container(self, cont: VirtualChunkContainer) -> None:
        """
        Set the virtual chunk container for the repository.

        Parameters
        ----------
        cont: VirtualChunkContainer
            The virtual chunk container to set.
        """
        ...
    def clear_virtual_chunk_containers(self) -> None:
        """
        Clear all virtual chunk containers from the repository.
        """
        ...
    def merge(self, other: RepositoryConfig) -> RepositoryConfig:
        """
        Merge another RepositoryConfig with this one.

        When merging, values from the other config take precedence. For nested configs
        (compression, caching, manifest, storage), the merge is applied recursively.
        For virtual_chunk_containers, entries from the other config extend this one.

        Parameters
        ----------
        other: RepositoryConfig
            The configuration to merge with this one.

        Returns
        -------
        RepositoryConfig
            A new merged configuration.
        """
        ...

caching property writable #

caching

The caching configuration for the repository.

Returns:

Type Description
CachingConfig | None

The caching configuration for the repository.

compression property writable #

compression

The compression configuration for the repository.

Returns:

Type Description
CompressionConfig | None

The compression configuration for the repository.

get_partial_values_concurrency property writable #

get_partial_values_concurrency

The number of concurrent requests to make when getting partial values from storage.

Returns:

Type Description
int | None

The number of concurrent requests to make when getting partial values from storage.

inline_chunk_threshold_bytes property writable #

inline_chunk_threshold_bytes

The maximum size of a chunk that will be stored inline in the repository. Chunks larger than this size will be written to storage.

manifest property writable #

manifest

The manifest configuration for the repository.

Returns:

Type Description
ManifestConfig | None

The manifest configuration for the repository.

max_concurrent_requests property writable #

max_concurrent_requests

The maximum number of concurrent HTTP requests Icechunk will do for this repo.

Returns:

Type Description
int | None

The maximum number of concurrent HTTP requests Icechunk will do for this repo.

storage property writable #

storage

The storage configuration for the repository.

Returns:

Type Description
StorageSettings | None

The storage configuration for the repository.

virtual_chunk_containers property #

virtual_chunk_containers

The virtual chunk containers for the repository.

Returns:

Type Description
dict[str, VirtualChunkContainer] | None

The virtual chunk containers for the repository.

__init__ #

__init__(inline_chunk_threshold_bytes=None, get_partial_values_concurrency=None, compression=None, max_concurrent_requests=None, caching=None, storage=None, virtual_chunk_containers=None, manifest=None)

Create a new RepositoryConfig object

Parameters:

Name Type Description Default
inline_chunk_threshold_bytes int | None

The maximum size of a chunk that will be stored inline in the repository.

None
get_partial_values_concurrency int | None

The number of concurrent requests to make when getting partial values from storage.

None
compression CompressionConfig | None

The compression configuration for the repository.

None
max_concurrent_requests int | None

The maximum number of concurrent HTTP requests Icechunk will do for this repo. Default is 256.

None
caching CachingConfig | None

The caching configuration for the repository.

None
storage StorageSettings | None

The storage configuration for the repository.

None
virtual_chunk_containers dict[str, VirtualChunkContainer] | None

The virtual chunk containers for the repository.

None
manifest ManifestConfig | None

The manifest configuration for the repository.

None
Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
def __init__(
    self,
    inline_chunk_threshold_bytes: int | None = None,
    get_partial_values_concurrency: int | None = None,
    compression: CompressionConfig | None = None,
    max_concurrent_requests: int | None = None,
    caching: CachingConfig | None = None,
    storage: StorageSettings | None = None,
    virtual_chunk_containers: dict[str, VirtualChunkContainer] | None = None,
    manifest: ManifestConfig | None = None,
) -> None:
    """
    Create a new `RepositoryConfig` object

    Parameters
    ----------
    inline_chunk_threshold_bytes: int | None
        The maximum size of a chunk that will be stored inline in the repository.
    get_partial_values_concurrency: int | None
        The number of concurrent requests to make when getting partial values from storage.
    compression: CompressionConfig | None
        The compression configuration for the repository.
    max_concurrent_requests: int | None
        The maximum number of concurrent HTTP requests Icechunk will do for this repo.
        Default is 256.
    caching: CachingConfig | None
        The caching configuration for the repository.
    storage: StorageSettings | None
        The storage configuration for the repository.
    virtual_chunk_containers: dict[str, VirtualChunkContainer] | None
        The virtual chunk containers for the repository.
    manifest: ManifestConfig | None
        The manifest configuration for the repository.
    """
    ...

clear_virtual_chunk_containers #

clear_virtual_chunk_containers()

Clear all virtual chunk containers from the repository.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
def clear_virtual_chunk_containers(self) -> None:
    """
    Clear all virtual chunk containers from the repository.
    """
    ...

default staticmethod #

default()

Create a default repository config instance

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
@staticmethod
def default() -> RepositoryConfig:
    """Create a default repository config instance"""
    ...

get_virtual_chunk_container #

get_virtual_chunk_container(name)

Get the virtual chunk container for the repository associated with the given name.

Parameters:

Name Type Description Default
name str

The name of the virtual chunk container to get.

required

Returns:

Type Description
VirtualChunkContainer | None

The virtual chunk container for the repository associated with the given name.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
def get_virtual_chunk_container(self, name: str) -> VirtualChunkContainer | None:
    """
    Get the virtual chunk container for the repository associated with the given name.

    Parameters
    ----------
    name: str
        The name of the virtual chunk container to get.

    Returns
    -------
    VirtualChunkContainer | None
        The virtual chunk container for the repository associated with the given name.
    """
    ...

merge #

merge(other)

Merge another RepositoryConfig with this one.

When merging, values from the other config take precedence. For nested configs (compression, caching, manifest, storage), the merge is applied recursively. For virtual_chunk_containers, entries from the other config extend this one.

Parameters:

Name Type Description Default
other RepositoryConfig

The configuration to merge with this one.

required

Returns:

Type Description
RepositoryConfig

A new merged configuration.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
def merge(self, other: RepositoryConfig) -> RepositoryConfig:
    """
    Merge another RepositoryConfig with this one.

    When merging, values from the other config take precedence. For nested configs
    (compression, caching, manifest, storage), the merge is applied recursively.
    For virtual_chunk_containers, entries from the other config extend this one.

    Parameters
    ----------
    other: RepositoryConfig
        The configuration to merge with this one.

    Returns
    -------
    RepositoryConfig
        A new merged configuration.
    """
    ...

set_virtual_chunk_container #

set_virtual_chunk_container(cont)

Set the virtual chunk container for the repository.

Parameters:

Name Type Description Default
cont VirtualChunkContainer

The virtual chunk container to set.

required
Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
def set_virtual_chunk_container(self, cont: VirtualChunkContainer) -> None:
    """
    Set the virtual chunk container for the repository.

    Parameters
    ----------
    cont: VirtualChunkContainer
        The virtual chunk container to set.
    """
    ...

S3Credentials #

Credentials for an S3 storage backend

Classes:

Name Description
Anonymous

Does not sign requests, useful for public buckets

FromEnv

Uses credentials from environment variables

Refreshable

Allows for an outside authority to pass in a function that can be used to provide credentials.

Static

Uses s3 credentials without expiration

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class S3Credentials:
    """Credentials for an S3 storage backend"""
    class FromEnv:
        """Uses credentials from environment variables"""
        def __init__(self) -> None: ...

    class Anonymous:
        """Does not sign requests, useful for public buckets"""
        def __init__(self) -> None: ...

    class Static:
        """Uses s3 credentials without expiration

        Parameters
        ----------
        credentials: S3StaticCredentials
            The credentials to use for authentication.
        """
        def __init__(self, credentials: S3StaticCredentials) -> None: ...

    class Refreshable:
        """Allows for an outside authority to pass in a function that can be used to provide credentials.

        This is useful for credentials that have an expiration time, or are otherwise not known ahead of time.

        Parameters
        ----------
        pickled_function: bytes
            The pickled function to use to provide credentials.
        current: S3StaticCredentials
            The initial credentials. They will be returned the first time credentials
            are requested and then deleted.
        """
        def __init__(
            self, pickled_function: bytes, current: S3StaticCredentials | None = None
        ) -> None: ...

Anonymous #

Does not sign requests, useful for public buckets

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class Anonymous:
    """Does not sign requests, useful for public buckets"""
    def __init__(self) -> None: ...

FromEnv #

Uses credentials from environment variables

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class FromEnv:
    """Uses credentials from environment variables"""
    def __init__(self) -> None: ...

Refreshable #

Allows for an outside authority to pass in a function that can be used to provide credentials.

This is useful for credentials that have an expiration time, or are otherwise not known ahead of time.

Parameters:

Name Type Description Default
pickled_function bytes

The pickled function to use to provide credentials.

required
current S3StaticCredentials | None

The initial credentials. They will be returned the first time credentials are requested and then deleted.

None
Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class Refreshable:
    """Allows for an outside authority to pass in a function that can be used to provide credentials.

    This is useful for credentials that have an expiration time, or are otherwise not known ahead of time.

    Parameters
    ----------
    pickled_function: bytes
        The pickled function to use to provide credentials.
    current: S3StaticCredentials
        The initial credentials. They will be returned the first time credentials
        are requested and then deleted.
    """
    def __init__(
        self, pickled_function: bytes, current: S3StaticCredentials | None = None
    ) -> None: ...

Static #

Uses s3 credentials without expiration

Parameters:

Name Type Description Default
credentials S3StaticCredentials

The credentials to use for authentication.

required
Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class Static:
    """Uses s3 credentials without expiration

    Parameters
    ----------
    credentials: S3StaticCredentials
        The credentials to use for authentication.
    """
    def __init__(self, credentials: S3StaticCredentials) -> None: ...

S3Options #

Options for accessing an S3-compatible storage backend

Methods:

Name Description
__init__

Create a new S3Options object

Attributes:

Name Type Description
allow_http bool

Whether HTTP requests are allowed for the storage backend.

anonymous bool

Whether to use anonymous credentials (unsigned requests).

endpoint_url str | None

Optional endpoint URL for the storage backend.

force_path_style bool

Whether to force path-style bucket addressing.

network_stream_timeout_seconds int | None

Timeout in seconds for idle network streams.

region str | None

Optional region to use for the storage backend.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class S3Options:
    """Options for accessing an S3-compatible storage backend"""
    def __init__(
        self,
        region: str | None = None,
        endpoint_url: str | None = None,
        allow_http: bool = False,
        anonymous: bool = False,
        force_path_style: bool = False,
        network_stream_timeout_seconds: int | None = None,
        requester_pays: bool = False,
    ) -> None:
        """
        Create a new `S3Options` object

        Parameters
        ----------
        region: str | None
            Optional, the region to use for the storage backend.
        endpoint_url: str | None
            Optional, the endpoint URL to use for the storage backend.
        allow_http: bool
            Whether to allow HTTP requests to the storage backend.
        anonymous: bool
            Whether to use anonymous credentials to the storage backend. When `True`, the s3 requests will not be signed.
        force_path_style: bool
            Whether to force use of path-style addressing for buckets.
        network_stream_timeout_seconds: int | None
            Timeout requests if no bytes can be transmitted during this period of time.
            If set to 0, timeout is disabled. Default is 60 seconds.
        requester_pays: bool
            Enable requester pays for S3 buckets
        """

    @property
    def region(self) -> str | None:
        """
        Optional region to use for the storage backend.

        Returns
        -------
        str | None
            The region configured for the storage backend.
        """
        ...

    @region.setter
    def region(self, value: str | None) -> None:
        """
        Set the region to use for the storage backend.

        Parameters
        ----------
        value: str | None
            The region to use for the storage backend.
        """
        ...

    @property
    def endpoint_url(self) -> str | None:
        """
        Optional endpoint URL for the storage backend.

        Returns
        -------
        str | None
            The endpoint URL configured for the storage backend.
        """
        ...

    @endpoint_url.setter
    def endpoint_url(self, value: str | None) -> None:
        """
        Set the endpoint URL for the storage backend.

        Parameters
        ----------
        value: str | None
            The endpoint URL to use for the storage backend.
        """
        ...

    @property
    def allow_http(self) -> bool:
        """
        Whether HTTP requests are allowed for the storage backend.

        Returns
        -------
        bool
            ``True`` when HTTP requests to the storage backend are permitted.
        """
        ...

    @allow_http.setter
    def allow_http(self, value: bool) -> None:
        """
        Set whether HTTP requests are allowed for the storage backend.

        Parameters
        ----------
        value: bool
            ``True`` to allow HTTP requests to the storage backend, ``False`` otherwise.
        """
        ...

    @property
    def anonymous(self) -> bool:
        """
        Whether to use anonymous credentials (unsigned requests).

        Returns
        -------
        bool
            ``True`` when anonymous access is configured.
        """
        ...

    @anonymous.setter
    def anonymous(self, value: bool) -> None:
        """
        Set whether to use anonymous credentials.

        Parameters
        ----------
        value: bool
            ``True`` to perform unsigned requests, ``False`` to sign requests.
        """
        ...

    @property
    def force_path_style(self) -> bool:
        """
        Whether to force path-style bucket addressing.

        Returns
        -------
        bool
            ``True`` when path-style addressing is forced.
        """
        ...

    @force_path_style.setter
    def force_path_style(self, value: bool) -> None:
        """
        Set whether to force path-style bucket addressing.

        Parameters
        ----------
        value: bool
            ``True`` to always use path-style addressing, ``False`` to allow virtual-host style.
        """
        ...

    @property
    def network_stream_timeout_seconds(self) -> int | None:
        """
        Timeout in seconds for idle network streams.

        Returns
        -------
        int | None
            The timeout duration; ``0`` disables the timeout and ``None`` uses the default.
        """
        ...

    @network_stream_timeout_seconds.setter
    def network_stream_timeout_seconds(self, value: int | None) -> None:
        """
        Set the timeout for idle network streams.

        Parameters
        ----------
        value: int | None
            Timeout duration in seconds. Use ``0`` to disable or ``None`` for the default.
        """
        ...

allow_http property writable #

allow_http

Whether HTTP requests are allowed for the storage backend.

Returns:

Type Description
bool

True when HTTP requests to the storage backend are permitted.

anonymous property writable #

anonymous

Whether to use anonymous credentials (unsigned requests).

Returns:

Type Description
bool

True when anonymous access is configured.

endpoint_url property writable #

endpoint_url

Optional endpoint URL for the storage backend.

Returns:

Type Description
str | None

The endpoint URL configured for the storage backend.

force_path_style property writable #

force_path_style

Whether to force path-style bucket addressing.

Returns:

Type Description
bool

True when path-style addressing is forced.

network_stream_timeout_seconds property writable #

network_stream_timeout_seconds

Timeout in seconds for idle network streams.

Returns:

Type Description
int | None

The timeout duration; 0 disables the timeout and None uses the default.

region property writable #

region

Optional region to use for the storage backend.

Returns:

Type Description
str | None

The region configured for the storage backend.

__init__ #

__init__(region=None, endpoint_url=None, allow_http=False, anonymous=False, force_path_style=False, network_stream_timeout_seconds=None, requester_pays=False)

Create a new S3Options object

Parameters:

Name Type Description Default
region str | None

Optional, the region to use for the storage backend.

None
endpoint_url str | None

Optional, the endpoint URL to use for the storage backend.

None
allow_http bool

Whether to allow HTTP requests to the storage backend.

False
anonymous bool

Whether to use anonymous credentials to the storage backend. When True, the s3 requests will not be signed.

False
force_path_style bool

Whether to force use of path-style addressing for buckets.

False
network_stream_timeout_seconds int | None

Timeout requests if no bytes can be transmitted during this period of time. If set to 0, timeout is disabled. Default is 60 seconds.

None
requester_pays bool

Enable requester pays for S3 buckets

False
Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
def __init__(
    self,
    region: str | None = None,
    endpoint_url: str | None = None,
    allow_http: bool = False,
    anonymous: bool = False,
    force_path_style: bool = False,
    network_stream_timeout_seconds: int | None = None,
    requester_pays: bool = False,
) -> None:
    """
    Create a new `S3Options` object

    Parameters
    ----------
    region: str | None
        Optional, the region to use for the storage backend.
    endpoint_url: str | None
        Optional, the endpoint URL to use for the storage backend.
    allow_http: bool
        Whether to allow HTTP requests to the storage backend.
    anonymous: bool
        Whether to use anonymous credentials to the storage backend. When `True`, the s3 requests will not be signed.
    force_path_style: bool
        Whether to force use of path-style addressing for buckets.
    network_stream_timeout_seconds: int | None
        Timeout requests if no bytes can be transmitted during this period of time.
        If set to 0, timeout is disabled. Default is 60 seconds.
    requester_pays: bool
        Enable requester pays for S3 buckets
    """

S3StaticCredentials #

Credentials for an S3 storage backend

Attributes: access_key_id: str The access key ID to use for authentication. secret_access_key: str The secret access key to use for authentication. session_token: str | None The session token to use for authentication. expires_after: datetime.datetime | None Optional, the expiration time of the credentials.

Methods:

Name Description
__init__

Create a new S3StaticCredentials object

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class S3StaticCredentials:
    """Credentials for an S3 storage backend

    Attributes:
        access_key_id: str
            The access key ID to use for authentication.
        secret_access_key: str
            The secret access key to use for authentication.
        session_token: str | None
            The session token to use for authentication.
        expires_after: datetime.datetime | None
            Optional, the expiration time of the credentials.
    """

    access_key_id: str
    secret_access_key: str
    session_token: str | None
    expires_after: datetime.datetime | None

    def __init__(
        self,
        access_key_id: str,
        secret_access_key: str,
        session_token: str | None = None,
        expires_after: datetime.datetime | None = None,
    ):
        """
        Create a new `S3StaticCredentials` object

        Parameters
        ----------
        access_key_id: str
            The access key ID to use for authentication.
        secret_access_key: str
            The secret access key to use for authentication.
        session_token: str | None
            Optional, the session token to use for authentication.
        expires_after: datetime.datetime | None
            Optional, the expiration time of the credentials.
        """
        ...

__init__ #

__init__(access_key_id, secret_access_key, session_token=None, expires_after=None)

Create a new S3StaticCredentials object

Parameters:

Name Type Description Default
access_key_id str

The access key ID to use for authentication.

required
secret_access_key str

The secret access key to use for authentication.

required
session_token str | None

Optional, the session token to use for authentication.

None
expires_after datetime | None

Optional, the expiration time of the credentials.

None
Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
def __init__(
    self,
    access_key_id: str,
    secret_access_key: str,
    session_token: str | None = None,
    expires_after: datetime.datetime | None = None,
):
    """
    Create a new `S3StaticCredentials` object

    Parameters
    ----------
    access_key_id: str
        The access key ID to use for authentication.
    secret_access_key: str
        The secret access key to use for authentication.
    session_token: str | None
        Optional, the session token to use for authentication.
    expires_after: datetime.datetime | None
        Optional, the expiration time of the credentials.
    """
    ...

Session #

A session object that allows for reading and writing data from an Icechunk repository.

Methods:

Name Description
all_virtual_chunk_locations

Return the location URLs of all virtual chunks.

all_virtual_chunk_locations_async

Return the location URLs of all virtual chunks (async version).

allow_pickling

Context manager to allow unpickling this store if writable.

amend

Commit the changes in the session to the repository, by amending/overwriting the previous commit.

amend_async

Commit the changes in the session to the repository, by amending/overwriting the previous commit.

chunk_coordinates

Return an async iterator to all initialized chunks for the array at array_path

chunk_type

Return the chunk type for the specified coordinates

chunk_type_async

Return the chunk type for the specified coordinates

commit

Commit the changes in the session to the repository.

commit_async

Commit the changes in the session to the repository (async version).

discard_changes

When the session is writable, discard any uncommitted changes.

flush

Save the changes in the session to a new snapshot without modifying the current branch.

flush_async

Save the changes in the session to a new snapshot without modifying the current branch.

fork

Create a child session that can be pickled to a worker job and later merged.

merge

Merge the changes for this session with the changes from another session.

merge_async

Merge the changes for this session with the changes from another session (async version).

move

Move or rename a node (array or group) in the hierarchy.

move_async

Async version of :meth:move.

rebase

Rebase the session to the latest ancestry of the branch.

rebase_async

Rebase the session to the latest ancestry of the branch (async version).

reindex_array

Reindex chunks in an array by applying a transformation function.

roll_array

Roll (circular shift) all chunks in an array by the given chunk offset.

shift_array

Shift all chunks in an array by the given chunk offset.

status

Compute an overview of the current session changes

Attributes:

Name Type Description
branch str | None

The branch that the session is based on. This is only set if the session is writable.

config RepositoryConfig

Get the repository configuration.

has_uncommitted_changes bool

Whether the session has uncommitted changes. This is only possibly true if the session is writable.

mode SessionMode

The mode of this session.

read_only bool

Whether the session is read-only.

snapshot_id str

The base snapshot ID of the session.

store IcechunkStore

Get a zarr Store object for reading and writing data from the repository using zarr python.

Source code in icechunk-python/python/icechunk/session.py
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
class Session:
    """A session object that allows for reading and writing data from an Icechunk repository."""

    _session: PySession
    _allow_changes: bool

    def __init__(self, session: PySession):
        self._session = session
        self._allow_changes = False

    def __eq__(self, value: object) -> bool:
        if not isinstance(value, Session):
            return False
        return self._session == value._session

    def __getstate__(self) -> object:
        if not self.read_only:
            raise ValueError(
                "You must opt-in to pickle writable sessions in a distributed context "
                "using Session.fork(). "
                "See https://icechunk.io/en/stable/parallel/#distributed-writes for more. "
                "If you are using xarray's `Dataset.to_zarr` method to write dask arrays, "
                "please use `icechunk.xarray.to_icechunk` instead. "
                "If you are using dask & distributed or multi-processing to read/write from the same repository, "
                "then pass a readonly session created using Repository.readonly_session for the read step. "
                "Alternatively, make sure to pass the ForkSession created by Session.fork() for the read step. "
            )
        state = {
            "_session": self._session.as_bytes(),
            "_allow_changes": self._allow_changes,
        }
        return state

    def __setstate__(self, state: object) -> None:
        if not isinstance(state, dict):
            raise ValueError("Invalid state")
        self._session = PySession.from_bytes(state["_session"])
        self._allow_changes = state["_allow_changes"]

    @contextlib.contextmanager
    def allow_pickling(self) -> Generator[None, None, None]:
        """
        Context manager to allow unpickling this store if writable.
        """
        raise RuntimeError(
            "The allow_pickling context manager has been removed. "
            "Use the new `Session.fork` API instead. "
            # FIXME: Add link to docs
            "Better yet, use `to_icechunk` if that will fit your needs."
        )

    @property
    def read_only(self) -> bool:
        """
        Whether the session is read-only.

        Returns
        -------
        bool
            True if the session is read-only, False otherwise.
        """
        return self._session.read_only

    @property
    def mode(self) -> SessionMode:
        """
        The mode of this session.

        Returns
        -------
        SessionMode
            The session mode - one of READONLY, WRITABLE, or REARRANGE.
        """
        return self._session.mode

    @property
    def snapshot_id(self) -> str:
        """
        The base snapshot ID of the session.

        Returns
        -------
        str
            The base snapshot ID of the session.
        """
        return self._session.snapshot_id

    @property
    def branch(self) -> str | None:
        """
        The branch that the session is based on. This is only set if the session is writable.

        Returns
        -------
        str or None
            The branch that the session is based on if the session is writable, None otherwise.
        """
        return self._session.branch

    @property
    def has_uncommitted_changes(self) -> bool:
        """
        Whether the session has uncommitted changes. This is only possibly true if the session is writable.

        Returns
        -------
        bool
            True if the session has uncommitted changes, False otherwise.
        """
        return self._session.has_uncommitted_changes

    def status(self) -> Diff:
        """
        Compute an overview of the current session changes

        Returns
        -------
        Diff
            The operations executed in the current session but still not committed.
        """
        return self._session.status()

    def discard_changes(self) -> None:
        """
        When the session is writable, discard any uncommitted changes.
        """
        self._session.discard_changes()

    @property
    def store(self) -> IcechunkStore:
        """
        Get a zarr Store object for reading and writing data from the repository using zarr python.

        Returns
        -------
        IcechunkStore
            A zarr Store object for reading and writing data from the repository.
        """
        return IcechunkStore(self._session.store, for_fork=False)

    @property
    def config(self) -> RepositoryConfig:
        """
        Get the repository configuration.

        Notice that changes to the returned object won't be impacted. To change configuration values
        use `Repository.reopen`.

        Returns
        -------
        RepositoryConfig
            The config for the repository that owns this session.
        """
        return self._session.config

    def move(self, from_path: str, to_path: str) -> None:
        """Move or rename a node (array or group) in the hierarchy.

        This is a metadata-only operation—no data is copied. Requires a rearrange session.

        Parameters
        ----------
        from_path : str
            The current path of the node (e.g., "/data/raw").
        to_path : str
            The new path for the node (e.g., "/data/v1").

        Examples
        --------
        >>> session = repo.rearrange_session("main")
        >>> session.move("/data/raw", "/data/v1")
        >>> session.commit("Renamed raw to v1")
        """
        return self._session.move_node(from_path, to_path)

    async def move_async(self, from_path: str, to_path: str) -> None:
        """Async version of :meth:`move`."""
        return await self._session.move_node_async(from_path, to_path)

    def all_virtual_chunk_locations(self) -> list[str]:
        """
        Return the location URLs of all virtual chunks.

        Returns
        -------
        list of str
            The location URLs of all virtual chunks.
        """
        return self._session.all_virtual_chunk_locations()

    def reindex_array(
        self,
        array_path: str,
        shift_chunk: Callable[[Iterable[int]], Iterable[int] | None],
    ) -> None:
        """Reindex chunks in an array by applying a transformation function.

        Parameters
        ----------
        array_path : str
            Path to the array.
        shift_chunk : Callable
            Function that receives chunk coordinates and returns new coordinates,
            or None to discard the chunk.
        """
        return self._session.reindex_array(array_path, shift_chunk)

    def shift_array(
        self,
        array_path: str,
        chunk_offset: Iterable[int],
    ) -> tuple[int, ...]:
        """Shift all chunks in an array by the given chunk offset.

        Chunks that shift out of bounds are discarded. Vacated positions retain
        stale chunk references — the caller typically writes new data there.

        Parameters
        ----------
        array_path : str
            The path to the array to shift.
        chunk_offset : Iterable[int]
            Offset added to each chunk coordinate. A chunk at index ``x`` moves
            to ``x + chunk_offset``. For a 3D array, ``chunk_offset=(1, 0, -2)``
            moves the chunk at ``(i, j, k)`` to ``(i+1, j, k-2)``.

        Returns
        -------
        tuple[int, ...]
            The shift in element space (``chunk_offset * chunk_size`` per dimension).
            For example, with ``chunk_size=10`` and ``chunk_offset=(2,)``, returns
            ``(20,)`` — useful for slicing the region that needs new data.

        Notes
        -----
        To shift right while preserving all data, first resize the array using zarr's
        array.resize(), then use shift_array.
        """
        return tuple(self._session.shift_array(array_path, list(chunk_offset)))

    def roll_array(
        self,
        array_path: str,
        chunk_offset: Iterable[int],
    ) -> tuple[int, ...]:
        """Roll (circular shift) all chunks in an array by the given chunk offset.

        Chunks that shift out of one end wrap around to the other side.
        No data is lost — this is a circular buffer operation.

        Parameters
        ----------
        array_path : str
            The path to the array to roll.
        chunk_offset : Iterable[int]
            Offset added to each chunk coordinate (with wraparound). A chunk at
            index ``x`` moves to ``(x + chunk_offset) % num_chunks``.

        Returns
        -------
        tuple[int, ...]
            The index shift in element space (chunk_offset * chunk_size for each dimension).
        """
        return tuple(self._session.roll_array(array_path, list(chunk_offset)))

    async def all_virtual_chunk_locations_async(self) -> list[str]:
        """
        Return the location URLs of all virtual chunks (async version).

        Returns
        -------
        list of str
            The location URLs of all virtual chunks.
        """
        return await self._session.all_virtual_chunk_locations_async()

    async def chunk_coordinates(
        self, array_path: str, batch_size: int = 1000
    ) -> AsyncIterator[tuple[int, ...]]:
        """
        Return an async iterator to all initialized chunks for the array at array_path

        Returns
        -------
        an async iterator to chunk coordinates as tuples
        """
        # We do unbatching here to improve speed. Switching to rust to get
        # a batch is much faster than switching for every element
        async for batch in self._session.chunk_coordinates(array_path, batch_size):
            for coord in batch:
                yield tuple(coord)

    def chunk_type(
        self,
        array_path: str,
        chunk_coordinates: Sequence[int],
    ) -> ChunkType:
        """
        Return the chunk type for the specified coordinates

        Parameters
        ----------
        array_path : str
            The path to the array inside the Zarr store. Example: "/groupA/groupB/outputs/my-array".
        chunk_coordinates: Sequence[int]
            A sequence of integers (list or tuple) used to locate the chunk. Example: [0, 1, 5].

        Returns
        -------
        ChunkType
            One of the supported chunk types.
        """
        return self._session.chunk_type(array_path, chunk_coordinates)

    async def chunk_type_async(
        self,
        array_path: str,
        chunk_coordinates: Sequence[int],
    ) -> ChunkType:
        """
        Return the chunk type for the specified coordinates

        Parameters
        ----------
        array_path : str
            The path to the array inside the Zarr store. Example: "/groupA/groupB/outputs/my-array".
        chunk_coordinates: Sequence[int]
            A sequence of integers (list or tuple) used to locate the chunk. Example: [0, 1, 5].

        Returns
        -------
        ChunkType
            One of the supported chunk types.
        """
        return await self._session.chunk_type_async(array_path, chunk_coordinates)

    def merge(self, *others: "ForkSession") -> None:
        """
        Merge the changes for this session with the changes from another session.

        Parameters
        ----------
        others : ForkSession
            The forked sessions to merge changes from.
        """
        for other in others:
            if not isinstance(other, ForkSession):
                raise TypeError(
                    "Sessions can only be merged with a ForkSession created with Session.fork(). "
                    f"Received {type(other).__name__} instead."
                )
            self._session.merge(other._session)
        self._allow_changes = False

    async def merge_async(self, *others: "ForkSession") -> None:
        """
        Merge the changes for this session with the changes from another session (async version).

        Parameters
        ----------
        others : ForkSession
            The forked sessions to merge changes from.
        """
        for other in others:
            if not isinstance(other, ForkSession):
                raise TypeError(
                    "Sessions can only be merged with a ForkSession created with Session.fork(). "
                    f"Received {type(other).__name__} instead."
                )
            await self._session.merge_async(other._session)
        self._allow_changes = False

    def commit(
        self,
        message: str,
        metadata: dict[str, Any] | None = None,
        rebase_with: ConflictSolver | None = None,
        rebase_tries: int = 1_000,
        allow_empty: bool = False,
    ) -> str:
        """
        Commit the changes in the session to the repository.

        When successful, the writable session is completed and the session is now read-only and based on the new commit. The snapshot ID of the new commit is returned.

        If the session is out of date, this will raise a ConflictError exception depicting the conflict that occurred. The session will need to be rebased before committing.

        Parameters
        ----------
        message : str
            The message to write with the commit.
        metadata : dict[str, Any] | None, optional
            Additional metadata to store with the commit snapshot.
        rebase_with : ConflictSolver | None, optional
            If other session committed while the current session was writing, use Session.rebase with this solver.
        rebase_tries : int, optional
            If other session committed while the current session was writing, use Session.rebase up to this many times in a loop.
        allow_empty : bool, optional
            If True, allow creating a commit even if there are no changes. Default is False.

        Returns
        -------
        str
            The snapshot ID of the new commit.

        Raises
        ------
        icechunk.ConflictError
            If the session is out of date and a conflict occurs.
        icechunk.NoChangesToCommitError
            If there are no changes to commit and allow_empty is False.
        """
        if self._allow_changes:
            warnings.warn(
                "Committing a session after forking, and without merging will not work. "
                "Merge back in the remote changes first using Session.merge().",
                UserWarning,
                stacklevel=2,
            )
        return self._session.commit(
            message,
            metadata,
            rebase_with=rebase_with,
            rebase_tries=rebase_tries,
            allow_empty=allow_empty,
        )

    async def commit_async(
        self,
        message: str,
        metadata: dict[str, Any] | None = None,
        rebase_with: ConflictSolver | None = None,
        rebase_tries: int = 1_000,
        allow_empty: bool = False,
    ) -> str:
        """
        Commit the changes in the session to the repository (async version).

        When successful, the writable session is completed and the session is now read-only and based on the new commit. The snapshot ID of the new commit is returned.

        If the session is out of date, this will raise a ConflictError exception depicting the conflict that occurred. The session will need to be rebased before committing.

        Parameters
        ----------
        message : str
            The message to write with the commit.
        metadata : dict[str, Any] | None, optional
            Additional metadata to store with the commit snapshot.
        rebase_with : ConflictSolver | None, optional
            If other session committed while the current session was writing, use Session.rebase with this solver.
        rebase_tries : int, optional
            If other session committed while the current session was writing, use Session.rebase up to this many times in a loop.
        allow_empty : bool, optional
            If True, allow creating a commit even if there are no changes. Default is False.

        Returns
        -------
        str
            The snapshot ID of the new commit.

        Raises
        ------
        icechunk.ConflictError
            If the session is out of date and a conflict occurs.
        icechunk.NoChangesToCommitError
            If there are no changes to commit and allow_empty is False.
        """
        if self._allow_changes:
            warnings.warn(
                "Committing a session after forking, and without merging will not work. "
                "Merge back in the remote changes first using Session.merge().",
                UserWarning,
                stacklevel=2,
            )
        return await self._session.commit_async(
            message,
            metadata,
            rebase_with=rebase_with,
            rebase_tries=rebase_tries,
            allow_empty=allow_empty,
        )

    def amend(
        self,
        message: str,
        metadata: dict[str, Any] | None = None,
        allow_empty: bool = False,
    ) -> str:
        """
        Commit the changes in the session to the repository, by amending/overwriting the previous commit.

        When successful, the writable session is completed and the session is now read-only and based on the new commit. The snapshot ID of the new commit is returned.

        If the session is out of date, this will raise a ConflictError exception depicting the conflict that occurred. The session will need to be rebased before committing.

        This operation doesn't create a new commit in the repo ancestry. It replaces the previous commit.

        The first commit to the repo cannot be amended.

        Parameters
        ----------
        message : str
            The message to write with the commit.
        metadata : dict[str, Any] | None, optional
            Additional metadata to store with the commit snapshot.
        allow_empty : bool, optional
            If True, allow amending even if no data changes have been made to the session.
            This is useful when you only want to update the commit message. Default is False.

        Returns
        -------
        str
            The snapshot ID of the new commit.

        Raises
        ------
        icechunk.ConflictError
            If the session is out of date and a conflict occurs.
        """
        if self._allow_changes:
            warnings.warn(
                "Committing a session after forking, and without merging will not work. "
                "Merge back in the remote changes first using Session.merge().",
                UserWarning,
                stacklevel=2,
            )
        return self._session.amend(message, metadata, allow_empty=allow_empty)

    async def amend_async(
        self,
        message: str,
        metadata: dict[str, Any] | None = None,
        allow_empty: bool = False,
    ) -> str:
        """
        Commit the changes in the session to the repository, by amending/overwriting the previous commit.

        When successful, the writable session is completed and the session is now read-only and based on the new commit. The snapshot ID of the new commit is returned.

        If the session is out of date, this will raise a ConflictError exception depicting the conflict that occurred. The session will need to be rebased before committing.

        This operation doesn't create a new commit in the repo ancestry. It replaces the previous commit.

        The first commit to the repo cannot be amended.

        Parameters
        ----------
        message : str
            The message to write with the commit.
        metadata : dict[str, Any] | None, optional
            Additional metadata to store with the commit snapshot.
        allow_empty : bool, optional
            If True, allow amending even if no data changes have been made to the session.
            This is useful when you only want to update the commit message. Default is False.

        Returns
        -------
        str
            The snapshot ID of the new commit.

        Raises
        ------
        icechunk.ConflictError
            If the session is out of date and a conflict occurs.
        """
        if self._allow_changes:
            warnings.warn(
                "Committing a session after forking, and without merging will not work. "
                "Merge back in the remote changes first using Session.merge().",
                UserWarning,
                stacklevel=2,
            )
        return await self._session.amend_async(message, metadata, allow_empty=allow_empty)

    def flush(
        self,
        message: str,
        metadata: dict[str, Any] | None = None,
    ) -> str:
        """
        Save the changes in the session to a new snapshot without modifying the current branch.

        When successful, the writable session is completed and the session is now read-only and based on the new snapshot. The ID of the new snapshot is returned.

        Parameters
        ----------
        message : str
            The message to write with the commit.
        metadata : dict[str, Any] | None, optional
            Additional metadata to store with the commit snapshot.

        Returns
        -------
        str
            The ID of the new snapshot.
        """
        if self._allow_changes:
            warnings.warn(
                "Committing a session after forking, and without merging will not work. "
                "Merge back in the remote changes first using Session.merge().",
                UserWarning,
                stacklevel=2,
            )
        return self._session.flush(message, metadata)

    async def flush_async(
        self,
        message: str,
        metadata: dict[str, Any] | None = None,
    ) -> str:
        """
        Save the changes in the session to a new snapshot without modifying the current branch.

        When successful, the writable session is completed and the session is now read-only and based on the new snapshot. The ID of the new snapshot is returned.

        Parameters
        ----------
        message : str
            The message to write with the commit.
        metadata : dict[str, Any] | None, optional
            Additional metadata to store with the commit snapshot.

        Returns
        -------
        str
            The ID of the new snapshot.
        """
        if self._allow_changes:
            warnings.warn(
                "Flushing a session after forking, and without merging will not work. "
                "Merge back in the remote changes first using Session.merge().",
                UserWarning,
                stacklevel=2,
            )
        return await self._session.flush_async(message, metadata)

    def rebase(self, solver: ConflictSolver) -> None:
        """
        Rebase the session to the latest ancestry of the branch.

        This method will iteratively crawl the ancestry of the branch and apply the changes from the branch to the session. If a conflict is detected, the conflict solver will be used to optionally resolve the conflict. When complete, the session will be based on the latest commit of the branch and the session will be ready to attempt another commit.

        When a conflict is detected and a resolution is not possible with the provided solver, a RebaseFailed exception will be raised. This exception will contain the snapshot ID that the rebase failed on and a list of conflicts that occurred.

        Parameters
        ----------
        solver : ConflictSolver
            The conflict solver to use when a conflict is detected.

        Raises
        ------
        RebaseFailedError
            When a conflict is detected and the solver fails to resolve it.
        """
        self._session.rebase(solver)

    async def rebase_async(self, solver: ConflictSolver) -> None:
        """
        Rebase the session to the latest ancestry of the branch (async version).

        This method will iteratively crawl the ancestry of the branch and apply the changes from the branch to the session. If a conflict is detected, the conflict solver will be used to optionally resolve the conflict. When complete, the session will be based on the latest commit of the branch and the session will be ready to attempt another commit.

        When a conflict is detected and a resolution is not possible with the provided solver, a RebaseFailed exception will be raised. This exception will contain the snapshot ID that the rebase failed on and a list of conflicts that occurred.

        Parameters
        ----------
        solver : ConflictSolver
            The conflict solver to use when a conflict is detected.

        Raises
        ------
        RebaseFailedError
            When a conflict is detected and the solver fails to resolve it.
        """
        await self._session.rebase_async(solver)

    def fork(self) -> "ForkSession":
        """
        Create a child session that can be pickled to a worker job and later merged.

        This method supports Icechunk's distributed, collaborative jobs. A coordinator task creates a new session using
        `Repository.writable_session`. Then `Session.fork` is called repeatedly to create as many serializable sessions
        as worker jobs. Each new `ForkSession` is pickled to the worker that uses it to do all its writes.
        Finally, the `ForkSessions` are pickled back to the coordinator that uses `ForkSession.merge` to merge them
        back into the original session and `commit`.

        Learn more about collaborative writes at https://icechunk.io/en/latest/parallel/

        Raises
        ------
        ValueError
            When `self` already has uncommitted changes.
        ValueError
            When `self` is read-only.
        """
        if self.has_uncommitted_changes:
            raise ValueError(
                "Cannot fork a Session with uncommitted changes. "
                "Make a commit, create a new Session, and then fork that to execute distributed writes."
            )
        if self.read_only:
            raise ValueError(
                "You should not need to fork a read-only session. Read-only sessions can be pickled and transmitted directly."
            )
        self._allow_changes = True
        # force a deep-copy of the underlying Session,
        # so that multiple forks can be created and
        # used independently in a local session.
        # See test_dask.py::test_fork_session_deep_copies for an example
        return ForkSession(PySession.from_bytes(self._session.as_bytes()))

branch property #

branch

The branch that the session is based on. This is only set if the session is writable.

Returns:

Type Description
str or None

The branch that the session is based on if the session is writable, None otherwise.

config property #

config

Get the repository configuration.

Notice that changes to the returned object won't be impacted. To change configuration values use Repository.reopen.

Returns:

Type Description
RepositoryConfig

The config for the repository that owns this session.

has_uncommitted_changes property #

has_uncommitted_changes

Whether the session has uncommitted changes. This is only possibly true if the session is writable.

Returns:

Type Description
bool

True if the session has uncommitted changes, False otherwise.

mode property #

mode

The mode of this session.

Returns:

Type Description
SessionMode

The session mode - one of READONLY, WRITABLE, or REARRANGE.

read_only property #

read_only

Whether the session is read-only.

Returns:

Type Description
bool

True if the session is read-only, False otherwise.

snapshot_id property #

snapshot_id

The base snapshot ID of the session.

Returns:

Type Description
str

The base snapshot ID of the session.

store property #

store

Get a zarr Store object for reading and writing data from the repository using zarr python.

Returns:

Type Description
IcechunkStore

A zarr Store object for reading and writing data from the repository.

all_virtual_chunk_locations #

all_virtual_chunk_locations()

Return the location URLs of all virtual chunks.

Returns:

Type Description
list of str

The location URLs of all virtual chunks.

Source code in icechunk-python/python/icechunk/session.py
def all_virtual_chunk_locations(self) -> list[str]:
    """
    Return the location URLs of all virtual chunks.

    Returns
    -------
    list of str
        The location URLs of all virtual chunks.
    """
    return self._session.all_virtual_chunk_locations()

all_virtual_chunk_locations_async async #

all_virtual_chunk_locations_async()

Return the location URLs of all virtual chunks (async version).

Returns:

Type Description
list of str

The location URLs of all virtual chunks.

Source code in icechunk-python/python/icechunk/session.py
async def all_virtual_chunk_locations_async(self) -> list[str]:
    """
    Return the location URLs of all virtual chunks (async version).

    Returns
    -------
    list of str
        The location URLs of all virtual chunks.
    """
    return await self._session.all_virtual_chunk_locations_async()

allow_pickling #

allow_pickling()

Context manager to allow unpickling this store if writable.

Source code in icechunk-python/python/icechunk/session.py
@contextlib.contextmanager
def allow_pickling(self) -> Generator[None, None, None]:
    """
    Context manager to allow unpickling this store if writable.
    """
    raise RuntimeError(
        "The allow_pickling context manager has been removed. "
        "Use the new `Session.fork` API instead. "
        # FIXME: Add link to docs
        "Better yet, use `to_icechunk` if that will fit your needs."
    )

amend #

amend(message, metadata=None, allow_empty=False)

Commit the changes in the session to the repository, by amending/overwriting the previous commit.

When successful, the writable session is completed and the session is now read-only and based on the new commit. The snapshot ID of the new commit is returned.

If the session is out of date, this will raise a ConflictError exception depicting the conflict that occurred. The session will need to be rebased before committing.

This operation doesn't create a new commit in the repo ancestry. It replaces the previous commit.

The first commit to the repo cannot be amended.

Parameters:

Name Type Description Default
message str

The message to write with the commit.

required
metadata dict[str, Any] | None

Additional metadata to store with the commit snapshot.

None
allow_empty bool

If True, allow amending even if no data changes have been made to the session. This is useful when you only want to update the commit message. Default is False.

False

Returns:

Type Description
str

The snapshot ID of the new commit.

Raises:

Type Description
ConflictError

If the session is out of date and a conflict occurs.

Source code in icechunk-python/python/icechunk/session.py
def amend(
    self,
    message: str,
    metadata: dict[str, Any] | None = None,
    allow_empty: bool = False,
) -> str:
    """
    Commit the changes in the session to the repository, by amending/overwriting the previous commit.

    When successful, the writable session is completed and the session is now read-only and based on the new commit. The snapshot ID of the new commit is returned.

    If the session is out of date, this will raise a ConflictError exception depicting the conflict that occurred. The session will need to be rebased before committing.

    This operation doesn't create a new commit in the repo ancestry. It replaces the previous commit.

    The first commit to the repo cannot be amended.

    Parameters
    ----------
    message : str
        The message to write with the commit.
    metadata : dict[str, Any] | None, optional
        Additional metadata to store with the commit snapshot.
    allow_empty : bool, optional
        If True, allow amending even if no data changes have been made to the session.
        This is useful when you only want to update the commit message. Default is False.

    Returns
    -------
    str
        The snapshot ID of the new commit.

    Raises
    ------
    icechunk.ConflictError
        If the session is out of date and a conflict occurs.
    """
    if self._allow_changes:
        warnings.warn(
            "Committing a session after forking, and without merging will not work. "
            "Merge back in the remote changes first using Session.merge().",
            UserWarning,
            stacklevel=2,
        )
    return self._session.amend(message, metadata, allow_empty=allow_empty)

amend_async async #

amend_async(message, metadata=None, allow_empty=False)

Commit the changes in the session to the repository, by amending/overwriting the previous commit.

When successful, the writable session is completed and the session is now read-only and based on the new commit. The snapshot ID of the new commit is returned.

If the session is out of date, this will raise a ConflictError exception depicting the conflict that occurred. The session will need to be rebased before committing.

This operation doesn't create a new commit in the repo ancestry. It replaces the previous commit.

The first commit to the repo cannot be amended.

Parameters:

Name Type Description Default
message str

The message to write with the commit.

required
metadata dict[str, Any] | None

Additional metadata to store with the commit snapshot.

None
allow_empty bool

If True, allow amending even if no data changes have been made to the session. This is useful when you only want to update the commit message. Default is False.

False

Returns:

Type Description
str

The snapshot ID of the new commit.

Raises:

Type Description
ConflictError

If the session is out of date and a conflict occurs.

Source code in icechunk-python/python/icechunk/session.py
async def amend_async(
    self,
    message: str,
    metadata: dict[str, Any] | None = None,
    allow_empty: bool = False,
) -> str:
    """
    Commit the changes in the session to the repository, by amending/overwriting the previous commit.

    When successful, the writable session is completed and the session is now read-only and based on the new commit. The snapshot ID of the new commit is returned.

    If the session is out of date, this will raise a ConflictError exception depicting the conflict that occurred. The session will need to be rebased before committing.

    This operation doesn't create a new commit in the repo ancestry. It replaces the previous commit.

    The first commit to the repo cannot be amended.

    Parameters
    ----------
    message : str
        The message to write with the commit.
    metadata : dict[str, Any] | None, optional
        Additional metadata to store with the commit snapshot.
    allow_empty : bool, optional
        If True, allow amending even if no data changes have been made to the session.
        This is useful when you only want to update the commit message. Default is False.

    Returns
    -------
    str
        The snapshot ID of the new commit.

    Raises
    ------
    icechunk.ConflictError
        If the session is out of date and a conflict occurs.
    """
    if self._allow_changes:
        warnings.warn(
            "Committing a session after forking, and without merging will not work. "
            "Merge back in the remote changes first using Session.merge().",
            UserWarning,
            stacklevel=2,
        )
    return await self._session.amend_async(message, metadata, allow_empty=allow_empty)

chunk_coordinates async #

chunk_coordinates(array_path, batch_size=1000)

Return an async iterator to all initialized chunks for the array at array_path

Returns:

Type Description
an async iterator to chunk coordinates as tuples
Source code in icechunk-python/python/icechunk/session.py
async def chunk_coordinates(
    self, array_path: str, batch_size: int = 1000
) -> AsyncIterator[tuple[int, ...]]:
    """
    Return an async iterator to all initialized chunks for the array at array_path

    Returns
    -------
    an async iterator to chunk coordinates as tuples
    """
    # We do unbatching here to improve speed. Switching to rust to get
    # a batch is much faster than switching for every element
    async for batch in self._session.chunk_coordinates(array_path, batch_size):
        for coord in batch:
            yield tuple(coord)

chunk_type #

chunk_type(array_path, chunk_coordinates)

Return the chunk type for the specified coordinates

Parameters:

Name Type Description Default
array_path str

The path to the array inside the Zarr store. Example: "/groupA/groupB/outputs/my-array".

required
chunk_coordinates Sequence[int]

A sequence of integers (list or tuple) used to locate the chunk. Example: [0, 1, 5].

required

Returns:

Type Description
ChunkType

One of the supported chunk types.

Source code in icechunk-python/python/icechunk/session.py
def chunk_type(
    self,
    array_path: str,
    chunk_coordinates: Sequence[int],
) -> ChunkType:
    """
    Return the chunk type for the specified coordinates

    Parameters
    ----------
    array_path : str
        The path to the array inside the Zarr store. Example: "/groupA/groupB/outputs/my-array".
    chunk_coordinates: Sequence[int]
        A sequence of integers (list or tuple) used to locate the chunk. Example: [0, 1, 5].

    Returns
    -------
    ChunkType
        One of the supported chunk types.
    """
    return self._session.chunk_type(array_path, chunk_coordinates)

chunk_type_async async #

chunk_type_async(array_path, chunk_coordinates)

Return the chunk type for the specified coordinates

Parameters:

Name Type Description Default
array_path str

The path to the array inside the Zarr store. Example: "/groupA/groupB/outputs/my-array".

required
chunk_coordinates Sequence[int]

A sequence of integers (list or tuple) used to locate the chunk. Example: [0, 1, 5].

required

Returns:

Type Description
ChunkType

One of the supported chunk types.

Source code in icechunk-python/python/icechunk/session.py
async def chunk_type_async(
    self,
    array_path: str,
    chunk_coordinates: Sequence[int],
) -> ChunkType:
    """
    Return the chunk type for the specified coordinates

    Parameters
    ----------
    array_path : str
        The path to the array inside the Zarr store. Example: "/groupA/groupB/outputs/my-array".
    chunk_coordinates: Sequence[int]
        A sequence of integers (list or tuple) used to locate the chunk. Example: [0, 1, 5].

    Returns
    -------
    ChunkType
        One of the supported chunk types.
    """
    return await self._session.chunk_type_async(array_path, chunk_coordinates)

commit #

commit(message, metadata=None, rebase_with=None, rebase_tries=1000, allow_empty=False)

Commit the changes in the session to the repository.

When successful, the writable session is completed and the session is now read-only and based on the new commit. The snapshot ID of the new commit is returned.

If the session is out of date, this will raise a ConflictError exception depicting the conflict that occurred. The session will need to be rebased before committing.

Parameters:

Name Type Description Default
message str

The message to write with the commit.

required
metadata dict[str, Any] | None

Additional metadata to store with the commit snapshot.

None
rebase_with ConflictSolver | None

If other session committed while the current session was writing, use Session.rebase with this solver.

None
rebase_tries int

If other session committed while the current session was writing, use Session.rebase up to this many times in a loop.

1000
allow_empty bool

If True, allow creating a commit even if there are no changes. Default is False.

False

Returns:

Type Description
str

The snapshot ID of the new commit.

Raises:

Type Description
ConflictError

If the session is out of date and a conflict occurs.

NoChangesToCommitError

If there are no changes to commit and allow_empty is False.

Source code in icechunk-python/python/icechunk/session.py
def commit(
    self,
    message: str,
    metadata: dict[str, Any] | None = None,
    rebase_with: ConflictSolver | None = None,
    rebase_tries: int = 1_000,
    allow_empty: bool = False,
) -> str:
    """
    Commit the changes in the session to the repository.

    When successful, the writable session is completed and the session is now read-only and based on the new commit. The snapshot ID of the new commit is returned.

    If the session is out of date, this will raise a ConflictError exception depicting the conflict that occurred. The session will need to be rebased before committing.

    Parameters
    ----------
    message : str
        The message to write with the commit.
    metadata : dict[str, Any] | None, optional
        Additional metadata to store with the commit snapshot.
    rebase_with : ConflictSolver | None, optional
        If other session committed while the current session was writing, use Session.rebase with this solver.
    rebase_tries : int, optional
        If other session committed while the current session was writing, use Session.rebase up to this many times in a loop.
    allow_empty : bool, optional
        If True, allow creating a commit even if there are no changes. Default is False.

    Returns
    -------
    str
        The snapshot ID of the new commit.

    Raises
    ------
    icechunk.ConflictError
        If the session is out of date and a conflict occurs.
    icechunk.NoChangesToCommitError
        If there are no changes to commit and allow_empty is False.
    """
    if self._allow_changes:
        warnings.warn(
            "Committing a session after forking, and without merging will not work. "
            "Merge back in the remote changes first using Session.merge().",
            UserWarning,
            stacklevel=2,
        )
    return self._session.commit(
        message,
        metadata,
        rebase_with=rebase_with,
        rebase_tries=rebase_tries,
        allow_empty=allow_empty,
    )

commit_async async #

commit_async(message, metadata=None, rebase_with=None, rebase_tries=1000, allow_empty=False)

Commit the changes in the session to the repository (async version).

When successful, the writable session is completed and the session is now read-only and based on the new commit. The snapshot ID of the new commit is returned.

If the session is out of date, this will raise a ConflictError exception depicting the conflict that occurred. The session will need to be rebased before committing.

Parameters:

Name Type Description Default
message str

The message to write with the commit.

required
metadata dict[str, Any] | None

Additional metadata to store with the commit snapshot.

None
rebase_with ConflictSolver | None

If other session committed while the current session was writing, use Session.rebase with this solver.

None
rebase_tries int

If other session committed while the current session was writing, use Session.rebase up to this many times in a loop.

1000
allow_empty bool

If True, allow creating a commit even if there are no changes. Default is False.

False

Returns:

Type Description
str

The snapshot ID of the new commit.

Raises:

Type Description
ConflictError

If the session is out of date and a conflict occurs.

NoChangesToCommitError

If there are no changes to commit and allow_empty is False.

Source code in icechunk-python/python/icechunk/session.py
async def commit_async(
    self,
    message: str,
    metadata: dict[str, Any] | None = None,
    rebase_with: ConflictSolver | None = None,
    rebase_tries: int = 1_000,
    allow_empty: bool = False,
) -> str:
    """
    Commit the changes in the session to the repository (async version).

    When successful, the writable session is completed and the session is now read-only and based on the new commit. The snapshot ID of the new commit is returned.

    If the session is out of date, this will raise a ConflictError exception depicting the conflict that occurred. The session will need to be rebased before committing.

    Parameters
    ----------
    message : str
        The message to write with the commit.
    metadata : dict[str, Any] | None, optional
        Additional metadata to store with the commit snapshot.
    rebase_with : ConflictSolver | None, optional
        If other session committed while the current session was writing, use Session.rebase with this solver.
    rebase_tries : int, optional
        If other session committed while the current session was writing, use Session.rebase up to this many times in a loop.
    allow_empty : bool, optional
        If True, allow creating a commit even if there are no changes. Default is False.

    Returns
    -------
    str
        The snapshot ID of the new commit.

    Raises
    ------
    icechunk.ConflictError
        If the session is out of date and a conflict occurs.
    icechunk.NoChangesToCommitError
        If there are no changes to commit and allow_empty is False.
    """
    if self._allow_changes:
        warnings.warn(
            "Committing a session after forking, and without merging will not work. "
            "Merge back in the remote changes first using Session.merge().",
            UserWarning,
            stacklevel=2,
        )
    return await self._session.commit_async(
        message,
        metadata,
        rebase_with=rebase_with,
        rebase_tries=rebase_tries,
        allow_empty=allow_empty,
    )

discard_changes #

discard_changes()

When the session is writable, discard any uncommitted changes.

Source code in icechunk-python/python/icechunk/session.py
def discard_changes(self) -> None:
    """
    When the session is writable, discard any uncommitted changes.
    """
    self._session.discard_changes()

flush #

flush(message, metadata=None)

Save the changes in the session to a new snapshot without modifying the current branch.

When successful, the writable session is completed and the session is now read-only and based on the new snapshot. The ID of the new snapshot is returned.

Parameters:

Name Type Description Default
message str

The message to write with the commit.

required
metadata dict[str, Any] | None

Additional metadata to store with the commit snapshot.

None

Returns:

Type Description
str

The ID of the new snapshot.

Source code in icechunk-python/python/icechunk/session.py
def flush(
    self,
    message: str,
    metadata: dict[str, Any] | None = None,
) -> str:
    """
    Save the changes in the session to a new snapshot without modifying the current branch.

    When successful, the writable session is completed and the session is now read-only and based on the new snapshot. The ID of the new snapshot is returned.

    Parameters
    ----------
    message : str
        The message to write with the commit.
    metadata : dict[str, Any] | None, optional
        Additional metadata to store with the commit snapshot.

    Returns
    -------
    str
        The ID of the new snapshot.
    """
    if self._allow_changes:
        warnings.warn(
            "Committing a session after forking, and without merging will not work. "
            "Merge back in the remote changes first using Session.merge().",
            UserWarning,
            stacklevel=2,
        )
    return self._session.flush(message, metadata)

flush_async async #

flush_async(message, metadata=None)

Save the changes in the session to a new snapshot without modifying the current branch.

When successful, the writable session is completed and the session is now read-only and based on the new snapshot. The ID of the new snapshot is returned.

Parameters:

Name Type Description Default
message str

The message to write with the commit.

required
metadata dict[str, Any] | None

Additional metadata to store with the commit snapshot.

None

Returns:

Type Description
str

The ID of the new snapshot.

Source code in icechunk-python/python/icechunk/session.py
async def flush_async(
    self,
    message: str,
    metadata: dict[str, Any] | None = None,
) -> str:
    """
    Save the changes in the session to a new snapshot without modifying the current branch.

    When successful, the writable session is completed and the session is now read-only and based on the new snapshot. The ID of the new snapshot is returned.

    Parameters
    ----------
    message : str
        The message to write with the commit.
    metadata : dict[str, Any] | None, optional
        Additional metadata to store with the commit snapshot.

    Returns
    -------
    str
        The ID of the new snapshot.
    """
    if self._allow_changes:
        warnings.warn(
            "Flushing a session after forking, and without merging will not work. "
            "Merge back in the remote changes first using Session.merge().",
            UserWarning,
            stacklevel=2,
        )
    return await self._session.flush_async(message, metadata)

fork #

fork()

Create a child session that can be pickled to a worker job and later merged.

This method supports Icechunk's distributed, collaborative jobs. A coordinator task creates a new session using Repository.writable_session. Then Session.fork is called repeatedly to create as many serializable sessions as worker jobs. Each new ForkSession is pickled to the worker that uses it to do all its writes. Finally, the ForkSessions are pickled back to the coordinator that uses ForkSession.merge to merge them back into the original session and commit.

Learn more about collaborative writes at https://icechunk.io/en/latest/parallel/

Raises:

Type Description
ValueError

When self already has uncommitted changes.

ValueError

When self is read-only.

Source code in icechunk-python/python/icechunk/session.py
def fork(self) -> "ForkSession":
    """
    Create a child session that can be pickled to a worker job and later merged.

    This method supports Icechunk's distributed, collaborative jobs. A coordinator task creates a new session using
    `Repository.writable_session`. Then `Session.fork` is called repeatedly to create as many serializable sessions
    as worker jobs. Each new `ForkSession` is pickled to the worker that uses it to do all its writes.
    Finally, the `ForkSessions` are pickled back to the coordinator that uses `ForkSession.merge` to merge them
    back into the original session and `commit`.

    Learn more about collaborative writes at https://icechunk.io/en/latest/parallel/

    Raises
    ------
    ValueError
        When `self` already has uncommitted changes.
    ValueError
        When `self` is read-only.
    """
    if self.has_uncommitted_changes:
        raise ValueError(
            "Cannot fork a Session with uncommitted changes. "
            "Make a commit, create a new Session, and then fork that to execute distributed writes."
        )
    if self.read_only:
        raise ValueError(
            "You should not need to fork a read-only session. Read-only sessions can be pickled and transmitted directly."
        )
    self._allow_changes = True
    # force a deep-copy of the underlying Session,
    # so that multiple forks can be created and
    # used independently in a local session.
    # See test_dask.py::test_fork_session_deep_copies for an example
    return ForkSession(PySession.from_bytes(self._session.as_bytes()))

merge #

merge(*others)

Merge the changes for this session with the changes from another session.

Parameters:

Name Type Description Default
others ForkSession

The forked sessions to merge changes from.

()
Source code in icechunk-python/python/icechunk/session.py
def merge(self, *others: "ForkSession") -> None:
    """
    Merge the changes for this session with the changes from another session.

    Parameters
    ----------
    others : ForkSession
        The forked sessions to merge changes from.
    """
    for other in others:
        if not isinstance(other, ForkSession):
            raise TypeError(
                "Sessions can only be merged with a ForkSession created with Session.fork(). "
                f"Received {type(other).__name__} instead."
            )
        self._session.merge(other._session)
    self._allow_changes = False

merge_async async #

merge_async(*others)

Merge the changes for this session with the changes from another session (async version).

Parameters:

Name Type Description Default
others ForkSession

The forked sessions to merge changes from.

()
Source code in icechunk-python/python/icechunk/session.py
async def merge_async(self, *others: "ForkSession") -> None:
    """
    Merge the changes for this session with the changes from another session (async version).

    Parameters
    ----------
    others : ForkSession
        The forked sessions to merge changes from.
    """
    for other in others:
        if not isinstance(other, ForkSession):
            raise TypeError(
                "Sessions can only be merged with a ForkSession created with Session.fork(). "
                f"Received {type(other).__name__} instead."
            )
        await self._session.merge_async(other._session)
    self._allow_changes = False

move #

move(from_path, to_path)

Move or rename a node (array or group) in the hierarchy.

This is a metadata-only operation—no data is copied. Requires a rearrange session.

Parameters:

Name Type Description Default
from_path str

The current path of the node (e.g., "/data/raw").

required
to_path str

The new path for the node (e.g., "/data/v1").

required

Examples:

>>> session = repo.rearrange_session("main")
>>> session.move("/data/raw", "/data/v1")
>>> session.commit("Renamed raw to v1")
Source code in icechunk-python/python/icechunk/session.py
def move(self, from_path: str, to_path: str) -> None:
    """Move or rename a node (array or group) in the hierarchy.

    This is a metadata-only operation—no data is copied. Requires a rearrange session.

    Parameters
    ----------
    from_path : str
        The current path of the node (e.g., "/data/raw").
    to_path : str
        The new path for the node (e.g., "/data/v1").

    Examples
    --------
    >>> session = repo.rearrange_session("main")
    >>> session.move("/data/raw", "/data/v1")
    >>> session.commit("Renamed raw to v1")
    """
    return self._session.move_node(from_path, to_path)

move_async async #

move_async(from_path, to_path)

Async version of :meth:move.

Source code in icechunk-python/python/icechunk/session.py
async def move_async(self, from_path: str, to_path: str) -> None:
    """Async version of :meth:`move`."""
    return await self._session.move_node_async(from_path, to_path)

rebase #

rebase(solver)

Rebase the session to the latest ancestry of the branch.

This method will iteratively crawl the ancestry of the branch and apply the changes from the branch to the session. If a conflict is detected, the conflict solver will be used to optionally resolve the conflict. When complete, the session will be based on the latest commit of the branch and the session will be ready to attempt another commit.

When a conflict is detected and a resolution is not possible with the provided solver, a RebaseFailed exception will be raised. This exception will contain the snapshot ID that the rebase failed on and a list of conflicts that occurred.

Parameters:

Name Type Description Default
solver ConflictSolver

The conflict solver to use when a conflict is detected.

required

Raises:

Type Description
RebaseFailedError

When a conflict is detected and the solver fails to resolve it.

Source code in icechunk-python/python/icechunk/session.py
def rebase(self, solver: ConflictSolver) -> None:
    """
    Rebase the session to the latest ancestry of the branch.

    This method will iteratively crawl the ancestry of the branch and apply the changes from the branch to the session. If a conflict is detected, the conflict solver will be used to optionally resolve the conflict. When complete, the session will be based on the latest commit of the branch and the session will be ready to attempt another commit.

    When a conflict is detected and a resolution is not possible with the provided solver, a RebaseFailed exception will be raised. This exception will contain the snapshot ID that the rebase failed on and a list of conflicts that occurred.

    Parameters
    ----------
    solver : ConflictSolver
        The conflict solver to use when a conflict is detected.

    Raises
    ------
    RebaseFailedError
        When a conflict is detected and the solver fails to resolve it.
    """
    self._session.rebase(solver)

rebase_async async #

rebase_async(solver)

Rebase the session to the latest ancestry of the branch (async version).

This method will iteratively crawl the ancestry of the branch and apply the changes from the branch to the session. If a conflict is detected, the conflict solver will be used to optionally resolve the conflict. When complete, the session will be based on the latest commit of the branch and the session will be ready to attempt another commit.

When a conflict is detected and a resolution is not possible with the provided solver, a RebaseFailed exception will be raised. This exception will contain the snapshot ID that the rebase failed on and a list of conflicts that occurred.

Parameters:

Name Type Description Default
solver ConflictSolver

The conflict solver to use when a conflict is detected.

required

Raises:

Type Description
RebaseFailedError

When a conflict is detected and the solver fails to resolve it.

Source code in icechunk-python/python/icechunk/session.py
async def rebase_async(self, solver: ConflictSolver) -> None:
    """
    Rebase the session to the latest ancestry of the branch (async version).

    This method will iteratively crawl the ancestry of the branch and apply the changes from the branch to the session. If a conflict is detected, the conflict solver will be used to optionally resolve the conflict. When complete, the session will be based on the latest commit of the branch and the session will be ready to attempt another commit.

    When a conflict is detected and a resolution is not possible with the provided solver, a RebaseFailed exception will be raised. This exception will contain the snapshot ID that the rebase failed on and a list of conflicts that occurred.

    Parameters
    ----------
    solver : ConflictSolver
        The conflict solver to use when a conflict is detected.

    Raises
    ------
    RebaseFailedError
        When a conflict is detected and the solver fails to resolve it.
    """
    await self._session.rebase_async(solver)

reindex_array #

reindex_array(array_path, shift_chunk)

Reindex chunks in an array by applying a transformation function.

Parameters:

Name Type Description Default
array_path str

Path to the array.

required
shift_chunk Callable

Function that receives chunk coordinates and returns new coordinates, or None to discard the chunk.

required
Source code in icechunk-python/python/icechunk/session.py
def reindex_array(
    self,
    array_path: str,
    shift_chunk: Callable[[Iterable[int]], Iterable[int] | None],
) -> None:
    """Reindex chunks in an array by applying a transformation function.

    Parameters
    ----------
    array_path : str
        Path to the array.
    shift_chunk : Callable
        Function that receives chunk coordinates and returns new coordinates,
        or None to discard the chunk.
    """
    return self._session.reindex_array(array_path, shift_chunk)

roll_array #

roll_array(array_path, chunk_offset)

Roll (circular shift) all chunks in an array by the given chunk offset.

Chunks that shift out of one end wrap around to the other side. No data is lost — this is a circular buffer operation.

Parameters:

Name Type Description Default
array_path str

The path to the array to roll.

required
chunk_offset Iterable[int]

Offset added to each chunk coordinate (with wraparound). A chunk at index x moves to (x + chunk_offset) % num_chunks.

required

Returns:

Type Description
tuple[int, ...]

The index shift in element space (chunk_offset * chunk_size for each dimension).

Source code in icechunk-python/python/icechunk/session.py
def roll_array(
    self,
    array_path: str,
    chunk_offset: Iterable[int],
) -> tuple[int, ...]:
    """Roll (circular shift) all chunks in an array by the given chunk offset.

    Chunks that shift out of one end wrap around to the other side.
    No data is lost — this is a circular buffer operation.

    Parameters
    ----------
    array_path : str
        The path to the array to roll.
    chunk_offset : Iterable[int]
        Offset added to each chunk coordinate (with wraparound). A chunk at
        index ``x`` moves to ``(x + chunk_offset) % num_chunks``.

    Returns
    -------
    tuple[int, ...]
        The index shift in element space (chunk_offset * chunk_size for each dimension).
    """
    return tuple(self._session.roll_array(array_path, list(chunk_offset)))

shift_array #

shift_array(array_path, chunk_offset)

Shift all chunks in an array by the given chunk offset.

Chunks that shift out of bounds are discarded. Vacated positions retain stale chunk references — the caller typically writes new data there.

Parameters:

Name Type Description Default
array_path str

The path to the array to shift.

required
chunk_offset Iterable[int]

Offset added to each chunk coordinate. A chunk at index x moves to x + chunk_offset. For a 3D array, chunk_offset=(1, 0, -2) moves the chunk at (i, j, k) to (i+1, j, k-2).

required

Returns:

Type Description
tuple[int, ...]

The shift in element space (chunk_offset * chunk_size per dimension). For example, with chunk_size=10 and chunk_offset=(2,), returns (20,) — useful for slicing the region that needs new data.

Notes

To shift right while preserving all data, first resize the array using zarr's array.resize(), then use shift_array.

Source code in icechunk-python/python/icechunk/session.py
def shift_array(
    self,
    array_path: str,
    chunk_offset: Iterable[int],
) -> tuple[int, ...]:
    """Shift all chunks in an array by the given chunk offset.

    Chunks that shift out of bounds are discarded. Vacated positions retain
    stale chunk references — the caller typically writes new data there.

    Parameters
    ----------
    array_path : str
        The path to the array to shift.
    chunk_offset : Iterable[int]
        Offset added to each chunk coordinate. A chunk at index ``x`` moves
        to ``x + chunk_offset``. For a 3D array, ``chunk_offset=(1, 0, -2)``
        moves the chunk at ``(i, j, k)`` to ``(i+1, j, k-2)``.

    Returns
    -------
    tuple[int, ...]
        The shift in element space (``chunk_offset * chunk_size`` per dimension).
        For example, with ``chunk_size=10`` and ``chunk_offset=(2,)``, returns
        ``(20,)`` — useful for slicing the region that needs new data.

    Notes
    -----
    To shift right while preserving all data, first resize the array using zarr's
    array.resize(), then use shift_array.
    """
    return tuple(self._session.shift_array(array_path, list(chunk_offset)))

status #

status()

Compute an overview of the current session changes

Returns:

Type Description
Diff

The operations executed in the current session but still not committed.

Source code in icechunk-python/python/icechunk/session.py
def status(self) -> Diff:
    """
    Compute an overview of the current session changes

    Returns
    -------
    Diff
        The operations executed in the current session but still not committed.
    """
    return self._session.status()

SessionMode #

Bases: Enum

Enum for session access modes

Attributes:

Name Type Description
READONLY int

Session can only read data

WRITABLE int

Session can read and write data

REARRANGE int

Session can only move nodes and reindex arrays

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class SessionMode(Enum):
    """Enum for session access modes

    Attributes
    ----------
    READONLY: int
        Session can only read data
    WRITABLE: int
        Session can read and write data
    REARRANGE: int
        Session can only move nodes and reindex arrays
    """

    READONLY = 0
    WRITABLE = 1
    REARRANGE = 2

SnapshotInfo #

Metadata for a snapshot

Attributes:

Name Type Description
id str

The snapshot ID

manifests list[ManifestFileInfo]

The manifests linked to this snapshot

message str

The commit message of the snapshot

metadata dict[str, Any]

The metadata of the snapshot

parent_id str | None

The snapshot ID

written_at datetime

The timestamp when the snapshot was written

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class SnapshotInfo:
    """Metadata for a snapshot"""
    @property
    def id(self) -> str:
        """The snapshot ID"""
        ...
    @property
    def parent_id(self) -> str | None:
        """The snapshot ID"""
        ...
    @property
    def written_at(self) -> datetime.datetime:
        """
        The timestamp when the snapshot was written
        """
        ...
    @property
    def message(self) -> str:
        """
        The commit message of the snapshot
        """
        ...
    @property
    def metadata(self) -> dict[str, Any]:
        """
        The metadata of the snapshot
        """
        ...
    @property
    def manifests(self) -> list[ManifestFileInfo]:
        """
        The manifests linked to this snapshot
        """
        ...

id property #

id

The snapshot ID

manifests property #

manifests

The manifests linked to this snapshot

message property #

message

The commit message of the snapshot

metadata property #

metadata

The metadata of the snapshot

parent_id property #

parent_id

The snapshot ID

written_at property #

written_at

The timestamp when the snapshot was written

Storage #

Storage configuration for an IcechunkStore

Currently supports memory, filesystem S3, azure blob, and google cloud storage backends. Use the following methods to create a Storage object with the desired backend.

Ex:

storage = icechunk.in_memory_storage()
storage = icechunk.local_filesystem_storage("/path/to/root")
storage = icechunk.s3_storage("bucket", "prefix", ...)
storage = icechunk.gcs_storage("bucket", "prefix", ...)
storage = icechunk.azure_storage("container", "prefix", ...)

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class Storage:
    """Storage configuration for an IcechunkStore

    Currently supports memory, filesystem S3, azure blob, and google cloud storage backends.
    Use the following methods to create a Storage object with the desired backend.

    Ex:
    ```
    storage = icechunk.in_memory_storage()
    storage = icechunk.local_filesystem_storage("/path/to/root")
    storage = icechunk.s3_storage("bucket", "prefix", ...)
    storage = icechunk.gcs_storage("bucket", "prefix", ...)
    storage = icechunk.azure_storage("container", "prefix", ...)
    ```
    """

    @classmethod
    def new_s3(
        cls,
        config: S3Options,
        bucket: str,
        prefix: str | None,
        credentials: AnyS3Credential | None = None,
    ) -> Storage: ...
    @classmethod
    def new_s3_object_store(
        cls,
        config: S3Options,
        bucket: str,
        prefix: str | None,
        credentials: AnyS3Credential | None = None,
    ) -> Storage: ...
    @classmethod
    def new_tigris(
        cls,
        config: S3Options,
        bucket: str,
        prefix: str | None,
        use_weak_consistency: bool,
        credentials: AnyS3Credential | None = None,
    ) -> Storage: ...
    @classmethod
    def new_in_memory(cls) -> Storage: ...
    @classmethod
    def new_local_filesystem(cls, path: str) -> Storage: ...
    @classmethod
    def new_gcs(
        cls,
        bucket: str,
        prefix: str | None,
        credentials: AnyGcsCredential | None = None,
        *,
        config: dict[str, str] | None = None,
    ) -> Storage: ...
    @classmethod
    def new_r2(
        cls,
        bucket: str | None,
        prefix: str | None,
        account_id: str | None,
        credentials: AnyS3Credential | None = None,
        *,
        config: S3Options,
    ) -> Storage: ...
    @classmethod
    def new_azure_blob(
        cls,
        account: str,
        container: str,
        prefix: str,
        credentials: AnyAzureCredential | None = None,
        *,
        config: dict[str, str] | None = None,
    ) -> Storage: ...
    @classmethod
    def new_http(
        cls,
        base_url: str,
        config: dict[str, str] | None = None,
    ) -> Storage: ...
    @classmethod
    def new_redirect(
        cls,
        base_url: str,
    ) -> Storage: ...
    def __repr__(self) -> str: ...
    def default_settings(self) -> StorageSettings: ...

StorageConcurrencySettings #

Configuration for how Icechunk uses its Storage instance

Methods:

Name Description
__init__

Create a new StorageConcurrencySettings object

Attributes:

Name Type Description
ideal_concurrent_request_size int | None

The ideal concurrent request size.

max_concurrent_requests_for_object int | None

The maximum number of concurrent requests for an object.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class StorageConcurrencySettings:
    """Configuration for how Icechunk uses its Storage instance"""

    def __init__(
        self,
        max_concurrent_requests_for_object: int | None = None,
        ideal_concurrent_request_size: int | None = None,
    ) -> None:
        """
        Create a new `StorageConcurrencySettings` object

        Parameters
        ----------
        max_concurrent_requests_for_object: int | None
            The maximum number of concurrent requests for an object.
        ideal_concurrent_request_size: int | None
            The ideal concurrent request size.
        """
        ...
    @property
    def max_concurrent_requests_for_object(self) -> int | None:
        """
        The maximum number of concurrent requests for an object.

        Returns
        -------
        int | None
            The maximum number of concurrent requests for an object.
        """
        ...
    @max_concurrent_requests_for_object.setter
    def max_concurrent_requests_for_object(self, value: int | None) -> None:
        """
        Set the maximum number of concurrent requests for an object.

        Parameters
        ----------
        value: int | None
            The maximum number of concurrent requests for an object.
        """
        ...
    @property
    def ideal_concurrent_request_size(self) -> int | None:
        """
        The ideal concurrent request size.

        Returns
        -------
        int | None
            The ideal concurrent request size.
        """
        ...
    @ideal_concurrent_request_size.setter
    def ideal_concurrent_request_size(self, value: int | None) -> None:
        """
        Set the ideal concurrent request size.

        Parameters
        ----------
        value: int | None
            The ideal concurrent request size.
        """
        ...

ideal_concurrent_request_size property writable #

ideal_concurrent_request_size

The ideal concurrent request size.

Returns:

Type Description
int | None

The ideal concurrent request size.

max_concurrent_requests_for_object property writable #

max_concurrent_requests_for_object

The maximum number of concurrent requests for an object.

Returns:

Type Description
int | None

The maximum number of concurrent requests for an object.

__init__ #

__init__(max_concurrent_requests_for_object=None, ideal_concurrent_request_size=None)

Create a new StorageConcurrencySettings object

Parameters:

Name Type Description Default
max_concurrent_requests_for_object int | None

The maximum number of concurrent requests for an object.

None
ideal_concurrent_request_size int | None

The ideal concurrent request size.

None
Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
def __init__(
    self,
    max_concurrent_requests_for_object: int | None = None,
    ideal_concurrent_request_size: int | None = None,
) -> None:
    """
    Create a new `StorageConcurrencySettings` object

    Parameters
    ----------
    max_concurrent_requests_for_object: int | None
        The maximum number of concurrent requests for an object.
    ideal_concurrent_request_size: int | None
        The ideal concurrent request size.
    """
    ...

StorageRetriesSettings #

Configuration for how Icechunk retries requests.

Icechunk retries failed requests with an exponential backoff algorithm.

Methods:

Name Description
__init__

Create a new StorageRetriesSettings object

Attributes:

Name Type Description
initial_backoff_ms int | None

The initial backoff duration in milliseconds.

max_backoff_ms int | None

The maximum backoff duration in milliseconds.

max_tries int | None

The maximum number of tries, including the initial one.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class StorageRetriesSettings:
    """Configuration for how Icechunk retries requests.

    Icechunk retries failed requests with an exponential backoff algorithm."""

    def __init__(
        self,
        max_tries: int | None = None,
        initial_backoff_ms: int | None = None,
        max_backoff_ms: int | None = None,
    ) -> None:
        """
        Create a new `StorageRetriesSettings` object

        Parameters
        ----------
        max_tries: int | None
            The maximum number of tries, including the initial one. Set to 1 to disable retries
        initial_backoff_ms: int | None
            The initial backoff duration in milliseconds
        max_backoff_ms: int | None
            The limit to backoff duration in milliseconds
        """
        ...
    @property
    def max_tries(self) -> int | None:
        """
        The maximum number of tries, including the initial one.

        Returns
        -------
        int | None
            The maximum number of tries.
        """
        ...
    @max_tries.setter
    def max_tries(self, value: int | None) -> None:
        """
        Set the maximum number of tries. Set to 1 to disable retries.

        Parameters
        ----------
        value: int | None
            The maximum number of tries
        """
        ...
    @property
    def initial_backoff_ms(self) -> int | None:
        """
        The initial backoff duration in milliseconds.

        Returns
        -------
        int | None
            The initial backoff duration in milliseconds.
        """
        ...
    @initial_backoff_ms.setter
    def initial_backoff_ms(self, value: int | None) -> None:
        """
        Set the initial backoff duration in milliseconds.

        Parameters
        ----------
        value: int | None
            The initial backoff duration in milliseconds.
        """
        ...
    @property
    def max_backoff_ms(self) -> int | None:
        """
        The maximum backoff duration in milliseconds.

        Returns
        -------
        int | None
            The maximum backoff duration in milliseconds.
        """
        ...
    @max_backoff_ms.setter
    def max_backoff_ms(self, value: int | None) -> None:
        """
        Set the maximum backoff duration in milliseconds.

        Parameters
        ----------
        value: int | None
            The maximum backoff duration in milliseconds.
        """
        ...

initial_backoff_ms property writable #

initial_backoff_ms

The initial backoff duration in milliseconds.

Returns:

Type Description
int | None

The initial backoff duration in milliseconds.

max_backoff_ms property writable #

max_backoff_ms

The maximum backoff duration in milliseconds.

Returns:

Type Description
int | None

The maximum backoff duration in milliseconds.

max_tries property writable #

max_tries

The maximum number of tries, including the initial one.

Returns:

Type Description
int | None

The maximum number of tries.

__init__ #

__init__(max_tries=None, initial_backoff_ms=None, max_backoff_ms=None)

Create a new StorageRetriesSettings object

Parameters:

Name Type Description Default
max_tries int | None

The maximum number of tries, including the initial one. Set to 1 to disable retries

None
initial_backoff_ms int | None

The initial backoff duration in milliseconds

None
max_backoff_ms int | None

The limit to backoff duration in milliseconds

None
Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
def __init__(
    self,
    max_tries: int | None = None,
    initial_backoff_ms: int | None = None,
    max_backoff_ms: int | None = None,
) -> None:
    """
    Create a new `StorageRetriesSettings` object

    Parameters
    ----------
    max_tries: int | None
        The maximum number of tries, including the initial one. Set to 1 to disable retries
    initial_backoff_ms: int | None
        The initial backoff duration in milliseconds
    max_backoff_ms: int | None
        The limit to backoff duration in milliseconds
    """
    ...

StorageSettings #

Configuration for how Icechunk uses its Storage instance

Methods:

Name Description
__init__

Create a new StorageSettings object

Attributes:

Name Type Description
chunks_storage_class str | None

Chunk objects in object store will use this storage class or self.storage_class if None

concurrency StorageConcurrencySettings | None

The configuration for how much concurrency Icechunk store uses

metadata_storage_class str | None

Metadata objects in object store will use this storage class or self.storage_class if None

minimum_size_for_multipart_upload int | None

Use object store's multipart upload for objects larger than this size in bytes

retries StorageRetriesSettings | None

The configuration for how Icechunk retries failed requests.

storage_class str | None

All objects in object store will use this storage class or the default if None

unsafe_use_conditional_create bool | None

True if Icechunk will use conditional PUT operations for creation in the object store

unsafe_use_conditional_update bool | None

True if Icechunk will use conditional PUT operations for updates in the object store

unsafe_use_metadata bool | None

True if Icechunk will write object metadata in the object store

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class StorageSettings:
    """Configuration for how Icechunk uses its Storage instance"""

    def __init__(
        self,
        concurrency: StorageConcurrencySettings | None = None,
        retries: StorageRetriesSettings | None = None,
        unsafe_use_conditional_create: bool | None = None,
        unsafe_use_conditional_update: bool | None = None,
        unsafe_use_metadata: bool | None = None,
        storage_class: str | None = None,
        metadata_storage_class: str | None = None,
        chunks_storage_class: str | None = None,
        minimum_size_for_multipart_upload: int | None = None,
    ) -> None:
        """
        Create a new `StorageSettings` object

        Parameters
        ----------
        concurrency: StorageConcurrencySettings | None
            The configuration for how Icechunk uses its Storage instance.

        retries: StorageRetriesSettings | None
            The configuration for how Icechunk retries failed requests.

        unsafe_use_conditional_update: bool | None
            If set to False, Icechunk loses some of its consistency guarantees.
            This is only useful in object stores that don't support the feature.
            Use it at your own risk.

        unsafe_use_conditional_create: bool | None
            If set to False, Icechunk loses some of its consistency guarantees.
            This is only useful in object stores that don't support the feature.
            Use at your own risk.

        unsafe_use_metadata: bool | None
            Don't write metadata fields in Icechunk files.
            This is only useful in object stores that don't support the feature.
            Use at your own risk.

        storage_class: str | None
            Store all objects using this object store storage class
            If None the object store default will be used.
            Currently not supported in GCS.
            Example: STANDARD_IA

        metadata_storage_class: str | None
            Store metadata objects using this object store storage class.
            Currently not supported in GCS.
            Defaults to storage_class.

        chunks_storage_class: str | None
            Store chunk objects using this object store storage class.
            Currently not supported in GCS.
            Defaults to storage_class.

        minimum_size_for_multipart_upload: int | None
            Use object store's multipart upload for objects larger than this size in bytes.
            Default: 100 MB if None is passed.
        """
        ...
    @property
    def concurrency(self) -> StorageConcurrencySettings | None:
        """
        The configuration for how much concurrency Icechunk store uses

        Returns
        -------
        StorageConcurrencySettings | None
            The configuration for how Icechunk uses its Storage instance.
        """

    @concurrency.setter
    def concurrency(self, value: StorageConcurrencySettings | None) -> None: ...
    @property
    def retries(self) -> StorageRetriesSettings | None:
        """
        The configuration for how Icechunk retries failed requests.

        Returns
        -------
        StorageRetriesSettings | None
            The configuration for how Icechunk retries failed requests.
        """

    @retries.setter
    def retries(self, value: StorageRetriesSettings | None) -> None: ...
    @property
    def unsafe_use_conditional_update(self) -> bool | None:
        """True if Icechunk will use conditional PUT operations for updates in the object store"""
        ...

    @unsafe_use_conditional_update.setter
    def unsafe_use_conditional_update(self, value: bool) -> None: ...
    @property
    def unsafe_use_conditional_create(self) -> bool | None:
        """True if Icechunk will use conditional PUT operations for creation in the object store"""
        ...

    @unsafe_use_conditional_create.setter
    def unsafe_use_conditional_create(self, value: bool) -> None: ...
    @property
    def unsafe_use_metadata(self) -> bool | None:
        """True if Icechunk will write object metadata in the object store"""
        ...

    @unsafe_use_metadata.setter
    def unsafe_use_metadata(self, value: bool) -> None: ...
    @property
    def storage_class(self) -> str | None:
        """All objects in object store will use this storage class or the default if None"""
        ...

    @storage_class.setter
    def storage_class(self, value: str) -> None: ...
    @property
    def metadata_storage_class(self) -> str | None:
        """Metadata objects in object store will use this storage class or self.storage_class if None"""
        ...

    @metadata_storage_class.setter
    def metadata_storage_class(self, value: str) -> None: ...
    @property
    def chunks_storage_class(self) -> str | None:
        """Chunk objects in object store will use this storage class or self.storage_class if None"""
        ...

    @chunks_storage_class.setter
    def chunks_storage_class(self, value: str) -> None: ...
    @property
    def minimum_size_for_multipart_upload(self) -> int | None:
        """Use object store's multipart upload for objects larger than this size in bytes"""
        ...

    @minimum_size_for_multipart_upload.setter
    def minimum_size_for_multipart_upload(self, value: int) -> None: ...

chunks_storage_class property writable #

chunks_storage_class

Chunk objects in object store will use this storage class or self.storage_class if None

concurrency property writable #

concurrency

The configuration for how much concurrency Icechunk store uses

Returns:

Type Description
StorageConcurrencySettings | None

The configuration for how Icechunk uses its Storage instance.

metadata_storage_class property writable #

metadata_storage_class

Metadata objects in object store will use this storage class or self.storage_class if None

minimum_size_for_multipart_upload property writable #

minimum_size_for_multipart_upload

Use object store's multipart upload for objects larger than this size in bytes

retries property writable #

retries

The configuration for how Icechunk retries failed requests.

Returns:

Type Description
StorageRetriesSettings | None

The configuration for how Icechunk retries failed requests.

storage_class property writable #

storage_class

All objects in object store will use this storage class or the default if None

unsafe_use_conditional_create property writable #

unsafe_use_conditional_create

True if Icechunk will use conditional PUT operations for creation in the object store

unsafe_use_conditional_update property writable #

unsafe_use_conditional_update

True if Icechunk will use conditional PUT operations for updates in the object store

unsafe_use_metadata property writable #

unsafe_use_metadata

True if Icechunk will write object metadata in the object store

__init__ #

__init__(concurrency=None, retries=None, unsafe_use_conditional_create=None, unsafe_use_conditional_update=None, unsafe_use_metadata=None, storage_class=None, metadata_storage_class=None, chunks_storage_class=None, minimum_size_for_multipart_upload=None)

Create a new StorageSettings object

Parameters:

Name Type Description Default
concurrency StorageConcurrencySettings | None

The configuration for how Icechunk uses its Storage instance.

None
retries StorageRetriesSettings | None

The configuration for how Icechunk retries failed requests.

None
unsafe_use_conditional_update bool | None

If set to False, Icechunk loses some of its consistency guarantees. This is only useful in object stores that don't support the feature. Use it at your own risk.

None
unsafe_use_conditional_create bool | None

If set to False, Icechunk loses some of its consistency guarantees. This is only useful in object stores that don't support the feature. Use at your own risk.

None
unsafe_use_metadata bool | None

Don't write metadata fields in Icechunk files. This is only useful in object stores that don't support the feature. Use at your own risk.

None
storage_class str | None

Store all objects using this object store storage class If None the object store default will be used. Currently not supported in GCS. Example: STANDARD_IA

None
metadata_storage_class str | None

Store metadata objects using this object store storage class. Currently not supported in GCS. Defaults to storage_class.

None
chunks_storage_class str | None

Store chunk objects using this object store storage class. Currently not supported in GCS. Defaults to storage_class.

None
minimum_size_for_multipart_upload int | None

Use object store's multipart upload for objects larger than this size in bytes. Default: 100 MB if None is passed.

None
Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
def __init__(
    self,
    concurrency: StorageConcurrencySettings | None = None,
    retries: StorageRetriesSettings | None = None,
    unsafe_use_conditional_create: bool | None = None,
    unsafe_use_conditional_update: bool | None = None,
    unsafe_use_metadata: bool | None = None,
    storage_class: str | None = None,
    metadata_storage_class: str | None = None,
    chunks_storage_class: str | None = None,
    minimum_size_for_multipart_upload: int | None = None,
) -> None:
    """
    Create a new `StorageSettings` object

    Parameters
    ----------
    concurrency: StorageConcurrencySettings | None
        The configuration for how Icechunk uses its Storage instance.

    retries: StorageRetriesSettings | None
        The configuration for how Icechunk retries failed requests.

    unsafe_use_conditional_update: bool | None
        If set to False, Icechunk loses some of its consistency guarantees.
        This is only useful in object stores that don't support the feature.
        Use it at your own risk.

    unsafe_use_conditional_create: bool | None
        If set to False, Icechunk loses some of its consistency guarantees.
        This is only useful in object stores that don't support the feature.
        Use at your own risk.

    unsafe_use_metadata: bool | None
        Don't write metadata fields in Icechunk files.
        This is only useful in object stores that don't support the feature.
        Use at your own risk.

    storage_class: str | None
        Store all objects using this object store storage class
        If None the object store default will be used.
        Currently not supported in GCS.
        Example: STANDARD_IA

    metadata_storage_class: str | None
        Store metadata objects using this object store storage class.
        Currently not supported in GCS.
        Defaults to storage_class.

    chunks_storage_class: str | None
        Store chunk objects using this object store storage class.
        Currently not supported in GCS.
        Defaults to storage_class.

    minimum_size_for_multipart_upload: int | None
        Use object store's multipart upload for objects larger than this size in bytes.
        Default: 100 MB if None is passed.
    """
    ...

VersionSelection #

Bases: Enum

Enum for selecting the which version of a conflict

Attributes:

Name Type Description
Fail int

Fail the rebase operation

UseOurs int

Use the version from the source store

UseTheirs int

Use the version from the target store

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class VersionSelection(Enum):
    """Enum for selecting the which version of a conflict

    Attributes
    ----------
    Fail: int
        Fail the rebase operation
    UseOurs: int
        Use the version from the source store
    UseTheirs: int
        Use the version from the target store
    """

    Fail = 0
    UseOurs = 1
    UseTheirs = 2

VirtualChunkContainer #

A virtual chunk container is a configuration that allows Icechunk to read virtual references from a storage backend.

Attributes:

Name Type Description
url_prefix str

The prefix of urls that will use this containers configuration for reading virtual references.

store ObjectStoreConfig

The storage backend to use for the virtual chunk container.

Methods:

Name Description
__init__

Create a new VirtualChunkContainer object

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class VirtualChunkContainer:
    """A virtual chunk container is a configuration that allows Icechunk to read virtual references from a storage backend.

    Attributes
    ----------
    url_prefix: str
        The prefix of urls that will use this containers configuration for reading virtual references.
    store: ObjectStoreConfig
        The storage backend to use for the virtual chunk container.
    """

    name: str
    url_prefix: str
    store: ObjectStoreConfig

    def __init__(self, url_prefix: str, store: AnyObjectStoreConfig):
        """
        Create a new `VirtualChunkContainer` object

        Parameters
        ----------
        url_prefix: str
            The prefix of urls that will use this containers configuration for reading virtual references.
        store: ObjectStoreConfig
            The storage backend to use for the virtual chunk container.
        """

__init__ #

__init__(url_prefix, store)

Create a new VirtualChunkContainer object

Parameters:

Name Type Description Default
url_prefix str

The prefix of urls that will use this containers configuration for reading virtual references.

required
store AnyObjectStoreConfig

The storage backend to use for the virtual chunk container.

required
Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
def __init__(self, url_prefix: str, store: AnyObjectStoreConfig):
    """
    Create a new `VirtualChunkContainer` object

    Parameters
    ----------
    url_prefix: str
        The prefix of urls that will use this containers configuration for reading virtual references.
    store: ObjectStoreConfig
        The storage backend to use for the virtual chunk container.
    """

VirtualChunkSpec #

The specification for a virtual chunk reference.

Attributes:

Name Type Description
etag_checksum str | None

Optional object store e-tag for the containing object.

index list[int]

The chunk index, in chunk coordinates space

last_updated_at_checksum datetime | None

Optional timestamp for the containing object.

length int

The length of the chunk in bytes

location str

The URL to the virtual chunk data, something like 's3://bucket/foo.nc'

offset int

The chunk offset within the pointed object, in bytes

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
class VirtualChunkSpec:
    """The specification for a virtual chunk reference."""
    @property
    def index(self) -> list[int]:
        """The chunk index, in chunk coordinates space"""
        ...
    @property
    def location(self) -> str:
        """The URL to the virtual chunk data, something like 's3://bucket/foo.nc'"""
        ...
    @property
    def offset(self) -> int:
        """The chunk offset within the pointed object, in bytes"""
        ...
    @property
    def length(self) -> int:
        """The length of the chunk in bytes"""
        ...
    @property
    def etag_checksum(self) -> str | None:
        """Optional object store e-tag for the containing object.

        Icechunk will refuse to serve data from this chunk if the etag has changed.
        """
        ...
    @property
    def last_updated_at_checksum(self) -> datetime.datetime | None:
        """Optional timestamp for the containing object.

        Icechunk will refuse to serve data from this chunk if it has been modified in object store after this time.
        """
        ...

    def __init__(
        self,
        index: list[int],
        location: str,
        offset: int,
        length: int,
        etag_checksum: str | None = None,
        last_updated_at_checksum: datetime.datetime | None = None,
    ) -> None: ...

etag_checksum property #

etag_checksum

Optional object store e-tag for the containing object.

Icechunk will refuse to serve data from this chunk if the etag has changed.

index property #

index

The chunk index, in chunk coordinates space

last_updated_at_checksum property #

last_updated_at_checksum

Optional timestamp for the containing object.

Icechunk will refuse to serve data from this chunk if it has been modified in object store after this time.

length property #

length

The length of the chunk in bytes

location property #

location

The URL to the virtual chunk data, something like 's3://bucket/foo.nc'

offset property #

offset

The chunk offset within the pointed object, in bytes

_upgrade_icechunk_repository #

_upgrade_icechunk_repository(repo, *, dry_run=True, delete_unused_v1_files=False)

Migrate a repository to the latest version of Icechunk.

This is an administrative operation, and must be executed in isolation from other readers and writers. Other processes running concurrently on the same repo may see undefined behavior.

At this time, this function supports only migration from Icechunk spec version 1 to Icechunk spec version 2. This means Icechunk versions 1.x to 2.x.

The operation is usually fast, but it can take several minutes if there is a very large version history (thousands of snapshots).

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
def _upgrade_icechunk_repository(
    repo: PyRepository, *, dry_run: bool = True, delete_unused_v1_files: bool = False
) -> None:
    """
    Migrate a repository to the latest version of Icechunk.

    This is an administrative operation, and must be executed in isolation from
    other readers and writers. Other processes running concurrently on the same
    repo may see undefined behavior.

    At this time, this function supports only migration from Icechunk spec version 1
    to Icechunk spec version 2. This means Icechunk versions 1.x to 2.x.

    The operation is usually fast, but it can take several minutes if there is a very
    large version history (thousands of snapshots).
    """
    ...

azure_credentials #

azure_credentials(*, access_key=None, sas_token=None, bearer_token=None, from_env=None)

Create credentials Azure Blob Storage object store.

If all arguments are None, credentials are fetched from the operative system environment.

Source code in icechunk-python/python/icechunk/credentials.py
def azure_credentials(
    *,
    access_key: str | None = None,
    sas_token: str | None = None,
    bearer_token: str | None = None,
    from_env: bool | None = None,
) -> AnyAzureCredential:
    """Create credentials Azure Blob Storage object store.

    If all arguments are None, credentials are fetched from the operative system environment.
    """
    if (from_env is None or from_env) and (
        access_key is None and sas_token is None and bearer_token is None
    ):
        return azure_from_env_credentials()

    if (access_key is not None or sas_token is not None or bearer_token is not None) and (
        from_env is None or not from_env
    ):
        return AzureCredentials.Static(
            azure_static_credentials(
                access_key=access_key,
                sas_token=sas_token,
                bearer_token=bearer_token,
            )
        )

    raise ValueError("Conflicting arguments to azure_credentials function")

azure_from_env_credentials #

azure_from_env_credentials()

Instruct Azure Blob Storage object store to fetch credentials from the operative system environment.

Source code in icechunk-python/python/icechunk/credentials.py
def azure_from_env_credentials() -> AzureCredentials.FromEnv:
    """Instruct Azure Blob Storage object store to fetch credentials from the operative system environment."""
    return AzureCredentials.FromEnv()

azure_static_credentials #

azure_static_credentials(*, access_key=None, sas_token=None, bearer_token=None)

Create static credentials Azure Blob Storage object store.

Source code in icechunk-python/python/icechunk/credentials.py
def azure_static_credentials(
    *,
    access_key: str | None = None,
    sas_token: str | None = None,
    bearer_token: str | None = None,
) -> AnyAzureStaticCredential:
    """Create static credentials Azure Blob Storage object store."""
    if [access_key, sas_token, bearer_token].count(None) != 2:
        raise ValueError("Conflicting arguments to azure_static_credentials function")
    if access_key is not None:
        return AzureStaticCredentials.AccessKey(access_key)
    if sas_token is not None:
        return AzureStaticCredentials.SasToken(sas_token)
    if bearer_token is not None:
        return AzureStaticCredentials.BearerToken(bearer_token)
    raise ValueError(
        "No valid static credential provided for Azure Blob Storage object store"
    )

azure_storage #

azure_storage(*, account, container, prefix, access_key=None, sas_token=None, bearer_token=None, from_env=None, config=None)

Create a Storage instance that saves data in Azure Blob Storage object store.

Parameters:

Name Type Description Default
account str

The account to which the caller must have access privileges

required
container str

The container where the repository will store its data

required
prefix str

The prefix within the container that is the root directory of the repository

required
access_key str | None

Azure Blob Storage credential access key

None
sas_token str | None

Azure Blob Storage credential SAS token

None
bearer_token str | None

Azure Blob Storage credential bearer token

None
from_env bool | None

Fetch credentials from the operative system environment

None
config dict[str, str] | None

A dictionary of options for the Azure Blob Storage object store. See https://docs.rs/object_store/latest/object_store/azure/enum.AzureConfigKey.html#variants for a list of possible configuration keys.

None
Source code in icechunk-python/python/icechunk/storage.py
def azure_storage(
    *,
    account: str,
    container: str,
    prefix: str,
    access_key: str | None = None,
    sas_token: str | None = None,
    bearer_token: str | None = None,
    from_env: bool | None = None,
    config: dict[str, str] | None = None,
) -> Storage:
    """Create a Storage instance that saves data in Azure Blob Storage object store.

    Parameters
    ----------
    account: str
        The account to which the caller must have access privileges
    container: str
        The container where the repository will store its data
    prefix: str
        The prefix within the container that is the root directory of the repository
    access_key: str | None
        Azure Blob Storage credential access key
    sas_token: str | None
        Azure Blob Storage credential SAS token
    bearer_token: str | None
        Azure Blob Storage credential bearer token
    from_env: bool | None
        Fetch credentials from the operative system environment
    config: dict[str, str] | None
        A dictionary of options for the Azure Blob Storage object store. See https://docs.rs/object_store/latest/object_store/azure/enum.AzureConfigKey.html#variants for a list of possible configuration keys.
    """
    credentials = azure_credentials(
        access_key=access_key,
        sas_token=sas_token,
        bearer_token=bearer_token,
        from_env=from_env,
    )
    return Storage.new_azure_blob(
        account=account,
        container=container,
        prefix=prefix,
        credentials=credentials,
        config=config,
    )

containers_credentials #

containers_credentials(m)

Build a map of credentials for virtual chunk containers.

Parameters:

Name Type Description Default
m Mapping[str, AnyS3Credential | AnyGcsCredential | AnyAzureCredential | None]

A mapping from container url prefixes to credentials.

required

Examples:

import icechunk as ic

config = ic.RepositoryConfig.default()
config.inline_chunk_threshold_bytes = 512

virtual_store_config = ic.s3_store(
    region="us-east-1",
    endpoint_url="http://localhost:9000",
    allow_http=True,
    s3_compatible=True,
    force_path_style=True,
)
container = ic.VirtualChunkContainer("s3://somebucket", virtual_store_config)
config.set_virtual_chunk_container(container)
credentials = ic.containers_credentials(
    {"s3://somebucket": ic.s3_credentials(access_key_id="ACCESS_KEY", secret_access_key="SECRET"}
)

repo = ic.Repository.create(
    storage=ic.local_filesystem_storage(store_path),
    config=config,
    authorize_virtual_chunk_access=credentials,
)
Source code in icechunk-python/python/icechunk/credentials.py
def containers_credentials(
    m: Mapping[str, AnyS3Credential | AnyGcsCredential | AnyAzureCredential | None],
) -> dict[str, AnyCredential | None]:
    """Build a map of credentials for virtual chunk containers.

    Parameters
    ----------
    m: Mapping[str, AnyS3Credential | AnyGcsCredential | AnyAzureCredential ]
        A mapping from container url prefixes to credentials.

    Examples
    --------
    ```python
    import icechunk as ic

    config = ic.RepositoryConfig.default()
    config.inline_chunk_threshold_bytes = 512

    virtual_store_config = ic.s3_store(
        region="us-east-1",
        endpoint_url="http://localhost:9000",
        allow_http=True,
        s3_compatible=True,
        force_path_style=True,
    )
    container = ic.VirtualChunkContainer("s3://somebucket", virtual_store_config)
    config.set_virtual_chunk_container(container)
    credentials = ic.containers_credentials(
        {"s3://somebucket": ic.s3_credentials(access_key_id="ACCESS_KEY", secret_access_key="SECRET"}
    )

    repo = ic.Repository.create(
        storage=ic.local_filesystem_storage(store_path),
        config=config,
        authorize_virtual_chunk_access=credentials,
    )
    ```

    """
    res: dict[str, AnyCredential | None] = {}
    for name, cred in m.items():
        if cred is None:
            res[name] = None
        elif isinstance(cred, AnyS3Credential):
            res[name] = Credentials.S3(cred)
        elif (
            isinstance(cred, GcsCredentials.FromEnv)
            or isinstance(cred, GcsCredentials.Static)
            or isinstance(cred, GcsCredentials.Refreshable)
            or isinstance(cred, GcsCredentials.Anonymous)
        ):
            res[name] = Credentials.Gcs(cast(GcsCredentials, cred))
        elif isinstance(cred, AzureCredentials.FromEnv) or isinstance(
            cred, AzureCredentials.Static
        ):
            res[name] = Credentials.Azure(cast(AzureCredentials, cred))
        else:
            raise ValueError(f"Unknown credential type {type(cred)}")
    return res

gcs_credentials #

gcs_credentials(*, service_account_file=None, service_account_key=None, application_credentials=None, bearer_token=None, from_env=None, anonymous=None, get_credentials=None, scatter_initial_credentials=False)

Create credentials Google Cloud Storage object store.

If all arguments are None, credentials are fetched from the operative system environment.

Source code in icechunk-python/python/icechunk/credentials.py
def gcs_credentials(
    *,
    service_account_file: str | None = None,
    service_account_key: str | None = None,
    application_credentials: str | None = None,
    bearer_token: str | None = None,
    from_env: bool | None = None,
    anonymous: bool | None = None,
    get_credentials: Callable[[], GcsBearerCredential] | None = None,
    scatter_initial_credentials: bool = False,
) -> AnyGcsCredential:
    """Create credentials Google Cloud Storage object store.

    If all arguments are None, credentials are fetched from the operative system environment.
    """
    if anonymous is not None and anonymous:
        return gcs_anonymous_credentials()

    if (from_env is None or from_env) and (
        service_account_file is None
        and service_account_key is None
        and application_credentials is None
        and bearer_token is None
    ):
        return gcs_from_env_credentials()

    if (
        service_account_file is not None
        or service_account_key is not None
        or application_credentials is not None
        or bearer_token is not None
    ) and (from_env is None or not from_env):
        return GcsCredentials.Static(
            gcs_static_credentials(
                service_account_file=service_account_file,
                service_account_key=service_account_key,
                application_credentials=application_credentials,
                bearer_token=bearer_token,
            )
        )

    if get_credentials is not None:
        return gcs_refreshable_credentials(
            get_credentials, scatter_initial_credentials=scatter_initial_credentials
        )

    raise ValueError("Conflicting arguments to gcs_credentials function")

gcs_from_env_credentials #

gcs_from_env_credentials()

Instruct Google Cloud Storage object store to fetch credentials from the operative system environment.

Source code in icechunk-python/python/icechunk/credentials.py
def gcs_from_env_credentials() -> GcsCredentials.FromEnv:
    """Instruct Google Cloud Storage object store to fetch credentials from the operative system environment."""
    return GcsCredentials.FromEnv()

gcs_refreshable_credentials #

gcs_refreshable_credentials(get_credentials, scatter_initial_credentials=False)

Create refreshable credentials for Google Cloud Storage object store.

Parameters:

Name Type Description Default
get_credentials Callable[[], GcsBearerCredential]

Use this function to get and refresh the credentials. The function must be pickable.

required
scatter_initial_credentials bool

Immediately call and store the value returned by get_credentials. This is useful if the repo or session will be pickled to generate many copies. Passing scatter_initial_credentials=True will ensure all those copies don't need to call get_credentials immediately. After the initial set of credentials has expired, the cached value is no longer used. Notice that credentials obtained are stored, and they can be sent over the network if you pickle the session/repo.

False
Source code in icechunk-python/python/icechunk/credentials.py
def gcs_refreshable_credentials(
    get_credentials: Callable[[], GcsBearerCredential],
    scatter_initial_credentials: bool = False,
) -> GcsCredentials.Refreshable:
    """Create refreshable credentials for Google Cloud Storage object store.

    Parameters
    ----------
    get_credentials: Callable[[], S3StaticCredentials]
        Use this function to get and refresh the credentials. The function must be pickable.
    scatter_initial_credentials: bool, optional
        Immediately call and store the value returned by get_credentials. This is useful if the
        repo or session will be pickled to generate many copies. Passing scatter_initial_credentials=True will
        ensure all those copies don't need to call get_credentials immediately. After the initial
        set of credentials has expired, the cached value is no longer used. Notice that credentials
        obtained are stored, and they can be sent over the network if you pickle the session/repo.
    """

    current = get_credentials() if scatter_initial_credentials else None
    return GcsCredentials.Refreshable(pickle.dumps(get_credentials), current)

gcs_static_credentials #

gcs_static_credentials(*, service_account_file=None, service_account_key=None, application_credentials=None, bearer_token=None)

Create static credentials Google Cloud Storage object store.

Source code in icechunk-python/python/icechunk/credentials.py
def gcs_static_credentials(
    *,
    service_account_file: str | None = None,
    service_account_key: str | None = None,
    application_credentials: str | None = None,
    bearer_token: str | None = None,
) -> AnyGcsStaticCredential:
    """Create static credentials Google Cloud Storage object store."""
    if service_account_file is not None:
        return GcsStaticCredentials.ServiceAccount(service_account_file)
    if service_account_key is not None:
        return GcsStaticCredentials.ServiceAccountKey(service_account_key)
    if application_credentials is not None:
        return GcsStaticCredentials.ApplicationCredentials(application_credentials)
    if bearer_token is not None:
        return GcsStaticCredentials.BearerToken(bearer_token)
    raise ValueError("Conflicting arguments to gcs_static_credentials function")

gcs_storage #

gcs_storage(*, bucket, prefix, service_account_file=None, service_account_key=None, application_credentials=None, bearer_token=None, anonymous=None, from_env=None, config=None, get_credentials=None, scatter_initial_credentials=False)

Create a Storage instance that saves data in Google Cloud Storage object store.

Parameters:

Name Type Description Default
bucket str

The bucket where the repository will store its data

required
prefix str | None

The prefix within the bucket that is the root directory of the repository

required
service_account_file str | None

The path to the service account file

None
service_account_key str | None

The service account key

None
application_credentials str | None

The path to the application credentials file

None
bearer_token str | None

The bearer token to use for the object store

None
anonymous bool | None

If set to True requests to the object store will not be signed

None
from_env bool | None

Fetch credentials from the operative system environment

None
config dict[str, str] | None

A dictionary of options for the Google Cloud Storage object store. See https://docs.rs/object_store/latest/object_store/gcp/enum.GoogleConfigKey.html#variants for a list of possible configuration keys.

None
get_credentials Callable[[], GcsBearerCredential] | None

Use this function to get and refresh object store credentials

None
scatter_initial_credentials bool

Immediately call and store the value returned by get_credentials. This is useful if the repo or session will be pickled to generate many copies. Passing scatter_initial_credentials=True will ensure all those copies don't need to call get_credentials immediately. After the initial set of credentials has expired, the cached value is no longer used. Notice that credentials obtained are stored, and they can be sent over the network if you pickle the session/repo.

False
Source code in icechunk-python/python/icechunk/storage.py
def gcs_storage(
    *,
    bucket: str,
    prefix: str | None,
    service_account_file: str | None = None,
    service_account_key: str | None = None,
    application_credentials: str | None = None,
    bearer_token: str | None = None,
    anonymous: bool | None = None,
    from_env: bool | None = None,
    config: dict[str, str] | None = None,
    get_credentials: Callable[[], GcsBearerCredential] | None = None,
    scatter_initial_credentials: bool = False,
) -> Storage:
    """Create a Storage instance that saves data in Google Cloud Storage object store.

    Parameters
    ----------
    bucket: str
        The bucket where the repository will store its data
    prefix: str | None
        The prefix within the bucket that is the root directory of the repository
    service_account_file: str | None
        The path to the service account file
    service_account_key: str | None
        The service account key
    application_credentials: str | None
        The path to the application credentials file
    bearer_token: str | None
        The bearer token to use for the object store
    anonymous: bool | None
        If set to True requests to the object store will not be signed
    from_env: bool | None
        Fetch credentials from the operative system environment
    config: dict[str, str] | None
        A dictionary of options for the Google Cloud Storage object store. See https://docs.rs/object_store/latest/object_store/gcp/enum.GoogleConfigKey.html#variants for a list of possible configuration keys.
    get_credentials: Callable[[], GcsBearerCredential] | None
        Use this function to get and refresh object store credentials
    scatter_initial_credentials: bool, optional
        Immediately call and store the value returned by get_credentials. This is useful if the
        repo or session will be pickled to generate many copies. Passing scatter_initial_credentials=True will
        ensure all those copies don't need to call get_credentials immediately. After the initial
        set of credentials has expired, the cached value is no longer used. Notice that credentials
        obtained are stored, and they can be sent over the network if you pickle the session/repo.
    """
    credentials = gcs_credentials(
        service_account_file=service_account_file,
        service_account_key=service_account_key,
        application_credentials=application_credentials,
        bearer_token=bearer_token,
        from_env=from_env,
        anonymous=anonymous,
        get_credentials=get_credentials,
        scatter_initial_credentials=scatter_initial_credentials,
    )
    return Storage.new_gcs(
        bucket=bucket,
        prefix=prefix,
        credentials=credentials,
        config=config,
    )

gcs_store #

gcs_store(opts=None)

Build an ObjectStoreConfig instance for Google Cloud Storage object stores.

Parameters:

Name Type Description Default
opts dict[str, str] | None

A dictionary of options for the Google Cloud Storage object store. See https://docs.rs/object_store/latest/object_store/gcp/enum.GoogleConfigKey.html#variants for a list of possible configuration keys.

None
Source code in icechunk-python/python/icechunk/storage.py
def gcs_store(
    opts: dict[str, str] | None = None,
) -> ObjectStoreConfig.Gcs:
    """Build an ObjectStoreConfig instance for Google Cloud Storage object stores.

    Parameters
    ----------
    opts: dict[str, str] | None
        A dictionary of options for the Google Cloud Storage object store. See https://docs.rs/object_store/latest/object_store/gcp/enum.GoogleConfigKey.html#variants for a list of possible configuration keys.
    """
    return ObjectStoreConfig.Gcs(opts)

http_storage #

http_storage(base_url, opts=None)

Create a read-only Storage instance that reads data from an HTTP(s) server

Parameters:

Name Type Description Default
base_url str

The URL path to the root of the repository

required
opts dict[str, str] | None

A dictionary of options for the HTTP object store. See https://docs.rs/object_store/latest/object_store/client/enum.ClientConfigKey.html#variants for a list of possible keys in snake case format.

None
Source code in icechunk-python/python/icechunk/storage.py
def http_storage(base_url: str, opts: dict[str, str] | None = None) -> Storage:
    """Create a read-only Storage instance that reads data from an HTTP(s) server

    Parameters
    ----------
    base_url: str
        The URL path to the root of the repository
    opts: dict[str, str] | None
        A dictionary of options for the HTTP object store. See https://docs.rs/object_store/latest/object_store/client/enum.ClientConfigKey.html#variants for a list of possible keys in snake case format.
    """
    return Storage.new_http(base_url, opts)

http_store #

http_store(opts=None)

Build an ObjectStoreConfig instance for HTTP object stores.

Parameters:

Name Type Description Default
opts dict[str, str] | None

A dictionary of options for the HTTP object store. See https://docs.rs/object_store/latest/object_store/client/enum.ClientConfigKey.html#variants for a list of possible keys in snake case format.

None
Source code in icechunk-python/python/icechunk/storage.py
def http_store(
    opts: dict[str, str] | None = None,
) -> ObjectStoreConfig.Http:
    """Build an ObjectStoreConfig instance for HTTP object stores.

    Parameters
    ----------
    opts: dict[str, str] | None
        A dictionary of options for the HTTP object store. See https://docs.rs/object_store/latest/object_store/client/enum.ClientConfigKey.html#variants for a list of possible keys in snake case format.
    """
    return ObjectStoreConfig.Http(opts)

in_memory_storage #

in_memory_storage()

Create a Storage instance that saves data in memory.

This Storage implementation is used for tests. Data will be lost after the process finishes, and can only be accesses through the Storage instance returned. Different instances don't share data.

Source code in icechunk-python/python/icechunk/storage.py
def in_memory_storage() -> Storage:
    """Create a Storage instance that saves data in memory.

    This Storage implementation is used for tests. Data will be lost after the process finishes, and can only be accesses through the Storage instance returned. Different instances don't share data."""
    return Storage.new_in_memory()

initialize_logs #

initialize_logs()

Initialize the logging system for the library.

Reads the value of the environment variable ICECHUNK_LOG to obtain the filters. This is autamtically called on import icechunk.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
def initialize_logs() -> None:
    """
    Initialize the logging system for the library.

    Reads the value of the environment variable ICECHUNK_LOG to obtain the filters.
    This is autamtically called on `import icechunk`.
    """
    ...

local_filesystem_storage #

local_filesystem_storage(path)

Create a Storage instance that saves data in the local file system.

This Storage instance is not recommended for production data

Source code in icechunk-python/python/icechunk/storage.py
def local_filesystem_storage(path: str) -> Storage:
    """Create a Storage instance that saves data in the local file system.

    This Storage instance is not recommended for production data
    """
    return Storage.new_local_filesystem(path)

local_filesystem_store #

local_filesystem_store(path)

Build an ObjectStoreConfig instance for local file stores.

Parameters:

Name Type Description Default
path str

The root directory for the store.

required
Source code in icechunk-python/python/icechunk/storage.py
def local_filesystem_store(
    path: str,
) -> ObjectStoreConfig.LocalFileSystem:
    """Build an ObjectStoreConfig instance for local file stores.

    Parameters
    ----------
    path: str
        The root directory for the store.
    """
    return ObjectStoreConfig.LocalFileSystem(path)

r2_storage #

r2_storage(*, bucket=None, prefix=None, account_id=None, endpoint_url=None, region=None, allow_http=False, access_key_id=None, secret_access_key=None, session_token=None, expires_after=None, anonymous=None, from_env=None, get_credentials=None, scatter_initial_credentials=False, network_stream_timeout_seconds=60)

Create a Storage instance that saves data in Tigris object store.

Parameters:

Name Type Description Default
bucket str | None

The bucket name

None
prefix str | None

The prefix within the bucket that is the root directory of the repository

None
account_id str | None

Cloudflare account ID. When provided, a default endpoint URL is constructed as https://<ACCOUNT_ID>.r2.cloudflarestorage.com. If not provided, endpoint_url must be provided instead.

None
endpoint_url str | None

Endpoint where the object store serves data, example: https://<ACCOUNT_ID>.r2.cloudflarestorage.com

None
region str | None

The region to use in the object store, if None the default region 'auto' will be used

None
allow_http bool

If the object store can be accessed using http protocol instead of https

False
access_key_id str | None

S3 credential access key

None
secret_access_key str | None

S3 credential secret access key

None
session_token str | None

Optional S3 credential session token

None
expires_after datetime | None

Optional expiration for the object store credentials

None
anonymous bool | None

If set to True requests to the object store will not be signed

None
from_env bool | None

Fetch credentials from the operative system environment

None
get_credentials Callable[[], S3StaticCredentials] | None

Use this function to get and refresh object store credentials

None
scatter_initial_credentials bool

Immediately call and store the value returned by get_credentials. This is useful if the repo or session will be pickled to generate many copies. Passing scatter_initial_credentials=True will ensure all those copies don't need to call get_credentials immediately. After the initial set of credentials has expired, the cached value is no longer used. Notice that credentials obtained are stored, and they can be sent over the network if you pickle the session/repo.

False
network_stream_timeout_seconds int

Timeout requests if no bytes can be transmitted during this period of time. If set to 0, timeout is disabled.

60
Source code in icechunk-python/python/icechunk/storage.py
def r2_storage(
    *,
    bucket: str | None = None,
    prefix: str | None = None,
    account_id: str | None = None,
    endpoint_url: str | None = None,
    region: str | None = None,
    allow_http: bool = False,
    access_key_id: str | None = None,
    secret_access_key: str | None = None,
    session_token: str | None = None,
    expires_after: datetime | None = None,
    anonymous: bool | None = None,
    from_env: bool | None = None,
    get_credentials: Callable[[], S3StaticCredentials] | None = None,
    scatter_initial_credentials: bool = False,
    network_stream_timeout_seconds: int = 60,
) -> Storage:
    """Create a Storage instance that saves data in Tigris object store.

    Parameters
    ----------
    bucket: str | None
        The bucket name
    prefix: str | None
        The prefix within the bucket that is the root directory of the repository
    account_id: str | None
        Cloudflare account ID. When provided, a default endpoint URL is constructed as
        `https://<ACCOUNT_ID>.r2.cloudflarestorage.com`. If not provided, `endpoint_url`
        must be provided instead.
    endpoint_url: str | None
        Endpoint where the object store serves data, example: `https://<ACCOUNT_ID>.r2.cloudflarestorage.com`
    region: str | None
        The region to use in the object store, if `None` the default region 'auto' will be used
    allow_http: bool
        If the object store can be accessed using http protocol instead of https
    access_key_id: str | None
        S3 credential access key
    secret_access_key: str | None
        S3 credential secret access key
    session_token: str | None
        Optional S3 credential session token
    expires_after: datetime | None
        Optional expiration for the object store credentials
    anonymous: bool | None
        If set to True requests to the object store will not be signed
    from_env: bool | None
        Fetch credentials from the operative system environment
    get_credentials: Callable[[], S3StaticCredentials] | None
        Use this function to get and refresh object store credentials
    scatter_initial_credentials: bool, optional
        Immediately call and store the value returned by get_credentials. This is useful if the
        repo or session will be pickled to generate many copies. Passing scatter_initial_credentials=True will
        ensure all those copies don't need to call get_credentials immediately. After the initial
        set of credentials has expired, the cached value is no longer used. Notice that credentials
        obtained are stored, and they can be sent over the network if you pickle the session/repo.
    network_stream_timeout_seconds: int
        Timeout requests if no bytes can be transmitted during this period of time.
        If set to 0, timeout is disabled.
    """
    credentials = s3_credentials(
        access_key_id=access_key_id,
        secret_access_key=secret_access_key,
        session_token=session_token,
        expires_after=expires_after,
        anonymous=anonymous,
        from_env=from_env,
        get_credentials=get_credentials,
        scatter_initial_credentials=scatter_initial_credentials,
    )
    options = S3Options(
        region=region,
        endpoint_url=endpoint_url,
        allow_http=allow_http,
        network_stream_timeout_seconds=network_stream_timeout_seconds,
        anonymous=anonymous or False,
    )
    return Storage.new_r2(
        config=options,
        bucket=bucket,
        prefix=prefix,
        account_id=account_id,
        credentials=credentials,
    )

s3_anonymous_credentials #

s3_anonymous_credentials()

Create no-signature credentials for S3 and S3 compatible object stores.

Source code in icechunk-python/python/icechunk/credentials.py
def s3_anonymous_credentials() -> S3Credentials.Anonymous:
    """Create no-signature credentials for S3 and S3 compatible object stores."""
    return S3Credentials.Anonymous()

s3_credentials #

s3_credentials(*, access_key_id=None, secret_access_key=None, session_token=None, expires_after=None, anonymous=None, from_env=None, get_credentials=None, scatter_initial_credentials=False)

Create credentials for S3 and S3 compatible object stores.

If all arguments are None, credentials are fetched from the environment.

Parameters:

Name Type Description Default
access_key_id str | None

S3 credential access key

None
secret_access_key str | None

S3 credential secret access key

None
session_token str | None

Optional S3 credential session token

None
expires_after datetime | None

Optional expiration for the object store credentials

None
anonymous bool | None

If set to True requests to the object store will not be signed

None
from_env bool | None

Fetch credentials from the operative system environment

None
get_credentials Callable[[], S3StaticCredentials] | None

Use this function to get and refresh object store credentials

None
scatter_initial_credentials bool

Immediately call and store the value returned by get_credentials. This is useful if the repo or session will be pickled to generate many copies. Passing scatter_initial_credentials=True will ensure all those copies don't need to call get_credentials immediately. After the initial set of credentials has expired, the cached value is no longer used. Notice that credentials obtained are stored, and they can be sent over the network if you pickle the session/repo.

False
Source code in icechunk-python/python/icechunk/credentials.py
def s3_credentials(
    *,
    access_key_id: str | None = None,
    secret_access_key: str | None = None,
    session_token: str | None = None,
    expires_after: datetime | None = None,
    anonymous: bool | None = None,
    from_env: bool | None = None,
    get_credentials: Callable[[], S3StaticCredentials] | None = None,
    scatter_initial_credentials: bool = False,
) -> AnyS3Credential:
    """Create credentials for S3 and S3 compatible object stores.

    If all arguments are None, credentials are fetched from the environment.

    Parameters
    ----------
    access_key_id: str | None
        S3 credential access key
    secret_access_key: str | None
        S3 credential secret access key
    session_token: str | None
        Optional S3 credential session token
    expires_after: datetime | None
        Optional expiration for the object store credentials
    anonymous: bool | None
        If set to True requests to the object store will not be signed
    from_env: bool | None
        Fetch credentials from the operative system environment
    get_credentials: Callable[[], S3StaticCredentials] | None
        Use this function to get and refresh object store credentials
    scatter_initial_credentials: bool, optional
        Immediately call and store the value returned by get_credentials. This is useful if the
        repo or session will be pickled to generate many copies. Passing scatter_initial_credentials=True will
        ensure all those copies don't need to call get_credentials immediately. After the initial
        set of credentials has expired, the cached value is no longer used. Notice that credentials
        obtained are stored, and they can be sent over the network if you pickle the session/repo.
    """
    if (
        (from_env is None or from_env)
        and access_key_id is None
        and secret_access_key is None
        and session_token is None
        and expires_after is None
        and not anonymous
        and get_credentials is None
    ):
        return s3_from_env_credentials()

    if (
        anonymous
        and access_key_id is None
        and secret_access_key is None
        and session_token is None
        and expires_after is None
        and not from_env
        and get_credentials is None
    ):
        return s3_anonymous_credentials()

    if (
        get_credentials is not None
        and access_key_id is None
        and secret_access_key is None
        and session_token is None
        and expires_after is None
        and not from_env
        and not anonymous
    ):
        return s3_refreshable_credentials(
            get_credentials, scatter_initial_credentials=scatter_initial_credentials
        )

    if (
        access_key_id
        and secret_access_key
        and not from_env
        and not anonymous
        and get_credentials is None
    ):
        return s3_static_credentials(
            access_key_id=access_key_id,
            secret_access_key=secret_access_key,
            session_token=session_token,
            expires_after=expires_after,
        )

    raise ValueError("Conflicting arguments to s3_credentials function")

s3_from_env_credentials #

s3_from_env_credentials()

Instruct S3 and S3 compatible object stores to gather credentials from the operative system environment.

Source code in icechunk-python/python/icechunk/credentials.py
def s3_from_env_credentials() -> S3Credentials.FromEnv:
    """Instruct S3 and S3 compatible object stores to gather credentials from the operative system environment."""
    return S3Credentials.FromEnv()

s3_refreshable_credentials #

s3_refreshable_credentials(get_credentials, scatter_initial_credentials=False)

Create refreshable credentials for S3 and S3 compatible object stores.

Parameters:

Name Type Description Default
get_credentials Callable[[], S3StaticCredentials]

Use this function to get and refresh the credentials. The function must be pickable.

required
scatter_initial_credentials bool

Immediately call and store the value returned by get_credentials. This is useful if the repo or session will be pickled to generate many copies. Passing scatter_initial_credentials=True will ensure all those copies don't need to call get_credentials immediately. After the initial set of credentials has expired, the cached value is no longer used. Notice that credentials obtained are stored, and they can be sent over the network if you pickle the session/repo.

False
Source code in icechunk-python/python/icechunk/credentials.py
def s3_refreshable_credentials(
    get_credentials: Callable[[], S3StaticCredentials],
    scatter_initial_credentials: bool = False,
) -> S3Credentials.Refreshable:
    """Create refreshable credentials for S3 and S3 compatible object stores.

    Parameters
    ----------
    get_credentials: Callable[[], S3StaticCredentials]
        Use this function to get and refresh the credentials. The function must be pickable.
    scatter_initial_credentials: bool, optional
        Immediately call and store the value returned by get_credentials. This is useful if the
        repo or session will be pickled to generate many copies. Passing scatter_initial_credentials=True will
        ensure all those copies don't need to call get_credentials immediately. After the initial
        set of credentials has expired, the cached value is no longer used. Notice that credentials
        obtained are stored, and they can be sent over the network if you pickle the session/repo.
    """
    current = get_credentials() if scatter_initial_credentials else None
    return S3Credentials.Refreshable(pickle.dumps(get_credentials), current)

s3_static_credentials #

s3_static_credentials(*, access_key_id, secret_access_key, session_token=None, expires_after=None)

Create static credentials for S3 and S3 compatible object stores.

Parameters:

Name Type Description Default
access_key_id str

S3 credential access key

required
secret_access_key str

S3 credential secret access key

required
session_token str | None

Optional S3 credential session token

None
expires_after datetime | None

Optional expiration for the object store credentials

None
Source code in icechunk-python/python/icechunk/credentials.py
def s3_static_credentials(
    *,
    access_key_id: str,
    secret_access_key: str,
    session_token: str | None = None,
    expires_after: datetime | None = None,
) -> S3Credentials.Static:
    """Create static credentials for S3 and S3 compatible object stores.

    Parameters
    ----------
    access_key_id: str | None
        S3 credential access key
    secret_access_key: str | None
        S3 credential secret access key
    session_token: str | None
        Optional S3 credential session token
    expires_after: datetime | None
        Optional expiration for the object store credentials
    """
    return S3Credentials.Static(
        S3StaticCredentials(
            access_key_id=access_key_id,
            secret_access_key=secret_access_key,
            session_token=session_token,
            expires_after=expires_after,
        )
    )

s3_storage #

s3_storage(*, bucket, prefix, region=None, endpoint_url=None, allow_http=False, access_key_id=None, secret_access_key=None, session_token=None, expires_after=None, anonymous=None, from_env=None, get_credentials=None, scatter_initial_credentials=False, force_path_style=False, network_stream_timeout_seconds=60, requester_pays=False)

Create a Storage instance that saves data in S3 or S3 compatible object stores.

Parameters:

Name Type Description Default
bucket str

The bucket where the repository will store its data

required
prefix str | None

The prefix within the bucket that is the root directory of the repository

required
region str | None

The region to use in the object store, if None a default region will be used

None
endpoint_url str | None

Optional endpoint where the object store serves data, example: http://localhost:9000

None
allow_http bool

If the object store can be accessed using http protocol instead of https

False
access_key_id str | None

S3 credential access key

None
secret_access_key str | None

S3 credential secret access key

None
session_token str | None

Optional S3 credential session token

None
expires_after datetime | None

Optional expiration for the object store credentials

None
anonymous bool | None

If set to True requests to the object store will not be signed

None
from_env bool | None

Fetch credentials from the operative system environment

None
get_credentials Callable[[], S3StaticCredentials] | None

Use this function to get and refresh object store credentials

None
scatter_initial_credentials bool

Immediately call and store the value returned by get_credentials. This is useful if the repo or session will be pickled to generate many copies. Passing scatter_initial_credentials=True will ensure all those copies don't need to call get_credentials immediately. After the initial set of credentials has expired, the cached value is no longer used. Notice that credentials obtained are stored, and they can be sent over the network if you pickle the session/repo.

False
force_path_style bool

Whether to force using path-style addressing for buckets

False
network_stream_timeout_seconds int

Timeout requests if no bytes can be transmitted during this period of time. If set to 0, timeout is disabled.

60
requester_pays bool

Enable requester pays for S3 buckets

False
Source code in icechunk-python/python/icechunk/storage.py
def s3_storage(
    *,
    bucket: str,
    prefix: str | None,
    region: str | None = None,
    endpoint_url: str | None = None,
    allow_http: bool = False,
    access_key_id: str | None = None,
    secret_access_key: str | None = None,
    session_token: str | None = None,
    expires_after: datetime | None = None,
    anonymous: bool | None = None,
    from_env: bool | None = None,
    get_credentials: Callable[[], S3StaticCredentials] | None = None,
    scatter_initial_credentials: bool = False,
    force_path_style: bool = False,
    network_stream_timeout_seconds: int = 60,
    requester_pays: bool = False,
) -> Storage:
    """Create a Storage instance that saves data in S3 or S3 compatible object stores.

    Parameters
    ----------
    bucket: str
        The bucket where the repository will store its data
    prefix: str | None
        The prefix within the bucket that is the root directory of the repository
    region: str | None
        The region to use in the object store, if `None` a default region will be used
    endpoint_url: str | None
        Optional endpoint where the object store serves data, example: http://localhost:9000
    allow_http: bool
        If the object store can be accessed using http protocol instead of https
    access_key_id: str | None
        S3 credential access key
    secret_access_key: str | None
        S3 credential secret access key
    session_token: str | None
        Optional S3 credential session token
    expires_after: datetime | None
        Optional expiration for the object store credentials
    anonymous: bool | None
        If set to True requests to the object store will not be signed
    from_env: bool | None
        Fetch credentials from the operative system environment
    get_credentials: Callable[[], S3StaticCredentials] | None
        Use this function to get and refresh object store credentials
    scatter_initial_credentials: bool, optional
        Immediately call and store the value returned by get_credentials. This is useful if the
        repo or session will be pickled to generate many copies. Passing scatter_initial_credentials=True will
        ensure all those copies don't need to call get_credentials immediately. After the initial
        set of credentials has expired, the cached value is no longer used. Notice that credentials
        obtained are stored, and they can be sent over the network if you pickle the session/repo.
    force_path_style: bool
        Whether to force using path-style addressing for buckets
    network_stream_timeout_seconds: int
        Timeout requests if no bytes can be transmitted during this period of time.
        If set to 0, timeout is disabled.
    requester_pays: bool
        Enable requester pays for S3 buckets
    """

    credentials = s3_credentials(
        access_key_id=access_key_id,
        secret_access_key=secret_access_key,
        session_token=session_token,
        expires_after=expires_after,
        anonymous=anonymous,
        from_env=from_env,
        get_credentials=get_credentials,
        scatter_initial_credentials=scatter_initial_credentials,
    )
    options = S3Options(
        region=region,
        endpoint_url=endpoint_url,
        allow_http=allow_http,
        force_path_style=force_path_style,
        network_stream_timeout_seconds=network_stream_timeout_seconds,
        requester_pays=requester_pays,
        anonymous=anonymous or False,
    )
    return Storage.new_s3(
        config=options,
        bucket=bucket,
        prefix=prefix,
        credentials=credentials,
    )

s3_store #

s3_store(region=None, endpoint_url=None, allow_http=False, anonymous=False, s3_compatible=False, force_path_style=False, network_stream_timeout_seconds=60, requester_pays=False)

Build an ObjectStoreConfig instance for S3 or S3 compatible object stores.

Source code in icechunk-python/python/icechunk/storage.py
def s3_store(
    region: str | None = None,
    endpoint_url: str | None = None,
    allow_http: bool = False,
    anonymous: bool = False,
    s3_compatible: bool = False,
    force_path_style: bool = False,
    network_stream_timeout_seconds: int = 60,
    requester_pays: bool = False,
) -> ObjectStoreConfig.S3Compatible | ObjectStoreConfig.S3:
    """Build an ObjectStoreConfig instance for S3 or S3 compatible object stores."""

    options = S3Options(
        region=region,
        endpoint_url=endpoint_url,
        allow_http=allow_http,
        force_path_style=force_path_style,
        network_stream_timeout_seconds=network_stream_timeout_seconds,
        requester_pays=requester_pays,
        anonymous=anonymous,
    )
    return (
        ObjectStoreConfig.S3Compatible(options)
        if s3_compatible
        else ObjectStoreConfig.S3(options)
    )

set_logs_filter #

set_logs_filter(log_filter_directive)

Set filters and log levels for the different modules.

Examples: - set_logs_filter("trace") # trace level for all modules - set_logs_filter("error") # error level for all modules - set_logs_filter("icechunk=debug,info") # debug level for icechunk, info for everything else

Full spec for the log_filter_directive syntax is documented in https://docs.rs/tracing-subscriber/latest/tracing_subscriber/filter/struct.EnvFilter.html#directives

Parameters:

Name Type Description Default
log_filter_directive str | None

The comma separated list of directives for modules and log levels. If None, the directive will be read from the environment variable ICECHUNK_LOG

required
Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
def set_logs_filter(log_filter_directive: str | None) -> None:
    """
    Set filters and log levels for the different modules.

    Examples:
      - set_logs_filter("trace")  # trace level for all modules
      - set_logs_filter("error")  # error level for all modules
      - set_logs_filter("icechunk=debug,info")  # debug level for icechunk, info for everything else

    Full spec for the log_filter_directive syntax is documented in
    https://docs.rs/tracing-subscriber/latest/tracing_subscriber/filter/struct.EnvFilter.html#directives

    Parameters
    ----------
    log_filter_directive: str | None
        The comma separated list of directives for modules and log levels.
        If None, the directive will be read from the environment variable
        ICECHUNK_LOG
    """
    ...

spec_version #

spec_version()

The version of the Icechunk specification that the library is compatible with.

Returns: int: The version of the Icechunk specification that the library is compatible with

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi
def spec_version() -> int:
    """
    The version of the Icechunk specification that the library is compatible with.

    Returns:
        int: The version of the Icechunk specification that the library is compatible with
    """
    ...

tigris_storage #

tigris_storage(*, bucket, prefix, region=None, endpoint_url=None, use_weak_consistency=False, allow_http=False, access_key_id=None, secret_access_key=None, session_token=None, expires_after=None, anonymous=None, from_env=None, get_credentials=None, scatter_initial_credentials=False, network_stream_timeout_seconds=60)

Create a Storage instance that saves data in Tigris object store.

Parameters:

Name Type Description Default
bucket str

The bucket where the repository will store its data

required
prefix str | None

The prefix within the bucket that is the root directory of the repository

required
region str | None

The region to use in the object store, if None a default region will be used

None
endpoint_url str | None

Optional endpoint where the object store serves data, example: http://localhost:9000

None
use_weak_consistency bool

If set to True it will return a Storage instance that is read only, and can read from the the closest Tigris region. Behavior is undefined if objects haven't propagated to the region yet. This option is for experts only.

False
allow_http bool

If the object store can be accessed using http protocol instead of https

False
access_key_id str | None

S3 credential access key

None
secret_access_key str | None

S3 credential secret access key

None
session_token str | None

Optional S3 credential session token

None
expires_after datetime | None

Optional expiration for the object store credentials

None
anonymous bool | None

If set to True requests to the object store will not be signed

None
from_env bool | None

Fetch credentials from the operative system environment

None
get_credentials Callable[[], S3StaticCredentials] | None

Use this function to get and refresh object store credentials

None
scatter_initial_credentials bool

Immediately call and store the value returned by get_credentials. This is useful if the repo or session will be pickled to generate many copies. Passing scatter_initial_credentials=True will ensure all those copies don't need to call get_credentials immediately. After the initial set of credentials has expired, the cached value is no longer used. Notice that credentials obtained are stored, and they can be sent over the network if you pickle the session/repo.

False
network_stream_timeout_seconds int

Timeout requests if no bytes can be transmitted during this period of time. If set to 0, timeout is disabled.

60
Source code in icechunk-python/python/icechunk/storage.py
def tigris_storage(
    *,
    bucket: str,
    prefix: str | None,
    region: str | None = None,
    endpoint_url: str | None = None,
    use_weak_consistency: bool = False,
    allow_http: bool = False,
    access_key_id: str | None = None,
    secret_access_key: str | None = None,
    session_token: str | None = None,
    expires_after: datetime | None = None,
    anonymous: bool | None = None,
    from_env: bool | None = None,
    get_credentials: Callable[[], S3StaticCredentials] | None = None,
    scatter_initial_credentials: bool = False,
    network_stream_timeout_seconds: int = 60,
) -> Storage:
    """Create a Storage instance that saves data in Tigris object store.

    Parameters
    ----------
    bucket: str
        The bucket where the repository will store its data
    prefix: str | None
        The prefix within the bucket that is the root directory of the repository
    region: str | None
        The region to use in the object store, if `None` a default region will be used
    endpoint_url: str | None
        Optional endpoint where the object store serves data, example: http://localhost:9000
    use_weak_consistency: bool
        If set to True it will return a Storage instance that is read only, and can read from the
        the closest Tigris region. Behavior is undefined if objects haven't propagated to the region yet.
        This option is for experts only.
    allow_http: bool
        If the object store can be accessed using http protocol instead of https
    access_key_id: str | None
        S3 credential access key
    secret_access_key: str | None
        S3 credential secret access key
    session_token: str | None
        Optional S3 credential session token
    expires_after: datetime | None
        Optional expiration for the object store credentials
    anonymous: bool | None
        If set to True requests to the object store will not be signed
    from_env: bool | None
        Fetch credentials from the operative system environment
    get_credentials: Callable[[], S3StaticCredentials] | None
        Use this function to get and refresh object store credentials
    scatter_initial_credentials: bool, optional
        Immediately call and store the value returned by get_credentials. This is useful if the
        repo or session will be pickled to generate many copies. Passing scatter_initial_credentials=True will
        ensure all those copies don't need to call get_credentials immediately. After the initial
        set of credentials has expired, the cached value is no longer used. Notice that credentials
        obtained are stored, and they can be sent over the network if you pickle the session/repo.
    network_stream_timeout_seconds: int
        Timeout requests if no bytes can be transmitted during this period of time.
        If set to 0, timeout is disabled.
    """
    credentials = s3_credentials(
        access_key_id=access_key_id,
        secret_access_key=secret_access_key,
        session_token=session_token,
        expires_after=expires_after,
        anonymous=anonymous,
        from_env=from_env,
        get_credentials=get_credentials,
        scatter_initial_credentials=scatter_initial_credentials,
    )
    options = S3Options(
        region=region,
        endpoint_url=endpoint_url,
        allow_http=allow_http,
        network_stream_timeout_seconds=network_stream_timeout_seconds,
        anonymous=anonymous or False,
    )
    return Storage.new_tigris(
        config=options,
        bucket=bucket,
        prefix=prefix,
        use_weak_consistency=use_weak_consistency,
        credentials=credentials,
    )

icechunk.xarray #

Functions:

Name Description
to_icechunk

Write an Xarray object to a group of an Icechunk store.

to_icechunk #

to_icechunk(obj, session, *, group=None, mode=None, safe_chunks=True, align_chunks=False, append_dim=None, region=None, encoding=None, chunkmanager_store_kwargs=None, split_every=None)

Write an Xarray object to a group of an Icechunk store.

Parameters:

Name Type Description Default
obj DataArray | Dataset

Xarray object to write

required
session Session

Writable Icechunk Session

required
mode "w", "w-", "a", "a-", r+", None

Persistence mode: "w" means create (overwrite if exists); "w-" means create (fail if exists); "a" means override all existing variables including dimension coordinates (create if does not exist); "a-" means only append those variables that have append_dim. "r+" means modify existing array values only (raise an error if any metadata or shapes would change). The default mode is "a" if append_dim is set. Otherwise, it is "r+" if region is set and w- otherwise.

"w"
group str

Group path. (a.k.a. path in zarr terminology.)

None
encoding dict

Nested dictionary with variable names as keys and dictionaries of variable specific encodings as values, e.g., {"my_variable": {"dtype": "int16", "scale_factor": 0.1,}, ...}

None
append_dim hashable

If set, the dimension along which the data will be appended. All other dimensions on overridden variables must remain the same size.

None
region dict or auto

Optional mapping from dimension names to either a) "auto", or b) integer slices, indicating the region of existing zarr array(s) in which to write this dataset's data.

If "auto" is provided the existing store will be opened and the region inferred by matching indexes. "auto" can be used as a single string, which will automatically infer the region for all dimensions, or as dictionary values for specific dimensions mixed together with explicit slices for other dimensions.

Alternatively integer slices can be provided; for example, {'x': slice(0, 1000), 'y': slice(10000, 11000)} would indicate that values should be written to the region 0:1000 along x and 10000:11000 along y.

Users are expected to ensure that the specified region aligns with Zarr chunk boundaries, and that dask chunks are also aligned. Xarray makes limited checks that these multiple chunk boundaries line up. It is possible to write incomplete chunks and corrupt the data with this option if you are not careful.

None
safe_chunks bool

If True, only allow writes to when there is a many-to-one relationship between Zarr chunks (specified in encoding) and Dask chunks. Set False to override this restriction; however, data may become corrupted if Zarr arrays are written in parallel. In addition to the many-to-one relationship validation, it also detects partial chunks writes when using the region parameter, these partial chunks are considered unsafe in the mode "r+" but safe in the mode "a". Note: Even with these validations it can still be unsafe to write two or more chunked arrays in the same location in parallel if they are not writing in independent regions.

True
align_chunks bool

If True, rechunks the Dask array to align with Zarr chunks before writing. This ensures each Dask chunk maps to one or more contiguous Zarr chunks, which avoids race conditions. Internally, the process sets safe_chunks=False and tries to preserve the original Dask chunking as much as possible. Note: While this alignment avoids write conflicts stemming from chunk boundary misalignment, it does not protect against race conditions if multiple uncoordinated processes write to the same Zarr array concurrently.

False
chunkmanager_store_kwargs dict

Additional keyword arguments passed on to the ChunkManager.store method used to store chunked arrays. For example for a dask array additional kwargs will be passed eventually to dask.array.store(). Experimental API that should not be relied upon.

None
split_every int | None

Number of tasks to merge at every level of the tree reduction.

None

Returns:

Type Description
None
Notes

Two restrictions apply to the use of region:

  • If region is set, all variables in a dataset must have at least one dimension in common with the region. Other variables should be written in a separate single call to to_icechunk().
  • Dimensions cannot be included in both region and append_dim at the same time. To create empty arrays to fill in with region, use the _XarrayDatasetWriter directly.
Source code in icechunk-python/python/icechunk/xarray.py
def to_icechunk(
    obj: DataArray | Dataset,
    session: Session,
    *,
    group: str | None = None,
    mode: ZarrWriteModes | None = None,
    safe_chunks: bool = True,
    align_chunks: bool = False,
    append_dim: Hashable | None = None,
    region: Region = None,
    encoding: Mapping[Any, Any] | None = None,
    chunkmanager_store_kwargs: MutableMapping[Any, Any] | None = None,
    split_every: int | None = None,
) -> None:
    """
    Write an Xarray object to a group of an Icechunk store.

    Parameters
    ----------
    obj: DataArray or Dataset
        Xarray object to write
    session : icechunk.Session
        Writable Icechunk Session
    mode : {"w", "w-", "a", "a-", r+", None}, optional
        Persistence mode: "w" means create (overwrite if exists);
        "w-" means create (fail if exists);
        "a" means override all existing variables including dimension coordinates (create if does not exist);
        "a-" means only append those variables that have ``append_dim``.
        "r+" means modify existing array *values* only (raise an error if
        any metadata or shapes would change).
        The default mode is "a" if ``append_dim`` is set. Otherwise, it is
        "r+" if ``region`` is set and ``w-`` otherwise.
    group : str, optional
        Group path. (a.k.a. `path` in zarr terminology.)
    encoding : dict, optional
        Nested dictionary with variable names as keys and dictionaries of
        variable specific encodings as values, e.g.,
        ``{"my_variable": {"dtype": "int16", "scale_factor": 0.1,}, ...}``
    append_dim : hashable, optional
        If set, the dimension along which the data will be appended. All
        other dimensions on overridden variables must remain the same size.
    region : dict or "auto", optional
        Optional mapping from dimension names to either a) ``"auto"``, or b) integer
        slices, indicating the region of existing zarr array(s) in which to write
        this dataset's data.

        If ``"auto"`` is provided the existing store will be opened and the region
        inferred by matching indexes. ``"auto"`` can be used as a single string,
        which will automatically infer the region for all dimensions, or as
        dictionary values for specific dimensions mixed together with explicit
        slices for other dimensions.

        Alternatively integer slices can be provided; for example, ``{'x': slice(0,
        1000), 'y': slice(10000, 11000)}`` would indicate that values should be
        written to the region ``0:1000`` along ``x`` and ``10000:11000`` along
        ``y``.

        Users are expected to ensure that the specified region aligns with
        Zarr chunk boundaries, and that dask chunks are also aligned.
        Xarray makes limited checks that these multiple chunk boundaries line up.
        It is possible to write incomplete chunks and corrupt the data with this
        option if you are not careful.
    safe_chunks : bool, default: True
        If True, only allow writes to when there is a many-to-one relationship
        between Zarr chunks (specified in encoding) and Dask chunks.
        Set False to override this restriction; however, data may become corrupted
        if Zarr arrays are written in parallel.
        In addition to the many-to-one relationship validation, it also detects partial
        chunks writes when using the region parameter,
        these partial chunks are considered unsafe in the mode "r+" but safe in
        the mode "a".
        Note: Even with these validations it can still be unsafe to write
        two or more chunked arrays in the same location in parallel if they are
        not writing in independent regions.
    align_chunks: bool, default False
        If True, rechunks the Dask array to align with Zarr chunks before writing.
        This ensures each Dask chunk maps to one or more contiguous Zarr chunks,
        which avoids race conditions.
        Internally, the process sets safe_chunks=False and tries to preserve
        the original Dask chunking as much as possible.
        Note: While this alignment avoids write conflicts stemming from chunk
        boundary misalignment, it does not protect against race conditions
        if multiple uncoordinated processes write to the same
        Zarr array concurrently.
    chunkmanager_store_kwargs : dict, optional
        Additional keyword arguments passed on to the `ChunkManager.store` method used to store
        chunked arrays. For example for a dask array additional kwargs will be passed eventually to
        `dask.array.store()`. Experimental API that should not be relied upon.
    split_every: int, optional
        Number of tasks to merge at every level of the tree reduction.

    Returns
    -------
    None

    Notes
    -----
    Two restrictions apply to the use of ``region``:

      - If ``region`` is set, _all_ variables in a dataset must have at
        least one dimension in common with the region. Other variables
        should be written in a separate single call to ``to_icechunk()``.
      - Dimensions cannot be included in both ``region`` and
        ``append_dim`` at the same time. To create empty arrays to fill
        in with ``region``, use the `_XarrayDatasetWriter` directly.
    """

    as_dataset = _make_dataset(obj)

    # This ugliness is needed so that we allow users to call `to_icechunk` with a dirty Session
    # for _serial_ writes
    is_dask = is_dask_collection(obj)
    fork: Session | ForkSession
    if is_dask:
        if session.has_uncommitted_changes:
            raise ValueError(
                "Calling `to_icechunk` is not allowed on a Session with uncommitted changes. Please commit first."
            )
        fork = session.fork()
    else:
        fork = session

    writer = _XarrayDatasetWriter(
        as_dataset, store=fork.store, safe_chunks=safe_chunks, align_chunks=align_chunks
    )

    writer._open_group(group=group, mode=mode, append_dim=append_dim, region=region)

    # write metadata
    writer.write_metadata(encoding)
    # write in-memory arrays
    writer.write_eager()
    # eagerly write dask arrays
    maybe_fork_session = writer.write_lazy(
        chunkmanager_store_kwargs=chunkmanager_store_kwargs,
        split_every=split_every,
    )
    if is_dask:
        if maybe_fork_session is None:
            raise RuntimeError(
                "Logic bug! Please open at issue at https://github.com/earth-mover/icechunk"
            )
        session.merge(maybe_fork_session)
    else:
        if maybe_fork_session is not None:
            raise RuntimeError(
                "Unexpected write of dask arrays! Please open at issue at https://github.com/earth-mover/icechunk"
            )

icechunk.dask #

Functions:

Name Description
computing_meta

A decorator to handle the dask-specific computing_meta flag.

store_dask

A version of dask.array.store for Icechunk stores.

computing_meta #

computing_meta(func)

A decorator to handle the dask-specific computing_meta flag.

If computing_meta is True in the keyword arguments, the decorated function will return a placeholder meta object (np.array([object()], dtype=object)). Otherwise, it will execute the original function.

Source code in icechunk-python/python/icechunk/dask.py
def computing_meta(func: Callable[P, R]) -> Callable[P, Any]:
    """
    A decorator to handle the dask-specific `computing_meta` flag.

    If `computing_meta` is True in the keyword arguments, the decorated
    function will return a placeholder meta object (np.array([object()], dtype=object)).
    Otherwise, it will execute the original function.
    """

    @functools.wraps(func)
    def wrapper(*args: P.args, **kwargs: P.kwargs) -> Any:
        if kwargs.get("computing_meta", False):
            return np.array([object()], dtype=object)
        return func(*args, **kwargs)

    return wrapper

store_dask #

store_dask(*, sources, targets, regions=None, split_every=None, **store_kwargs)

A version of dask.array.store for Icechunk stores.

This method will eagerly execute writes to the Icechunk store, and will merge the changesets corresponding to each write task. The store object passed in will be updated in-place with the fully merged changeset.

For distributed or multi-processing writes, this method must be called within the Session.allow_pickling() context. All Zarr arrays in targets must also be created within this context since they contain a reference to the Session.

Parameters:

Name Type Description Default
sources list[Array]

List of dask arrays to write.

required
targets list of `zarr.Array`

Corresponding list of Zarr array objects to write to.

required
regions list[tuple[slice, ...]] | None

Corresponding region for each of targets to write to.

None
split_every int | None

Number of changesets to merge at a given time.

None
**store_kwargs Any

Arbitrary keyword arguments passed to dask.array.store. Notably compute, return_stored, load_stored, and lock are unsupported.

{}
Source code in icechunk-python/python/icechunk/dask.py
def store_dask(
    *,
    sources: list[Array],
    targets: "list[zarr.Array[ArrayV3Metadata]]",
    regions: list[tuple[slice, ...]] | None = None,
    split_every: int | None = None,
    **store_kwargs: Any,
) -> ForkSession:
    """
    A version of ``dask.array.store`` for Icechunk stores.

    This method will eagerly execute writes to the Icechunk store, and will
    merge the changesets corresponding to each write task. The `store` object
    passed in will be updated in-place with the fully merged changeset.

    For distributed or multi-processing writes, this method must be called within
    the `Session.allow_pickling()` context. All Zarr arrays in `targets` must also
    be created within this context since they contain a reference to the Session.

    Parameters
    ----------
    sources: list of `dask.array.Array`
        List of dask arrays to write.
    targets : list of `zarr.Array`
        Corresponding list of Zarr array objects to write to.
    regions: list of tuple of slice, optional
        Corresponding region for each of `targets` to write to.
    split_every: int, optional
        Number of changesets to merge at a given time.
    **store_kwargs:
        Arbitrary keyword arguments passed to `dask.array.store`. Notably `compute`,
        `return_stored`, `load_stored`, and `lock` are unsupported.
    """
    _assert_correct_dask_version()
    stored_arrays = dask.array.store(
        sources=sources,
        targets=targets,  # type: ignore[arg-type]
        regions=regions,
        compute=False,
        return_stored=True,
        load_stored=False,
        lock=False,
        **store_kwargs,
    )
    return session_merge_reduction(stored_arrays, split_every=split_every, **store_kwargs)