Python API Reference

icechunk #

Modules:

Name	Description
`credentials`
`dask`
`distributed`
`repository`
`session`
`storage`
`store`
`xarray`

Classes:

Name	Description
`AzureCredentials`	Credentials for an azure storage backend
`AzureStaticCredentials`	Credentials for an azure storage backend
`BasicConflictSolver`	A basic conflict solver that allows for simple configuration of resolution behavior
`CachingConfig`	Configuration for how Icechunk caches its metadata files
`ChunkType`	Enum for Zarr chunk types
`CompressionAlgorithm`	Enum for selecting the compression algorithm used by Icechunk to write its metadata files
`CompressionConfig`	Configuration for how Icechunk compresses its metadata files
`Conflict`	A conflict detected between snapshots
`ConflictDetector`	A conflict solver that can be used to detect conflicts between two stores, but does not resolve them
`ConflictError`	An error that occurs when a conflict is detected
`ConflictSolver`	An abstract conflict solver that can be used to detect or resolve conflicts between two stores
`ConflictType`	Type of conflict detected
`Diff`	The result of comparing two snapshots
`ForkSession`
`GCSummary`	Summarizes the results of a garbage collection operation on an icechunk repo
`GcsBearerCredential`	Credentials for a google cloud storage backend
`GcsCredentials`	Credentials for a google cloud storage backend
`GcsStaticCredentials`	Credentials for a google cloud storage backend
`IcechunkError`	Base class for all Icechunk errors
`IcechunkStore`
`ManifestConfig`	Configuration for how Icechunk manifests
`ManifestFileInfo`	Manifest file metadata
`ManifestPreloadCondition`	Configuration for conditions under which manifests will preload on session creation
`ManifestPreloadConfig`	Configuration for how Icechunk manifest preload on session creation
`ManifestSplitCondition`	Configuration for conditions under which manifests will be split into splits
`ManifestSplitDimCondition`	Conditions for specifying dimensions along which to shard manifests.
`ManifestSplittingConfig`	Configuration for manifest splitting.
`RebaseFailedError`	An error that occurs when a rebase operation fails
`Repository`	An Icechunk repository.
`RepositoryConfig`	Configuration for an Icechunk repository
`S3Credentials`	Credentials for an S3 storage backend
`S3Options`	Options for accessing an S3-compatible storage backend
`S3StaticCredentials`	Credentials for an S3 storage backend
`Session`	A session object that allows for reading and writing data from an Icechunk repository.
`SessionMode`	Enum for session access modes
`SnapshotInfo`	Metadata for a snapshot
`Storage`	Storage configuration for an IcechunkStore
`StorageConcurrencySettings`	Configuration for how Icechunk uses its Storage instance
`StorageRetriesSettings`	Configuration for how Icechunk retries requests.
`StorageSettings`	Configuration for how Icechunk uses its Storage instance
`VersionSelection`	Enum for selecting the which version of a conflict
`VirtualChunkContainer`	A virtual chunk container is a configuration that allows Icechunk to read virtual references from a storage backend.
`VirtualChunkSpec`	The specification for a virtual chunk reference.

Functions:

Name	Description
`_upgrade_icechunk_repository`	Migrate a repository to the latest version of Icechunk.
`azure_credentials`	Create credentials Azure Blob Storage object store.
`azure_from_env_credentials`	Instruct Azure Blob Storage object store to fetch credentials from the operative system environment.
`azure_static_credentials`	Create static credentials Azure Blob Storage object store.
`azure_storage`	Create a Storage instance that saves data in Azure Blob Storage object store.
`containers_credentials`	Build a map of credentials for virtual chunk containers.
`gcs_credentials`	Create credentials Google Cloud Storage object store.
`gcs_from_env_credentials`	Instruct Google Cloud Storage object store to fetch credentials from the operative system environment.
`gcs_refreshable_credentials`	Create refreshable credentials for Google Cloud Storage object store.
`gcs_static_credentials`	Create static credentials Google Cloud Storage object store.
`gcs_storage`	Create a Storage instance that saves data in Google Cloud Storage object store.
`gcs_store`	Build an ObjectStoreConfig instance for Google Cloud Storage object stores.
`http_storage`	Create a read-only Storage instance that reads data from an HTTP(s) server
`http_store`	Build an ObjectStoreConfig instance for HTTP object stores.
`in_memory_storage`	Create a Storage instance that saves data in memory.
`initialize_logs`	Initialize the logging system for the library.
`local_filesystem_storage`	Create a Storage instance that saves data in the local file system.
`local_filesystem_store`	Build an ObjectStoreConfig instance for local file stores.
`r2_storage`	Create a Storage instance that saves data in Tigris object store.
`s3_anonymous_credentials`	Create no-signature credentials for S3 and S3 compatible object stores.
`s3_credentials`	Create credentials for S3 and S3 compatible object stores.
`s3_from_env_credentials`	Instruct S3 and S3 compatible object stores to gather credentials from the operative system environment.
`s3_refreshable_credentials`	Create refreshable credentials for S3 and S3 compatible object stores.
`s3_static_credentials`	Create static credentials for S3 and S3 compatible object stores.
`s3_storage`	Create a Storage instance that saves data in S3 or S3 compatible object stores.
`s3_store`	Build an ObjectStoreConfig instance for S3 or S3 compatible object stores.
`set_logs_filter`	Set filters and log levels for the different modules.
`spec_version`	The version of the Icechunk specification that the library is compatible with.
`tigris_storage`	Create a Storage instance that saves data in Tigris object store.

AzureCredentials #

Credentials for an azure storage backend

This can be used to authenticate with an azure storage backend.

Classes:

Name	Description
`FromEnv`	Uses credentials from environment variables
`Static`	Uses azure credentials without expiration

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class AzureCredentials:
    """Credentials for an azure storage backend

    This can be used to authenticate with an azure storage backend.
    """
    class FromEnv:
        """Uses credentials from environment variables"""
        def __init__(self) -> None: ...

    class Static:
        """Uses azure credentials without expiration"""
        def __init__(self, credentials: AnyAzureStaticCredential) -> None: ...

FromEnv #

Uses credentials from environment variables

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class FromEnv:
    """Uses credentials from environment variables"""
    def __init__(self) -> None: ...

Static #

Uses azure credentials without expiration

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class Static:
    """Uses azure credentials without expiration"""
    def __init__(self, credentials: AnyAzureStaticCredential) -> None: ...

AzureStaticCredentials #

Credentials for an azure storage backend

Classes:

Name	Description
`AccessKey`	Credentials for an azure storage backend using an access key
`BearerToken`	Credentials for an azure storage backend using a bearer token
`SasToken`	Credentials for an azure storage backend using a shared access signature token

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class AzureStaticCredentials:
    """Credentials for an azure storage backend"""
    class AccessKey:
        """Credentials for an azure storage backend using an access key

        Parameters
        ----------
        key: str
            The access key to use for authentication.
        """
        def __init__(self, key: str) -> None: ...

    class SasToken:
        """Credentials for an azure storage backend using a shared access signature token

        Parameters
        ----------
        token: str
            The shared access signature token to use for authentication.
        """
        def __init__(self, token: str) -> None: ...

    class BearerToken:
        """Credentials for an azure storage backend using a bearer token

        Parameters
        ----------
        token: str
            The bearer token to use for authentication.
        """
        def __init__(self, token: str) -> None: ...

AccessKey #

Credentials for an azure storage backend using an access key

Parameters:

Name	Type	Description	Default
`key`	`str`	The access key to use for authentication.	required

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class AccessKey:
    """Credentials for an azure storage backend using an access key

    Parameters
    ----------
    key: str
        The access key to use for authentication.
    """
    def __init__(self, key: str) -> None: ...

BearerToken #

Credentials for an azure storage backend using a bearer token

Parameters:

Name	Type	Description	Default
`token`	`str`	The bearer token to use for authentication.	required

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class BearerToken:
    """Credentials for an azure storage backend using a bearer token

    Parameters
    ----------
    token: str
        The bearer token to use for authentication.
    """
    def __init__(self, token: str) -> None: ...

SasToken #

Credentials for an azure storage backend using a shared access signature token

Parameters:

Name	Type	Description	Default
`token`	`str`	The shared access signature token to use for authentication.	required

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class SasToken:
    """Credentials for an azure storage backend using a shared access signature token

    Parameters
    ----------
    token: str
        The shared access signature token to use for authentication.
    """
    def __init__(self, token: str) -> None: ...

BasicConflictSolver #

Bases: ConflictSolver

A basic conflict solver that allows for simple configuration of resolution behavior

This conflict solver allows for simple configuration of resolution behavior for conflicts that may occur during a rebase operation. It will attempt to resolve a limited set of conflicts based on the configuration options provided.

When a chunk conflict is encountered, the behavior is determined by the on_chunk_conflict option
When an array is deleted that has been updated, fail_on_delete_of_updated_array will determine whether to fail the rebase operation
When a group is deleted that has been updated, fail_on_delete_of_updated_group will determine whether to fail the rebase operation

Methods:

Name	Description
`__init__`	Create a BasicConflictSolver object with the given configuration options

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class BasicConflictSolver(ConflictSolver):
    """A basic conflict solver that allows for simple configuration of resolution behavior

    This conflict solver allows for simple configuration of resolution behavior for conflicts that may occur during a rebase operation.
    It will attempt to resolve a limited set of conflicts based on the configuration options provided.

    - When a chunk conflict is encountered, the behavior is determined by the `on_chunk_conflict` option
    - When an array is deleted that has been updated, `fail_on_delete_of_updated_array` will determine whether to fail the rebase operation
    - When a group is deleted that has been updated, `fail_on_delete_of_updated_group` will determine whether to fail the rebase operation
    """

    def __init__(
        self,
        *,
        on_chunk_conflict: VersionSelection = VersionSelection.UseOurs,
        fail_on_delete_of_updated_array: bool = False,
        fail_on_delete_of_updated_group: bool = False,
    ) -> None:
        """Create a BasicConflictSolver object with the given configuration options

        Parameters
        ----------
        on_chunk_conflict: VersionSelection
            The behavior to use when a chunk conflict is encountered, by default VersionSelection.use_theirs()
        fail_on_delete_of_updated_array: bool
            Whether to fail when a chunk is deleted that has been updated, by default False
        fail_on_delete_of_updated_group: bool
            Whether to fail when a group is deleted that has been updated, by default False
        """
        ...

init #

__init__(*, on_chunk_conflict=VersionSelection.UseOurs, fail_on_delete_of_updated_array=False, fail_on_delete_of_updated_group=False)

Create a BasicConflictSolver object with the given configuration options

Parameters:

Name	Type	Description	Default
`on_chunk_conflict`	`VersionSelection`	The behavior to use when a chunk conflict is encountered, by default VersionSelection.use_theirs()	`UseOurs`
`fail_on_delete_of_updated_array`	`bool`	Whether to fail when a chunk is deleted that has been updated, by default False	`False`
`fail_on_delete_of_updated_group`	`bool`	Whether to fail when a group is deleted that has been updated, by default False	`False`

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

def __init__(
    self,
    *,
    on_chunk_conflict: VersionSelection = VersionSelection.UseOurs,
    fail_on_delete_of_updated_array: bool = False,
    fail_on_delete_of_updated_group: bool = False,
) -> None:
    """Create a BasicConflictSolver object with the given configuration options

    Parameters
    ----------
    on_chunk_conflict: VersionSelection
        The behavior to use when a chunk conflict is encountered, by default VersionSelection.use_theirs()
    fail_on_delete_of_updated_array: bool
        Whether to fail when a chunk is deleted that has been updated, by default False
    fail_on_delete_of_updated_group: bool
        Whether to fail when a group is deleted that has been updated, by default False
    """
    ...

CachingConfig #

Configuration for how Icechunk caches its metadata files

Methods:

Name	Description
`__init__`	Create a new `CachingConfig` object

Attributes:

Name	Type	Description
`num_bytes_attributes`	`int \| None`	The number of bytes of attributes to cache.
`num_bytes_chunks`	`int \| None`	The number of bytes of chunks to cache.
`num_chunk_refs`	`int \| None`	The number of chunk references to cache.
`num_snapshot_nodes`	`int \| None`	The number of snapshot nodes to cache.
`num_transaction_changes`	`int \| None`	The number of transaction changes to cache.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class CachingConfig:
    """Configuration for how Icechunk caches its metadata files"""

    def __init__(
        self,
        num_snapshot_nodes: int | None = None,
        num_chunk_refs: int | None = None,
        num_transaction_changes: int | None = None,
        num_bytes_attributes: int | None = None,
        num_bytes_chunks: int | None = None,
    ) -> None:
        """
        Create a new `CachingConfig` object

        Parameters
        ----------
        num_snapshot_nodes: int | None
            The number of snapshot nodes to cache.
        num_chunk_refs: int | None
            The number of chunk references to cache.
        num_transaction_changes: int | None
            The number of transaction changes to cache.
        num_bytes_attributes: int | None
            The number of bytes of attributes to cache.
        num_bytes_chunks: int | None
            The number of bytes of chunks to cache.
        """
    @property
    def num_snapshot_nodes(self) -> int | None:
        """
        The number of snapshot nodes to cache.

        Returns
        -------
        int | None
            The number of snapshot nodes to cache.
        """
        ...
    @num_snapshot_nodes.setter
    def num_snapshot_nodes(self, value: int | None) -> None:
        """
        Set the number of snapshot nodes to cache.

        Parameters
        ----------
        value: int | None
            The number of snapshot nodes to cache.
        """
        ...
    @property
    def num_chunk_refs(self) -> int | None:
        """
        The number of chunk references to cache.

        Returns
        -------
        int | None
            The number of chunk references to cache.
        """
        ...
    @num_chunk_refs.setter
    def num_chunk_refs(self, value: int | None) -> None:
        """
        Set the number of chunk references to cache.

        Parameters
        ----------
        value: int | None
            The number of chunk references to cache.
        """
        ...
    @property
    def num_transaction_changes(self) -> int | None:
        """
        The number of transaction changes to cache.

        Returns
        -------
        int | None
            The number of transaction changes to cache.
        """
        ...
    @num_transaction_changes.setter
    def num_transaction_changes(self, value: int | None) -> None:
        """
        Set the number of transaction changes to cache.

        Parameters
        ----------
        value: int | None
            The number of transaction changes to cache.
        """
        ...
    @property
    def num_bytes_attributes(self) -> int | None:
        """
        The number of bytes of attributes to cache.

        Returns
        -------
        int | None
            The number of bytes of attributes to cache.
        """
        ...
    @num_bytes_attributes.setter
    def num_bytes_attributes(self, value: int | None) -> None:
        """
        Set the number of bytes of attributes to cache.

        Parameters
        ----------
        value: int | None
            The number of bytes of attributes to cache.
        """
        ...
    @property
    def num_bytes_chunks(self) -> int | None:
        """
        The number of bytes of chunks to cache.

        Returns
        -------
        int | None
            The number of bytes of chunks to cache.
        """
        ...
    @num_bytes_chunks.setter
    def num_bytes_chunks(self, value: int | None) -> None:
        """
        Set the number of bytes of chunks to cache.

        Parameters
        ----------
        value: int | None
            The number of bytes of chunks to cache.
        """
        ...

num_bytes_attributes `property` `writable` #

num_bytes_attributes

The number of bytes of attributes to cache.

Returns:

Type	Description
`int \| None`	The number of bytes of attributes to cache.

num_bytes_chunks `property` `writable` #

num_bytes_chunks

The number of bytes of chunks to cache.

Returns:

Type	Description
`int \| None`	The number of bytes of chunks to cache.

num_chunk_refs `property` `writable` #

num_chunk_refs

The number of chunk references to cache.

Returns:

Type	Description
`int \| None`	The number of chunk references to cache.

num_snapshot_nodes `property` `writable` #

num_snapshot_nodes

The number of snapshot nodes to cache.

Returns:

Type	Description
`int \| None`	The number of snapshot nodes to cache.

num_transaction_changes `property` `writable` #

num_transaction_changes

The number of transaction changes to cache.

Returns:

Type	Description
`int \| None`	The number of transaction changes to cache.

init #

__init__(num_snapshot_nodes=None, num_chunk_refs=None, num_transaction_changes=None, num_bytes_attributes=None, num_bytes_chunks=None)

Create a new CachingConfig object

Parameters:

Name	Type	Description	Default
`num_snapshot_nodes`	`int \| None`	The number of snapshot nodes to cache.	`None`
`num_chunk_refs`	`int \| None`	The number of chunk references to cache.	`None`
`num_transaction_changes`	`int \| None`	The number of transaction changes to cache.	`None`
`num_bytes_attributes`	`int \| None`	The number of bytes of attributes to cache.	`None`
`num_bytes_chunks`	`int \| None`	The number of bytes of chunks to cache.	`None`

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

def __init__(
    self,
    num_snapshot_nodes: int | None = None,
    num_chunk_refs: int | None = None,
    num_transaction_changes: int | None = None,
    num_bytes_attributes: int | None = None,
    num_bytes_chunks: int | None = None,
) -> None:
    """
    Create a new `CachingConfig` object

    Parameters
    ----------
    num_snapshot_nodes: int | None
        The number of snapshot nodes to cache.
    num_chunk_refs: int | None
        The number of chunk references to cache.
    num_transaction_changes: int | None
        The number of transaction changes to cache.
    num_bytes_attributes: int | None
        The number of bytes of attributes to cache.
    num_bytes_chunks: int | None
        The number of bytes of chunks to cache.
    """

ChunkType #

Bases: Enum

Enum for Zarr chunk types

Attributes:

Name	Type	Description
`Uninitialized`	`int`	Chunk doesn't have a materialized type yet
`Native`	`int`	Regular Zarr chunks
`Virtual`	`int`	Chunk conforming to the VirtualiZarr spec
`Inline`	`int`	Chunk is store inline in the manifest

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class ChunkType(Enum):
    """Enum for Zarr chunk types

    Attributes
    ----------
    Uninitialized: int
        Chunk doesn't have a materialized type yet
    Native: int
        Regular Zarr chunks
    Virtual: int
        Chunk conforming to the VirtualiZarr spec
    Inline: int
        Chunk is store inline in the manifest
    """

    UNINITIALIZED = 0
    NATIVE = 1
    VIRTUAL = 2
    INLINE = 3

CompressionAlgorithm #

Bases: Enum

Enum for selecting the compression algorithm used by Icechunk to write its metadata files

Attributes:

Name	Type	Description
`Zstd`	`int`	The Zstd compression algorithm.

Methods:

Name	Description
`default`	The default compression algorithm used by Icechunk to write its metadata files.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class CompressionAlgorithm(Enum):
    """Enum for selecting the compression algorithm used by Icechunk to write its metadata files

    Attributes
    ----------
    Zstd: int
        The Zstd compression algorithm.
    """

    Zstd = 0

    def __init__(self) -> None: ...
    @staticmethod
    def default() -> CompressionAlgorithm:
        """
        The default compression algorithm used by Icechunk to write its metadata files.

        Returns
        -------
        CompressionAlgorithm
            The default compression algorithm.
        """
        ...

default `staticmethod` #

default()

The default compression algorithm used by Icechunk to write its metadata files.

Returns:

Type	Description
`CompressionAlgorithm`	The default compression algorithm.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

@staticmethod
def default() -> CompressionAlgorithm:
    """
    The default compression algorithm used by Icechunk to write its metadata files.

    Returns
    -------
    CompressionAlgorithm
        The default compression algorithm.
    """
    ...

CompressionConfig #

Configuration for how Icechunk compresses its metadata files

Methods:

Name	Description
`__init__`	Create a new `CompressionConfig` object
`default`	The default compression configuration used by Icechunk to write its metadata files.

Attributes:

Name	Type	Description
`algorithm`	`CompressionAlgorithm \| None`	The compression algorithm used by Icechunk to write its metadata files.
`level`	`int \| None`	The compression level used by Icechunk to write its metadata files.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class CompressionConfig:
    """Configuration for how Icechunk compresses its metadata files"""

    def __init__(
        self, algorithm: CompressionAlgorithm | None = None, level: int | None = None
    ) -> None:
        """
        Create a new `CompressionConfig` object

        Parameters
        ----------
        algorithm: CompressionAlgorithm | None
            The compression algorithm to use.
        level: int | None
            The compression level to use.
        """
        ...
    @property
    def algorithm(self) -> CompressionAlgorithm | None:
        """
        The compression algorithm used by Icechunk to write its metadata files.

        Returns
        -------
        CompressionAlgorithm | None
            The compression algorithm used by Icechunk to write its metadata files.
        """
        ...
    @algorithm.setter
    def algorithm(self, value: CompressionAlgorithm | None) -> None:
        """
        Set the compression algorithm used by Icechunk to write its metadata files.

        Parameters
        ----------
        value: CompressionAlgorithm | None
            The compression algorithm to use.
        """
        ...
    @property
    def level(self) -> int | None:
        """
        The compression level used by Icechunk to write its metadata files.

        Returns
        -------
        int | None
            The compression level used by Icechunk to write its metadata files.
        """
        ...
    @level.setter
    def level(self, value: int | None) -> None:
        """
        Set the compression level used by Icechunk to write its metadata files.

        Parameters
        ----------
        value: int | None
            The compression level to use.
        """
        ...
    @staticmethod
    def default() -> CompressionConfig:
        """
        The default compression configuration used by Icechunk to write its metadata files.

        Returns
        -------
        CompressionConfig
        """

algorithm `property` `writable` #

algorithm

The compression algorithm used by Icechunk to write its metadata files.

Returns:

Type	Description
`CompressionAlgorithm \| None`	The compression algorithm used by Icechunk to write its metadata files.

level `property` `writable` #

level

The compression level used by Icechunk to write its metadata files.

Returns:

Type	Description
`int \| None`	The compression level used by Icechunk to write its metadata files.

init #

__init__(algorithm=None, level=None)

Create a new CompressionConfig object

Parameters:

Name	Type	Description	Default
`algorithm`	`CompressionAlgorithm \| None`	The compression algorithm to use.	`None`
`level`	`int \| None`	The compression level to use.	`None`

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

def __init__(
    self, algorithm: CompressionAlgorithm | None = None, level: int | None = None
) -> None:
    """
    Create a new `CompressionConfig` object

    Parameters
    ----------
    algorithm: CompressionAlgorithm | None
        The compression algorithm to use.
    level: int | None
        The compression level to use.
    """
    ...

default `staticmethod` #

default()

The default compression configuration used by Icechunk to write its metadata files.

Returns:

Type	Description
`CompressionConfig`

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

@staticmethod
def default() -> CompressionConfig:
    """
    The default compression configuration used by Icechunk to write its metadata files.

    Returns
    -------
    CompressionConfig
    """

Conflict #

A conflict detected between snapshots

Methods:

Name	Description
`__init__`	Create a new Conflict.

Attributes:

Name	Type	Description
`conflict_type`	`ConflictType`	The type of conflict detected
`conflicted_chunks`	`list[list[int]] \| None`	If the conflict is a chunk conflict, this will return the list of chunk indices that are in conflict
`path`	`str`	The path of the node that caused the conflict

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class Conflict:
    """A conflict detected between snapshots"""

    def __init__(
        self,
        conflict_type: ConflictType,
        path: str,
        conflicted_chunks: list[list[int]] | None = None,
    ) -> None:
        """
        Create a new Conflict.

        Parameters
        ----------
        conflict_type: ConflictType
            The type of conflict.
        path: str
            The path of the node that caused the conflict.
        conflicted_chunks: list[list[int]] | None
            If the conflict is a chunk conflict, the list of chunk indices in conflict.
        """
        ...

    @property
    def conflict_type(self) -> ConflictType:
        """The type of conflict detected

        Returns:
            ConflictType: The type of conflict detected
        """
        ...

    @property
    def path(self) -> str:
        """The path of the node that caused the conflict

        Returns:
            str: The path of the node that caused the conflict
        """
        ...

    @property
    def conflicted_chunks(self) -> list[list[int]] | None:
        """If the conflict is a chunk conflict, this will return the list of chunk indices that are in conflict

        Returns:
            list[list[int]] | None: The list of chunk indices that are in conflict
        """
        ...

conflict_type `property` #

conflict_type

The type of conflict detected

Returns: ConflictType: The type of conflict detected

conflicted_chunks `property` #

conflicted_chunks

If the conflict is a chunk conflict, this will return the list of chunk indices that are in conflict

Returns: list[list[int]] | None: The list of chunk indices that are in conflict

path `property` #

path

The path of the node that caused the conflict

Returns: str: The path of the node that caused the conflict

init #

__init__(conflict_type, path, conflicted_chunks=None)

Create a new Conflict.

Parameters:

Name	Type	Description	Default
`conflict_type`	`ConflictType`	The type of conflict.	required
`path`	`str`	The path of the node that caused the conflict.	required
`conflicted_chunks`	`list[list[int]] \| None`	If the conflict is a chunk conflict, the list of chunk indices in conflict.	`None`

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

def __init__(
    self,
    conflict_type: ConflictType,
    path: str,
    conflicted_chunks: list[list[int]] | None = None,
) -> None:
    """
    Create a new Conflict.

    Parameters
    ----------
    conflict_type: ConflictType
        The type of conflict.
    path: str
        The path of the node that caused the conflict.
    conflicted_chunks: list[list[int]] | None
        If the conflict is a chunk conflict, the list of chunk indices in conflict.
    """
    ...

ConflictDetector #

Bases: ConflictSolver

A conflict solver that can be used to detect conflicts between two stores, but does not resolve them

Where the BasicConflictSolver will attempt to resolve conflicts, the ConflictDetector will only detect them. This means that during a rebase operation the ConflictDetector will raise a RebaseFailed error if any conflicts are detected, and allow the rebase operation to be retried with a different conflict resolution strategy. Otherwise, if no conflicts are detected the rebase operation will succeed.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class ConflictDetector(ConflictSolver):
    """A conflict solver that can be used to detect conflicts between two stores, but does not resolve them

    Where the `BasicConflictSolver` will attempt to resolve conflicts, the `ConflictDetector` will only detect them. This means
    that during a rebase operation the `ConflictDetector` will raise a `RebaseFailed` error if any conflicts are detected, and
    allow the rebase operation to be retried with a different conflict resolution strategy. Otherwise, if no conflicts are detected
    the rebase operation will succeed.
    """

    def __init__(self) -> None: ...

ConflictError #

Bases: Exception

An error that occurs when a conflict is detected

Methods:

Name	Description
`__init__`	Create a new ConflictError.

Attributes:

Name	Type	Description
`actual_parent`	`str`	The actual parent snapshot ID of the branch that the session attempted to commit to.
`expected_parent`	`str`	The expected parent snapshot ID.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class ConflictError(Exception):
    """An error that occurs when a conflict is detected"""

    def __init__(
        self,
        expected_parent: str | None = None,
        actual_parent: str | None = None,
    ) -> None:
        """
        Create a new ConflictError.

        Parameters
        ----------
        expected_parent: str | None
            The expected parent snapshot ID.
        actual_parent: str | None
            The actual parent snapshot ID of the branch.
        """
        ...

    @property
    def expected_parent(self) -> str:
        """The expected parent snapshot ID.

        This is the snapshot ID that the session was based on when the
        commit operation was called.
        """
        ...
    @property
    def actual_parent(self) -> str:
        """
        The actual parent snapshot ID of the branch that the session attempted to commit to.

        When the session is based on a branch, this is the snapshot ID of the branch tip. If this
        error is raised, it means the branch was modified and committed by another session after
        the session was created.
        """
        ...
    ...

actual_parent `property` #

actual_parent

The actual parent snapshot ID of the branch that the session attempted to commit to.

When the session is based on a branch, this is the snapshot ID of the branch tip. If this error is raised, it means the branch was modified and committed by another session after the session was created.

expected_parent `property` #

expected_parent

The expected parent snapshot ID.

This is the snapshot ID that the session was based on when the commit operation was called.

init #

__init__(expected_parent=None, actual_parent=None)

Create a new ConflictError.

Parameters:

Name	Type	Description	Default
`expected_parent`	`str \| None`	The expected parent snapshot ID.	`None`
`actual_parent`	`str \| None`	The actual parent snapshot ID of the branch.	`None`

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

def __init__(
    self,
    expected_parent: str | None = None,
    actual_parent: str | None = None,
) -> None:
    """
    Create a new ConflictError.

    Parameters
    ----------
    expected_parent: str | None
        The expected parent snapshot ID.
    actual_parent: str | None
        The actual parent snapshot ID of the branch.
    """
    ...

ConflictSolver #

An abstract conflict solver that can be used to detect or resolve conflicts between two stores

This should never be used directly, but should be subclassed to provide specific conflict resolution behavior

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class ConflictSolver:
    """An abstract conflict solver that can be used to detect or resolve conflicts between two stores

    This should never be used directly, but should be subclassed to provide specific conflict resolution behavior
    """

    ...

ConflictType #

Bases: Enum

Type of conflict detected

Attributes:

Name	Type	Description
`ChunkDoubleUpdate`		A chunk update conflicts with an existing chunk update
`ChunksUpdatedInDeletedArray`		Chunks are updated in a deleted array
`ChunksUpdatedInUpdatedArray`		Chunks are updated in an updated array
`DeleteOfUpdatedArray`		A delete is attempted on an updated array
`DeleteOfUpdatedGroup`		A delete is attempted on an updated group
`NewNodeConflictsWithExistingNode`		A new node conflicts with an existing node
`NewNodeInInvalidGroup`		A new node is in an invalid group
`ZarrMetadataDoubleUpdate`		A zarr metadata update conflicts with an existing zarr metadata update
`ZarrMetadataUpdateOfDeletedArray`		A zarr metadata update is attempted on a deleted array
`ZarrMetadataUpdateOfDeletedGroup`		A zarr metadata update is attempted on a deleted group

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class ConflictType(Enum):
    """Type of conflict detected"""

    NewNodeConflictsWithExistingNode = (1,)
    """A new node conflicts with an existing node"""

    NewNodeInInvalidGroup = (2,)
    """A new node is in an invalid group"""

    ZarrMetadataDoubleUpdate = (3,)
    """A zarr metadata update conflicts with an existing zarr metadata update"""

    ZarrMetadataUpdateOfDeletedArray = (4,)
    """A zarr metadata update is attempted on a deleted array"""

    ZarrMetadataUpdateOfDeletedGroup = (5,)
    """A zarr metadata update is attempted on a deleted group"""

    ChunkDoubleUpdate = (6,)
    """A chunk update conflicts with an existing chunk update"""

    ChunksUpdatedInDeletedArray = (7,)
    """Chunks are updated in a deleted array"""

    ChunksUpdatedInUpdatedArray = (8,)
    """Chunks are updated in an updated array"""

    DeleteOfUpdatedArray = (9,)
    """A delete is attempted on an updated array"""

    DeleteOfUpdatedGroup = (10,)
    """A delete is attempted on an updated group"""

    (MoveOperationCannotBeRebased,) = (11,)
    """Move operation cannot be rebased"""

ChunkDoubleUpdate `class-attribute` `instance-attribute` #

ChunkDoubleUpdate = (6,)

A chunk update conflicts with an existing chunk update

ChunksUpdatedInDeletedArray `class-attribute` `instance-attribute` #

ChunksUpdatedInDeletedArray = (7,)

Chunks are updated in a deleted array

ChunksUpdatedInUpdatedArray `class-attribute` `instance-attribute` #

ChunksUpdatedInUpdatedArray = (8,)

Chunks are updated in an updated array

DeleteOfUpdatedArray `class-attribute` `instance-attribute` #

DeleteOfUpdatedArray = (9,)

A delete is attempted on an updated array

DeleteOfUpdatedGroup `class-attribute` `instance-attribute` #

DeleteOfUpdatedGroup = (10,)

A delete is attempted on an updated group

NewNodeConflictsWithExistingNode `class-attribute` `instance-attribute` #

NewNodeConflictsWithExistingNode = (1,)

A new node conflicts with an existing node

NewNodeInInvalidGroup `class-attribute` `instance-attribute` #

NewNodeInInvalidGroup = (2,)

A new node is in an invalid group

ZarrMetadataDoubleUpdate `class-attribute` `instance-attribute` #

ZarrMetadataDoubleUpdate = (3,)

A zarr metadata update conflicts with an existing zarr metadata update

ZarrMetadataUpdateOfDeletedArray `class-attribute` `instance-attribute` #

ZarrMetadataUpdateOfDeletedArray = (4,)

A zarr metadata update is attempted on a deleted array

ZarrMetadataUpdateOfDeletedGroup `class-attribute` `instance-attribute` #

ZarrMetadataUpdateOfDeletedGroup = (5,)

A zarr metadata update is attempted on a deleted group

Diff #

The result of comparing two snapshots

Methods:

Name	Description
`is_empty`	Returns True if the diff contains no changes.

Attributes:

Name	Type	Description
`deleted_arrays`	`set[str]`	The arrays that were deleted in the target ref.
`deleted_groups`	`set[str]`	The groups that were deleted in the target ref.
`moved_nodes`	`list[tuple[str, str]]`	The list of node moves, in order of application, as tuples (from_path, to_path).
`new_arrays`	`set[str]`	The arrays that were added to the target ref.
`new_groups`	`set[str]`	The groups that were added to the target ref.
`updated_arrays`	`set[str]`	The arrays that were updated via zarr metadata in the target ref.
`updated_chunks`	`dict[str, list[list[int]]]`	The chunks indices that had data updated in the target ref, keyed by the path to the array.
`updated_groups`	`set[str]`	The groups that were updated via zarr metadata in the target ref.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class Diff:
    """The result of comparing two snapshots"""
    def is_empty(self) -> bool:
        """
        Returns True if the diff contains no changes.
        """
        ...
    @property
    def new_groups(self) -> set[str]:
        """
        The groups that were added to the target ref.
        """
        ...
    @property
    def new_arrays(self) -> set[str]:
        """
        The arrays that were added to the target ref.
        """
        ...
    @property
    def deleted_groups(self) -> set[str]:
        """
        The groups that were deleted in the target ref.
        """
        ...
    @property
    def deleted_arrays(self) -> set[str]:
        """
        The arrays that were deleted in the target ref.
        """
        ...
    @property
    def updated_groups(self) -> set[str]:
        """
        The groups that were updated via zarr metadata in the target ref.
        """
        ...
    @property
    def updated_arrays(self) -> set[str]:
        """
        The arrays that were updated via zarr metadata in the target ref.
        """
        ...
    @property
    def updated_chunks(self) -> dict[str, list[list[int]]]:
        """
        The chunks indices that had data updated in the target ref, keyed by the path to the array.
        """
        ...
    @property
    def moved_nodes(self) -> list[tuple[str, str]]:
        """
        The list of node moves, in order of application, as tuples (from_path, to_path).
        """
        ...

deleted_arrays `property` #

deleted_arrays

The arrays that were deleted in the target ref.

deleted_groups `property` #

deleted_groups

The groups that were deleted in the target ref.

moved_nodes `property` #

moved_nodes

The list of node moves, in order of application, as tuples (from_path, to_path).

new_arrays `property` #

new_arrays

The arrays that were added to the target ref.

new_groups `property` #

new_groups

The groups that were added to the target ref.

updated_arrays `property` #

updated_arrays

The arrays that were updated via zarr metadata in the target ref.

updated_chunks `property` #

updated_chunks

The chunks indices that had data updated in the target ref, keyed by the path to the array.

updated_groups `property` #

updated_groups

The groups that were updated via zarr metadata in the target ref.

is_empty #

is_empty()

Returns True if the diff contains no changes.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

def is_empty(self) -> bool:
    """
    Returns True if the diff contains no changes.
    """
    ...

ForkSession #

Bases: Session

Methods:

Name	Description
`merge_async`	Merge the changes for this fork session with the changes from other fork sessions (async version).

Attributes:

Name	Type	Description
`store`	`IcechunkStore`	Get a zarr Store object for reading and writing data from the repository using zarr python.

Source code in icechunk-python/python/icechunk/session.py

class ForkSession(Session):
    def __getstate__(self) -> object:
        state = {"_session": self._session.as_bytes()}
        return state

    def __setstate__(self, state: object) -> None:
        if not isinstance(state, dict):
            raise ValueError("Invalid state")
        self._session = PySession.from_bytes(state["_session"])

    def merge(self, *others: Self) -> None:
        for other in others:
            if not isinstance(other, ForkSession):
                raise TypeError(
                    f"A ForkSession can only be merged with another ForkSession. Received {type(other)} instead."
                )
            self._session.merge(other._session)

    async def merge_async(self, *others: Self) -> None:
        """
        Merge the changes for this fork session with the changes from other fork sessions (async version).

        Parameters
        ----------
        others : ForkSession
            The other fork sessions to merge changes from.
        """
        for other in others:
            if not isinstance(other, ForkSession):
                raise TypeError(
                    f"A ForkSession can only be merged with another ForkSession. Received {type(other)} instead."
                )
            await self._session.merge_async(other._session)

    def commit(
        self,
        message: str,
        metadata: dict[str, Any] | None = None,
        rebase_with: ConflictSolver | None = None,
        rebase_tries: int = 1_000,
        allow_empty: bool = False,
    ) -> NoReturn:
        raise TypeError(
            "Cannot commit a fork of a Session. If you are using uncooperative writes, "
            "please send the Repository object to your workers, not a Session. "
            "See https://icechunk.io/en/stable/icechunk-python/parallel/#distributed-writes for more."
        )

    async def commit_async(
        self,
        message: str,
        metadata: dict[str, Any] | None = None,
        rebase_with: ConflictSolver | None = None,
        rebase_tries: int = 1_000,
        allow_empty: bool = False,
    ) -> NoReturn:
        raise TypeError(
            "Cannot commit a fork of a Session. If you are using uncooperative writes, "
            "please send the Repository object to your workers, not a Session. "
            "See https://icechunk.io/en/stable/icechunk-python/parallel/#distributed-writes for more."
        )

    def flush(
        self,
        message: str,
        metadata: dict[str, Any] | None = None,
    ) -> NoReturn:
        raise TypeError(
            "Cannot flush a fork of a Session. If you are using uncooperative writes, "
            "please send the Repository object to your workers, not a Session. "
            "See https://icechunk.io/en/stable/icechunk-python/parallel/#distributed-writes for more."
        )

    async def flush_async(
        self,
        message: str,
        metadata: dict[str, Any] | None = None,
    ) -> NoReturn:
        raise TypeError(
            "Cannot flush a fork of a Session. If you are using uncooperative writes, "
            "please send the Repository object to your workers, not a Session. "
            "See https://icechunk.io/en/stable/icechunk-python/parallel/#distributed-writes for more."
        )

    @property
    def store(self) -> IcechunkStore:
        """
        Get a zarr Store object for reading and writing data from the repository using zarr python.

        Returns
        -------
        IcechunkStore
            A zarr Store object for reading and writing data from the repository.
        """
        return IcechunkStore(self._session.store, for_fork=True)

store `property` #

store

Get a zarr Store object for reading and writing data from the repository using zarr python.

Returns:

Type	Description
`IcechunkStore`	A zarr Store object for reading and writing data from the repository.

merge_async `async` #

merge_async(*others)

Merge the changes for this fork session with the changes from other fork sessions (async version).

Parameters:

Name	Type	Description	Default
`others`	`ForkSession`	The other fork sessions to merge changes from.	`()`

Source code in icechunk-python/python/icechunk/session.py

async def merge_async(self, *others: Self) -> None:
    """
    Merge the changes for this fork session with the changes from other fork sessions (async version).

    Parameters
    ----------
    others : ForkSession
        The other fork sessions to merge changes from.
    """
    for other in others:
        if not isinstance(other, ForkSession):
            raise TypeError(
                f"A ForkSession can only be merged with another ForkSession. Received {type(other)} instead."
            )
        await self._session.merge_async(other._session)

GCSummary #

Summarizes the results of a garbage collection operation on an icechunk repo

Attributes:

Name	Type	Description
`attributes_deleted`	`int`	How many attributes were deleted.
`bytes_deleted`	`int`	How many bytes were deleted.
`chunks_deleted`	`int`	How many chunks were deleted.
`manifests_deleted`	`int`	How many manifests were deleted.
`snapshots_deleted`	`int`	How many snapshots were deleted.
`transaction_logs_deleted`	`int`	How many transaction logs were deleted.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class GCSummary:
    """Summarizes the results of a garbage collection operation on an icechunk repo"""
    @property
    def bytes_deleted(self) -> int:
        """
        How many bytes were deleted.
        """
        ...
    @property
    def chunks_deleted(self) -> int:
        """
        How many chunks were deleted.
        """
        ...
    @property
    def manifests_deleted(self) -> int:
        """
        How many manifests were deleted.
        """
        ...
    @property
    def snapshots_deleted(self) -> int:
        """
        How many snapshots were deleted.
        """
        ...
    @property
    def attributes_deleted(self) -> int:
        """
        How many attributes were deleted.
        """
        ...
    @property
    def transaction_logs_deleted(self) -> int:
        """
        How many transaction logs were deleted.
        """
        ...

attributes_deleted `property` #

attributes_deleted

How many attributes were deleted.

bytes_deleted `property` #

bytes_deleted

How many bytes were deleted.

chunks_deleted `property` #

chunks_deleted

How many chunks were deleted.

manifests_deleted `property` #

manifests_deleted

How many manifests were deleted.

snapshots_deleted `property` #

snapshots_deleted

How many snapshots were deleted.

transaction_logs_deleted `property` #

transaction_logs_deleted

How many transaction logs were deleted.

GcsBearerCredential #

Credentials for a google cloud storage backend

This is a bearer token that has an expiration time.

Methods:

Name	Description
`__init__`	Create a GcsBearerCredential object

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class GcsBearerCredential:
    """Credentials for a google cloud storage backend

    This is a bearer token that has an expiration time.
    """

    def __init__(
        self, bearer: str, *, expires_after: datetime.datetime | None = None
    ) -> None:
        """Create a GcsBearerCredential object

        Parameters
        ----------
        bearer: str
            The bearer token to use for authentication.
        expires_after: datetime.datetime | None
            The expiration time of the bearer token.
        """

    @property
    def bearer(self) -> str: ...
    @property
    def expires_after(self) -> datetime.datetime | None: ...

init #

__init__(bearer, *, expires_after=None)

Create a GcsBearerCredential object

Parameters:

Name	Type	Description	Default
`bearer`	`str`	The bearer token to use for authentication.	required
`expires_after`	`datetime \| None`	The expiration time of the bearer token.	`None`

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

def __init__(
    self, bearer: str, *, expires_after: datetime.datetime | None = None
) -> None:
    """Create a GcsBearerCredential object

    Parameters
    ----------
    bearer: str
        The bearer token to use for authentication.
    expires_after: datetime.datetime | None
        The expiration time of the bearer token.
    """

GcsCredentials #

Credentials for a google cloud storage backend

This can be used to authenticate with a google cloud storage backend.

Classes:

Name	Description
`Anonymous`	Uses anonymous credentials
`FromEnv`	Uses credentials from environment variables
`Refreshable`	Allows for an outside authority to pass in a function that can be used to provide credentials.
`Static`	Uses gcs credentials without expiration

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class GcsCredentials:
    """Credentials for a google cloud storage backend

    This can be used to authenticate with a google cloud storage backend.
    """
    class Anonymous:
        """Uses anonymous credentials"""
        def __init__(self) -> None: ...

    class FromEnv:
        """Uses credentials from environment variables"""
        def __init__(self) -> None: ...

    class Static:
        """Uses gcs credentials without expiration"""
        def __init__(self, credentials: AnyGcsStaticCredential) -> None: ...

    class Refreshable:
        """Allows for an outside authority to pass in a function that can be used to provide credentials.

        This is useful for credentials that have an expiration time, or are otherwise not known ahead of time.
        """
        def __init__(
            self, pickled_function: bytes, current: GcsBearerCredential | None = None
        ) -> None: ...

Anonymous #

Uses anonymous credentials

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class Anonymous:
    """Uses anonymous credentials"""
    def __init__(self) -> None: ...

FromEnv #

Uses credentials from environment variables

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class FromEnv:
    """Uses credentials from environment variables"""
    def __init__(self) -> None: ...

Refreshable #

Allows for an outside authority to pass in a function that can be used to provide credentials.

This is useful for credentials that have an expiration time, or are otherwise not known ahead of time.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class Refreshable:
    """Allows for an outside authority to pass in a function that can be used to provide credentials.

    This is useful for credentials that have an expiration time, or are otherwise not known ahead of time.
    """
    def __init__(
        self, pickled_function: bytes, current: GcsBearerCredential | None = None
    ) -> None: ...

Static #

Uses gcs credentials without expiration

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class Static:
    """Uses gcs credentials without expiration"""
    def __init__(self, credentials: AnyGcsStaticCredential) -> None: ...

GcsStaticCredentials #

Credentials for a google cloud storage backend

Classes:

Name	Description
`ApplicationCredentials`	Credentials for a google cloud storage backend using application default credentials
`BearerToken`	Credentials for a google cloud storage backend using a bearer token
`ServiceAccount`	Credentials for a google cloud storage backend using a service account json file
`ServiceAccountKey`	Credentials for a google cloud storage backend using a a serialized service account key

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class GcsStaticCredentials:
    """Credentials for a google cloud storage backend"""
    class ServiceAccount:
        """Credentials for a google cloud storage backend using a service account json file

        Parameters
        ----------
        path: str
            The path to the service account json file.
        """
        def __init__(self, path: str) -> None: ...

    class ServiceAccountKey:
        """Credentials for a google cloud storage backend using a a serialized service account key

        Parameters
        ----------
        key: str
            The serialized service account key.
        """
        def __init__(self, key: str) -> None: ...

    class ApplicationCredentials:
        """Credentials for a google cloud storage backend using application default credentials

        Parameters
        ----------
        path: str
            The path to the application default credentials (ADC) file.
        """
        def __init__(self, path: str) -> None: ...

    class BearerToken:
        """Credentials for a google cloud storage backend using a bearer token

        Parameters
        ----------
        token: str
            The bearer token to use for authentication.
        """
        def __init__(self, token: str) -> None: ...

ApplicationCredentials #

Credentials for a google cloud storage backend using application default credentials

Parameters:

Name	Type	Description	Default
`path`	`str`	The path to the application default credentials (ADC) file.	required

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class ApplicationCredentials:
    """Credentials for a google cloud storage backend using application default credentials

    Parameters
    ----------
    path: str
        The path to the application default credentials (ADC) file.
    """
    def __init__(self, path: str) -> None: ...

BearerToken #

Credentials for a google cloud storage backend using a bearer token

Parameters:

Name	Type	Description	Default
`token`	`str`	The bearer token to use for authentication.	required

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class BearerToken:
    """Credentials for a google cloud storage backend using a bearer token

    Parameters
    ----------
    token: str
        The bearer token to use for authentication.
    """
    def __init__(self, token: str) -> None: ...

ServiceAccount #

Credentials for a google cloud storage backend using a service account json file

Parameters:

Name	Type	Description	Default
`path`	`str`	The path to the service account json file.	required

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class ServiceAccount:
    """Credentials for a google cloud storage backend using a service account json file

    Parameters
    ----------
    path: str
        The path to the service account json file.
    """
    def __init__(self, path: str) -> None: ...

ServiceAccountKey #

Credentials for a google cloud storage backend using a a serialized service account key

Parameters:

Name	Type	Description	Default
`key`	`str`	The serialized service account key.	required

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class ServiceAccountKey:
    """Credentials for a google cloud storage backend using a a serialized service account key

    Parameters
    ----------
    key: str
        The serialized service account key.
    """
    def __init__(self, key: str) -> None: ...

IcechunkError #

Bases: Exception

Base class for all Icechunk errors

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class IcechunkError(Exception):
    """Base class for all Icechunk errors"""

    @property
    def message(self) -> str: ...

IcechunkStore #

Bases: Store, SyncMixin

Methods:

Name	Description
`__init__`	Create a new IcechunkStore.
`clear`	Clear the store.
`delete`	Remove a key from the store
`delete_dir`	Delete a prefix
`exists`	Check if a key exists in the store.
`get`	Retrieve the value associated with a given key.
`get_partial_values`	Retrieve possibly partial values from given key_ranges.
`is_empty`	Check if the directory is empty.
`list`	Retrieve all keys in the store.
`list_dir`	Retrieve all keys and prefixes with a given prefix and which do not contain the character
`list_prefix`	Retrieve all keys in the store that begin with a given prefix. Keys are returned relative
`set`	Store a (key, value) pair.
`set_if_not_exists`	Store a key to `value` if the key is not already present.
`set_partial_values`	Store values at a given key, starting at byte range_start.
`set_virtual_ref`	Store a virtual reference to a chunk.
`set_virtual_ref_async`	Store a virtual reference to a chunk asynchronously.
`set_virtual_refs`	Store multiple virtual references for the same array.
`set_virtual_refs_async`	Store multiple virtual references for the same array asynchronously.
`sync_clear`	Clear the store.

Attributes:

Name	Type	Description
`supports_listing`	`bool`	Does the store support listing?
`supports_partial_writes`	`Literal[False]`	Does the store support partial writes?
`supports_writes`	`bool`	Does the store support writes?

Source code in icechunk-python/python/icechunk/store.py

class IcechunkStore(Store, SyncMixin):
    _store: PyStore
    _for_fork: bool

    def __init__(
        self,
        store: PyStore,
        for_fork: bool,
        read_only: bool | None = None,
        *args: Any,
        **kwargs: Any,
    ):
        """Create a new IcechunkStore.

        This should not be called directly, instead use the `create`, `open_existing` or `open_or_create` class methods.
        """
        read_only = read_only if read_only is not None else store.read_only
        super().__init__(read_only=read_only)
        if store is None:
            raise ValueError(
                "An IcechunkStore should not be created with the default constructor, instead use either the create or open_existing class methods."
            )
        self._store = store
        self._is_open = True
        self._for_fork = for_fork

    def __eq__(self, value: object) -> bool:
        if not isinstance(value, IcechunkStore):
            return False
        return self._store == value._store

    def __getstate__(self) -> object:
        # for read_only sessions we allow pickling, this allows distributed reads without forking
        writable = not self.session.read_only
        if writable and not self._for_fork:
            raise ValueError(
                "You must opt-in to pickle writable sessions in a distributed context "
                "using Session.fork(). "
                # link to docs
                "If you are using xarray's `Dataset.to_zarr` method to write dask arrays, "
                "please use `icechunk.xarray.to_icechunk` instead. "
            )
        d = self.__dict__.copy()
        # we serialize the Rust store as bytes
        d["_store"] = self._store.as_bytes()
        d["_for_fork"] = self._for_fork
        return d

    def __setstate__(self, state: Any) -> None:
        # we have to deserialize the bytes of the Rust store
        store_repr = state["_store"]
        state["_store"] = PyStore.from_bytes(store_repr)
        self.__dict__ = state

    def with_read_only(self, read_only: bool = False) -> Store:
        new_store = IcechunkStore(store=self._store, for_fork=False, read_only=read_only)
        new_store._is_open = False
        return new_store

    @property
    def session(self) -> "Session":
        from icechunk.session import ForkSession, Session

        if self._for_fork:
            return ForkSession(self._store.session)
        else:
            return Session(self._store.session)

    async def clear(self) -> None:
        """Clear the store.

        This will remove all contents from the current session,
        including all groups and all arrays. But it will not modify the repository history.
        """
        return await self._store.clear()

    def sync_clear(self) -> None:
        """Clear the store.

        This will remove all contents from the current session,
        including all groups and all arrays. But it will not modify the repository history.
        """
        return self._store.sync_clear()

    async def is_empty(self, prefix: str) -> bool:
        """
        Check if the directory is empty.

        Parameters
        ----------
        prefix : str
            Prefix of keys to check.

        Returns
        -------
        bool
            True if the store is empty, False otherwise.
        """
        return await self._store.is_empty(prefix)

    async def get(
        self,
        key: str,
        prototype: BufferPrototype,
        byte_range: ByteRequest | None = None,
    ) -> Buffer | None:
        """Retrieve the value associated with a given key.

        Parameters
        ----------
        key : str
        byte_range : ByteRequest, optional

            ByteRequest may be one of the following. If not provided, all data associated with the key is retrieved.

            - RangeByteRequest(int, int): Request a specific range of bytes in the form (start, end). The end is exclusive. If the given range is zero-length or starts after the end of the object, an error will be returned. Additionally, if the range ends after the end of the object, the entire remainder of the object will be returned. Otherwise, the exact requested range will be returned.
            - OffsetByteRequest(int): Request all bytes starting from a given byte offset. This is equivalent to bytes={int}- as an HTTP header.
            - SuffixByteRequest(int): Request the last int bytes. Note that here, int is the size of the request, not the byte offset. This is equivalent to bytes=-{int} as an HTTP header.

        Returns
        -------
        Buffer
        """

        try:
            result = await self._store.get(key, _byte_request_to_tuple(byte_range))
        except KeyError as _e:
            # Zarr python expects None to be returned if the key does not exist
            # but an IcechunkStore returns an error if the key does not exist
            return None

        return prototype.buffer.from_bytes(result)

    async def get_partial_values(
        self,
        prototype: BufferPrototype,
        key_ranges: Iterable[tuple[str, ByteRequest | None]],
    ) -> list[Buffer | None]:
        """Retrieve possibly partial values from given key_ranges.

        Parameters
        ----------
        key_ranges : Iterable[tuple[str, tuple[int | None, int | None]]]
            Ordered set of key, range pairs, a key may occur multiple times with different ranges

        Returns
        -------
        list of values, in the order of the key_ranges, may contain null/none for missing keys
        """
        # NOTE: pyo3 has not implicit conversion from an Iterable to a rust iterable. So we convert it
        # to a list here first. Possible opportunity for optimization.
        ranges = [(k[0], _byte_request_to_tuple(k[1])) for k in key_ranges]
        result = await self._store.get_partial_values(list(ranges))
        return [prototype.buffer.from_bytes(r) for r in result]

    async def exists(self, key: str) -> bool:
        """Check if a key exists in the store.

        Parameters
        ----------
        key : str

        Returns
        -------
        bool
        """
        return await self._store.exists(key)

    @property
    def supports_writes(self) -> bool:
        """Does the store support writes?"""
        return self._store.supports_writes

    async def set(self, key: str, value: Buffer) -> None:
        """Store a (key, value) pair.

        Parameters
        ----------
        key : str
        value : Buffer
        """
        if not isinstance(value, Buffer):
            raise TypeError(
                f"IcechunkStore.set(): `value` must be a Buffer instance. Got an instance of {type(value)} instead."
            )
        return await self._store.set(key, value.to_bytes())

    async def set_if_not_exists(self, key: str, value: Buffer) -> None:
        """
        Store a key to ``value`` if the key is not already present.

        Parameters
        -----------
        key : str
        value : Buffer
        """
        return await self._store.set_if_not_exists(key, value.to_bytes())

    def set_virtual_ref(
        self,
        key: str,
        location: str,
        *,
        offset: int,
        length: int,
        checksum: str | datetime | None = None,
        validate_container: bool = True,
    ) -> None:
        """Store a virtual reference to a chunk.

        Parameters
        ----------
        key : str
            The chunk to store the reference under. This is the fully qualified zarr key eg: 'array/c/0/0/0'
        location : str
            The location of the chunk in storage. This is absolute path to the chunk in storage eg: 's3://bucket/path/to/file.nc'
        offset : int
            The offset in bytes from the start of the file location in storage the chunk starts at
        length : int
            The length of the chunk in bytes, measured from the given offset
        checksum : str | datetime | None
            The etag or last_medified_at field of the object
        validate_container: bool
            If set to true, fail for locations that don't match any existing virtual chunk container
        """
        return self._store.set_virtual_ref(
            key, location, offset, length, checksum, validate_container
        )

    async def set_virtual_ref_async(
        self,
        key: str,
        location: str,
        *,
        offset: int,
        length: int,
        checksum: str | datetime | None = None,
        validate_container: bool = True,
    ) -> None:
        """Store a virtual reference to a chunk asynchronously.

        Parameters
        ----------
        key : str
            The chunk to store the reference under. This is the fully qualified zarr key eg: 'array/c/0/0/0'
        location : str
            The location of the chunk in storage. This is absolute path to the chunk in storage eg: 's3://bucket/path/to/file.nc'
        offset : int
            The offset in bytes from the start of the file location in storage the chunk starts at
        length : int
            The length of the chunk in bytes, measured from the given offset
        checksum : str | datetime | None
            The etag or last_medified_at field of the object
        validate_container: bool
            If set to true, fail for locations that don't match any existing virtual chunk container
        """
        return await self._store.set_virtual_ref_async(
            key, location, offset, length, checksum, validate_container
        )

    def set_virtual_refs(
        self,
        array_path: str,
        chunks: list[VirtualChunkSpec],
        *,
        validate_containers: bool = True,
    ) -> list[tuple[int, ...]] | None:
        """Store multiple virtual references for the same array.

        Parameters
        ----------
        array_path : str
            The path to the array inside the Zarr store. Example: "/groupA/groupB/outputs/my-array"
        chunks : list[VirtualChunkSpec],
            The list of virtual chunks to add
        validate_containers: bool
            If set to true, ignore virtual references for locations that don't match any existing virtual chunk container


        Returns
        -------
        list[tuple[int, ...]] | None

            If all virtual references where successfully updated, it returns None.
            If there were validation errors, it returns the chunk indices of all failed references.
        """
        return self._store.set_virtual_refs(array_path, chunks, validate_containers)

    async def set_virtual_refs_async(
        self,
        array_path: str,
        chunks: list[VirtualChunkSpec],
        *,
        validate_containers: bool = True,
    ) -> list[tuple[int, ...]] | None:
        """Store multiple virtual references for the same array asynchronously.

        Parameters
        ----------
        array_path : str
            The path to the array inside the Zarr store. Example: "/groupA/groupB/outputs/my-array"
        chunks : list[VirtualChunkSpec],
            The list of virtual chunks to add
        validate_containers: bool
            If set to true, ignore virtual references for locations that don't match any existing virtual chunk container


        Returns
        -------
        list[tuple[int, ...]] | None

            If all virtual references where successfully updated, it returns None.
            If there were validation errors, it returns the chunk indices of all failed references.
        """
        return await self._store.set_virtual_refs_async(
            array_path, chunks, validate_containers
        )

    async def delete(self, key: str) -> None:
        """Remove a key from the store

        Parameters
        ----------
        key : str
        """
        return await self._store.delete(key)

    async def delete_dir(self, prefix: str) -> None:
        """Delete a prefix

        Parameters
        ----------
        prefix : str
        """
        return await self._store.delete_dir(prefix)

    @property
    def supports_partial_writes(self) -> Literal[False]:
        """Does the store support partial writes?

        Partial writes are no longer used by Zarr, so this is always false.
        """
        return self._store.supports_partial_writes  # type: ignore[return-value]

    async def set_partial_values(
        self, key_start_values: Iterable[tuple[str, int, BytesLike]]
    ) -> None:
        """Store values at a given key, starting at byte range_start.

        Parameters
        ----------
        key_start_values : list[tuple[str, int, BytesLike]]
            set of key, range_start, values triples, a key may occur multiple times with different
            range_starts, range_starts (considering the length of the respective values) must not
            specify overlapping ranges for the same key
        """
        # NOTE: pyo3 does not implicit conversion from an Iterable to a rust iterable. So we convert it
        # to a list here first. Possible opportunity for optimization.
        # NOTE: currently we only implement the case where the values are bytes
        return await self._store.set_partial_values(list(key_start_values))  # type: ignore[arg-type]

    @property
    def supports_listing(self) -> bool:
        """Does the store support listing?"""
        return self._store.supports_listing

    @property
    def supports_consolidated_metadata(self) -> bool:
        return self._store.supports_consolidated_metadata

    @property
    def supports_deletes(self) -> bool:
        return self._store.supports_deletes

    def list(self) -> AsyncIterator[str]:
        """Retrieve all keys in the store.

        Returns
        -------
        AsyncIterator[str, None]
        """
        # This method should be async, like overridden methods in child classes.
        # However, that's not straightforward:
        # https://stackoverflow.com/questions/68905848

        # The zarr spec specefies that that this and other
        # listing methods should not be async, so we need to
        # wrap the async method in a sync method.
        return self._store.list()

    def list_prefix(self, prefix: str) -> AsyncIterator[str]:
        """Retrieve all keys in the store that begin with a given prefix. Keys are returned relative
        to the root of the store.

        Parameters
        ----------
        prefix : str

        Returns
        -------
        AsyncIterator[str, None]
        """
        # The zarr spec specefies that that this and other
        # listing methods should not be async, so we need to
        # wrap the async method in a sync method.
        return self._store.list_prefix(prefix)

    def list_dir(self, prefix: str) -> AsyncIterator[str]:
        """
        Retrieve all keys and prefixes with a given prefix and which do not contain the character
        “/” after the given prefix.

        Parameters
        ----------
        prefix : str

        Returns
        -------
        AsyncIterator[str, None]
        """
        # The zarr spec specefies that that this and other
        # listing methods should not be async, so we need to
        # wrap the async method in a sync method.
        return self._store.list_dir(prefix)

    async def getsize(self, key: str) -> int:
        return await self._store.getsize(key)

    async def getsize_prefix(self, prefix: str) -> int:
        return await self._store.getsize_prefix(prefix)

supports_listing `property` #

supports_listing

Does the store support listing?

supports_partial_writes `property` #

supports_partial_writes

Does the store support partial writes?

Partial writes are no longer used by Zarr, so this is always false.

supports_writes `property` #

supports_writes

Does the store support writes?

init #

__init__(store, for_fork, read_only=None, *args, **kwargs)

Create a new IcechunkStore.

This should not be called directly, instead use the create, open_existing or open_or_create class methods.

Source code in icechunk-python/python/icechunk/store.py

def __init__(
    self,
    store: PyStore,
    for_fork: bool,
    read_only: bool | None = None,
    *args: Any,
    **kwargs: Any,
):
    """Create a new IcechunkStore.

    This should not be called directly, instead use the `create`, `open_existing` or `open_or_create` class methods.
    """
    read_only = read_only if read_only is not None else store.read_only
    super().__init__(read_only=read_only)
    if store is None:
        raise ValueError(
            "An IcechunkStore should not be created with the default constructor, instead use either the create or open_existing class methods."
        )
    self._store = store
    self._is_open = True
    self._for_fork = for_fork

clear `async` #

clear()

Clear the store.

This will remove all contents from the current session, including all groups and all arrays. But it will not modify the repository history.

Source code in icechunk-python/python/icechunk/store.py

async def clear(self) -> None:
    """Clear the store.

    This will remove all contents from the current session,
    including all groups and all arrays. But it will not modify the repository history.
    """
    return await self._store.clear()

delete `async` #

delete(key)

Remove a key from the store

Parameters:

Name	Type	Description	Default
`key`	`str`		required

Source code in icechunk-python/python/icechunk/store.py

async def delete(self, key: str) -> None:
    """Remove a key from the store

    Parameters
    ----------
    key : str
    """
    return await self._store.delete(key)

delete_dir `async` #

delete_dir(prefix)

Delete a prefix

Parameters:

Name	Type	Description	Default
`prefix`	`str`		required

Source code in icechunk-python/python/icechunk/store.py

async def delete_dir(self, prefix: str) -> None:
    """Delete a prefix

    Parameters
    ----------
    prefix : str
    """
    return await self._store.delete_dir(prefix)

exists `async` #

exists(key)

Check if a key exists in the store.

Parameters:

Name	Type	Description	Default
`key`	`str`		required

Returns:

Type	Description
`bool`

Source code in icechunk-python/python/icechunk/store.py

async def exists(self, key: str) -> bool:
    """Check if a key exists in the store.

    Parameters
    ----------
    key : str

    Returns
    -------
    bool
    """
    return await self._store.exists(key)

get `async` #

get(key, prototype, byte_range=None)

Retrieve the value associated with a given key.

Parameters:

Name	Type	Description	Default
`key`	`str`		required
`byte_range`	`ByteRequest`	ByteRequest may be one of the following. If not provided, all data associated with the key is retrieved. RangeByteRequest(int, int): Request a specific range of bytes in the form (start, end). The end is exclusive. If the given range is zero-length or starts after the end of the object, an error will be returned. Additionally, if the range ends after the end of the object, the entire remainder of the object will be returned. Otherwise, the exact requested range will be returned. OffsetByteRequest(int): Request all bytes starting from a given byte offset. This is equivalent to bytes={int}- as an HTTP header. SuffixByteRequest(int): Request the last int bytes. Note that here, int is the size of the request, not the byte offset. This is equivalent to bytes=-{int} as an HTTP header.	`None`

Returns:

Type	Description
`Buffer`

Source code in icechunk-python/python/icechunk/store.py

async def get(
    self,
    key: str,
    prototype: BufferPrototype,
    byte_range: ByteRequest | None = None,
) -> Buffer | None:
    """Retrieve the value associated with a given key.

    Parameters
    ----------
    key : str
    byte_range : ByteRequest, optional

        ByteRequest may be one of the following. If not provided, all data associated with the key is retrieved.

        - RangeByteRequest(int, int): Request a specific range of bytes in the form (start, end). The end is exclusive. If the given range is zero-length or starts after the end of the object, an error will be returned. Additionally, if the range ends after the end of the object, the entire remainder of the object will be returned. Otherwise, the exact requested range will be returned.
        - OffsetByteRequest(int): Request all bytes starting from a given byte offset. This is equivalent to bytes={int}- as an HTTP header.
        - SuffixByteRequest(int): Request the last int bytes. Note that here, int is the size of the request, not the byte offset. This is equivalent to bytes=-{int} as an HTTP header.

    Returns
    -------
    Buffer
    """

    try:
        result = await self._store.get(key, _byte_request_to_tuple(byte_range))
    except KeyError as _e:
        # Zarr python expects None to be returned if the key does not exist
        # but an IcechunkStore returns an error if the key does not exist
        return None

    return prototype.buffer.from_bytes(result)

get_partial_values `async` #

get_partial_values(prototype, key_ranges)

Retrieve possibly partial values from given key_ranges.

Parameters:

Name	Type	Description	Default
`key_ranges`	`Iterable[tuple[str, tuple[int \| None, int \| None]]]`	Ordered set of key, range pairs, a key may occur multiple times with different ranges	required

Returns:

Type	Description
`list of values, in the order of the key_ranges, may contain null/none for missing keys`

Source code in icechunk-python/python/icechunk/store.py

async def get_partial_values(
    self,
    prototype: BufferPrototype,
    key_ranges: Iterable[tuple[str, ByteRequest | None]],
) -> list[Buffer | None]:
    """Retrieve possibly partial values from given key_ranges.

    Parameters
    ----------
    key_ranges : Iterable[tuple[str, tuple[int | None, int | None]]]
        Ordered set of key, range pairs, a key may occur multiple times with different ranges

    Returns
    -------
    list of values, in the order of the key_ranges, may contain null/none for missing keys
    """
    # NOTE: pyo3 has not implicit conversion from an Iterable to a rust iterable. So we convert it
    # to a list here first. Possible opportunity for optimization.
    ranges = [(k[0], _byte_request_to_tuple(k[1])) for k in key_ranges]
    result = await self._store.get_partial_values(list(ranges))
    return [prototype.buffer.from_bytes(r) for r in result]

is_empty `async` #

is_empty(prefix)

Check if the directory is empty.

Parameters:

Name	Type	Description	Default
`prefix`	`str`	Prefix of keys to check.	required

Returns:

Type	Description
`bool`	True if the store is empty, False otherwise.

Source code in icechunk-python/python/icechunk/store.py

async def is_empty(self, prefix: str) -> bool:
    """
    Check if the directory is empty.

    Parameters
    ----------
    prefix : str
        Prefix of keys to check.

    Returns
    -------
    bool
        True if the store is empty, False otherwise.
    """
    return await self._store.is_empty(prefix)

list #

list()

Retrieve all keys in the store.

Returns:

Type	Description
`AsyncIterator[str, None]`

Source code in icechunk-python/python/icechunk/store.py

def list(self) -> AsyncIterator[str]:
    """Retrieve all keys in the store.

    Returns
    -------
    AsyncIterator[str, None]
    """
    # This method should be async, like overridden methods in child classes.
    # However, that's not straightforward:
    # https://stackoverflow.com/questions/68905848

    # The zarr spec specefies that that this and other
    # listing methods should not be async, so we need to
    # wrap the async method in a sync method.
    return self._store.list()

list_dir #

list_dir(prefix)

Retrieve all keys and prefixes with a given prefix and which do not contain the character “/” after the given prefix.

Parameters:

Name	Type	Description	Default
`prefix`	`str`		required

Returns:

Type	Description
`AsyncIterator[str, None]`

Source code in icechunk-python/python/icechunk/store.py

def list_dir(self, prefix: str) -> AsyncIterator[str]:
    """
    Retrieve all keys and prefixes with a given prefix and which do not contain the character
    “/” after the given prefix.

    Parameters
    ----------
    prefix : str

    Returns
    -------
    AsyncIterator[str, None]
    """
    # The zarr spec specefies that that this and other
    # listing methods should not be async, so we need to
    # wrap the async method in a sync method.
    return self._store.list_dir(prefix)

list_prefix #

list_prefix(prefix)

Retrieve all keys in the store that begin with a given prefix. Keys are returned relative to the root of the store.

Parameters:

Name	Type	Description	Default
`prefix`	`str`		required

Returns:

Type	Description
`AsyncIterator[str, None]`

Source code in icechunk-python/python/icechunk/store.py

def list_prefix(self, prefix: str) -> AsyncIterator[str]:
    """Retrieve all keys in the store that begin with a given prefix. Keys are returned relative
    to the root of the store.

    Parameters
    ----------
    prefix : str

    Returns
    -------
    AsyncIterator[str, None]
    """
    # The zarr spec specefies that that this and other
    # listing methods should not be async, so we need to
    # wrap the async method in a sync method.
    return self._store.list_prefix(prefix)

set `async` #

set(key, value)

Store a (key, value) pair.

Parameters:

Name	Type	Description	Default
`key`	`str`		required
`value`	`Buffer`		required

Source code in icechunk-python/python/icechunk/store.py

async def set(self, key: str, value: Buffer) -> None:
    """Store a (key, value) pair.

    Parameters
    ----------
    key : str
    value : Buffer
    """
    if not isinstance(value, Buffer):
        raise TypeError(
            f"IcechunkStore.set(): `value` must be a Buffer instance. Got an instance of {type(value)} instead."
        )
    return await self._store.set(key, value.to_bytes())

set_if_not_exists `async` #

set_if_not_exists(key, value)

Store a key to value if the key is not already present.

Parameters:

Name	Type	Description	Default
`key`	`str`		required
`value`	`Buffer`		required

Source code in icechunk-python/python/icechunk/store.py

async def set_if_not_exists(self, key: str, value: Buffer) -> None:
    """
    Store a key to ``value`` if the key is not already present.

    Parameters
    -----------
    key : str
    value : Buffer
    """
    return await self._store.set_if_not_exists(key, value.to_bytes())

set_partial_values `async` #

set_partial_values(key_start_values)

Store values at a given key, starting at byte range_start.

Parameters:

Name	Type	Description	Default
`key_start_values`	`list[tuple[str, int, BytesLike]]`	set of key, range_start, values triples, a key may occur multiple times with different range_starts, range_starts (considering the length of the respective values) must not specify overlapping ranges for the same key	required

Source code in icechunk-python/python/icechunk/store.py

async def set_partial_values(
    self, key_start_values: Iterable[tuple[str, int, BytesLike]]
) -> None:
    """Store values at a given key, starting at byte range_start.

    Parameters
    ----------
    key_start_values : list[tuple[str, int, BytesLike]]
        set of key, range_start, values triples, a key may occur multiple times with different
        range_starts, range_starts (considering the length of the respective values) must not
        specify overlapping ranges for the same key
    """
    # NOTE: pyo3 does not implicit conversion from an Iterable to a rust iterable. So we convert it
    # to a list here first. Possible opportunity for optimization.
    # NOTE: currently we only implement the case where the values are bytes
    return await self._store.set_partial_values(list(key_start_values))  # type: ignore[arg-type]

set_virtual_ref #

set_virtual_ref(key, location, *, offset, length, checksum=None, validate_container=True)

Store a virtual reference to a chunk.

Parameters:

Name	Type	Description	Default
`key`	`str`	The chunk to store the reference under. This is the fully qualified zarr key eg: 'array/c/0/0/0'	required
`location`	`str`	The location of the chunk in storage. This is absolute path to the chunk in storage eg: 's3://bucket/path/to/file.nc'	required
`offset`	`int`	The offset in bytes from the start of the file location in storage the chunk starts at	required
`length`	`int`	The length of the chunk in bytes, measured from the given offset	required
`checksum`	`str \| datetime \| None`	The etag or last_medified_at field of the object	`None`
`validate_container`	`bool`	If set to true, fail for locations that don't match any existing virtual chunk container	`True`

Source code in icechunk-python/python/icechunk/store.py

def set_virtual_ref(
    self,
    key: str,
    location: str,
    *,
    offset: int,
    length: int,
    checksum: str | datetime | None = None,
    validate_container: bool = True,
) -> None:
    """Store a virtual reference to a chunk.

    Parameters
    ----------
    key : str
        The chunk to store the reference under. This is the fully qualified zarr key eg: 'array/c/0/0/0'
    location : str
        The location of the chunk in storage. This is absolute path to the chunk in storage eg: 's3://bucket/path/to/file.nc'
    offset : int
        The offset in bytes from the start of the file location in storage the chunk starts at
    length : int
        The length of the chunk in bytes, measured from the given offset
    checksum : str | datetime | None
        The etag or last_medified_at field of the object
    validate_container: bool
        If set to true, fail for locations that don't match any existing virtual chunk container
    """
    return self._store.set_virtual_ref(
        key, location, offset, length, checksum, validate_container
    )

set_virtual_ref_async `async` #

set_virtual_ref_async(key, location, *, offset, length, checksum=None, validate_container=True)

Store a virtual reference to a chunk asynchronously.

Parameters:

Name	Type	Description	Default
`key`	`str`	The chunk to store the reference under. This is the fully qualified zarr key eg: 'array/c/0/0/0'	required
`location`	`str`	The location of the chunk in storage. This is absolute path to the chunk in storage eg: 's3://bucket/path/to/file.nc'	required
`offset`	`int`	The offset in bytes from the start of the file location in storage the chunk starts at	required
`length`	`int`	The length of the chunk in bytes, measured from the given offset	required
`checksum`	`str \| datetime \| None`	The etag or last_medified_at field of the object	`None`
`validate_container`	`bool`	If set to true, fail for locations that don't match any existing virtual chunk container	`True`

Source code in icechunk-python/python/icechunk/store.py

async def set_virtual_ref_async(
    self,
    key: str,
    location: str,
    *,
    offset: int,
    length: int,
    checksum: str | datetime | None = None,
    validate_container: bool = True,
) -> None:
    """Store a virtual reference to a chunk asynchronously.

    Parameters
    ----------
    key : str
        The chunk to store the reference under. This is the fully qualified zarr key eg: 'array/c/0/0/0'
    location : str
        The location of the chunk in storage. This is absolute path to the chunk in storage eg: 's3://bucket/path/to/file.nc'
    offset : int
        The offset in bytes from the start of the file location in storage the chunk starts at
    length : int
        The length of the chunk in bytes, measured from the given offset
    checksum : str | datetime | None
        The etag or last_medified_at field of the object
    validate_container: bool
        If set to true, fail for locations that don't match any existing virtual chunk container
    """
    return await self._store.set_virtual_ref_async(
        key, location, offset, length, checksum, validate_container
    )

set_virtual_refs #

set_virtual_refs(array_path, chunks, *, validate_containers=True)

Store multiple virtual references for the same array.

Parameters:

Name	Type	Description	Default
`array_path`	`str`	The path to the array inside the Zarr store. Example: "/groupA/groupB/outputs/my-array"	required
`chunks`	`(list[VirtualChunkSpec],)`	The list of virtual chunks to add	required
`validate_containers`	`bool`	If set to true, ignore virtual references for locations that don't match any existing virtual chunk container	`True`

Returns:

Type	Description
`list[tuple[int, ...]] \| None`	If all virtual references where successfully updated, it returns None. If there were validation errors, it returns the chunk indices of all failed references.

Source code in icechunk-python/python/icechunk/store.py

def set_virtual_refs(
    self,
    array_path: str,
    chunks: list[VirtualChunkSpec],
    *,
    validate_containers: bool = True,
) -> list[tuple[int, ...]] | None:
    """Store multiple virtual references for the same array.

    Parameters
    ----------
    array_path : str
        The path to the array inside the Zarr store. Example: "/groupA/groupB/outputs/my-array"
    chunks : list[VirtualChunkSpec],
        The list of virtual chunks to add
    validate_containers: bool
        If set to true, ignore virtual references for locations that don't match any existing virtual chunk container


    Returns
    -------
    list[tuple[int, ...]] | None

        If all virtual references where successfully updated, it returns None.
        If there were validation errors, it returns the chunk indices of all failed references.
    """
    return self._store.set_virtual_refs(array_path, chunks, validate_containers)

set_virtual_refs_async `async` #

set_virtual_refs_async(array_path, chunks, *, validate_containers=True)

Store multiple virtual references for the same array asynchronously.

Parameters:

Name	Type	Description	Default
`array_path`	`str`	The path to the array inside the Zarr store. Example: "/groupA/groupB/outputs/my-array"	required
`chunks`	`(list[VirtualChunkSpec],)`	The list of virtual chunks to add	required
`validate_containers`	`bool`	If set to true, ignore virtual references for locations that don't match any existing virtual chunk container	`True`

Returns:

Type	Description
`list[tuple[int, ...]] \| None`	If all virtual references where successfully updated, it returns None. If there were validation errors, it returns the chunk indices of all failed references.

Source code in icechunk-python/python/icechunk/store.py

async def set_virtual_refs_async(
    self,
    array_path: str,
    chunks: list[VirtualChunkSpec],
    *,
    validate_containers: bool = True,
) -> list[tuple[int, ...]] | None:
    """Store multiple virtual references for the same array asynchronously.

    Parameters
    ----------
    array_path : str
        The path to the array inside the Zarr store. Example: "/groupA/groupB/outputs/my-array"
    chunks : list[VirtualChunkSpec],
        The list of virtual chunks to add
    validate_containers: bool
        If set to true, ignore virtual references for locations that don't match any existing virtual chunk container


    Returns
    -------
    list[tuple[int, ...]] | None

        If all virtual references where successfully updated, it returns None.
        If there were validation errors, it returns the chunk indices of all failed references.
    """
    return await self._store.set_virtual_refs_async(
        array_path, chunks, validate_containers
    )

sync_clear #

sync_clear()

Clear the store.

This will remove all contents from the current session, including all groups and all arrays. But it will not modify the repository history.

Source code in icechunk-python/python/icechunk/store.py

def sync_clear(self) -> None:
    """Clear the store.

    This will remove all contents from the current session,
    including all groups and all arrays. But it will not modify the repository history.
    """
    return self._store.sync_clear()

ManifestConfig #

Configuration for how Icechunk manifests

Methods:

Name	Description
`__init__`	Create a new `ManifestConfig` object

Attributes:

Name	Type	Description
`preload`	`ManifestPreloadConfig \| None`	The configuration for how Icechunk manifests will be preloaded.
`splitting`	`ManifestSplittingConfig \| None`	The configuration for how Icechunk manifests will be split.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class ManifestConfig:
    """Configuration for how Icechunk manifests"""

    def __init__(
        self,
        preload: ManifestPreloadConfig | None = None,
        splitting: ManifestSplittingConfig | None = None,
    ) -> None:
        """
        Create a new `ManifestConfig` object

        Parameters
        ----------
        preload: ManifestPreloadConfig | None
            The configuration for how Icechunk manifests will be preloaded.
        splitting: ManifestSplittingConfig | None
            The configuration for how Icechunk manifests will be split.
        """
        ...
    @property
    def preload(self) -> ManifestPreloadConfig | None:
        """
        The configuration for how Icechunk manifests will be preloaded.

        Returns
        -------
        ManifestPreloadConfig | None
            The configuration for how Icechunk manifests will be preloaded.
        """
        ...
    @preload.setter
    def preload(self, value: ManifestPreloadConfig | None) -> None:
        """
        Set the configuration for how Icechunk manifests will be preloaded.

        Parameters
        ----------
        value: ManifestPreloadConfig | None
            The configuration for how Icechunk manifests will be preloaded.
        """
        ...

    @property
    def splitting(self) -> ManifestSplittingConfig | None:
        """
        The configuration for how Icechunk manifests will be split.

        Returns
        -------
        ManifestSplittingConfig | None
            The configuration for how Icechunk manifests will be split.
        """
        ...

    @splitting.setter
    def splitting(self, value: ManifestSplittingConfig | None) -> None:
        """
        Set the configuration for how Icechunk manifests will be split.

        Parameters
        ----------
        value: ManifestSplittingConfig | None
            The configuration for how Icechunk manifests will be split.
        """
        ...

preload `property` `writable` #

preload

The configuration for how Icechunk manifests will be preloaded.

Returns:

Type	Description
`ManifestPreloadConfig \| None`	The configuration for how Icechunk manifests will be preloaded.

splitting `property` `writable` #

splitting

The configuration for how Icechunk manifests will be split.

Returns:

Type	Description
`ManifestSplittingConfig \| None`	The configuration for how Icechunk manifests will be split.

init #

__init__(preload=None, splitting=None)

Create a new ManifestConfig object

Parameters:

Name	Type	Description	Default
`preload`	`ManifestPreloadConfig \| None`	The configuration for how Icechunk manifests will be preloaded.	`None`
`splitting`	`ManifestSplittingConfig \| None`	The configuration for how Icechunk manifests will be split.	`None`

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

def __init__(
    self,
    preload: ManifestPreloadConfig | None = None,
    splitting: ManifestSplittingConfig | None = None,
) -> None:
    """
    Create a new `ManifestConfig` object

    Parameters
    ----------
    preload: ManifestPreloadConfig | None
        The configuration for how Icechunk manifests will be preloaded.
    splitting: ManifestSplittingConfig | None
        The configuration for how Icechunk manifests will be split.
    """
    ...

ManifestFileInfo #

Manifest file metadata

Attributes:

Name	Type	Description
`id`	`str`	The manifest id
`num_chunk_refs`	`int`	The number of chunk references contained in this manifest
`size_bytes`	`int`	The size in bytes of the

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class ManifestFileInfo:
    """Manifest file metadata"""

    @property
    def id(self) -> str:
        """The manifest id"""
        ...
    @property
    def size_bytes(self) -> int:
        """The size in bytes of the"""
        ...
    @property
    def num_chunk_refs(self) -> int:
        """The number of chunk references contained in this manifest"""
        ...

id `property` #

id

The manifest id

num_chunk_refs `property` #

num_chunk_refs

The number of chunk references contained in this manifest

size_bytes `property` #

size_bytes

The size in bytes of the

ManifestPreloadCondition #

Configuration for conditions under which manifests will preload on session creation

Methods:

Name	Description
`__and__`	Create a preload condition that matches if both this condition and `other` match.
`__or__`	Create a preload condition that matches if either this condition or `other` match.
`and_conditions`	Create a preload condition that matches only if all passed `conditions` match
`false`	Create a preload condition that never matches any manifests
`name_matches`	Create a preload condition that matches if the array's name matches the passed regex.
`num_refs`	Create a preload condition that matches only if the number of chunk references in the manifest is within the given range.
`or_conditions`	Create a preload condition that matches if any of `conditions` matches
`path_matches`	Create a preload condition that matches if the full path to the array matches the passed regex.
`true`	Create a preload condition that always matches any manifest

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class ManifestPreloadCondition:
    """Configuration for conditions under which manifests will preload on session creation"""

    @staticmethod
    def or_conditions(
        conditions: list[ManifestPreloadCondition],
    ) -> ManifestPreloadCondition:
        """Create a preload condition that matches if any of `conditions` matches"""
        ...
    @staticmethod
    def and_conditions(
        conditions: list[ManifestPreloadCondition],
    ) -> ManifestPreloadCondition:
        """Create a preload condition that matches only if all passed `conditions` match"""
        ...
    @staticmethod
    def path_matches(regex: str) -> ManifestPreloadCondition:
        """Create a preload condition that matches if the full path to the array matches the passed regex.

        Array paths are absolute, as in `/path/to/my/array`
        """
        ...
    @staticmethod
    def name_matches(regex: str) -> ManifestPreloadCondition:
        """Create a preload condition that matches if the array's name matches the passed regex.

        Example, for an array  `/model/outputs/temperature`, the following will match:
        ```
        name_matches(".*temp.*")
        ```
        """
        ...
    @staticmethod
    def num_refs(from_refs: int | None, to_refs: int | None) -> ManifestPreloadCondition:
        """Create a preload condition that matches only if the number of chunk references in the manifest is within the given range.

        from_refs is inclusive, to_refs is exclusive.
        """
        ...
    @staticmethod
    def true() -> ManifestPreloadCondition:
        """Create a preload condition that always matches any manifest"""
        ...
    @staticmethod
    def false() -> ManifestPreloadCondition:
        """Create a preload condition that never matches any manifests"""
        ...
    def __and__(self, other: ManifestPreloadCondition) -> ManifestPreloadCondition:
        """Create a preload condition that matches if both this condition and `other` match."""
        ...
    def __or__(self, other: ManifestPreloadCondition) -> ManifestPreloadCondition:
        """Create a preload condition that matches if either this condition or `other` match."""
        ...

and #

__and__(other)

Create a preload condition that matches if both this condition and other match.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

def __and__(self, other: ManifestPreloadCondition) -> ManifestPreloadCondition:
    """Create a preload condition that matches if both this condition and `other` match."""
    ...

or #

__or__(other)

Create a preload condition that matches if either this condition or other match.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

def __or__(self, other: ManifestPreloadCondition) -> ManifestPreloadCondition:
    """Create a preload condition that matches if either this condition or `other` match."""
    ...

and_conditions `staticmethod` #

and_conditions(conditions)

Create a preload condition that matches only if all passed conditions match

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

@staticmethod
def and_conditions(
    conditions: list[ManifestPreloadCondition],
) -> ManifestPreloadCondition:
    """Create a preload condition that matches only if all passed `conditions` match"""
    ...

false `staticmethod` #

false()

Create a preload condition that never matches any manifests

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

@staticmethod
def false() -> ManifestPreloadCondition:
    """Create a preload condition that never matches any manifests"""
    ...

name_matches `staticmethod` #

name_matches(regex)

Create a preload condition that matches if the array's name matches the passed regex.

Example, for an array /model/outputs/temperature, the following will match:

name_matches(".*temp.*")

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

@staticmethod
def name_matches(regex: str) -> ManifestPreloadCondition:
    """Create a preload condition that matches if the array's name matches the passed regex.

    Example, for an array  `/model/outputs/temperature`, the following will match:
    ```
    name_matches(".*temp.*")
    ```
    """
    ...

num_refs `staticmethod` #

num_refs(from_refs, to_refs)

Create a preload condition that matches only if the number of chunk references in the manifest is within the given range.

from_refs is inclusive, to_refs is exclusive.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

@staticmethod
def num_refs(from_refs: int | None, to_refs: int | None) -> ManifestPreloadCondition:
    """Create a preload condition that matches only if the number of chunk references in the manifest is within the given range.

    from_refs is inclusive, to_refs is exclusive.
    """
    ...

or_conditions `staticmethod` #

or_conditions(conditions)

Create a preload condition that matches if any of conditions matches

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

@staticmethod
def or_conditions(
    conditions: list[ManifestPreloadCondition],
) -> ManifestPreloadCondition:
    """Create a preload condition that matches if any of `conditions` matches"""
    ...

path_matches `staticmethod` #

path_matches(regex)

Create a preload condition that matches if the full path to the array matches the passed regex.

Array paths are absolute, as in /path/to/my/array

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

@staticmethod
def path_matches(regex: str) -> ManifestPreloadCondition:
    """Create a preload condition that matches if the full path to the array matches the passed regex.

    Array paths are absolute, as in `/path/to/my/array`
    """
    ...

true `staticmethod` #

true()

Create a preload condition that always matches any manifest

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

@staticmethod
def true() -> ManifestPreloadCondition:
    """Create a preload condition that always matches any manifest"""
    ...

ManifestPreloadConfig #

Configuration for how Icechunk manifest preload on session creation

Methods:

Name	Description
`__init__`	Create a new `ManifestPreloadConfig` object

Attributes:

Name	Type	Description
`max_arrays_to_scan`	`int \| None`	The maximum number of arrays to scan when looking for manifests to preload.
`max_total_refs`	`int \| None`	The maximum number of references to preload.
`preload_if`	`ManifestPreloadCondition \| None`	The condition under which manifests will be preloaded.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class ManifestPreloadConfig:
    """Configuration for how Icechunk manifest preload on session creation"""

    def __init__(
        self,
        max_total_refs: int | None = None,
        preload_if: ManifestPreloadCondition | None = None,
        max_arrays_to_scan: int | None = None,
    ) -> None:
        """
        Create a new `ManifestPreloadConfig` object

        Parameters
        ----------
        max_total_refs: int | None
            The maximum number of references to preload.
        preload_if: ManifestPreloadCondition | None
            The condition under which manifests will be preloaded.
        max_arrays_to_scan: int | None
            The maximum number of arrays to scan when looking for manifests to preload.
            Default is 50. Increase for repositories with many nested groups.
        """
        ...
    @property
    def max_total_refs(self) -> int | None:
        """
        The maximum number of references to preload.

        Returns
        -------
        int | None
            The maximum number of references to preload.
        """
        ...
    @max_total_refs.setter
    def max_total_refs(self, value: int | None) -> None:
        """
        Set the maximum number of references to preload.

        Parameters
        ----------
        value: int | None
            The maximum number of references to preload.
        """
        ...
    @property
    def preload_if(self) -> ManifestPreloadCondition | None:
        """
        The condition under which manifests will be preloaded.

        Returns
        -------
        ManifestPreloadCondition | None
            The condition under which manifests will be preloaded.
        """
        ...
    @preload_if.setter
    def preload_if(self, value: ManifestPreloadCondition | None) -> None:
        """
        Set the condition under which manifests will be preloaded.

        Parameters
        ----------
        value: ManifestPreloadCondition | None
            The condition under which manifests will be preloaded.
        """
        ...
    @property
    def max_arrays_to_scan(self) -> int | None:
        """
        The maximum number of arrays to scan when looking for manifests to preload.

        Returns
        -------
        int | None
            The maximum number of arrays to scan. Default is 50.
        """
        ...
    @max_arrays_to_scan.setter
    def max_arrays_to_scan(self, value: int | None) -> None:
        """
        Set the maximum number of arrays to scan when looking for manifests to preload.

        Parameters
        ----------
        value: int | None
            The maximum number of arrays to scan.
        """
        ...

max_arrays_to_scan `property` `writable` #

max_arrays_to_scan

The maximum number of arrays to scan when looking for manifests to preload.

Returns:

Type	Description
`int \| None`	The maximum number of arrays to scan. Default is 50.

max_total_refs `property` `writable` #

max_total_refs

The maximum number of references to preload.

Returns:

Type	Description
`int \| None`	The maximum number of references to preload.

preload_if `property` `writable` #

preload_if

The condition under which manifests will be preloaded.

Returns:

Type	Description
`ManifestPreloadCondition \| None`	The condition under which manifests will be preloaded.

init #

__init__(max_total_refs=None, preload_if=None, max_arrays_to_scan=None)

Create a new ManifestPreloadConfig object

Parameters:

Name	Type	Description	Default
`max_total_refs`	`int \| None`	The maximum number of references to preload.	`None`
`preload_if`	`ManifestPreloadCondition \| None`	The condition under which manifests will be preloaded.	`None`
`max_arrays_to_scan`	`int \| None`	The maximum number of arrays to scan when looking for manifests to preload. Default is 50. Increase for repositories with many nested groups.	`None`

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

def __init__(
    self,
    max_total_refs: int | None = None,
    preload_if: ManifestPreloadCondition | None = None,
    max_arrays_to_scan: int | None = None,
) -> None:
    """
    Create a new `ManifestPreloadConfig` object

    Parameters
    ----------
    max_total_refs: int | None
        The maximum number of references to preload.
    preload_if: ManifestPreloadCondition | None
        The condition under which manifests will be preloaded.
    max_arrays_to_scan: int | None
        The maximum number of arrays to scan when looking for manifests to preload.
        Default is 50. Increase for repositories with many nested groups.
    """
    ...

ManifestSplitCondition #

Configuration for conditions under which manifests will be split into splits

Methods:

Name	Description
`AnyArray`	Create a splitting condition that matches any array.
`__and__`	Create a splitting condition that matches if both this condition and `other` match
`__or__`	Create a splitting condition that matches if either this condition or `other` matches
`and_conditions`	Create a splitting condition that matches only if all passed `conditions` match
`name_matches`	Create a splitting condition that matches if the array's name matches the passed regex.
`or_conditions`	Create a splitting condition that matches if any of `conditions` matches
`path_matches`	Create a splitting condition that matches if the full path to the array matches the passed regex.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class ManifestSplitCondition:
    """Configuration for conditions under which manifests will be split into splits"""

    @staticmethod
    def or_conditions(
        conditions: list[ManifestSplitCondition],
    ) -> ManifestSplitCondition:
        """Create a splitting condition that matches if any of `conditions` matches"""
        ...
    @staticmethod
    def and_conditions(
        conditions: list[ManifestSplitCondition],
    ) -> ManifestSplitCondition:
        """Create a splitting condition that matches only if all passed `conditions` match"""
        ...
    @staticmethod
    def path_matches(regex: str) -> ManifestSplitCondition:
        """Create a splitting condition that matches if the full path to the array matches the passed regex.

        Array paths are absolute, as in `/path/to/my/array`
        """
        ...
    @staticmethod
    def name_matches(regex: str) -> ManifestSplitCondition:
        """Create a splitting condition that matches if the array's name matches the passed regex.

        Example, for an array  `/model/outputs/temperature`, the following will match:
        ```
        name_matches(".*temp.*")
        ```
        """
        ...

    @staticmethod
    def AnyArray() -> ManifestSplitCondition:
        """Create a splitting condition that matches any array."""
        ...

    def __or__(self, other: ManifestSplitCondition) -> ManifestSplitCondition:
        """Create a splitting condition that matches if either this condition or `other` matches"""
        ...

    def __and__(self, other: ManifestSplitCondition) -> ManifestSplitCondition:
        """Create a splitting condition that matches if both this condition and `other` match"""
        ...

AnyArray `staticmethod` #

AnyArray()

Create a splitting condition that matches any array.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

@staticmethod
def AnyArray() -> ManifestSplitCondition:
    """Create a splitting condition that matches any array."""
    ...

and #

__and__(other)

Create a splitting condition that matches if both this condition and other match

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

def __and__(self, other: ManifestSplitCondition) -> ManifestSplitCondition:
    """Create a splitting condition that matches if both this condition and `other` match"""
    ...

or #

__or__(other)

Create a splitting condition that matches if either this condition or other matches

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

def __or__(self, other: ManifestSplitCondition) -> ManifestSplitCondition:
    """Create a splitting condition that matches if either this condition or `other` matches"""
    ...

and_conditions `staticmethod` #

and_conditions(conditions)

Create a splitting condition that matches only if all passed conditions match

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

@staticmethod
def and_conditions(
    conditions: list[ManifestSplitCondition],
) -> ManifestSplitCondition:
    """Create a splitting condition that matches only if all passed `conditions` match"""
    ...

name_matches `staticmethod` #

name_matches(regex)

Create a splitting condition that matches if the array's name matches the passed regex.

Example, for an array /model/outputs/temperature, the following will match:

name_matches(".*temp.*")

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

@staticmethod
def name_matches(regex: str) -> ManifestSplitCondition:
    """Create a splitting condition that matches if the array's name matches the passed regex.

    Example, for an array  `/model/outputs/temperature`, the following will match:
    ```
    name_matches(".*temp.*")
    ```
    """
    ...

or_conditions `staticmethod` #

or_conditions(conditions)

Create a splitting condition that matches if any of conditions matches

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

@staticmethod
def or_conditions(
    conditions: list[ManifestSplitCondition],
) -> ManifestSplitCondition:
    """Create a splitting condition that matches if any of `conditions` matches"""
    ...

path_matches `staticmethod` #

path_matches(regex)

Create a splitting condition that matches if the full path to the array matches the passed regex.

Array paths are absolute, as in /path/to/my/array

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

@staticmethod
def path_matches(regex: str) -> ManifestSplitCondition:
    """Create a splitting condition that matches if the full path to the array matches the passed regex.

    Array paths are absolute, as in `/path/to/my/array`
    """
    ...

ManifestSplitDimCondition #

Conditions for specifying dimensions along which to shard manifests.

Classes:

Name	Description
`Any`	Split along any other unspecified dimension.
`Axis`	Split along specified integer axis.
`DimensionName`	Split along specified named dimension.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class ManifestSplitDimCondition:
    """Conditions for specifying dimensions along which to shard manifests."""
    class Axis:
        """Split along specified integer axis."""
        def __init__(self, axis: int) -> None: ...

    class DimensionName:
        """Split along specified named dimension."""
        def __init__(self, regex: str) -> None: ...

    class Any:
        """Split along any other unspecified dimension."""
        def __init__(self) -> None: ...

Any #

Split along any other unspecified dimension.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class Any:
    """Split along any other unspecified dimension."""
    def __init__(self) -> None: ...

Axis #

Split along specified integer axis.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class Axis:
    """Split along specified integer axis."""
    def __init__(self, axis: int) -> None: ...

DimensionName #

Split along specified named dimension.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class DimensionName:
    """Split along specified named dimension."""
    def __init__(self, regex: str) -> None: ...

ManifestSplittingConfig #

Configuration for manifest splitting.

Methods:

Name	Description
`__init__`	Configuration for how Icechunk manifests will be split.

Attributes:

Name	Type	Description
`split_sizes`	`SplitSizes`	Configuration for how Icechunk manifests will be split.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class ManifestSplittingConfig:
    """Configuration for manifest splitting."""

    @staticmethod
    def from_dict(
        split_sizes: dict[
            ManifestSplitCondition,
            dict[
                ManifestSplitDimCondition.Axis
                | ManifestSplitDimCondition.DimensionName
                | ManifestSplitDimCondition.Any,
                int,
            ],
        ],
    ) -> ManifestSplittingConfig: ...
    def to_dict(
        config: ManifestSplittingConfig,
    ) -> dict[
        ManifestSplitCondition,
        dict[
            ManifestSplitDimCondition.Axis
            | ManifestSplitDimCondition.DimensionName
            | ManifestSplitDimCondition.Any,
            int,
        ],
    ]: ...
    def __init__(self, split_sizes: SplitSizes) -> None:
        """Configuration for how Icechunk manifests will be split.

        Parameters
        ----------
        split_sizes: tuple[tuple[ManifestSplitCondition, tuple[tuple[ManifestSplitDimCondition, int], ...]], ...]
            The configuration for how Icechunk manifests will be preloaded.

        Examples
        --------

        Split manifests for the `temperature` array, with 3 chunks per shard along the `longitude` dimension.
        >>> ManifestSplittingConfig.from_dict(
        ...     {
        ...         ManifestSplitCondition.name_matches("temperature"): {
        ...             ManifestSplitDimCondition.DimensionName("longitude"): 3
        ...         }
        ...     }
        ... )
        """
        pass

    @property
    def split_sizes(self) -> SplitSizes:
        """
        Configuration for how Icechunk manifests will be split.

        Returns
        -------
        tuple[tuple[ManifestSplitCondition, tuple[tuple[ManifestSplitDimCondition, int], ...]], ...]
            The configuration for how Icechunk manifests will be preloaded.
        """
        ...

    @split_sizes.setter
    def split_sizes(self, value: SplitSizes) -> None:
        """
        Set the sizes for how Icechunk manifests will be split.

        Parameters
        ----------
        value: tuple[tuple[ManifestSplitCondition, tuple[tuple[ManifestSplitDimCondition, int], ...]], ...]
            The configuration for how Icechunk manifests will be preloaded.
        """
        ...

split_sizes `property` `writable` #

split_sizes

Configuration for how Icechunk manifests will be split.

Returns:

Type	Description
`tuple[tuple[ManifestSplitCondition, tuple[tuple[ManifestSplitDimCondition, int], ...]], ...]`	The configuration for how Icechunk manifests will be preloaded.

init #

__init__(split_sizes)

Configuration for how Icechunk manifests will be split.

Parameters:

Name	Type	Description	Default
`split_sizes`	`SplitSizes`	The configuration for how Icechunk manifests will be preloaded.	required

Examples:

Split manifests for the temperature array, with 3 chunks per shard along the longitude dimension.

>>> ManifestSplittingConfig.from_dict(
...     {
...         ManifestSplitCondition.name_matches("temperature"): {
...             ManifestSplitDimCondition.DimensionName("longitude"): 3
...         }
...     }
... )

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

def __init__(self, split_sizes: SplitSizes) -> None:
    """Configuration for how Icechunk manifests will be split.

    Parameters
    ----------
    split_sizes: tuple[tuple[ManifestSplitCondition, tuple[tuple[ManifestSplitDimCondition, int], ...]], ...]
        The configuration for how Icechunk manifests will be preloaded.

    Examples
    --------

    Split manifests for the `temperature` array, with 3 chunks per shard along the `longitude` dimension.
    >>> ManifestSplittingConfig.from_dict(
    ...     {
    ...         ManifestSplitCondition.name_matches("temperature"): {
    ...             ManifestSplitDimCondition.DimensionName("longitude"): 3
    ...         }
    ...     }
    ... )
    """
    pass

RebaseFailedError #

Bases: IcechunkError

An error that occurs when a rebase operation fails

Methods:

Name	Description
`__init__`	Create a new RebaseFailedError.

Attributes:

Name	Type	Description
`conflicts`	`list[Conflict]`	The conflicts that occurred during the rebase operation
`snapshot`	`str`	The snapshot ID that the session was rebased to

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class RebaseFailedError(IcechunkError):
    """An error that occurs when a rebase operation fails"""

    def __init__(self, snapshot: str, conflicts: list[Conflict]) -> None:
        """
        Create a new RebaseFailedError.

        Parameters
        ----------
        snapshot: str
            The snapshot ID that the session was rebased to.
        conflicts: list[Conflict]
            The conflicts that occurred during the rebase operation.
        """
        ...

    @property
    def snapshot(self) -> str:
        """The snapshot ID that the session was rebased to"""
        ...

    @property
    def conflicts(self) -> list[Conflict]:
        """The conflicts that occurred during the rebase operation

        Returns:
            list[Conflict]: The conflicts that occurred during the rebase operation
        """
    ...

conflicts `property` #

conflicts

The conflicts that occurred during the rebase operation

Returns: list[Conflict]: The conflicts that occurred during the rebase operation

snapshot `property` #

snapshot

The snapshot ID that the session was rebased to

init #

__init__(snapshot, conflicts)

Create a new RebaseFailedError.

Parameters:

Name	Type	Description	Default
`snapshot`	`str`	The snapshot ID that the session was rebased to.	required
`conflicts`	`list[Conflict]`	The conflicts that occurred during the rebase operation.	required

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

def __init__(self, snapshot: str, conflicts: list[Conflict]) -> None:
    """
    Create a new RebaseFailedError.

    Parameters
    ----------
    snapshot: str
        The snapshot ID that the session was rebased to.
    conflicts: list[Conflict]
        The conflicts that occurred during the rebase operation.
    """
    ...

Repository #

An Icechunk repository.

Methods:

Name	Description
`ancestry`	Get the ancestry of a snapshot.
`async_ancestry`	Get the ancestry of a snapshot.
`chunk_storage_stats`	Calculate the total storage used for chunks, in bytes.
`chunk_storage_stats_async`	Calculate the total storage used for chunks, in bytes (async version).
`create`	Create a new Icechunk repository.
`create_async`	Create a new Icechunk repository asynchronously.
`create_branch`	Create a new branch at the given snapshot.
`create_branch_async`	Create a new branch at the given snapshot (async version).
`create_tag`	Create a new tag at the given snapshot.
`create_tag_async`	Create a new tag at the given snapshot (async version).
`default_commit_metadata`	Get the current configured default commit metadata for the repository.
`delete_branch`	Delete a branch.
`delete_branch_async`	Delete a branch (async version).
`delete_tag`	Delete a tag.
`delete_tag_async`	Delete a tag (async version).
`diff`	Compute an overview of the operations executed from version `from` to version `to`.
`diff_async`	Compute an overview of the operations executed from version `from` to version `to` (async version).
`exists`	Check if a repository exists at the given storage location.
`exists_async`	Check if a repository exists at the given storage location (async version).
`expire_snapshots`	Expire all snapshots older than a threshold.
`expire_snapshots_async`	Expire all snapshots older than a threshold (async version).
`fetch_config`	Fetch the configuration for the repository saved in storage.
`fetch_config_async`	Fetch the configuration for the repository saved in storage (async version).
`fetch_spec_version`	Fetch the spec version of a repository without fully opening it.
`fetch_spec_version_async`	Fetch the spec version of a repository without fully opening it (async version).
`garbage_collect`	Delete any objects no longer accessible from any branches or tags.
`garbage_collect_async`	Delete any objects no longer accessible from any branches or tags (async version).
`get_metadata`	Get the current configured repository metadata.
`get_metadata_async`	Get the current configured repository metadata.
`list_branches`	List the branches in the repository.
`list_branches_async`	List the branches in the repository (async version).
`list_manifest_files`	Get the manifest files used by the given snapshot ID
`list_manifest_files_async`	Get the manifest files used by the given snapshot ID
`list_tags`	List the tags in the repository.
`list_tags_async`	List the tags in the repository (async version).
`lookup_branch`	Get the tip snapshot ID of a branch.
`lookup_branch_async`	Get the tip snapshot ID of a branch (async version).
`lookup_snapshot`	Get the SnapshotInfo given a snapshot ID
`lookup_snapshot_async`	Get the SnapshotInfo given a snapshot ID (async version)
`lookup_tag`	Get the snapshot ID of a tag.
`lookup_tag_async`	Get the snapshot ID of a tag (async version).
`open`	Open an existing Icechunk repository.
`open_async`	Open an existing Icechunk repository asynchronously.
`open_or_create`	Open an existing Icechunk repository or create a new one if it does not exist.
`open_or_create_async`	Open an existing Icechunk repository or create a new one if it does not exist (async version).
`ops_log`	Get a summary of changes to the repository
`ops_log_async`	Get a summary of changes to the repository
`readonly_session`	Create a read-only session.
`readonly_session_async`	Create a read-only session (async version).
`rearrange_session`	Create a session to move/rename nodes in the Zarr hierarchy.
`rearrange_session_async`	Create a session to move/rename nodes in the Zarr hierarchy.
`reopen`	Reopen the repository with new configuration or credentials.
`reopen_async`	Reopen the repository with new configuration or credentials (async version).
`reset_branch`	Reset a branch to a specific snapshot.
`reset_branch_async`	Reset a branch to a specific snapshot (async version).
`rewrite_manifests`	Rewrite manifests for all arrays.
`rewrite_manifests_async`	Rewrite manifests for all arrays (async version).
`save_config`	Save the repository configuration to storage, this configuration will be used in future calls to Repository.open.
`save_config_async`	Save the repository configuration to storage (async version).
`set_default_commit_metadata`	Set the default commit metadata for the repository. This is useful for providing
`set_metadata`	Set the repository metadata, the passed dict will replace the complete metadata.
`set_metadata_async`	Set the repository metadata, the passed dict will replace the complete metadata.
`total_chunks_storage`	Calculate the total storage used for chunks, in bytes.
`total_chunks_storage_async`	Calculate the total storage used for chunks, in bytes (async version).
`transaction`	Create a transaction on a branch.
`update_metadata`	Update the repository metadata.
`update_metadata_async`	Update the repository metadata.
`writable_session`	Create a writable session on a branch.
`writable_session_async`	Create a writable session on a branch (async version).

Attributes:

Name	Type	Description
`authorized_virtual_container_prefixes`	`set[str]`	Get all authorized virtual chunk container prefixes.
`config`	`RepositoryConfig`	Get a copy of this repository's config.
`metadata`	`dict[str, Any]`	Get the current configured repository metadata.
`storage`	`Storage`	Get a copy of this repository's Storage instance.

Source code in icechunk-python/python/icechunk/repository.py

class Repository:
    """An Icechunk repository."""

    _repository: PyRepository

    def __init__(self, repository: PyRepository):
        self._repository = repository

    @classmethod
    def create(
        cls,
        storage: Storage,
        config: RepositoryConfig | None = None,
        authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
        spec_version: int | None = None,
    ) -> Self:
        """
        Create a new Icechunk repository.
        If one already exists at the given store location, an error will be raised.

        !!! warning
            Attempting to create a Repo concurrently in the same location from multiple processes is not safe.
            Instead, create a Repo once and then open it concurrently.

        Parameters
        ----------
        storage : Storage
            The storage configuration for the repository.
        config : RepositoryConfig, optional
            The repository configuration. If not provided, a default configuration will be used.
        authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
            Authorize Icechunk to access virtual chunks in these containers. A mapping
            from container url_prefix to the credentials to use to access chunks in
            that container. If credential is `None`, they will be fetched from the
            environment, or anonymous credentials will be used if the container allows it.
            As a security measure, Icechunk will block access to virtual chunks if the
            container is not authorized using this argument.
        spec_version : int, optional
            Use this version of the spec for the new repository. If not passed, the latest version
            of the spec that was available before the library version release will be used.

        Returns
        -------
        Self
            An instance of the Repository class.
        """
        return cls(
            PyRepository.create(
                storage,
                config=config,
                authorize_virtual_chunk_access=authorize_virtual_chunk_access,
                spec_version=spec_version,
            )
        )

    @classmethod
    async def create_async(
        cls,
        storage: Storage,
        config: RepositoryConfig | None = None,
        authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
        spec_version: int | None = None,
    ) -> Self:
        """
        Create a new Icechunk repository asynchronously.
        If one already exists at the given store location, an error will be raised.

        !!! warning
            Attempting to create a Repo concurrently in the same location from multiple processes is not safe.
            Instead, create a Repo once and then open it concurrently.

        Parameters
        ----------
        storage : Storage
            The storage configuration for the repository.
        config : RepositoryConfig, optional
            The repository configuration. If not provided, a default configuration will be used.
        authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
            Authorize Icechunk to access virtual chunks in these containers. A mapping
            from container url_prefix to the credentials to use to access chunks in
            that container. If credential is `None`, they will be fetched from the
            environment, or anonymous credentials will be used if the container allows it.
            As a security measure, Icechunk will block access to virtual chunks if the
            container is not authorized using this argument.
        spec_version : int, optional
            Use this version of the spec for the new repository. If not passed, the latest version
            of the spec that was available before the library version release will be used.

        Returns
        -------
        Self
            An instance of the Repository class.
        """
        return cls(
            await PyRepository.create_async(
                storage,
                config=config,
                authorize_virtual_chunk_access=authorize_virtual_chunk_access,
                spec_version=spec_version,
            )
        )

    @classmethod
    def open(
        cls,
        storage: Storage,
        config: RepositoryConfig | None = None,
        authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
    ) -> Self:
        """
        Open an existing Icechunk repository.

        If no repository exists at the given storage location, an error will be raised.

        !!! warning
            This method must be used with care in a multiprocessing context.
            Read more in our [Parallel Write Guide](./parallel.md#uncooperative-distributed-writes).

        Parameters
        ----------
        storage : Storage
            The storage configuration for the repository.
        config : RepositoryConfig, optional
            The repository settings. If not provided, a default configuration will be
            loaded from the repository.
        authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
            Authorize Icechunk to access virtual chunks in these containers. A mapping
            from container url_prefix to the credentials to use to access chunks in
            that container. If credential is `None`, they will be fetched from the
            environment, or anonymous credentials will be used if the container allows it.
            As a security measure, Icechunk will block access to virtual chunks if the
            container is not authorized using this argument.

        Returns
        -------
        Self
            An instance of the Repository class.
        """
        return cls(
            PyRepository.open(
                storage,
                config=config,
                authorize_virtual_chunk_access=authorize_virtual_chunk_access,
            )
        )

    @classmethod
    async def open_async(
        cls,
        storage: Storage,
        config: RepositoryConfig | None = None,
        authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
    ) -> Self:
        """
        Open an existing Icechunk repository asynchronously.

        If no repository exists at the given storage location, an error will be raised.

        !!! warning
            This method must be used with care in a multiprocessing context.
            Read more in our [Parallel Write Guide](./parallel.md#uncooperative-distributed-writes).

        Parameters
        ----------
        storage : Storage
            The storage configuration for the repository.
        config : RepositoryConfig, optional
            The repository settings. If not provided, a default configuration will be
            loaded from the repository.
        authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
            Authorize Icechunk to access virtual chunks in these containers. A mapping
            from container url_prefix to the credentials to use to access chunks in
            that container. If credential is `None`, they will be fetched from the
            environment, or anonymous credentials will be used if the container allows it.
            As a security measure, Icechunk will block access to virtual chunks if the
            container is not authorized using this argument.

        Returns
        -------
        Self
            An instance of the Repository class.
        """
        return cls(
            await PyRepository.open_async(
                storage,
                config=config,
                authorize_virtual_chunk_access=authorize_virtual_chunk_access,
            )
        )

    @classmethod
    def open_or_create(
        cls,
        storage: Storage,
        config: RepositoryConfig | None = None,
        authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
        create_version: int | None = None,
    ) -> Self:
        """
        Open an existing Icechunk repository or create a new one if it does not exist.

        !!! warning
            This method must be used with care in a multiprocessing context.
            Read more in our [Parallel Write Guide](./parallel.md#uncooperative-distributed-writes).

            Attempting to create a Repo concurrently in the same location from multiple processes is not safe.
            Instead, create a Repo once and then open it concurrently.

        Parameters
        ----------
        storage : Storage
            The storage configuration for the repository.
        config : RepositoryConfig, optional
            The repository settings. If not provided, a default configuration will be
            loaded from the repository.
        authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
            Authorize Icechunk to access virtual chunks in these containers. A mapping
            from container url_prefix to the credentials to use to access chunks in
            that container. If credential is `None`, they will be fetched from the
            environment, or anonymous credentials will be used if the container allows it.
            As a security measure, Icechunk will block access to virtual chunks if the
            container is not authorized using this argument.
        create_version : int, optional
            Use this version of the spec for the new repository, if it needs to be created.
            If not passed, the latest version of the spec that was available before the
            library version release will be used.


        Returns
        -------
        Self
            An instance of the Repository class.
        """
        return cls(
            PyRepository.open_or_create(
                storage,
                config=config,
                authorize_virtual_chunk_access=authorize_virtual_chunk_access,
                create_version=create_version,
            )
        )

    @classmethod
    async def open_or_create_async(
        cls,
        storage: Storage,
        config: RepositoryConfig | None = None,
        authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
        create_version: int | None = None,
    ) -> Self:
        """
        Open an existing Icechunk repository or create a new one if it does not exist (async version).

        !!! warning
            This method must be used with care in a multiprocessing context.
            Read more in our [Parallel Write Guide](./parallel.md#uncooperative-distributed-writes).

            Attempting to create a Repo concurrently in the same location from multiple processes is not safe.
            Instead, create a Repo once and then open it concurrently.

        Parameters
        ----------
        storage : Storage
            The storage configuration for the repository.
        config : RepositoryConfig, optional
            The repository settings. If not provided, a default configuration will be
            loaded from the repository.
        authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
            Authorize Icechunk to access virtual chunks in these containers. A mapping
            from container url_prefix to the credentials to use to access chunks in
            that container. If credential is `None`, they will be fetched from the
            environment, or anonymous credentials will be used if the container allows it.
            As a security measure, Icechunk will block access to virtual chunks if the
            container is not authorized using this argument.
        create_version : int, optional
            Use this version of the spec for the new repository, if it needs to be created.
            If not passed, the latest version of the spec that was available before the
            library version release will be used.

        Returns
        -------
        Self
            An instance of the Repository class.
        """
        return cls(
            await PyRepository.open_or_create_async(
                storage,
                config=config,
                authorize_virtual_chunk_access=authorize_virtual_chunk_access,
                create_version=create_version,
            )
        )

    @staticmethod
    def exists(storage: Storage) -> bool:
        """
        Check if a repository exists at the given storage location.

        Parameters
        ----------
        storage : Storage
            The storage configuration for the repository.

        Returns
        -------
        bool
            True if the repository exists, False otherwise.
        """
        return PyRepository.exists(storage)

    @staticmethod
    async def exists_async(storage: Storage) -> bool:
        """
        Check if a repository exists at the given storage location (async version).

        Parameters
        ----------
        storage : Storage
            The storage configuration for the repository.

        Returns
        -------
        bool
            True if the repository exists, False otherwise.
        """
        return await PyRepository.exists_async(storage)

    @staticmethod
    def fetch_spec_version(storage: Storage) -> int | None:
        """
        Fetch the spec version of a repository without fully opening it.

        This is useful for checking the repository format version before opening,
        for example to know what version of the library is needed to open it.

        Parameters
        ----------
        storage : Storage
            The storage configuration for the repository.

        Returns
        -------
        int | None
            The spec version of the repository if it exists, None if no repository
            exists at the given location.
        """
        return PyRepository.fetch_spec_version(storage)

    @staticmethod
    async def fetch_spec_version_async(storage: Storage) -> int | None:
        """
        Fetch the spec version of a repository without fully opening it (async version).

        This is useful for checking the repository format version before opening,
        for example to know what version of the library is needed to open it.

        Parameters
        ----------
        storage : Storage
            The storage configuration for the repository.

        Returns
        -------
        int | None
            The spec version of the repository if it exists, None if no repository
            exists at the given location.
        """
        return await PyRepository.fetch_spec_version_async(storage)

    def __getstate__(self) -> object:
        return {
            "_repository": self._repository.as_bytes(),
        }

    def __setstate__(self, state: object) -> None:
        if not isinstance(state, dict):
            raise ValueError("Invalid repository state")
        self._repository = PyRepository.from_bytes(state["_repository"])

    @staticmethod
    def fetch_config(storage: Storage) -> RepositoryConfig | None:
        """
        Fetch the configuration for the repository saved in storage.

        Parameters
        ----------
        storage : Storage
            The storage configuration for the repository.

        Returns
        -------
        RepositoryConfig | None
            The repository configuration if it exists, None otherwise.
        """
        return PyRepository.fetch_config(storage)

    @staticmethod
    async def fetch_config_async(storage: Storage) -> RepositoryConfig | None:
        """
        Fetch the configuration for the repository saved in storage (async version).

        Parameters
        ----------
        storage : Storage
            The storage configuration for the repository.

        Returns
        -------
        RepositoryConfig | None
            The repository configuration if it exists, None otherwise.
        """
        return await PyRepository.fetch_config_async(storage)

    def save_config(self) -> None:
        """
        Save the repository configuration to storage, this configuration will be used in future calls to Repository.open.

        Returns
        -------
        None
        """
        return self._repository.save_config()

    async def save_config_async(self) -> None:
        """
        Save the repository configuration to storage (async version).

        Returns
        -------
        None
        """
        return await self._repository.save_config_async()

    @property
    def config(self) -> RepositoryConfig:
        """
        Get a copy of this repository's config.

        Returns
        -------
        RepositoryConfig
            The repository configuration.
        """
        return self._repository.config()

    @property
    def storage(self) -> Storage:
        """
        Get a copy of this repository's Storage instance.

        Returns
        -------
        Storage
            The repository storage instance.
        """
        return self._repository.storage()

    @property
    def authorized_virtual_container_prefixes(self) -> set[str]:
        """
        Get all authorized virtual chunk container prefixes.

        Returns
        -------
        url_prefixes: set[str]
            The set of authorized url prefixes for each virtual chunk container
        """
        return self._repository.authorized_virtual_container_prefixes

    def reopen(
        self,
        config: RepositoryConfig | None = None,
        authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
    ) -> Self:
        """
        Reopen the repository with new configuration or credentials.

        Parameters
        ----------
        config : RepositoryConfig, optional
            The new repository configuration. If not provided, uses the existing configuration.
        authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
            New virtual chunk access credentials.

        Returns
        -------
        Self
            A new Repository instance with the updated configuration.
        """
        return self.__class__(
            self._repository.reopen(
                config=config,
                authorize_virtual_chunk_access=authorize_virtual_chunk_access,
            )
        )

    async def reopen_async(
        self,
        config: RepositoryConfig | None = None,
        authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
    ) -> Self:
        """
        Reopen the repository with new configuration or credentials (async version).

        Parameters
        ----------
        config : RepositoryConfig, optional
            The new repository configuration. If not provided, uses the existing configuration.
        authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
            New virtual chunk access credentials.

        Returns
        -------
        Self
            A new Repository instance with the updated configuration.
        """
        return self.__class__(
            await self._repository.reopen_async(
                config=config,
                authorize_virtual_chunk_access=authorize_virtual_chunk_access,
            )
        )

    def set_default_commit_metadata(self, metadata: dict[str, Any]) -> None:
        """
        Set the default commit metadata for the repository. This is useful for providing
        addition static system conexted metadata to all commits.

        When a commit is made, the metadata will be merged with the metadata provided, with any
        duplicate keys being overwritten by the metadata provided in the commit.

        !!! warning
            This metadata is only applied to sessions that are created after this call. Any open
            writable sessions will not be affected and will not use the new default metadata.

        Parameters
        ----------
        metadata : dict[str, Any]
            The default commit metadata. Pass an empty dict to clear the default metadata.
        """
        return self._repository.set_default_commit_metadata(metadata)

    def default_commit_metadata(self) -> dict[str, Any]:
        """
        Get the current configured default commit metadata for the repository.

        Returns
        -------
        dict[str, Any]
            The default commit metadata.
        """
        return self._repository.default_commit_metadata()

    def get_metadata(self) -> dict[str, Any]:
        """
        Get the current configured repository metadata.

        Returns
        -------
        dict[str, Any]
            The repository level metadata.
        """
        return self._repository.get_metadata()

    @property
    def metadata(self) -> dict[str, Any]:
        """
        Get the current configured repository metadata.

        Returns
        -------
        dict[str, Any]
            The repository level metadata.
        """
        return self._repository.get_metadata()

    async def get_metadata_async(self) -> dict[str, Any]:
        """
        Get the current configured repository metadata.

        Returns
        -------
        dict[str, Any]
            The repository level metadata.
        """
        return await self._repository.get_metadata_async()

    def set_metadata(self, metadata: dict[str, Any]) -> None:
        """
        Set the repository metadata, the passed dict will replace the complete metadata.

        If you prefer to only update some metadata values, use Repository.update_metadata

        Parameters
        ----------
        metadata : dict[str, Any]
            The value to use as repository metadata.
        """
        self._repository.set_metadata(metadata)

    async def set_metadata_async(self, metadata: dict[str, Any]) -> None:
        """
        Set the repository metadata, the passed dict will replace the complete metadata.

        If you prefer to only update some metadata values, use Repository.update_metadata

        Parameters
        ----------
        metadata : dict[str, Any]
            The value to use as repository metadata.
        """
        await self._repository.set_metadata_async(metadata)

    def update_metadata(self, metadata: dict[str, Any]) -> dict[str, Any]:
        """
        Update the repository metadata.

        The passed dict will be merged with the current metadata, overriding existing keys.

        Parameters
        ----------
        metadata : dict[str, Any]
            The dict to merge into the repository metadata.
        """
        return self._repository.update_metadata(metadata)

    async def update_metadata_async(self, metadata: dict[str, Any]) -> dict[str, Any]:
        """
        Update the repository metadata.

        The passed dict will be merged with the current metadata, overriding existing keys.

        Parameters
        ----------
        metadata : dict[str, Any]
            The dict to merge into the repository metadata.
        """
        return await self._repository.update_metadata_async(metadata)

    def ancestry(
        self,
        *,
        branch: str | None = None,
        tag: str | None = None,
        snapshot_id: str | None = None,
    ) -> Iterator[SnapshotInfo]:
        """
        Get the ancestry of a snapshot.

        Parameters
        ----------
        branch : str, optional
            The branch to get the ancestry of.
        tag : str, optional
            The tag to get the ancestry of.
        snapshot_id : str, optional
            The snapshot ID to get the ancestry of.

        Returns
        -------
        list[SnapshotInfo]
            The ancestry of the snapshot, listing out the snapshots and their metadata.

        Notes
        -----
        Only one of the arguments can be specified.
        """

        # the returned object is both an Async and Sync iterator
        res = cast(
            Iterator[SnapshotInfo],
            self._repository.async_ancestry(
                branch=branch, tag=tag, snapshot_id=snapshot_id
            ),
        )
        return res

    def async_ancestry(
        self,
        *,
        branch: str | None = None,
        tag: str | None = None,
        snapshot_id: str | None = None,
    ) -> AsyncIterator[SnapshotInfo]:
        """
        Get the ancestry of a snapshot.

        Parameters
        ----------
        branch : str, optional
            The branch to get the ancestry of.
        tag : str, optional
            The tag to get the ancestry of.
        snapshot_id : str, optional
            The snapshot ID to get the ancestry of.

        Returns
        -------
        list[SnapshotInfo]
            The ancestry of the snapshot, listing out the snapshots and their metadata.

        Notes
        -----
        Only one of the arguments can be specified.
        """
        return self._repository.async_ancestry(
            branch=branch, tag=tag, snapshot_id=snapshot_id
        )

    def ops_log(self) -> Iterator[UpdateType]:
        """
        Get a summary of changes to the repository
        """

        # the returned object is both an Async and Sync iterator
        res = cast(
            Iterator[UpdateType],
            self._repository.async_ops_log(),
        )
        return res

    def ops_log_async(self) -> AsyncIterator[UpdateType]:
        """
        Get a summary of changes to the repository
        """

        # the returned object is both an Async and Sync iterator
        return self._repository.async_ops_log()

    def create_branch(self, branch: str, snapshot_id: str) -> None:
        """
        Create a new branch at the given snapshot.

        Parameters
        ----------
        branch : str
            The name of the branch to create.
        snapshot_id : str
            The snapshot ID to create the branch at.

        Returns
        -------
        None
        """
        self._repository.create_branch(branch, snapshot_id)

    async def create_branch_async(self, branch: str, snapshot_id: str) -> None:
        """
        Create a new branch at the given snapshot (async version).

        Parameters
        ----------
        branch : str
            The name of the branch to create.
        snapshot_id : str
            The snapshot ID to create the branch at.

        Returns
        -------
        None
        """
        await self._repository.create_branch_async(branch, snapshot_id)

    def list_branches(self) -> set[str]:
        """
        List the branches in the repository.

        Returns
        -------
        set[str]
            A set of branch names.
        """
        return self._repository.list_branches()

    async def list_branches_async(self) -> set[str]:
        """
        List the branches in the repository (async version).

        Returns
        -------
        set[str]
            A set of branch names.
        """
        return await self._repository.list_branches_async()

    def lookup_branch(self, branch: str) -> str:
        """
        Get the tip snapshot ID of a branch.

        Parameters
        ----------
        branch : str
            The branch to get the tip of.

        Returns
        -------
        str
            The snapshot ID of the tip of the branch.
        """
        return self._repository.lookup_branch(branch)

    async def lookup_branch_async(self, branch: str) -> str:
        """
        Get the tip snapshot ID of a branch (async version).

        Parameters
        ----------
        branch : str
            The branch to get the tip of.

        Returns
        -------
        str
            The snapshot ID of the tip of the branch.
        """
        return await self._repository.lookup_branch_async(branch)

    def lookup_snapshot(self, snapshot_id: str) -> SnapshotInfo:
        """
        Get the SnapshotInfo given a snapshot ID

        Parameters
        ----------
        snapshot_id : str
            The id of the snapshot to look up

        Returns
        -------
        SnapshotInfo
        """
        return self._repository.lookup_snapshot(snapshot_id)

    async def lookup_snapshot_async(self, snapshot_id: str) -> SnapshotInfo:
        """
        Get the SnapshotInfo given a snapshot ID (async version)

        Parameters
        ----------
        snapshot_id : str
            The id of the snapshot to look up

        Returns
        -------
        SnapshotInfo
        """
        return await self._repository.lookup_snapshot_async(snapshot_id)

    def list_manifest_files(self, snapshot_id: str) -> list[ManifestFileInfo]:
        """
        Get the manifest files used by the given snapshot ID

        Parameters
        ----------
        snapshot_id : str
            The id of the snapshot to get information for

        Returns
        -------
        list[ManifestFileInfo]
        """
        return self._repository.list_manifest_files(snapshot_id)

    async def list_manifest_files_async(self, snapshot_id: str) -> list[ManifestFileInfo]:
        """
        Get the manifest files used by the given snapshot ID

        Parameters
        ----------
        snapshot_id : str
            The id of the snapshot to get information for

        Returns
        -------
        list[ManifestFileInfo]
        """
        return await self._repository.list_manifest_files_async(snapshot_id)

    def reset_branch(
        self, branch: str, snapshot_id: str, *, from_snapshot_id: str | None = None
    ) -> None:
        """
        Reset a branch to a specific snapshot.

        This will permanently alter the history of the branch such that the tip of
        the branch is the specified snapshot.

        Parameters
        ----------
        branch : str
            The branch to reset.
        snapshot_id : str
            The snapshot ID to reset the branch to.
        from_snapshot_id : str | None
            If passed, the reset will only be executed if the branch currently
            points to from_snapshot_id.

        Returns
        -------
        None
        """
        self._repository.reset_branch(branch, snapshot_id, from_snapshot_id)

    async def reset_branch_async(
        self, branch: str, snapshot_id: str, *, from_snapshot_id: str | None = None
    ) -> None:
        """
        Reset a branch to a specific snapshot (async version).

        This will permanently alter the history of the branch such that the tip of
        the branch is the specified snapshot.

        Parameters
        ----------
        branch : str
            The branch to reset.
        snapshot_id : str
            The snapshot ID to reset the branch to.
        from_snapshot_id : str | None
            If passed, the reset will only be executed if the branch currently
            points to from_snapshot_id.

        Returns
        -------
        None
        """
        await self._repository.reset_branch_async(branch, snapshot_id, from_snapshot_id)

    def delete_branch(self, branch: str) -> None:
        """
        Delete a branch.

        Parameters
        ----------
        branch : str
            The branch to delete.

        Returns
        -------
        None
        """
        self._repository.delete_branch(branch)

    async def delete_branch_async(self, branch: str) -> None:
        """
        Delete a branch (async version).

        Parameters
        ----------
        branch : str
            The branch to delete.

        Returns
        -------
        None
        """
        await self._repository.delete_branch_async(branch)

    def delete_tag(self, tag: str) -> None:
        """
        Delete a tag.

        Parameters
        ----------
        tag : str
            The tag to delete.

        Returns
        -------
        None
        """
        self._repository.delete_tag(tag)

    async def delete_tag_async(self, tag: str) -> None:
        """
        Delete a tag (async version).

        Parameters
        ----------
        tag : str
            The tag to delete.

        Returns
        -------
        None
        """
        await self._repository.delete_tag_async(tag)

    def create_tag(self, tag: str, snapshot_id: str) -> None:
        """
        Create a new tag at the given snapshot.

        Parameters
        ----------
        tag : str
            The name of the tag to create.
        snapshot_id : str
            The snapshot ID to create the tag at.

        Returns
        -------
        None
        """
        self._repository.create_tag(tag, snapshot_id)

    async def create_tag_async(self, tag: str, snapshot_id: str) -> None:
        """
        Create a new tag at the given snapshot (async version).

        Parameters
        ----------
        tag : str
            The name of the tag to create.
        snapshot_id : str
            The snapshot ID to create the tag at.

        Returns
        -------
        None
        """
        await self._repository.create_tag_async(tag, snapshot_id)

    def list_tags(self) -> set[str]:
        """
        List the tags in the repository.

        Returns
        -------
        set[str]
            A set of tag names.
        """
        return self._repository.list_tags()

    async def list_tags_async(self) -> set[str]:
        """
        List the tags in the repository (async version).

        Returns
        -------
        set[str]
            A set of tag names.
        """
        return await self._repository.list_tags_async()

    def lookup_tag(self, tag: str) -> str:
        """
        Get the snapshot ID of a tag.

        Parameters
        ----------
        tag : str
            The tag to get the snapshot ID of.

        Returns
        -------
        str
            The snapshot ID of the tag.
        """
        return self._repository.lookup_tag(tag)

    async def lookup_tag_async(self, tag: str) -> str:
        """
        Get the snapshot ID of a tag (async version).

        Parameters
        ----------
        tag : str
            The tag to get the snapshot ID of.

        Returns
        -------
        str
            The snapshot ID of the tag.
        """
        return await self._repository.lookup_tag_async(tag)

    def diff(
        self,
        *,
        from_branch: str | None = None,
        from_tag: str | None = None,
        from_snapshot_id: str | None = None,
        to_branch: str | None = None,
        to_tag: str | None = None,
        to_snapshot_id: str | None = None,
    ) -> Diff:
        """
        Compute an overview of the operations executed from version `from` to version `to`.

        Both versions, `from` and `to`, must be identified. Identification can be done using a branch, tag or snapshot id.
        The styles used to identify the `from` and `to` versions can be different.

        The `from` version must be a member of the `ancestry` of `to`.

        Returns
        -------
        Diff
            The operations executed between the two versions
        """
        return self._repository.diff(
            from_branch=from_branch,
            from_tag=from_tag,
            from_snapshot_id=from_snapshot_id,
            to_branch=to_branch,
            to_tag=to_tag,
            to_snapshot_id=to_snapshot_id,
        )

    async def diff_async(
        self,
        *,
        from_branch: str | None = None,
        from_tag: str | None = None,
        from_snapshot_id: str | None = None,
        to_branch: str | None = None,
        to_tag: str | None = None,
        to_snapshot_id: str | None = None,
    ) -> Diff:
        """
        Compute an overview of the operations executed from version `from` to version `to` (async version).

        Both versions, `from` and `to`, must be identified. Identification can be done using a branch, tag or snapshot id.
        The styles used to identify the `from` and `to` versions can be different.

        The `from` version must be a member of the `ancestry` of `to`.

        Returns
        -------
        Diff
            The operations executed between the two versions
        """
        return await self._repository.diff_async(
            from_branch=from_branch,
            from_tag=from_tag,
            from_snapshot_id=from_snapshot_id,
            to_branch=to_branch,
            to_tag=to_tag,
            to_snapshot_id=to_snapshot_id,
        )

    def readonly_session(
        self,
        branch: str | None = None,
        *,
        tag: str | None = None,
        snapshot_id: str | None = None,
        as_of: datetime.datetime | None = None,
    ) -> Session:
        """
        Create a read-only session.

        This can be thought of as a read-only checkout of the repository at a given snapshot.
        When branch or tag are provided, the session will be based on the tip of the branch or
        the snapshot ID of the tag.

        Parameters
        ----------
        branch : str, optional
            If provided, the branch to create the session on.
        tag : str, optional
            If provided, the tag to create the session on.
        snapshot_id : str, optional
            If provided, the snapshot ID to create the session on.
        as_of: datetime.datetime, optional
            When combined with the branch argument, it will open the session at the last
            snapshot that is at or before this datetime

        Returns
        -------
        Session
            The read-only session, pointing to the specified snapshot, tag, or branch.

        Notes
        -----
        Only one of the arguments can be specified.
        """
        return Session(
            self._repository.readonly_session(
                branch=branch, tag=tag, snapshot_id=snapshot_id, as_of=as_of
            )
        )

    async def readonly_session_async(
        self,
        branch: str | None = None,
        *,
        tag: str | None = None,
        snapshot_id: str | None = None,
        as_of: datetime.datetime | None = None,
    ) -> Session:
        """
        Create a read-only session (async version).

        This can be thought of as a read-only checkout of the repository at a given snapshot.
        When branch or tag are provided, the session will be based on the tip of the branch or
        the snapshot ID of the tag.

        Parameters
        ----------
        branch : str, optional
            If provided, the branch to create the session on.
        tag : str, optional
            If provided, the tag to create the session on.
        snapshot_id : str, optional
            If provided, the snapshot ID to create the session on.
        as_of: datetime.datetime, optional
            When combined with the branch argument, it will open the session at the last
            snapshot that is at or before this datetime

        Returns
        -------
        Session
            The read-only session, pointing to the specified snapshot, tag, or branch.

        Notes
        -----
        Only one of the arguments can be specified.
        """
        return Session(
            await self._repository.readonly_session_async(
                branch=branch, tag=tag, snapshot_id=snapshot_id, as_of=as_of
            )
        )

    def writable_session(self, branch: str) -> Session:
        """
        Create a writable session on a branch.

        Like the read-only session, this can be thought of as a checkout of the repository at the
        tip of the branch. However, this session is writable and can be used to make changes to the
        repository. When ready, the changes can be committed to the branch, after which the session will
        become a read-only session on the new snapshot.

        Parameters
        ----------
        branch : str
            The branch to create the session on.

        Returns
        -------
        Session
            The writable session on the branch.
        """
        return Session(self._repository.writable_session(branch))

    async def writable_session_async(self, branch: str) -> Session:
        """
        Create a writable session on a branch (async version).

        Like the read-only session, this can be thought of as a checkout of the repository at the
        tip of the branch. However, this session is writable and can be used to make changes to the
        repository. When ready, the changes can be committed to the branch, after which the session will
        become a read-only session on the new snapshot.

        Parameters
        ----------
        branch : str
            The branch to create the session on.

        Returns
        -------
        Session
            The writable session on the branch.
        """
        return Session(await self._repository.writable_session_async(branch))

    def rearrange_session(self, branch: str) -> Session:
        """
        Create a session to move/rename nodes in the Zarr hierarchy.

        Like the read-only session, this can be thought of as a checkout of the repository at the
        tip of the branch. However, this session is writable and can be used to make changes to the
        repository. When ready, the changes can be committed to the branch, after which the session will
        become a read-only session on the new snapshot.

        This session only allows to make changes through `Session.move`. If you want to modify data, and
        not only move nodes, use `Session.writable_session` instead.

        Parameters
        ----------
        branch : str
            The branch to create the session on.

        Returns
        -------
        Session
            The writable session on the branch.
        """
        return Session(self._repository.rearrange_session(branch))

    async def rearrange_session_async(self, branch: str) -> Session:
        """
        Create a session to move/rename nodes in the Zarr hierarchy.

        Like the read-only session, this can be thought of as a checkout of the repository at the
        tip of the branch. However, this session is writable and can be used to make changes to the
        repository. When ready, the changes can be committed to the branch, after which the session will
        become a read-only session on the new snapshot.

        This session only allows to make changes through `Session.move`. If you want to modify data, and
        not only move nodes, use `Session.writable_session` instead.

        Parameters
        ----------
        branch : str
            The branch to create the session on.

        Returns
        -------
        Session
            The writable session on the branch.
        """
        return Session(await self._repository.rearrange_session_async(branch))

    @contextmanager
    def transaction(
        self,
        branch: str,
        *,
        message: str,
        metadata: dict[str, Any] | None = None,
        rebase_with: ConflictSolver | None = None,
        rebase_tries: int = 1_000,
    ) -> Iterator[IcechunkStore]:
        """
        Create a transaction on a branch.

        This is a context manager that creates a writable session on the specified branch.
        When the context is exited, the session will be committed to the branch
        using the specified message.

        Parameters
        ----------
        branch : str
            The branch to create the transaction on.
        message : str
            The commit message to use when committing the session.
        metadata : dict[str, Any] | None, optional
            Additional metadata to store with the commit snapshot.
        rebase_with : ConflictSolver | None, optional
            If other session committed while the current session was writing, use Session.rebase with this solver.
        rebase_tries : int, optional
            If other session committed while the current session was writing, use Session.rebase up to this many times in a loop.

        Yields
        -------
        store : IcechunkStore
            A Zarr Store which can be used to interact with the data in the repository.
        """
        session = self.writable_session(branch)
        yield session.store
        session.commit(
            message=message,
            metadata=metadata,
            rebase_with=rebase_with,
            rebase_tries=rebase_tries,
        )

    def expire_snapshots(
        self,
        older_than: datetime.datetime,
        *,
        delete_expired_branches: bool = False,
        delete_expired_tags: bool = False,
    ) -> set[str]:
        """Expire all snapshots older than a threshold.

        This processes snapshots found by navigating all references in
        the repo, tags first, branches leter, both in lexicographical order.

        Returns the ids of all snapshots considered expired and skipped
        from history. Notice that this snapshot are not necessarily
        available for garbage collection, they could still be pointed by
        ether refs.

        If `delete_expired_*` is set to True, branches or tags that, after the
        expiration process, point to expired snapshots directly, will be
        deleted.

        Danger
        ------
        This is an administrative operation, it should be run
        carefully. The repository can still operate concurrently while
        `expire_snapshots` runs, but other readers can get inconsistent
        views of the repository history.

        Parameters
        ----------
        older_than: datetime.datetime
            Expire snapshots older than this time.
        delete_expired_branches: bool, optional
            Whether to delete any branches that now have only expired snapshots.
        delete_expired_tags: bool, optional
            Whether to delete any tags associated with expired snapshots

        Returns
        -------
        set of expires snapshot IDs
        """
        return self._repository.expire_snapshots(
            older_than,
            delete_expired_branches=delete_expired_branches,
            delete_expired_tags=delete_expired_tags,
        )

    async def expire_snapshots_async(
        self,
        older_than: datetime.datetime,
        *,
        delete_expired_branches: bool = False,
        delete_expired_tags: bool = False,
    ) -> set[str]:
        """Expire all snapshots older than a threshold (async version).

        This processes snapshots found by navigating all references in
        the repo, tags first, branches leter, both in lexicographical order.

        Returns the ids of all snapshots considered expired and skipped
        from history. Notice that this snapshot are not necessarily
        available for garbage collection, they could still be pointed by
        ether refs.

        If `delete_expired_*` is set to True, branches or tags that, after the
        expiration process, point to expired snapshots directly, will be
        deleted.

        Danger
        ------
        This is an administrative operation, it should be run
        carefully. The repository can still operate concurrently while
        `expire_snapshots` runs, but other readers can get inconsistent
        views of the repository history.

        Parameters
        ----------
        older_than: datetime.datetime
            Expire snapshots older than this time.
        delete_expired_branches: bool, optional
            Whether to delete any branches that now have only expired snapshots.
        delete_expired_tags: bool, optional
            Whether to delete any tags associated with expired snapshots

        Returns
        -------
        set of expires snapshot IDs
        """
        return await self._repository.expire_snapshots_async(
            older_than,
            delete_expired_branches=delete_expired_branches,
            delete_expired_tags=delete_expired_tags,
        )

    def rewrite_manifests(
        self, message: str, *, branch: str, metadata: dict[str, Any] | None = None
    ) -> str:
        """
        Rewrite manifests for all arrays.

        This method will start a new writable session on the specified branch,
        rewrite manifests for all arrays, and then commits with the specifeid ``message``
        and ``metadata``.

        A JSON representation of the currently active splitting configuration will be
        stored in the commit's metadata under the key `"splitting_config"`.

        Parameters
        ----------
        message : str
            The message to write with the commit.
        branch: str
            The branch to commit to.
        metadata : dict[str, Any] | None, optional
            Additional metadata to store with the commit snapshot.

        Returns
        -------
        str
            The snapshot ID of the new commit.

        """
        return self._repository.rewrite_manifests(
            message, branch=branch, metadata=metadata
        )

    async def rewrite_manifests_async(
        self, message: str, *, branch: str, metadata: dict[str, Any] | None = None
    ) -> str:
        """
        Rewrite manifests for all arrays (async version).

        This method will start a new writable session on the specified branch,
        rewrite manifests for all arrays, and then commits with the specifeid ``message``
        and ``metadata``.

        A JSON representation of the currently active splitting configuration will be
        stored in the commit's metadata under the key `"splitting_config"`.

        Parameters
        ----------
        message : str
            The message to write with the commit.
        branch: str
            The branch to commit to.
        metadata : dict[str, Any] | None, optional
            Additional metadata to store with the commit snapshot.

        Returns
        -------
        str
            The snapshot ID of the new commit.

        """
        return await self._repository.rewrite_manifests_async(
            message, branch=branch, metadata=metadata
        )

    def garbage_collect(
        self,
        delete_object_older_than: datetime.datetime,
        *,
        dry_run: bool = False,
        max_snapshots_in_memory: int = 50,
        max_compressed_manifest_mem_bytes: int = 512 * 1024 * 1024,
        max_concurrent_manifest_fetches: int = 500,
    ) -> GCSummary:
        """Delete any objects no longer accessible from any branches or tags.

        Danger
        ------
        This is an administrative operation, it should be run
        carefully. The repository can still operate concurrently while
        `garbage_collect` runs, but other reades can get inconsistent
        views if they are trying to access the expired snapshots.

        Parameters
        ----------
        delete_object_older_than: datetime.datetime
            Delete objects older than this time.
        dry_run: bool : bool
            Report results but don't delete any objects
        max_snapshots_in_memory : int
            Don't prefetch more than this many Snapshots to memory.
        max_compressed_manifest_mem_bytes : int
            Don't use more than this memory to store compressed in-flight manifests.
        max_concurrent_manifest_fetches : int
            Don't run more than this many concurrent manifest fetches.

        Returns
        -------
        GCSummary
            Summary of objects deleted.
        """

        return self._repository.garbage_collect(
            delete_object_older_than,
            dry_run=dry_run,
            max_snapshots_in_memory=max_snapshots_in_memory,
            max_compressed_manifest_mem_bytes=max_compressed_manifest_mem_bytes,
            max_concurrent_manifest_fetches=max_concurrent_manifest_fetches,
        )

    async def garbage_collect_async(
        self,
        delete_object_older_than: datetime.datetime,
        *,
        dry_run: bool = False,
        max_snapshots_in_memory: int = 50,
        max_compressed_manifest_mem_bytes: int = 512 * 1024 * 1024,
        max_concurrent_manifest_fetches: int = 500,
    ) -> GCSummary:
        """Delete any objects no longer accessible from any branches or tags (async version).

        Danger
        ------
        This is an administrative operation, it should be run
        carefully. The repository can still operate concurrently while
        `garbage_collect` runs, but other reades can get inconsistent
        views if they are trying to access the expired snapshots.

        Parameters
        ----------
        delete_object_older_than: datetime.datetime
            Delete objects older than this time.
        dry_run: bool : bool
            Report results but don't delete any objects
        max_snapshots_in_memory : int
            Don't prefetch more than this many Snapshots to memory.
        max_compressed_manifest_mem_bytes : int
            Don't use more than this memory to store compressed in-flight manifests.
        max_concurrent_manifest_fetches : int
            Don't run more than this many concurrent manifest fetches.

        Returns
        -------
        GCSummary
            Summary of objects deleted.
        """

        return await self._repository.garbage_collect_async(
            delete_object_older_than,
            dry_run=dry_run,
            max_snapshots_in_memory=max_snapshots_in_memory,
            max_compressed_manifest_mem_bytes=max_compressed_manifest_mem_bytes,
            max_concurrent_manifest_fetches=max_concurrent_manifest_fetches,
        )

    def chunk_storage_stats(
        self,
        *,
        max_snapshots_in_memory: int = 50,
        max_compressed_manifest_mem_bytes: int = 512 * 1024 * 1024,
        max_concurrent_manifest_fetches: int = 500,
    ) -> ChunkStorageStats:
        """Calculate the total storage used for chunks, in bytes.

        It reports the storage needed to store all snapshots in the repository that
        are reachable from any branches or tags. Unreachable snapshots can be generated
        by using `reset_branch` or `expire_snapshots`. The chunks for these snapshots
        are not included in the result, and they should probably be deleted using
        `garbage_collection`.

        The result is a dataclass with attributes for storage consumed by different
        types of chunks (e.g. `native_bytes`, `virtual_bytes`, `total_bytes`).

        Parameters
        ----------
        max_snapshots_in_memory: int
            Don't prefetch more than this many Snapshots to memory.
        max_compressed_manifest_mem_bytes : int
            Don't use more than this memory to store compressed in-flight manifests.
        max_concurrent_manifest_fetches : int
            Don't run more than this many concurrent manifest fetches.
        """
        return self._repository.chunk_storage_stats(
            max_snapshots_in_memory=max_snapshots_in_memory,
            max_compressed_manifest_mem_bytes=max_compressed_manifest_mem_bytes,
            max_concurrent_manifest_fetches=max_concurrent_manifest_fetches,
        )

    async def chunk_storage_stats_async(
        self,
        *,
        max_snapshots_in_memory: int = 50,
        max_compressed_manifest_mem_bytes: int = 512 * 1024 * 1024,
        max_concurrent_manifest_fetches: int = 500,
    ) -> ChunkStorageStats:
        """Calculate the total storage used for chunks, in bytes (async version).

        It reports the storage needed to store all snapshots in the repository that
        are reachable from any branches or tags. Unreachable snapshots can be generated
        by using `reset_branch` or `expire_snapshots`. The chunks for these snapshots
        are not included in the result, and they should probably be deleted using
        `garbage_collection`.

        The result is a dataclass with attributes for storage consumed by different
        types of chunks (e.g. `native_bytes`, `virtual_bytes`, `total_bytes`).

        Parameters
        ----------
        max_snapshots_in_memory: int
            Don't prefetch more than this many Snapshots to memory.
        max_compressed_manifest_mem_bytes : int
            Don't use more than this memory to store compressed in-flight manifests.
        max_concurrent_manifest_fetches : int
            Don't run more than this many concurrent manifest fetches.
        """
        return await self._repository.chunk_storage_stats_async(
            max_snapshots_in_memory=max_snapshots_in_memory,
            max_compressed_manifest_mem_bytes=max_compressed_manifest_mem_bytes,
            max_concurrent_manifest_fetches=max_concurrent_manifest_fetches,
        )

    def total_chunks_storage(
        self,
        *,
        max_snapshots_in_memory: int = 50,
        max_compressed_manifest_mem_bytes: int = 512 * 1024 * 1024,
        max_concurrent_manifest_fetches: int = 500,
    ) -> int:
        """Calculate the total storage used for chunks, in bytes.

        It reports the storage needed to store all snapshots in the repository that
        are reachable from any branches or tags. Unreachable snapshots can be generated
        by using `reset_branch` or `expire_snapshots`. The chunks for these snapshots
        are not included in the result, and they should probably be deleted using
        `garbage_collection`.

        The result includes only native chunks, not adding virtual or inline chunks.

        Parameters
        ----------
        max_snapshots_in_memory: int
            Don't prefetch more than this many Snapshots to memory.
        max_compressed_manifest_mem_bytes : int
            Don't use more than this memory to store compressed in-flight manifests.
        max_concurrent_manifest_fetches : int
            Don't run more than this many concurrent manifest fetches.
        """

        warnings.warn(
            "The ``total_chunks_storage`` method has been deprecated in favour of the ``chunk_storage_stats`` method. "
            "The new method is superior, as it actually calculates storage size occupied by inlined and virtual chunks in addition to native chunks. "
            "You can still access just the total native bytes: to keep your existing behaviour using API that will not be removed in a future version, "
            "please replace your existing ``.total_chunks_storage(**kwargs)`` method call with ``.chunk_storage_stats(**same_kwargs).native_bytes``.",
            DeprecationWarning,
            stacklevel=2,
        )

        stats = self._repository.chunk_storage_stats(
            max_snapshots_in_memory=max_snapshots_in_memory,
            max_compressed_manifest_mem_bytes=max_compressed_manifest_mem_bytes,
            max_concurrent_manifest_fetches=max_concurrent_manifest_fetches,
        )
        return stats.native_bytes

    async def total_chunks_storage_async(
        self,
        *,
        max_snapshots_in_memory: int = 50,
        max_compressed_manifest_mem_bytes: int = 512 * 1024 * 1024,
        max_concurrent_manifest_fetches: int = 500,
    ) -> int:
        """Calculate the total storage used for chunks, in bytes (async version).

        It reports the storage needed to store all snapshots in the repository that
        are reachable from any branches or tags. Unreachable snapshots can be generated
        by using `reset_branch` or `expire_snapshots`. The chunks for these snapshots
        are not included in the result, and they should probably be deleted using
        `garbage_collection`.

        The result includes only native chunks, not adding virtual or inline chunks.

        Parameters
        ----------
        max_snapshots_in_memory: int
            Don't prefetch more than this many Snapshots to memory.
        max_compressed_manifest_mem_bytes : int
            Don't use more than this memory to store compressed in-flight manifests.
        max_concurrent_manifest_fetches : int
            Don't run more than this many concurrent manifest fetches.
        """

        warnings.warn(
            "The ``total_chunks_storage_async`` method has been deprecated in favour of the ``chunk_storage_stats_async`` method. "
            "The new method is superior, as it actually calculates storage size occupied by inlined and virtual chunks in addition to native chunks. "
            "You can still access just the total native bytes: to keep your existing behaviour using API that will not be removed in a future version, "
            "please replace your existing ``.total_chunks_storage_async(**kwargs)`` method call with ``.chunk_storage_stats_async(**same_kwargs).native_bytes``.",
            DeprecationWarning,
            stacklevel=2,
        )

        stats = await self._repository.chunk_storage_stats_async(
            max_snapshots_in_memory=max_snapshots_in_memory,
            max_compressed_manifest_mem_bytes=max_compressed_manifest_mem_bytes,
            max_concurrent_manifest_fetches=max_concurrent_manifest_fetches,
        )
        return stats.native_bytes

    def inspect_snapshot(self, snapshot_id: str, *, pretty: bool = True) -> str:
        return self._repository.inspect_snapshot(snapshot_id, pretty=pretty)

    async def inspect_snapshot_async(
        self, snapshot_id: str, *, pretty: bool = True
    ) -> str:
        return await self._repository.inspect_snapshot_async(snapshot_id, pretty=pretty)

    @property
    def spec_version(self) -> int:
        return self._repository.spec_version

authorized_virtual_container_prefixes `property` #

authorized_virtual_container_prefixes

Get all authorized virtual chunk container prefixes.

Returns:

Name	Type	Description
`url_prefixes`	`set[str]`	The set of authorized url prefixes for each virtual chunk container

config `property` #

config

Get a copy of this repository's config.

Returns:

Type	Description
`RepositoryConfig`	The repository configuration.

metadata `property` #

metadata

Get the current configured repository metadata.

Returns:

Type	Description
`dict[str, Any]`	The repository level metadata.

storage `property` #

storage

Get a copy of this repository's Storage instance.

Returns:

Type	Description
`Storage`	The repository storage instance.

ancestry #

ancestry(*, branch=None, tag=None, snapshot_id=None)

Get the ancestry of a snapshot.

Parameters:

Name	Type	Description	Default
`branch`	`str`	The branch to get the ancestry of.	`None`
`tag`	`str`	The tag to get the ancestry of.	`None`
`snapshot_id`	`str`	The snapshot ID to get the ancestry of.	`None`

Returns:

Type	Description
`list[SnapshotInfo]`	The ancestry of the snapshot, listing out the snapshots and their metadata.

Notes

Only one of the arguments can be specified.

Source code in icechunk-python/python/icechunk/repository.py

def ancestry(
    self,
    *,
    branch: str | None = None,
    tag: str | None = None,
    snapshot_id: str | None = None,
) -> Iterator[SnapshotInfo]:
    """
    Get the ancestry of a snapshot.

    Parameters
    ----------
    branch : str, optional
        The branch to get the ancestry of.
    tag : str, optional
        The tag to get the ancestry of.
    snapshot_id : str, optional
        The snapshot ID to get the ancestry of.

    Returns
    -------
    list[SnapshotInfo]
        The ancestry of the snapshot, listing out the snapshots and their metadata.

    Notes
    -----
    Only one of the arguments can be specified.
    """

    # the returned object is both an Async and Sync iterator
    res = cast(
        Iterator[SnapshotInfo],
        self._repository.async_ancestry(
            branch=branch, tag=tag, snapshot_id=snapshot_id
        ),
    )
    return res

async_ancestry #

async_ancestry(*, branch=None, tag=None, snapshot_id=None)

Get the ancestry of a snapshot.

Parameters:

Name	Type	Description	Default
`branch`	`str`	The branch to get the ancestry of.	`None`
`tag`	`str`	The tag to get the ancestry of.	`None`
`snapshot_id`	`str`	The snapshot ID to get the ancestry of.	`None`

Returns:

Type	Description
`list[SnapshotInfo]`	The ancestry of the snapshot, listing out the snapshots and their metadata.

Notes

Only one of the arguments can be specified.

Source code in icechunk-python/python/icechunk/repository.py

def async_ancestry(
    self,
    *,
    branch: str | None = None,
    tag: str | None = None,
    snapshot_id: str | None = None,
) -> AsyncIterator[SnapshotInfo]:
    """
    Get the ancestry of a snapshot.

    Parameters
    ----------
    branch : str, optional
        The branch to get the ancestry of.
    tag : str, optional
        The tag to get the ancestry of.
    snapshot_id : str, optional
        The snapshot ID to get the ancestry of.

    Returns
    -------
    list[SnapshotInfo]
        The ancestry of the snapshot, listing out the snapshots and their metadata.

    Notes
    -----
    Only one of the arguments can be specified.
    """
    return self._repository.async_ancestry(
        branch=branch, tag=tag, snapshot_id=snapshot_id
    )

chunk_storage_stats #

chunk_storage_stats(*, max_snapshots_in_memory=50, max_compressed_manifest_mem_bytes=512 * 1024 * 1024, max_concurrent_manifest_fetches=500)

Calculate the total storage used for chunks, in bytes.

It reports the storage needed to store all snapshots in the repository that are reachable from any branches or tags. Unreachable snapshots can be generated by using reset_branch or expire_snapshots. The chunks for these snapshots are not included in the result, and they should probably be deleted using garbage_collection.

The result is a dataclass with attributes for storage consumed by different types of chunks (e.g. native_bytes, virtual_bytes, total_bytes).

Parameters:

Name	Type	Description	Default
`max_snapshots_in_memory`	`int`	Don't prefetch more than this many Snapshots to memory.	`50`
`max_compressed_manifest_mem_bytes`	`int`	Don't use more than this memory to store compressed in-flight manifests.	`512 * 1024 * 1024`
`max_concurrent_manifest_fetches`	`int`	Don't run more than this many concurrent manifest fetches.	`500`

Source code in icechunk-python/python/icechunk/repository.py

def chunk_storage_stats(
    self,
    *,
    max_snapshots_in_memory: int = 50,
    max_compressed_manifest_mem_bytes: int = 512 * 1024 * 1024,
    max_concurrent_manifest_fetches: int = 500,
) -> ChunkStorageStats:
    """Calculate the total storage used for chunks, in bytes.

    It reports the storage needed to store all snapshots in the repository that
    are reachable from any branches or tags. Unreachable snapshots can be generated
    by using `reset_branch` or `expire_snapshots`. The chunks for these snapshots
    are not included in the result, and they should probably be deleted using
    `garbage_collection`.

    The result is a dataclass with attributes for storage consumed by different
    types of chunks (e.g. `native_bytes`, `virtual_bytes`, `total_bytes`).

    Parameters
    ----------
    max_snapshots_in_memory: int
        Don't prefetch more than this many Snapshots to memory.
    max_compressed_manifest_mem_bytes : int
        Don't use more than this memory to store compressed in-flight manifests.
    max_concurrent_manifest_fetches : int
        Don't run more than this many concurrent manifest fetches.
    """
    return self._repository.chunk_storage_stats(
        max_snapshots_in_memory=max_snapshots_in_memory,
        max_compressed_manifest_mem_bytes=max_compressed_manifest_mem_bytes,
        max_concurrent_manifest_fetches=max_concurrent_manifest_fetches,
    )

chunk_storage_stats_async `async` #

chunk_storage_stats_async(*, max_snapshots_in_memory=50, max_compressed_manifest_mem_bytes=512 * 1024 * 1024, max_concurrent_manifest_fetches=500)

Calculate the total storage used for chunks, in bytes (async version).

It reports the storage needed to store all snapshots in the repository that are reachable from any branches or tags. Unreachable snapshots can be generated by using reset_branch or expire_snapshots. The chunks for these snapshots are not included in the result, and they should probably be deleted using garbage_collection.

The result is a dataclass with attributes for storage consumed by different types of chunks (e.g. native_bytes, virtual_bytes, total_bytes).

Parameters:

Name	Type	Description	Default
`max_snapshots_in_memory`	`int`	Don't prefetch more than this many Snapshots to memory.	`50`
`max_compressed_manifest_mem_bytes`	`int`	Don't use more than this memory to store compressed in-flight manifests.	`512 * 1024 * 1024`
`max_concurrent_manifest_fetches`	`int`	Don't run more than this many concurrent manifest fetches.	`500`

Source code in icechunk-python/python/icechunk/repository.py

async def chunk_storage_stats_async(
    self,
    *,
    max_snapshots_in_memory: int = 50,
    max_compressed_manifest_mem_bytes: int = 512 * 1024 * 1024,
    max_concurrent_manifest_fetches: int = 500,
) -> ChunkStorageStats:
    """Calculate the total storage used for chunks, in bytes (async version).

    It reports the storage needed to store all snapshots in the repository that
    are reachable from any branches or tags. Unreachable snapshots can be generated
    by using `reset_branch` or `expire_snapshots`. The chunks for these snapshots
    are not included in the result, and they should probably be deleted using
    `garbage_collection`.

    The result is a dataclass with attributes for storage consumed by different
    types of chunks (e.g. `native_bytes`, `virtual_bytes`, `total_bytes`).

    Parameters
    ----------
    max_snapshots_in_memory: int
        Don't prefetch more than this many Snapshots to memory.
    max_compressed_manifest_mem_bytes : int
        Don't use more than this memory to store compressed in-flight manifests.
    max_concurrent_manifest_fetches : int
        Don't run more than this many concurrent manifest fetches.
    """
    return await self._repository.chunk_storage_stats_async(
        max_snapshots_in_memory=max_snapshots_in_memory,
        max_compressed_manifest_mem_bytes=max_compressed_manifest_mem_bytes,
        max_concurrent_manifest_fetches=max_concurrent_manifest_fetches,
    )

create `classmethod` #

create(storage, config=None, authorize_virtual_chunk_access=None, spec_version=None)

Create a new Icechunk repository. If one already exists at the given store location, an error will be raised.

Warning

Attempting to create a Repo concurrently in the same location from multiple processes is not safe. Instead, create a Repo once and then open it concurrently.

Parameters:

Name	Type	Description	Default
`storage`	`Storage`	The storage configuration for the repository.	required
`config`	`RepositoryConfig`	The repository configuration. If not provided, a default configuration will be used.	`None`
`authorize_virtual_chunk_access`	`dict[str, AnyCredential \| None]`	Authorize Icechunk to access virtual chunks in these containers. A mapping from container url_prefix to the credentials to use to access chunks in that container. If credential is `None`, they will be fetched from the environment, or anonymous credentials will be used if the container allows it. As a security measure, Icechunk will block access to virtual chunks if the container is not authorized using this argument.	`None`
`spec_version`	`int`	Use this version of the spec for the new repository. If not passed, the latest version of the spec that was available before the library version release will be used.	`None`

Returns:

Type	Description
`Self`	An instance of the Repository class.

Source code in icechunk-python/python/icechunk/repository.py

@classmethod
def create(
    cls,
    storage: Storage,
    config: RepositoryConfig | None = None,
    authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
    spec_version: int | None = None,
) -> Self:
    """
    Create a new Icechunk repository.
    If one already exists at the given store location, an error will be raised.

    !!! warning
        Attempting to create a Repo concurrently in the same location from multiple processes is not safe.
        Instead, create a Repo once and then open it concurrently.

    Parameters
    ----------
    storage : Storage
        The storage configuration for the repository.
    config : RepositoryConfig, optional
        The repository configuration. If not provided, a default configuration will be used.
    authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
        Authorize Icechunk to access virtual chunks in these containers. A mapping
        from container url_prefix to the credentials to use to access chunks in
        that container. If credential is `None`, they will be fetched from the
        environment, or anonymous credentials will be used if the container allows it.
        As a security measure, Icechunk will block access to virtual chunks if the
        container is not authorized using this argument.
    spec_version : int, optional
        Use this version of the spec for the new repository. If not passed, the latest version
        of the spec that was available before the library version release will be used.

    Returns
    -------
    Self
        An instance of the Repository class.
    """
    return cls(
        PyRepository.create(
            storage,
            config=config,
            authorize_virtual_chunk_access=authorize_virtual_chunk_access,
            spec_version=spec_version,
        )
    )

create_async `async` `classmethod` #

create_async(storage, config=None, authorize_virtual_chunk_access=None, spec_version=None)

Create a new Icechunk repository asynchronously. If one already exists at the given store location, an error will be raised.

Warning

Attempting to create a Repo concurrently in the same location from multiple processes is not safe. Instead, create a Repo once and then open it concurrently.

Parameters:

Name	Type	Description	Default
`storage`	`Storage`	The storage configuration for the repository.	required
`config`	`RepositoryConfig`	The repository configuration. If not provided, a default configuration will be used.	`None`
`authorize_virtual_chunk_access`	`dict[str, AnyCredential \| None]`	Authorize Icechunk to access virtual chunks in these containers. A mapping from container url_prefix to the credentials to use to access chunks in that container. If credential is `None`, they will be fetched from the environment, or anonymous credentials will be used if the container allows it. As a security measure, Icechunk will block access to virtual chunks if the container is not authorized using this argument.	`None`
`spec_version`	`int`	Use this version of the spec for the new repository. If not passed, the latest version of the spec that was available before the library version release will be used.	`None`

Returns:

Type	Description
`Self`	An instance of the Repository class.

Source code in icechunk-python/python/icechunk/repository.py

@classmethod
async def create_async(
    cls,
    storage: Storage,
    config: RepositoryConfig | None = None,
    authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
    spec_version: int | None = None,
) -> Self:
    """
    Create a new Icechunk repository asynchronously.
    If one already exists at the given store location, an error will be raised.

    !!! warning
        Attempting to create a Repo concurrently in the same location from multiple processes is not safe.
        Instead, create a Repo once and then open it concurrently.

    Parameters
    ----------
    storage : Storage
        The storage configuration for the repository.
    config : RepositoryConfig, optional
        The repository configuration. If not provided, a default configuration will be used.
    authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
        Authorize Icechunk to access virtual chunks in these containers. A mapping
        from container url_prefix to the credentials to use to access chunks in
        that container. If credential is `None`, they will be fetched from the
        environment, or anonymous credentials will be used if the container allows it.
        As a security measure, Icechunk will block access to virtual chunks if the
        container is not authorized using this argument.
    spec_version : int, optional
        Use this version of the spec for the new repository. If not passed, the latest version
        of the spec that was available before the library version release will be used.

    Returns
    -------
    Self
        An instance of the Repository class.
    """
    return cls(
        await PyRepository.create_async(
            storage,
            config=config,
            authorize_virtual_chunk_access=authorize_virtual_chunk_access,
            spec_version=spec_version,
        )
    )

create_branch #

create_branch(branch, snapshot_id)

Create a new branch at the given snapshot.

Parameters:

Name	Type	Description	Default
`branch`	`str`	The name of the branch to create.	required
`snapshot_id`	`str`	The snapshot ID to create the branch at.	required

Returns:

Type	Description
`None`

Source code in icechunk-python/python/icechunk/repository.py

def create_branch(self, branch: str, snapshot_id: str) -> None:
    """
    Create a new branch at the given snapshot.

    Parameters
    ----------
    branch : str
        The name of the branch to create.
    snapshot_id : str
        The snapshot ID to create the branch at.

    Returns
    -------
    None
    """
    self._repository.create_branch(branch, snapshot_id)

create_branch_async `async` #

create_branch_async(branch, snapshot_id)

Create a new branch at the given snapshot (async version).

Parameters:

Name	Type	Description	Default
`branch`	`str`	The name of the branch to create.	required
`snapshot_id`	`str`	The snapshot ID to create the branch at.	required

Returns:

Type	Description
`None`

Source code in icechunk-python/python/icechunk/repository.py

async def create_branch_async(self, branch: str, snapshot_id: str) -> None:
    """
    Create a new branch at the given snapshot (async version).

    Parameters
    ----------
    branch : str
        The name of the branch to create.
    snapshot_id : str
        The snapshot ID to create the branch at.

    Returns
    -------
    None
    """
    await self._repository.create_branch_async(branch, snapshot_id)

create_tag #

create_tag(tag, snapshot_id)

Create a new tag at the given snapshot.

Parameters:

Name	Type	Description	Default
`tag`	`str`	The name of the tag to create.	required
`snapshot_id`	`str`	The snapshot ID to create the tag at.	required

Returns:

Type	Description
`None`

Source code in icechunk-python/python/icechunk/repository.py

def create_tag(self, tag: str, snapshot_id: str) -> None:
    """
    Create a new tag at the given snapshot.

    Parameters
    ----------
    tag : str
        The name of the tag to create.
    snapshot_id : str
        The snapshot ID to create the tag at.

    Returns
    -------
    None
    """
    self._repository.create_tag(tag, snapshot_id)

create_tag_async `async` #

create_tag_async(tag, snapshot_id)

Create a new tag at the given snapshot (async version).

Parameters:

Name	Type	Description	Default
`tag`	`str`	The name of the tag to create.	required
`snapshot_id`	`str`	The snapshot ID to create the tag at.	required

Returns:

Type	Description
`None`

Source code in icechunk-python/python/icechunk/repository.py

async def create_tag_async(self, tag: str, snapshot_id: str) -> None:
    """
    Create a new tag at the given snapshot (async version).

    Parameters
    ----------
    tag : str
        The name of the tag to create.
    snapshot_id : str
        The snapshot ID to create the tag at.

    Returns
    -------
    None
    """
    await self._repository.create_tag_async(tag, snapshot_id)

default_commit_metadata #

default_commit_metadata()

Get the current configured default commit metadata for the repository.

Returns:

Type	Description
`dict[str, Any]`	The default commit metadata.

Source code in icechunk-python/python/icechunk/repository.py

def default_commit_metadata(self) -> dict[str, Any]:
    """
    Get the current configured default commit metadata for the repository.

    Returns
    -------
    dict[str, Any]
        The default commit metadata.
    """
    return self._repository.default_commit_metadata()

delete_branch #

delete_branch(branch)

Delete a branch.

Parameters:

Name	Type	Description	Default
`branch`	`str`	The branch to delete.	required

Returns:

Type	Description
`None`

Source code in icechunk-python/python/icechunk/repository.py

def delete_branch(self, branch: str) -> None:
    """
    Delete a branch.

    Parameters
    ----------
    branch : str
        The branch to delete.

    Returns
    -------
    None
    """
    self._repository.delete_branch(branch)

delete_branch_async `async` #

delete_branch_async(branch)

Delete a branch (async version).

Parameters:

Name	Type	Description	Default
`branch`	`str`	The branch to delete.	required

Returns:

Type	Description
`None`

Source code in icechunk-python/python/icechunk/repository.py

async def delete_branch_async(self, branch: str) -> None:
    """
    Delete a branch (async version).

    Parameters
    ----------
    branch : str
        The branch to delete.

    Returns
    -------
    None
    """
    await self._repository.delete_branch_async(branch)

delete_tag #

delete_tag(tag)

Delete a tag.

Parameters:

Name	Type	Description	Default
`tag`	`str`	The tag to delete.	required

Returns:

Type	Description
`None`

Source code in icechunk-python/python/icechunk/repository.py

def delete_tag(self, tag: str) -> None:
    """
    Delete a tag.

    Parameters
    ----------
    tag : str
        The tag to delete.

    Returns
    -------
    None
    """
    self._repository.delete_tag(tag)

delete_tag_async `async` #

delete_tag_async(tag)

Delete a tag (async version).

Parameters:

Name	Type	Description	Default
`tag`	`str`	The tag to delete.	required

Returns:

Type	Description
`None`

Source code in icechunk-python/python/icechunk/repository.py

async def delete_tag_async(self, tag: str) -> None:
    """
    Delete a tag (async version).

    Parameters
    ----------
    tag : str
        The tag to delete.

    Returns
    -------
    None
    """
    await self._repository.delete_tag_async(tag)

diff #

diff(*, from_branch=None, from_tag=None, from_snapshot_id=None, to_branch=None, to_tag=None, to_snapshot_id=None)

Compute an overview of the operations executed from version from to version to.

Both versions, from and to, must be identified. Identification can be done using a branch, tag or snapshot id. The styles used to identify the from and to versions can be different.

The from version must be a member of the ancestry of to.

Returns:

Type	Description
`Diff`	The operations executed between the two versions

Source code in icechunk-python/python/icechunk/repository.py

def diff(
    self,
    *,
    from_branch: str | None = None,
    from_tag: str | None = None,
    from_snapshot_id: str | None = None,
    to_branch: str | None = None,
    to_tag: str | None = None,
    to_snapshot_id: str | None = None,
) -> Diff:
    """
    Compute an overview of the operations executed from version `from` to version `to`.

    Both versions, `from` and `to`, must be identified. Identification can be done using a branch, tag or snapshot id.
    The styles used to identify the `from` and `to` versions can be different.

    The `from` version must be a member of the `ancestry` of `to`.

    Returns
    -------
    Diff
        The operations executed between the two versions
    """
    return self._repository.diff(
        from_branch=from_branch,
        from_tag=from_tag,
        from_snapshot_id=from_snapshot_id,
        to_branch=to_branch,
        to_tag=to_tag,
        to_snapshot_id=to_snapshot_id,
    )

diff_async `async` #

diff_async(*, from_branch=None, from_tag=None, from_snapshot_id=None, to_branch=None, to_tag=None, to_snapshot_id=None)

Compute an overview of the operations executed from version from to version to (async version).

Both versions, from and to, must be identified. Identification can be done using a branch, tag or snapshot id. The styles used to identify the from and to versions can be different.

The from version must be a member of the ancestry of to.

Returns:

Type	Description
`Diff`	The operations executed between the two versions

Source code in icechunk-python/python/icechunk/repository.py

async def diff_async(
    self,
    *,
    from_branch: str | None = None,
    from_tag: str | None = None,
    from_snapshot_id: str | None = None,
    to_branch: str | None = None,
    to_tag: str | None = None,
    to_snapshot_id: str | None = None,
) -> Diff:
    """
    Compute an overview of the operations executed from version `from` to version `to` (async version).

    Both versions, `from` and `to`, must be identified. Identification can be done using a branch, tag or snapshot id.
    The styles used to identify the `from` and `to` versions can be different.

    The `from` version must be a member of the `ancestry` of `to`.

    Returns
    -------
    Diff
        The operations executed between the two versions
    """
    return await self._repository.diff_async(
        from_branch=from_branch,
        from_tag=from_tag,
        from_snapshot_id=from_snapshot_id,
        to_branch=to_branch,
        to_tag=to_tag,
        to_snapshot_id=to_snapshot_id,
    )

exists `staticmethod` #

exists(storage)

Check if a repository exists at the given storage location.

Parameters:

Name	Type	Description	Default
`storage`	`Storage`	The storage configuration for the repository.	required

Returns:

Type	Description
`bool`	True if the repository exists, False otherwise.

Source code in icechunk-python/python/icechunk/repository.py

@staticmethod
def exists(storage: Storage) -> bool:
    """
    Check if a repository exists at the given storage location.

    Parameters
    ----------
    storage : Storage
        The storage configuration for the repository.

    Returns
    -------
    bool
        True if the repository exists, False otherwise.
    """
    return PyRepository.exists(storage)

exists_async `async` `staticmethod` #

exists_async(storage)

Check if a repository exists at the given storage location (async version).

Parameters:

Name	Type	Description	Default
`storage`	`Storage`	The storage configuration for the repository.	required

Returns:

Type	Description
`bool`	True if the repository exists, False otherwise.

Source code in icechunk-python/python/icechunk/repository.py

@staticmethod
async def exists_async(storage: Storage) -> bool:
    """
    Check if a repository exists at the given storage location (async version).

    Parameters
    ----------
    storage : Storage
        The storage configuration for the repository.

    Returns
    -------
    bool
        True if the repository exists, False otherwise.
    """
    return await PyRepository.exists_async(storage)

expire_snapshots #

expire_snapshots(older_than, *, delete_expired_branches=False, delete_expired_tags=False)

Expire all snapshots older than a threshold.

This processes snapshots found by navigating all references in the repo, tags first, branches leter, both in lexicographical order.

Returns the ids of all snapshots considered expired and skipped from history. Notice that this snapshot are not necessarily available for garbage collection, they could still be pointed by ether refs.

If delete_expired_* is set to True, branches or tags that, after the expiration process, point to expired snapshots directly, will be deleted.

Danger

This is an administrative operation, it should be run carefully. The repository can still operate concurrently while expire_snapshots runs, but other readers can get inconsistent views of the repository history.

Parameters:

Name	Type	Description	Default
`older_than`	`datetime`	Expire snapshots older than this time.	required
`delete_expired_branches`	`bool`	Whether to delete any branches that now have only expired snapshots.	`False`
`delete_expired_tags`	`bool`	Whether to delete any tags associated with expired snapshots	`False`

Returns:

Type	Description
`set of expires snapshot IDs`

Source code in icechunk-python/python/icechunk/repository.py

def expire_snapshots(
    self,
    older_than: datetime.datetime,
    *,
    delete_expired_branches: bool = False,
    delete_expired_tags: bool = False,
) -> set[str]:
    """Expire all snapshots older than a threshold.

    This processes snapshots found by navigating all references in
    the repo, tags first, branches leter, both in lexicographical order.

    Returns the ids of all snapshots considered expired and skipped
    from history. Notice that this snapshot are not necessarily
    available for garbage collection, they could still be pointed by
    ether refs.

    If `delete_expired_*` is set to True, branches or tags that, after the
    expiration process, point to expired snapshots directly, will be
    deleted.

    Danger
    ------
    This is an administrative operation, it should be run
    carefully. The repository can still operate concurrently while
    `expire_snapshots` runs, but other readers can get inconsistent
    views of the repository history.

    Parameters
    ----------
    older_than: datetime.datetime
        Expire snapshots older than this time.
    delete_expired_branches: bool, optional
        Whether to delete any branches that now have only expired snapshots.
    delete_expired_tags: bool, optional
        Whether to delete any tags associated with expired snapshots

    Returns
    -------
    set of expires snapshot IDs
    """
    return self._repository.expire_snapshots(
        older_than,
        delete_expired_branches=delete_expired_branches,
        delete_expired_tags=delete_expired_tags,
    )

expire_snapshots_async `async` #

expire_snapshots_async(older_than, *, delete_expired_branches=False, delete_expired_tags=False)

Expire all snapshots older than a threshold (async version).

This processes snapshots found by navigating all references in the repo, tags first, branches leter, both in lexicographical order.

Returns the ids of all snapshots considered expired and skipped from history. Notice that this snapshot are not necessarily available for garbage collection, they could still be pointed by ether refs.

If delete_expired_* is set to True, branches or tags that, after the expiration process, point to expired snapshots directly, will be deleted.

Danger

This is an administrative operation, it should be run carefully. The repository can still operate concurrently while expire_snapshots runs, but other readers can get inconsistent views of the repository history.

Parameters:

Name	Type	Description	Default
`older_than`	`datetime`	Expire snapshots older than this time.	required
`delete_expired_branches`	`bool`	Whether to delete any branches that now have only expired snapshots.	`False`
`delete_expired_tags`	`bool`	Whether to delete any tags associated with expired snapshots	`False`

Returns:

Type	Description
`set of expires snapshot IDs`

Source code in icechunk-python/python/icechunk/repository.py

async def expire_snapshots_async(
    self,
    older_than: datetime.datetime,
    *,
    delete_expired_branches: bool = False,
    delete_expired_tags: bool = False,
) -> set[str]:
    """Expire all snapshots older than a threshold (async version).

    This processes snapshots found by navigating all references in
    the repo, tags first, branches leter, both in lexicographical order.

    Returns the ids of all snapshots considered expired and skipped
    from history. Notice that this snapshot are not necessarily
    available for garbage collection, they could still be pointed by
    ether refs.

    If `delete_expired_*` is set to True, branches or tags that, after the
    expiration process, point to expired snapshots directly, will be
    deleted.

    Danger
    ------
    This is an administrative operation, it should be run
    carefully. The repository can still operate concurrently while
    `expire_snapshots` runs, but other readers can get inconsistent
    views of the repository history.

    Parameters
    ----------
    older_than: datetime.datetime
        Expire snapshots older than this time.
    delete_expired_branches: bool, optional
        Whether to delete any branches that now have only expired snapshots.
    delete_expired_tags: bool, optional
        Whether to delete any tags associated with expired snapshots

    Returns
    -------
    set of expires snapshot IDs
    """
    return await self._repository.expire_snapshots_async(
        older_than,
        delete_expired_branches=delete_expired_branches,
        delete_expired_tags=delete_expired_tags,
    )

fetch_config `staticmethod` #

fetch_config(storage)

Fetch the configuration for the repository saved in storage.

Parameters:

Name	Type	Description	Default
`storage`	`Storage`	The storage configuration for the repository.	required

Returns:

Type	Description
`RepositoryConfig \| None`	The repository configuration if it exists, None otherwise.

Source code in icechunk-python/python/icechunk/repository.py

@staticmethod
def fetch_config(storage: Storage) -> RepositoryConfig | None:
    """
    Fetch the configuration for the repository saved in storage.

    Parameters
    ----------
    storage : Storage
        The storage configuration for the repository.

    Returns
    -------
    RepositoryConfig | None
        The repository configuration if it exists, None otherwise.
    """
    return PyRepository.fetch_config(storage)

fetch_config_async `async` `staticmethod` #

fetch_config_async(storage)

Fetch the configuration for the repository saved in storage (async version).

Parameters:

Name	Type	Description	Default
`storage`	`Storage`	The storage configuration for the repository.	required

Returns:

Type	Description
`RepositoryConfig \| None`	The repository configuration if it exists, None otherwise.

Source code in icechunk-python/python/icechunk/repository.py

@staticmethod
async def fetch_config_async(storage: Storage) -> RepositoryConfig | None:
    """
    Fetch the configuration for the repository saved in storage (async version).

    Parameters
    ----------
    storage : Storage
        The storage configuration for the repository.

    Returns
    -------
    RepositoryConfig | None
        The repository configuration if it exists, None otherwise.
    """
    return await PyRepository.fetch_config_async(storage)

fetch_spec_version `staticmethod` #

fetch_spec_version(storage)

Fetch the spec version of a repository without fully opening it.

This is useful for checking the repository format version before opening, for example to know what version of the library is needed to open it.

Parameters:

Name	Type	Description	Default
`storage`	`Storage`	The storage configuration for the repository.	required

Returns:

Type	Description
`int \| None`	The spec version of the repository if it exists, None if no repository exists at the given location.

Source code in icechunk-python/python/icechunk/repository.py

@staticmethod
def fetch_spec_version(storage: Storage) -> int | None:
    """
    Fetch the spec version of a repository without fully opening it.

    This is useful for checking the repository format version before opening,
    for example to know what version of the library is needed to open it.

    Parameters
    ----------
    storage : Storage
        The storage configuration for the repository.

    Returns
    -------
    int | None
        The spec version of the repository if it exists, None if no repository
        exists at the given location.
    """
    return PyRepository.fetch_spec_version(storage)

fetch_spec_version_async `async` `staticmethod` #

fetch_spec_version_async(storage)

Fetch the spec version of a repository without fully opening it (async version).

This is useful for checking the repository format version before opening, for example to know what version of the library is needed to open it.

Parameters:

Name	Type	Description	Default
`storage`	`Storage`	The storage configuration for the repository.	required

Returns:

Type	Description
`int \| None`	The spec version of the repository if it exists, None if no repository exists at the given location.

Source code in icechunk-python/python/icechunk/repository.py

@staticmethod
async def fetch_spec_version_async(storage: Storage) -> int | None:
    """
    Fetch the spec version of a repository without fully opening it (async version).

    This is useful for checking the repository format version before opening,
    for example to know what version of the library is needed to open it.

    Parameters
    ----------
    storage : Storage
        The storage configuration for the repository.

    Returns
    -------
    int | None
        The spec version of the repository if it exists, None if no repository
        exists at the given location.
    """
    return await PyRepository.fetch_spec_version_async(storage)

garbage_collect #

garbage_collect(delete_object_older_than, *, dry_run=False, max_snapshots_in_memory=50, max_compressed_manifest_mem_bytes=512 * 1024 * 1024, max_concurrent_manifest_fetches=500)

Delete any objects no longer accessible from any branches or tags.

Danger

This is an administrative operation, it should be run carefully. The repository can still operate concurrently while garbage_collect runs, but other reades can get inconsistent views if they are trying to access the expired snapshots.

Parameters:

Name	Type	Description	Default
`delete_object_older_than`	`datetime`	Delete objects older than this time.	required
`dry_run`	`bool`	Report results but don't delete any objects	`False`
`max_snapshots_in_memory`	`int`	Don't prefetch more than this many Snapshots to memory.	`50`
`max_compressed_manifest_mem_bytes`	`int`	Don't use more than this memory to store compressed in-flight manifests.	`512 * 1024 * 1024`
`max_concurrent_manifest_fetches`	`int`	Don't run more than this many concurrent manifest fetches.	`500`

Returns:

Type	Description
`GCSummary`	Summary of objects deleted.

Source code in icechunk-python/python/icechunk/repository.py

def garbage_collect(
    self,
    delete_object_older_than: datetime.datetime,
    *,
    dry_run: bool = False,
    max_snapshots_in_memory: int = 50,
    max_compressed_manifest_mem_bytes: int = 512 * 1024 * 1024,
    max_concurrent_manifest_fetches: int = 500,
) -> GCSummary:
    """Delete any objects no longer accessible from any branches or tags.

    Danger
    ------
    This is an administrative operation, it should be run
    carefully. The repository can still operate concurrently while
    `garbage_collect` runs, but other reades can get inconsistent
    views if they are trying to access the expired snapshots.

    Parameters
    ----------
    delete_object_older_than: datetime.datetime
        Delete objects older than this time.
    dry_run: bool : bool
        Report results but don't delete any objects
    max_snapshots_in_memory : int
        Don't prefetch more than this many Snapshots to memory.
    max_compressed_manifest_mem_bytes : int
        Don't use more than this memory to store compressed in-flight manifests.
    max_concurrent_manifest_fetches : int
        Don't run more than this many concurrent manifest fetches.

    Returns
    -------
    GCSummary
        Summary of objects deleted.
    """

    return self._repository.garbage_collect(
        delete_object_older_than,
        dry_run=dry_run,
        max_snapshots_in_memory=max_snapshots_in_memory,
        max_compressed_manifest_mem_bytes=max_compressed_manifest_mem_bytes,
        max_concurrent_manifest_fetches=max_concurrent_manifest_fetches,
    )

garbage_collect_async `async` #

garbage_collect_async(delete_object_older_than, *, dry_run=False, max_snapshots_in_memory=50, max_compressed_manifest_mem_bytes=512 * 1024 * 1024, max_concurrent_manifest_fetches=500)

Delete any objects no longer accessible from any branches or tags (async version).

Danger

This is an administrative operation, it should be run carefully. The repository can still operate concurrently while garbage_collect runs, but other reades can get inconsistent views if they are trying to access the expired snapshots.

Parameters:

Name	Type	Description	Default
`delete_object_older_than`	`datetime`	Delete objects older than this time.	required
`dry_run`	`bool`	Report results but don't delete any objects	`False`
`max_snapshots_in_memory`	`int`	Don't prefetch more than this many Snapshots to memory.	`50`
`max_compressed_manifest_mem_bytes`	`int`	Don't use more than this memory to store compressed in-flight manifests.	`512 * 1024 * 1024`
`max_concurrent_manifest_fetches`	`int`	Don't run more than this many concurrent manifest fetches.	`500`

Returns:

Type	Description
`GCSummary`	Summary of objects deleted.

Source code in icechunk-python/python/icechunk/repository.py

async def garbage_collect_async(
    self,
    delete_object_older_than: datetime.datetime,
    *,
    dry_run: bool = False,
    max_snapshots_in_memory: int = 50,
    max_compressed_manifest_mem_bytes: int = 512 * 1024 * 1024,
    max_concurrent_manifest_fetches: int = 500,
) -> GCSummary:
    """Delete any objects no longer accessible from any branches or tags (async version).

    Danger
    ------
    This is an administrative operation, it should be run
    carefully. The repository can still operate concurrently while
    `garbage_collect` runs, but other reades can get inconsistent
    views if they are trying to access the expired snapshots.

    Parameters
    ----------
    delete_object_older_than: datetime.datetime
        Delete objects older than this time.
    dry_run: bool : bool
        Report results but don't delete any objects
    max_snapshots_in_memory : int
        Don't prefetch more than this many Snapshots to memory.
    max_compressed_manifest_mem_bytes : int
        Don't use more than this memory to store compressed in-flight manifests.
    max_concurrent_manifest_fetches : int
        Don't run more than this many concurrent manifest fetches.

    Returns
    -------
    GCSummary
        Summary of objects deleted.
    """

    return await self._repository.garbage_collect_async(
        delete_object_older_than,
        dry_run=dry_run,
        max_snapshots_in_memory=max_snapshots_in_memory,
        max_compressed_manifest_mem_bytes=max_compressed_manifest_mem_bytes,
        max_concurrent_manifest_fetches=max_concurrent_manifest_fetches,
    )

get_metadata #

get_metadata()

Get the current configured repository metadata.

Returns:

Type	Description
`dict[str, Any]`	The repository level metadata.

Source code in icechunk-python/python/icechunk/repository.py

def get_metadata(self) -> dict[str, Any]:
    """
    Get the current configured repository metadata.

    Returns
    -------
    dict[str, Any]
        The repository level metadata.
    """
    return self._repository.get_metadata()

get_metadata_async `async` #

get_metadata_async()

Get the current configured repository metadata.

Returns:

Type	Description
`dict[str, Any]`	The repository level metadata.

Source code in icechunk-python/python/icechunk/repository.py

async def get_metadata_async(self) -> dict[str, Any]:
    """
    Get the current configured repository metadata.

    Returns
    -------
    dict[str, Any]
        The repository level metadata.
    """
    return await self._repository.get_metadata_async()

list_branches #

list_branches()

List the branches in the repository.

Returns:

Type	Description
`set[str]`	A set of branch names.

Source code in icechunk-python/python/icechunk/repository.py

def list_branches(self) -> set[str]:
    """
    List the branches in the repository.

    Returns
    -------
    set[str]
        A set of branch names.
    """
    return self._repository.list_branches()

list_branches_async `async` #

list_branches_async()

List the branches in the repository (async version).

Returns:

Type	Description
`set[str]`	A set of branch names.

Source code in icechunk-python/python/icechunk/repository.py

async def list_branches_async(self) -> set[str]:
    """
    List the branches in the repository (async version).

    Returns
    -------
    set[str]
        A set of branch names.
    """
    return await self._repository.list_branches_async()

list_manifest_files #

list_manifest_files(snapshot_id)

Get the manifest files used by the given snapshot ID

Parameters:

Name	Type	Description	Default
`snapshot_id`	`str`	The id of the snapshot to get information for	required

Returns:

Type	Description
`list[ManifestFileInfo]`

Source code in icechunk-python/python/icechunk/repository.py

def list_manifest_files(self, snapshot_id: str) -> list[ManifestFileInfo]:
    """
    Get the manifest files used by the given snapshot ID

    Parameters
    ----------
    snapshot_id : str
        The id of the snapshot to get information for

    Returns
    -------
    list[ManifestFileInfo]
    """
    return self._repository.list_manifest_files(snapshot_id)

list_manifest_files_async `async` #

list_manifest_files_async(snapshot_id)

Get the manifest files used by the given snapshot ID

Parameters:

Name	Type	Description	Default
`snapshot_id`	`str`	The id of the snapshot to get information for	required

Returns:

Type	Description
`list[ManifestFileInfo]`

Source code in icechunk-python/python/icechunk/repository.py

async def list_manifest_files_async(self, snapshot_id: str) -> list[ManifestFileInfo]:
    """
    Get the manifest files used by the given snapshot ID

    Parameters
    ----------
    snapshot_id : str
        The id of the snapshot to get information for

    Returns
    -------
    list[ManifestFileInfo]
    """
    return await self._repository.list_manifest_files_async(snapshot_id)

list_tags #

list_tags()

List the tags in the repository.

Returns:

Type	Description
`set[str]`	A set of tag names.

Source code in icechunk-python/python/icechunk/repository.py

def list_tags(self) -> set[str]:
    """
    List the tags in the repository.

    Returns
    -------
    set[str]
        A set of tag names.
    """
    return self._repository.list_tags()

list_tags_async `async` #

list_tags_async()

List the tags in the repository (async version).

Returns:

Type	Description
`set[str]`	A set of tag names.

Source code in icechunk-python/python/icechunk/repository.py

async def list_tags_async(self) -> set[str]:
    """
    List the tags in the repository (async version).

    Returns
    -------
    set[str]
        A set of tag names.
    """
    return await self._repository.list_tags_async()

lookup_branch #

lookup_branch(branch)

Get the tip snapshot ID of a branch.

Parameters:

Name	Type	Description	Default
`branch`	`str`	The branch to get the tip of.	required

Returns:

Type	Description
`str`	The snapshot ID of the tip of the branch.

Source code in icechunk-python/python/icechunk/repository.py

def lookup_branch(self, branch: str) -> str:
    """
    Get the tip snapshot ID of a branch.

    Parameters
    ----------
    branch : str
        The branch to get the tip of.

    Returns
    -------
    str
        The snapshot ID of the tip of the branch.
    """
    return self._repository.lookup_branch(branch)

lookup_branch_async `async` #

lookup_branch_async(branch)

Get the tip snapshot ID of a branch (async version).

Parameters:

Name	Type	Description	Default
`branch`	`str`	The branch to get the tip of.	required

Returns:

Type	Description
`str`	The snapshot ID of the tip of the branch.

Source code in icechunk-python/python/icechunk/repository.py

async def lookup_branch_async(self, branch: str) -> str:
    """
    Get the tip snapshot ID of a branch (async version).

    Parameters
    ----------
    branch : str
        The branch to get the tip of.

    Returns
    -------
    str
        The snapshot ID of the tip of the branch.
    """
    return await self._repository.lookup_branch_async(branch)

lookup_snapshot #

lookup_snapshot(snapshot_id)

Get the SnapshotInfo given a snapshot ID

Parameters:

Name	Type	Description	Default
`snapshot_id`	`str`	The id of the snapshot to look up	required

Returns:

Type	Description
`SnapshotInfo`

Source code in icechunk-python/python/icechunk/repository.py

def lookup_snapshot(self, snapshot_id: str) -> SnapshotInfo:
    """
    Get the SnapshotInfo given a snapshot ID

    Parameters
    ----------
    snapshot_id : str
        The id of the snapshot to look up

    Returns
    -------
    SnapshotInfo
    """
    return self._repository.lookup_snapshot(snapshot_id)

lookup_snapshot_async `async` #

lookup_snapshot_async(snapshot_id)

Get the SnapshotInfo given a snapshot ID (async version)

Parameters:

Name	Type	Description	Default
`snapshot_id`	`str`	The id of the snapshot to look up	required

Returns:

Type	Description
`SnapshotInfo`

Source code in icechunk-python/python/icechunk/repository.py

async def lookup_snapshot_async(self, snapshot_id: str) -> SnapshotInfo:
    """
    Get the SnapshotInfo given a snapshot ID (async version)

    Parameters
    ----------
    snapshot_id : str
        The id of the snapshot to look up

    Returns
    -------
    SnapshotInfo
    """
    return await self._repository.lookup_snapshot_async(snapshot_id)

lookup_tag #

lookup_tag(tag)

Get the snapshot ID of a tag.

Parameters:

Name	Type	Description	Default
`tag`	`str`	The tag to get the snapshot ID of.	required

Returns:

Type	Description
`str`	The snapshot ID of the tag.

Source code in icechunk-python/python/icechunk/repository.py

def lookup_tag(self, tag: str) -> str:
    """
    Get the snapshot ID of a tag.

    Parameters
    ----------
    tag : str
        The tag to get the snapshot ID of.

    Returns
    -------
    str
        The snapshot ID of the tag.
    """
    return self._repository.lookup_tag(tag)

lookup_tag_async `async` #

lookup_tag_async(tag)

Get the snapshot ID of a tag (async version).

Parameters:

Name	Type	Description	Default
`tag`	`str`	The tag to get the snapshot ID of.	required

Returns:

Type	Description
`str`	The snapshot ID of the tag.

Source code in icechunk-python/python/icechunk/repository.py

async def lookup_tag_async(self, tag: str) -> str:
    """
    Get the snapshot ID of a tag (async version).

    Parameters
    ----------
    tag : str
        The tag to get the snapshot ID of.

    Returns
    -------
    str
        The snapshot ID of the tag.
    """
    return await self._repository.lookup_tag_async(tag)

open `classmethod` #

open(storage, config=None, authorize_virtual_chunk_access=None)

Open an existing Icechunk repository.

If no repository exists at the given storage location, an error will be raised.

Warning

This method must be used with care in a multiprocessing context. Read more in our Parallel Write Guide.

Parameters:

Name	Type	Description	Default
`storage`	`Storage`	The storage configuration for the repository.	required
`config`	`RepositoryConfig`	The repository settings. If not provided, a default configuration will be loaded from the repository.	`None`
`authorize_virtual_chunk_access`	`dict[str, AnyCredential \| None]`	Authorize Icechunk to access virtual chunks in these containers. A mapping from container url_prefix to the credentials to use to access chunks in that container. If credential is `None`, they will be fetched from the environment, or anonymous credentials will be used if the container allows it. As a security measure, Icechunk will block access to virtual chunks if the container is not authorized using this argument.	`None`

Returns:

Type	Description
`Self`	An instance of the Repository class.

Source code in icechunk-python/python/icechunk/repository.py

@classmethod
def open(
    cls,
    storage: Storage,
    config: RepositoryConfig | None = None,
    authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
) -> Self:
    """
    Open an existing Icechunk repository.

    If no repository exists at the given storage location, an error will be raised.

    !!! warning
        This method must be used with care in a multiprocessing context.
        Read more in our [Parallel Write Guide](./parallel.md#uncooperative-distributed-writes).

    Parameters
    ----------
    storage : Storage
        The storage configuration for the repository.
    config : RepositoryConfig, optional
        The repository settings. If not provided, a default configuration will be
        loaded from the repository.
    authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
        Authorize Icechunk to access virtual chunks in these containers. A mapping
        from container url_prefix to the credentials to use to access chunks in
        that container. If credential is `None`, they will be fetched from the
        environment, or anonymous credentials will be used if the container allows it.
        As a security measure, Icechunk will block access to virtual chunks if the
        container is not authorized using this argument.

    Returns
    -------
    Self
        An instance of the Repository class.
    """
    return cls(
        PyRepository.open(
            storage,
            config=config,
            authorize_virtual_chunk_access=authorize_virtual_chunk_access,
        )
    )

open_async `async` `classmethod` #

open_async(storage, config=None, authorize_virtual_chunk_access=None)

Open an existing Icechunk repository asynchronously.

If no repository exists at the given storage location, an error will be raised.

Warning

This method must be used with care in a multiprocessing context. Read more in our Parallel Write Guide.

Parameters:

Name	Type	Description	Default
`storage`	`Storage`	The storage configuration for the repository.	required
`config`	`RepositoryConfig`	The repository settings. If not provided, a default configuration will be loaded from the repository.	`None`
`authorize_virtual_chunk_access`	`dict[str, AnyCredential \| None]`	Authorize Icechunk to access virtual chunks in these containers. A mapping from container url_prefix to the credentials to use to access chunks in that container. If credential is `None`, they will be fetched from the environment, or anonymous credentials will be used if the container allows it. As a security measure, Icechunk will block access to virtual chunks if the container is not authorized using this argument.	`None`

Returns:

Type	Description
`Self`	An instance of the Repository class.

Source code in icechunk-python/python/icechunk/repository.py

@classmethod
async def open_async(
    cls,
    storage: Storage,
    config: RepositoryConfig | None = None,
    authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
) -> Self:
    """
    Open an existing Icechunk repository asynchronously.

    If no repository exists at the given storage location, an error will be raised.

    !!! warning
        This method must be used with care in a multiprocessing context.
        Read more in our [Parallel Write Guide](./parallel.md#uncooperative-distributed-writes).

    Parameters
    ----------
    storage : Storage
        The storage configuration for the repository.
    config : RepositoryConfig, optional
        The repository settings. If not provided, a default configuration will be
        loaded from the repository.
    authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
        Authorize Icechunk to access virtual chunks in these containers. A mapping
        from container url_prefix to the credentials to use to access chunks in
        that container. If credential is `None`, they will be fetched from the
        environment, or anonymous credentials will be used if the container allows it.
        As a security measure, Icechunk will block access to virtual chunks if the
        container is not authorized using this argument.

    Returns
    -------
    Self
        An instance of the Repository class.
    """
    return cls(
        await PyRepository.open_async(
            storage,
            config=config,
            authorize_virtual_chunk_access=authorize_virtual_chunk_access,
        )
    )

open_or_create `classmethod` #

open_or_create(storage, config=None, authorize_virtual_chunk_access=None, create_version=None)

Open an existing Icechunk repository or create a new one if it does not exist.

Warning

This method must be used with care in a multiprocessing context. Read more in our Parallel Write Guide.

Attempting to create a Repo concurrently in the same location from multiple processes is not safe. Instead, create a Repo once and then open it concurrently.

Parameters:

Name	Type	Description	Default
`storage`	`Storage`	The storage configuration for the repository.	required
`config`	`RepositoryConfig`	The repository settings. If not provided, a default configuration will be loaded from the repository.	`None`
`authorize_virtual_chunk_access`	`dict[str, AnyCredential \| None]`	Authorize Icechunk to access virtual chunks in these containers. A mapping from container url_prefix to the credentials to use to access chunks in that container. If credential is `None`, they will be fetched from the environment, or anonymous credentials will be used if the container allows it. As a security measure, Icechunk will block access to virtual chunks if the container is not authorized using this argument.	`None`
`create_version`	`int`	Use this version of the spec for the new repository, if it needs to be created. If not passed, the latest version of the spec that was available before the library version release will be used.	`None`

Returns:

Type	Description
`Self`	An instance of the Repository class.

Source code in icechunk-python/python/icechunk/repository.py

@classmethod
def open_or_create(
    cls,
    storage: Storage,
    config: RepositoryConfig | None = None,
    authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
    create_version: int | None = None,
) -> Self:
    """
    Open an existing Icechunk repository or create a new one if it does not exist.

    !!! warning
        This method must be used with care in a multiprocessing context.
        Read more in our [Parallel Write Guide](./parallel.md#uncooperative-distributed-writes).

        Attempting to create a Repo concurrently in the same location from multiple processes is not safe.
        Instead, create a Repo once and then open it concurrently.

    Parameters
    ----------
    storage : Storage
        The storage configuration for the repository.
    config : RepositoryConfig, optional
        The repository settings. If not provided, a default configuration will be
        loaded from the repository.
    authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
        Authorize Icechunk to access virtual chunks in these containers. A mapping
        from container url_prefix to the credentials to use to access chunks in
        that container. If credential is `None`, they will be fetched from the
        environment, or anonymous credentials will be used if the container allows it.
        As a security measure, Icechunk will block access to virtual chunks if the
        container is not authorized using this argument.
    create_version : int, optional
        Use this version of the spec for the new repository, if it needs to be created.
        If not passed, the latest version of the spec that was available before the
        library version release will be used.


    Returns
    -------
    Self
        An instance of the Repository class.
    """
    return cls(
        PyRepository.open_or_create(
            storage,
            config=config,
            authorize_virtual_chunk_access=authorize_virtual_chunk_access,
            create_version=create_version,
        )
    )

open_or_create_async `async` `classmethod` #

open_or_create_async(storage, config=None, authorize_virtual_chunk_access=None, create_version=None)

Open an existing Icechunk repository or create a new one if it does not exist (async version).

Warning

This method must be used with care in a multiprocessing context. Read more in our Parallel Write Guide.

Attempting to create a Repo concurrently in the same location from multiple processes is not safe. Instead, create a Repo once and then open it concurrently.

Parameters:

Name	Type	Description	Default
`storage`	`Storage`	The storage configuration for the repository.	required
`config`	`RepositoryConfig`	The repository settings. If not provided, a default configuration will be loaded from the repository.	`None`
`authorize_virtual_chunk_access`	`dict[str, AnyCredential \| None]`	Authorize Icechunk to access virtual chunks in these containers. A mapping from container url_prefix to the credentials to use to access chunks in that container. If credential is `None`, they will be fetched from the environment, or anonymous credentials will be used if the container allows it. As a security measure, Icechunk will block access to virtual chunks if the container is not authorized using this argument.	`None`
`create_version`	`int`	Use this version of the spec for the new repository, if it needs to be created. If not passed, the latest version of the spec that was available before the library version release will be used.	`None`

Returns:

Type	Description
`Self`	An instance of the Repository class.

Source code in icechunk-python/python/icechunk/repository.py

@classmethod
async def open_or_create_async(
    cls,
    storage: Storage,
    config: RepositoryConfig | None = None,
    authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
    create_version: int | None = None,
) -> Self:
    """
    Open an existing Icechunk repository or create a new one if it does not exist (async version).

    !!! warning
        This method must be used with care in a multiprocessing context.
        Read more in our [Parallel Write Guide](./parallel.md#uncooperative-distributed-writes).

        Attempting to create a Repo concurrently in the same location from multiple processes is not safe.
        Instead, create a Repo once and then open it concurrently.

    Parameters
    ----------
    storage : Storage
        The storage configuration for the repository.
    config : RepositoryConfig, optional
        The repository settings. If not provided, a default configuration will be
        loaded from the repository.
    authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
        Authorize Icechunk to access virtual chunks in these containers. A mapping
        from container url_prefix to the credentials to use to access chunks in
        that container. If credential is `None`, they will be fetched from the
        environment, or anonymous credentials will be used if the container allows it.
        As a security measure, Icechunk will block access to virtual chunks if the
        container is not authorized using this argument.
    create_version : int, optional
        Use this version of the spec for the new repository, if it needs to be created.
        If not passed, the latest version of the spec that was available before the
        library version release will be used.

    Returns
    -------
    Self
        An instance of the Repository class.
    """
    return cls(
        await PyRepository.open_or_create_async(
            storage,
            config=config,
            authorize_virtual_chunk_access=authorize_virtual_chunk_access,
            create_version=create_version,
        )
    )

ops_log #

ops_log()

Get a summary of changes to the repository

Source code in icechunk-python/python/icechunk/repository.py

def ops_log(self) -> Iterator[UpdateType]:
    """
    Get a summary of changes to the repository
    """

    # the returned object is both an Async and Sync iterator
    res = cast(
        Iterator[UpdateType],
        self._repository.async_ops_log(),
    )
    return res

ops_log_async #

ops_log_async()

Get a summary of changes to the repository

Source code in icechunk-python/python/icechunk/repository.py

def ops_log_async(self) -> AsyncIterator[UpdateType]:
    """
    Get a summary of changes to the repository
    """

    # the returned object is both an Async and Sync iterator
    return self._repository.async_ops_log()

readonly_session #

readonly_session(branch=None, *, tag=None, snapshot_id=None, as_of=None)

Create a read-only session.

This can be thought of as a read-only checkout of the repository at a given snapshot. When branch or tag are provided, the session will be based on the tip of the branch or the snapshot ID of the tag.

Parameters:

Name	Type	Description	Default
`branch`	`str`	If provided, the branch to create the session on.	`None`
`tag`	`str`	If provided, the tag to create the session on.	`None`
`snapshot_id`	`str`	If provided, the snapshot ID to create the session on.	`None`
`as_of`	`datetime \| None`	When combined with the branch argument, it will open the session at the last snapshot that is at or before this datetime	`None`

Returns:

Type	Description
`Session`	The read-only session, pointing to the specified snapshot, tag, or branch.

Notes

Only one of the arguments can be specified.

Source code in icechunk-python/python/icechunk/repository.py

def readonly_session(
    self,
    branch: str | None = None,
    *,
    tag: str | None = None,
    snapshot_id: str | None = None,
    as_of: datetime.datetime | None = None,
) -> Session:
    """
    Create a read-only session.

    This can be thought of as a read-only checkout of the repository at a given snapshot.
    When branch or tag are provided, the session will be based on the tip of the branch or
    the snapshot ID of the tag.

    Parameters
    ----------
    branch : str, optional
        If provided, the branch to create the session on.
    tag : str, optional
        If provided, the tag to create the session on.
    snapshot_id : str, optional
        If provided, the snapshot ID to create the session on.
    as_of: datetime.datetime, optional
        When combined with the branch argument, it will open the session at the last
        snapshot that is at or before this datetime

    Returns
    -------
    Session
        The read-only session, pointing to the specified snapshot, tag, or branch.

    Notes
    -----
    Only one of the arguments can be specified.
    """
    return Session(
        self._repository.readonly_session(
            branch=branch, tag=tag, snapshot_id=snapshot_id, as_of=as_of
        )
    )

readonly_session_async `async` #

readonly_session_async(branch=None, *, tag=None, snapshot_id=None, as_of=None)

Create a read-only session (async version).

This can be thought of as a read-only checkout of the repository at a given snapshot. When branch or tag are provided, the session will be based on the tip of the branch or the snapshot ID of the tag.

Parameters:

Name	Type	Description	Default
`branch`	`str`	If provided, the branch to create the session on.	`None`
`tag`	`str`	If provided, the tag to create the session on.	`None`
`snapshot_id`	`str`	If provided, the snapshot ID to create the session on.	`None`
`as_of`	`datetime \| None`	When combined with the branch argument, it will open the session at the last snapshot that is at or before this datetime	`None`

Returns:

Type	Description
`Session`	The read-only session, pointing to the specified snapshot, tag, or branch.

Notes

Only one of the arguments can be specified.

Source code in icechunk-python/python/icechunk/repository.py

async def readonly_session_async(
    self,
    branch: str | None = None,
    *,
    tag: str | None = None,
    snapshot_id: str | None = None,
    as_of: datetime.datetime | None = None,
) -> Session:
    """
    Create a read-only session (async version).

    This can be thought of as a read-only checkout of the repository at a given snapshot.
    When branch or tag are provided, the session will be based on the tip of the branch or
    the snapshot ID of the tag.

    Parameters
    ----------
    branch : str, optional
        If provided, the branch to create the session on.
    tag : str, optional
        If provided, the tag to create the session on.
    snapshot_id : str, optional
        If provided, the snapshot ID to create the session on.
    as_of: datetime.datetime, optional
        When combined with the branch argument, it will open the session at the last
        snapshot that is at or before this datetime

    Returns
    -------
    Session
        The read-only session, pointing to the specified snapshot, tag, or branch.

    Notes
    -----
    Only one of the arguments can be specified.
    """
    return Session(
        await self._repository.readonly_session_async(
            branch=branch, tag=tag, snapshot_id=snapshot_id, as_of=as_of
        )
    )

rearrange_session #

rearrange_session(branch)

Create a session to move/rename nodes in the Zarr hierarchy.

Like the read-only session, this can be thought of as a checkout of the repository at the tip of the branch. However, this session is writable and can be used to make changes to the repository. When ready, the changes can be committed to the branch, after which the session will become a read-only session on the new snapshot.

This session only allows to make changes through Session.move. If you want to modify data, and not only move nodes, use Session.writable_session instead.

Parameters:

Name	Type	Description	Default
`branch`	`str`	The branch to create the session on.	required

Returns:

Type	Description
`Session`	The writable session on the branch.

Source code in icechunk-python/python/icechunk/repository.py

def rearrange_session(self, branch: str) -> Session:
    """
    Create a session to move/rename nodes in the Zarr hierarchy.

    Like the read-only session, this can be thought of as a checkout of the repository at the
    tip of the branch. However, this session is writable and can be used to make changes to the
    repository. When ready, the changes can be committed to the branch, after which the session will
    become a read-only session on the new snapshot.

    This session only allows to make changes through `Session.move`. If you want to modify data, and
    not only move nodes, use `Session.writable_session` instead.

    Parameters
    ----------
    branch : str
        The branch to create the session on.

    Returns
    -------
    Session
        The writable session on the branch.
    """
    return Session(self._repository.rearrange_session(branch))

rearrange_session_async `async` #

rearrange_session_async(branch)

Create a session to move/rename nodes in the Zarr hierarchy.

Like the read-only session, this can be thought of as a checkout of the repository at the tip of the branch. However, this session is writable and can be used to make changes to the repository. When ready, the changes can be committed to the branch, after which the session will become a read-only session on the new snapshot.

This session only allows to make changes through Session.move. If you want to modify data, and not only move nodes, use Session.writable_session instead.

Parameters:

Name	Type	Description	Default
`branch`	`str`	The branch to create the session on.	required

Returns:

Type	Description
`Session`	The writable session on the branch.

Source code in icechunk-python/python/icechunk/repository.py

async def rearrange_session_async(self, branch: str) -> Session:
    """
    Create a session to move/rename nodes in the Zarr hierarchy.

    Like the read-only session, this can be thought of as a checkout of the repository at the
    tip of the branch. However, this session is writable and can be used to make changes to the
    repository. When ready, the changes can be committed to the branch, after which the session will
    become a read-only session on the new snapshot.

    This session only allows to make changes through `Session.move`. If you want to modify data, and
    not only move nodes, use `Session.writable_session` instead.

    Parameters
    ----------
    branch : str
        The branch to create the session on.

    Returns
    -------
    Session
        The writable session on the branch.
    """
    return Session(await self._repository.rearrange_session_async(branch))

reopen #

reopen(config=None, authorize_virtual_chunk_access=None)

Reopen the repository with new configuration or credentials.

Parameters:

Name	Type	Description	Default
`config`	`RepositoryConfig`	The new repository configuration. If not provided, uses the existing configuration.	`None`
`authorize_virtual_chunk_access`	`dict[str, AnyCredential \| None]`	New virtual chunk access credentials.	`None`

Returns:

Type	Description
`Self`	A new Repository instance with the updated configuration.

Source code in icechunk-python/python/icechunk/repository.py

def reopen(
    self,
    config: RepositoryConfig | None = None,
    authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
) -> Self:
    """
    Reopen the repository with new configuration or credentials.

    Parameters
    ----------
    config : RepositoryConfig, optional
        The new repository configuration. If not provided, uses the existing configuration.
    authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
        New virtual chunk access credentials.

    Returns
    -------
    Self
        A new Repository instance with the updated configuration.
    """
    return self.__class__(
        self._repository.reopen(
            config=config,
            authorize_virtual_chunk_access=authorize_virtual_chunk_access,
        )
    )

reopen_async `async` #

reopen_async(config=None, authorize_virtual_chunk_access=None)

Reopen the repository with new configuration or credentials (async version).

Parameters:

Name	Type	Description	Default
`config`	`RepositoryConfig`	The new repository configuration. If not provided, uses the existing configuration.	`None`
`authorize_virtual_chunk_access`	`dict[str, AnyCredential \| None]`	New virtual chunk access credentials.	`None`

Returns:

Type	Description
`Self`	A new Repository instance with the updated configuration.

Source code in icechunk-python/python/icechunk/repository.py

async def reopen_async(
    self,
    config: RepositoryConfig | None = None,
    authorize_virtual_chunk_access: dict[str, AnyCredential | None] | None = None,
) -> Self:
    """
    Reopen the repository with new configuration or credentials (async version).

    Parameters
    ----------
    config : RepositoryConfig, optional
        The new repository configuration. If not provided, uses the existing configuration.
    authorize_virtual_chunk_access : dict[str, AnyCredential | None], optional
        New virtual chunk access credentials.

    Returns
    -------
    Self
        A new Repository instance with the updated configuration.
    """
    return self.__class__(
        await self._repository.reopen_async(
            config=config,
            authorize_virtual_chunk_access=authorize_virtual_chunk_access,
        )
    )

reset_branch #

reset_branch(branch, snapshot_id, *, from_snapshot_id=None)

Reset a branch to a specific snapshot.

This will permanently alter the history of the branch such that the tip of the branch is the specified snapshot.

Parameters:

Name	Type	Description	Default
`branch`	`str`	The branch to reset.	required
`snapshot_id`	`str`	The snapshot ID to reset the branch to.	required
`from_snapshot_id`	`str \| None`	If passed, the reset will only be executed if the branch currently points to from_snapshot_id.	`None`

Returns:

Type	Description
`None`

Source code in icechunk-python/python/icechunk/repository.py

def reset_branch(
    self, branch: str, snapshot_id: str, *, from_snapshot_id: str | None = None
) -> None:
    """
    Reset a branch to a specific snapshot.

    This will permanently alter the history of the branch such that the tip of
    the branch is the specified snapshot.

    Parameters
    ----------
    branch : str
        The branch to reset.
    snapshot_id : str
        The snapshot ID to reset the branch to.
    from_snapshot_id : str | None
        If passed, the reset will only be executed if the branch currently
        points to from_snapshot_id.

    Returns
    -------
    None
    """
    self._repository.reset_branch(branch, snapshot_id, from_snapshot_id)

reset_branch_async `async` #

reset_branch_async(branch, snapshot_id, *, from_snapshot_id=None)

Reset a branch to a specific snapshot (async version).

This will permanently alter the history of the branch such that the tip of the branch is the specified snapshot.

Parameters:

Name	Type	Description	Default
`branch`	`str`	The branch to reset.	required
`snapshot_id`	`str`	The snapshot ID to reset the branch to.	required
`from_snapshot_id`	`str \| None`	If passed, the reset will only be executed if the branch currently points to from_snapshot_id.	`None`

Returns:

Type	Description
`None`

Source code in icechunk-python/python/icechunk/repository.py

async def reset_branch_async(
    self, branch: str, snapshot_id: str, *, from_snapshot_id: str | None = None
) -> None:
    """
    Reset a branch to a specific snapshot (async version).

    This will permanently alter the history of the branch such that the tip of
    the branch is the specified snapshot.

    Parameters
    ----------
    branch : str
        The branch to reset.
    snapshot_id : str
        The snapshot ID to reset the branch to.
    from_snapshot_id : str | None
        If passed, the reset will only be executed if the branch currently
        points to from_snapshot_id.

    Returns
    -------
    None
    """
    await self._repository.reset_branch_async(branch, snapshot_id, from_snapshot_id)

rewrite_manifests #

rewrite_manifests(message, *, branch, metadata=None)

Rewrite manifests for all arrays.

This method will start a new writable session on the specified branch, rewrite manifests for all arrays, and then commits with the specifeid message and metadata.

A JSON representation of the currently active splitting configuration will be stored in the commit's metadata under the key "splitting_config".

Parameters:

Name	Type	Description	Default
`message`	`str`	The message to write with the commit.	required
`branch`	`str`	The branch to commit to.	required
`metadata`	`dict[str, Any] \| None`	Additional metadata to store with the commit snapshot.	`None`

Returns:

Type	Description
`str`	The snapshot ID of the new commit.

Source code in icechunk-python/python/icechunk/repository.py

def rewrite_manifests(
    self, message: str, *, branch: str, metadata: dict[str, Any] | None = None
) -> str:
    """
    Rewrite manifests for all arrays.

    This method will start a new writable session on the specified branch,
    rewrite manifests for all arrays, and then commits with the specifeid ``message``
    and ``metadata``.

    A JSON representation of the currently active splitting configuration will be
    stored in the commit's metadata under the key `"splitting_config"`.

    Parameters
    ----------
    message : str
        The message to write with the commit.
    branch: str
        The branch to commit to.
    metadata : dict[str, Any] | None, optional
        Additional metadata to store with the commit snapshot.

    Returns
    -------
    str
        The snapshot ID of the new commit.

    """
    return self._repository.rewrite_manifests(
        message, branch=branch, metadata=metadata
    )

rewrite_manifests_async `async` #

rewrite_manifests_async(message, *, branch, metadata=None)

Rewrite manifests for all arrays (async version).

This method will start a new writable session on the specified branch, rewrite manifests for all arrays, and then commits with the specifeid message and metadata.

A JSON representation of the currently active splitting configuration will be stored in the commit's metadata under the key "splitting_config".

Parameters:

Name	Type	Description	Default
`message`	`str`	The message to write with the commit.	required
`branch`	`str`	The branch to commit to.	required
`metadata`	`dict[str, Any] \| None`	Additional metadata to store with the commit snapshot.	`None`

Returns:

Type	Description
`str`	The snapshot ID of the new commit.

Source code in icechunk-python/python/icechunk/repository.py

async def rewrite_manifests_async(
    self, message: str, *, branch: str, metadata: dict[str, Any] | None = None
) -> str:
    """
    Rewrite manifests for all arrays (async version).

    This method will start a new writable session on the specified branch,
    rewrite manifests for all arrays, and then commits with the specifeid ``message``
    and ``metadata``.

    A JSON representation of the currently active splitting configuration will be
    stored in the commit's metadata under the key `"splitting_config"`.

    Parameters
    ----------
    message : str
        The message to write with the commit.
    branch: str
        The branch to commit to.
    metadata : dict[str, Any] | None, optional
        Additional metadata to store with the commit snapshot.

    Returns
    -------
    str
        The snapshot ID of the new commit.

    """
    return await self._repository.rewrite_manifests_async(
        message, branch=branch, metadata=metadata
    )

save_config #

save_config()

Save the repository configuration to storage, this configuration will be used in future calls to Repository.open.

Returns:

Type	Description
`None`

Source code in icechunk-python/python/icechunk/repository.py

def save_config(self) -> None:
    """
    Save the repository configuration to storage, this configuration will be used in future calls to Repository.open.

    Returns
    -------
    None
    """
    return self._repository.save_config()

save_config_async `async` #

save_config_async()

Save the repository configuration to storage (async version).

Returns:

Type	Description
`None`

Source code in icechunk-python/python/icechunk/repository.py

async def save_config_async(self) -> None:
    """
    Save the repository configuration to storage (async version).

    Returns
    -------
    None
    """
    return await self._repository.save_config_async()

set_default_commit_metadata #

set_default_commit_metadata(metadata)

Set the default commit metadata for the repository. This is useful for providing addition static system conexted metadata to all commits.

When a commit is made, the metadata will be merged with the metadata provided, with any duplicate keys being overwritten by the metadata provided in the commit.

Warning

This metadata is only applied to sessions that are created after this call. Any open writable sessions will not be affected and will not use the new default metadata.

Parameters:

Name	Type	Description	Default
`metadata`	`dict[str, Any]`	The default commit metadata. Pass an empty dict to clear the default metadata.	required

Source code in icechunk-python/python/icechunk/repository.py

def set_default_commit_metadata(self, metadata: dict[str, Any]) -> None:
    """
    Set the default commit metadata for the repository. This is useful for providing
    addition static system conexted metadata to all commits.

    When a commit is made, the metadata will be merged with the metadata provided, with any
    duplicate keys being overwritten by the metadata provided in the commit.

    !!! warning
        This metadata is only applied to sessions that are created after this call. Any open
        writable sessions will not be affected and will not use the new default metadata.

    Parameters
    ----------
    metadata : dict[str, Any]
        The default commit metadata. Pass an empty dict to clear the default metadata.
    """
    return self._repository.set_default_commit_metadata(metadata)

set_metadata #

set_metadata(metadata)

Set the repository metadata, the passed dict will replace the complete metadata.

If you prefer to only update some metadata values, use Repository.update_metadata

Parameters:

Name	Type	Description	Default
`metadata`	`dict[str, Any]`	The value to use as repository metadata.	required

Source code in icechunk-python/python/icechunk/repository.py

def set_metadata(self, metadata: dict[str, Any]) -> None:
    """
    Set the repository metadata, the passed dict will replace the complete metadata.

    If you prefer to only update some metadata values, use Repository.update_metadata

    Parameters
    ----------
    metadata : dict[str, Any]
        The value to use as repository metadata.
    """
    self._repository.set_metadata(metadata)

set_metadata_async `async` #

set_metadata_async(metadata)

Set the repository metadata, the passed dict will replace the complete metadata.

If you prefer to only update some metadata values, use Repository.update_metadata

Parameters:

Name	Type	Description	Default
`metadata`	`dict[str, Any]`	The value to use as repository metadata.	required

Source code in icechunk-python/python/icechunk/repository.py

async def set_metadata_async(self, metadata: dict[str, Any]) -> None:
    """
    Set the repository metadata, the passed dict will replace the complete metadata.

    If you prefer to only update some metadata values, use Repository.update_metadata

    Parameters
    ----------
    metadata : dict[str, Any]
        The value to use as repository metadata.
    """
    await self._repository.set_metadata_async(metadata)

total_chunks_storage #

total_chunks_storage(*, max_snapshots_in_memory=50, max_compressed_manifest_mem_bytes=512 * 1024 * 1024, max_concurrent_manifest_fetches=500)

Calculate the total storage used for chunks, in bytes.

It reports the storage needed to store all snapshots in the repository that are reachable from any branches or tags. Unreachable snapshots can be generated by using reset_branch or expire_snapshots. The chunks for these snapshots are not included in the result, and they should probably be deleted using garbage_collection.

The result includes only native chunks, not adding virtual or inline chunks.

Parameters:

Name	Type	Description	Default
`max_snapshots_in_memory`	`int`	Don't prefetch more than this many Snapshots to memory.	`50`
`max_compressed_manifest_mem_bytes`	`int`	Don't use more than this memory to store compressed in-flight manifests.	`512 * 1024 * 1024`
`max_concurrent_manifest_fetches`	`int`	Don't run more than this many concurrent manifest fetches.	`500`

Source code in icechunk-python/python/icechunk/repository.py

def total_chunks_storage(
    self,
    *,
    max_snapshots_in_memory: int = 50,
    max_compressed_manifest_mem_bytes: int = 512 * 1024 * 1024,
    max_concurrent_manifest_fetches: int = 500,
) -> int:
    """Calculate the total storage used for chunks, in bytes.

    It reports the storage needed to store all snapshots in the repository that
    are reachable from any branches or tags. Unreachable snapshots can be generated
    by using `reset_branch` or `expire_snapshots`. The chunks for these snapshots
    are not included in the result, and they should probably be deleted using
    `garbage_collection`.

    The result includes only native chunks, not adding virtual or inline chunks.

    Parameters
    ----------
    max_snapshots_in_memory: int
        Don't prefetch more than this many Snapshots to memory.
    max_compressed_manifest_mem_bytes : int
        Don't use more than this memory to store compressed in-flight manifests.
    max_concurrent_manifest_fetches : int
        Don't run more than this many concurrent manifest fetches.
    """

    warnings.warn(
        "The ``total_chunks_storage`` method has been deprecated in favour of the ``chunk_storage_stats`` method. "
        "The new method is superior, as it actually calculates storage size occupied by inlined and virtual chunks in addition to native chunks. "
        "You can still access just the total native bytes: to keep your existing behaviour using API that will not be removed in a future version, "
        "please replace your existing ``.total_chunks_storage(**kwargs)`` method call with ``.chunk_storage_stats(**same_kwargs).native_bytes``.",
        DeprecationWarning,
        stacklevel=2,
    )

    stats = self._repository.chunk_storage_stats(
        max_snapshots_in_memory=max_snapshots_in_memory,
        max_compressed_manifest_mem_bytes=max_compressed_manifest_mem_bytes,
        max_concurrent_manifest_fetches=max_concurrent_manifest_fetches,
    )
    return stats.native_bytes

total_chunks_storage_async `async` #

total_chunks_storage_async(*, max_snapshots_in_memory=50, max_compressed_manifest_mem_bytes=512 * 1024 * 1024, max_concurrent_manifest_fetches=500)

Calculate the total storage used for chunks, in bytes (async version).

It reports the storage needed to store all snapshots in the repository that are reachable from any branches or tags. Unreachable snapshots can be generated by using reset_branch or expire_snapshots. The chunks for these snapshots are not included in the result, and they should probably be deleted using garbage_collection.

The result includes only native chunks, not adding virtual or inline chunks.

Parameters:

Name	Type	Description	Default
`max_snapshots_in_memory`	`int`	Don't prefetch more than this many Snapshots to memory.	`50`
`max_compressed_manifest_mem_bytes`	`int`	Don't use more than this memory to store compressed in-flight manifests.	`512 * 1024 * 1024`
`max_concurrent_manifest_fetches`	`int`	Don't run more than this many concurrent manifest fetches.	`500`

Source code in icechunk-python/python/icechunk/repository.py

async def total_chunks_storage_async(
    self,
    *,
    max_snapshots_in_memory: int = 50,
    max_compressed_manifest_mem_bytes: int = 512 * 1024 * 1024,
    max_concurrent_manifest_fetches: int = 500,
) -> int:
    """Calculate the total storage used for chunks, in bytes (async version).

    It reports the storage needed to store all snapshots in the repository that
    are reachable from any branches or tags. Unreachable snapshots can be generated
    by using `reset_branch` or `expire_snapshots`. The chunks for these snapshots
    are not included in the result, and they should probably be deleted using
    `garbage_collection`.

    The result includes only native chunks, not adding virtual or inline chunks.

    Parameters
    ----------
    max_snapshots_in_memory: int
        Don't prefetch more than this many Snapshots to memory.
    max_compressed_manifest_mem_bytes : int
        Don't use more than this memory to store compressed in-flight manifests.
    max_concurrent_manifest_fetches : int
        Don't run more than this many concurrent manifest fetches.
    """

    warnings.warn(
        "The ``total_chunks_storage_async`` method has been deprecated in favour of the ``chunk_storage_stats_async`` method. "
        "The new method is superior, as it actually calculates storage size occupied by inlined and virtual chunks in addition to native chunks. "
        "You can still access just the total native bytes: to keep your existing behaviour using API that will not be removed in a future version, "
        "please replace your existing ``.total_chunks_storage_async(**kwargs)`` method call with ``.chunk_storage_stats_async(**same_kwargs).native_bytes``.",
        DeprecationWarning,
        stacklevel=2,
    )

    stats = await self._repository.chunk_storage_stats_async(
        max_snapshots_in_memory=max_snapshots_in_memory,
        max_compressed_manifest_mem_bytes=max_compressed_manifest_mem_bytes,
        max_concurrent_manifest_fetches=max_concurrent_manifest_fetches,
    )
    return stats.native_bytes

transaction #

transaction(branch, *, message, metadata=None, rebase_with=None, rebase_tries=1000)

Create a transaction on a branch.

This is a context manager that creates a writable session on the specified branch. When the context is exited, the session will be committed to the branch using the specified message.

Parameters:

Name	Type	Description	Default
`branch`	`str`	The branch to create the transaction on.	required
`message`	`str`	The commit message to use when committing the session.	required
`metadata`	`dict[str, Any] \| None`	Additional metadata to store with the commit snapshot.	`None`
`rebase_with`	`ConflictSolver \| None`	If other session committed while the current session was writing, use Session.rebase with this solver.	`None`
`rebase_tries`	`int`	If other session committed while the current session was writing, use Session.rebase up to this many times in a loop.	`1000`

Yields:

Name	Type	Description
`store`	`IcechunkStore`	A Zarr Store which can be used to interact with the data in the repository.

Source code in icechunk-python/python/icechunk/repository.py

@contextmanager
def transaction(
    self,
    branch: str,
    *,
    message: str,
    metadata: dict[str, Any] | None = None,
    rebase_with: ConflictSolver | None = None,
    rebase_tries: int = 1_000,
) -> Iterator[IcechunkStore]:
    """
    Create a transaction on a branch.

    This is a context manager that creates a writable session on the specified branch.
    When the context is exited, the session will be committed to the branch
    using the specified message.

    Parameters
    ----------
    branch : str
        The branch to create the transaction on.
    message : str
        The commit message to use when committing the session.
    metadata : dict[str, Any] | None, optional
        Additional metadata to store with the commit snapshot.
    rebase_with : ConflictSolver | None, optional
        If other session committed while the current session was writing, use Session.rebase with this solver.
    rebase_tries : int, optional
        If other session committed while the current session was writing, use Session.rebase up to this many times in a loop.

    Yields
    -------
    store : IcechunkStore
        A Zarr Store which can be used to interact with the data in the repository.
    """
    session = self.writable_session(branch)
    yield session.store
    session.commit(
        message=message,
        metadata=metadata,
        rebase_with=rebase_with,
        rebase_tries=rebase_tries,
    )

update_metadata #

update_metadata(metadata)

Update the repository metadata.

The passed dict will be merged with the current metadata, overriding existing keys.

Parameters:

Name	Type	Description	Default
`metadata`	`dict[str, Any]`	The dict to merge into the repository metadata.	required

Source code in icechunk-python/python/icechunk/repository.py

def update_metadata(self, metadata: dict[str, Any]) -> dict[str, Any]:
    """
    Update the repository metadata.

    The passed dict will be merged with the current metadata, overriding existing keys.

    Parameters
    ----------
    metadata : dict[str, Any]
        The dict to merge into the repository metadata.
    """
    return self._repository.update_metadata(metadata)

update_metadata_async `async` #

update_metadata_async(metadata)

Update the repository metadata.

The passed dict will be merged with the current metadata, overriding existing keys.

Parameters:

Name	Type	Description	Default
`metadata`	`dict[str, Any]`	The dict to merge into the repository metadata.	required

Source code in icechunk-python/python/icechunk/repository.py

async def update_metadata_async(self, metadata: dict[str, Any]) -> dict[str, Any]:
    """
    Update the repository metadata.

    The passed dict will be merged with the current metadata, overriding existing keys.

    Parameters
    ----------
    metadata : dict[str, Any]
        The dict to merge into the repository metadata.
    """
    return await self._repository.update_metadata_async(metadata)

writable_session #

writable_session(branch)

Create a writable session on a branch.

Like the read-only session, this can be thought of as a checkout of the repository at the tip of the branch. However, this session is writable and can be used to make changes to the repository. When ready, the changes can be committed to the branch, after which the session will become a read-only session on the new snapshot.

Parameters:

Name	Type	Description	Default
`branch`	`str`	The branch to create the session on.	required

Returns:

Type	Description
`Session`	The writable session on the branch.

Source code in icechunk-python/python/icechunk/repository.py

def writable_session(self, branch: str) -> Session:
    """
    Create a writable session on a branch.

    Like the read-only session, this can be thought of as a checkout of the repository at the
    tip of the branch. However, this session is writable and can be used to make changes to the
    repository. When ready, the changes can be committed to the branch, after which the session will
    become a read-only session on the new snapshot.

    Parameters
    ----------
    branch : str
        The branch to create the session on.

    Returns
    -------
    Session
        The writable session on the branch.
    """
    return Session(self._repository.writable_session(branch))

writable_session_async `async` #

writable_session_async(branch)

Create a writable session on a branch (async version).

Like the read-only session, this can be thought of as a checkout of the repository at the tip of the branch. However, this session is writable and can be used to make changes to the repository. When ready, the changes can be committed to the branch, after which the session will become a read-only session on the new snapshot.

Parameters:

Name	Type	Description	Default
`branch`	`str`	The branch to create the session on.	required

Returns:

Type	Description
`Session`	The writable session on the branch.

Source code in icechunk-python/python/icechunk/repository.py

async def writable_session_async(self, branch: str) -> Session:
    """
    Create a writable session on a branch (async version).

    Like the read-only session, this can be thought of as a checkout of the repository at the
    tip of the branch. However, this session is writable and can be used to make changes to the
    repository. When ready, the changes can be committed to the branch, after which the session will
    become a read-only session on the new snapshot.

    Parameters
    ----------
    branch : str
        The branch to create the session on.

    Returns
    -------
    Session
        The writable session on the branch.
    """
    return Session(await self._repository.writable_session_async(branch))

RepositoryConfig #

Configuration for an Icechunk repository

Methods:

Name	Description
`__init__`	Create a new `RepositoryConfig` object
`clear_virtual_chunk_containers`	Clear all virtual chunk containers from the repository.
`default`	Create a default repository config instance
`get_virtual_chunk_container`	Get the virtual chunk container for the repository associated with the given name.
`merge`	Merge another RepositoryConfig with this one.
`set_virtual_chunk_container`	Set the virtual chunk container for the repository.

Attributes:

Name	Type	Description
`caching`	`CachingConfig \| None`	The caching configuration for the repository.
`compression`	`CompressionConfig \| None`	The compression configuration for the repository.
`get_partial_values_concurrency`	`int \| None`	The number of concurrent requests to make when getting partial values from storage.
`inline_chunk_threshold_bytes`	`int \| None`	The maximum size of a chunk that will be stored inline in the repository. Chunks larger than this size will be written to storage.
`manifest`	`ManifestConfig \| None`	The manifest configuration for the repository.
`max_concurrent_requests`	`int \| None`	The maximum number of concurrent HTTP requests Icechunk will do for this repo.
`storage`	`StorageSettings \| None`	The storage configuration for the repository.
`virtual_chunk_containers`	`dict[str, VirtualChunkContainer] \| None`	The virtual chunk containers for the repository.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class RepositoryConfig:
    """Configuration for an Icechunk repository"""

    def __init__(
        self,
        inline_chunk_threshold_bytes: int | None = None,
        get_partial_values_concurrency: int | None = None,
        compression: CompressionConfig | None = None,
        max_concurrent_requests: int | None = None,
        caching: CachingConfig | None = None,
        storage: StorageSettings | None = None,
        virtual_chunk_containers: dict[str, VirtualChunkContainer] | None = None,
        manifest: ManifestConfig | None = None,
    ) -> None:
        """
        Create a new `RepositoryConfig` object

        Parameters
        ----------
        inline_chunk_threshold_bytes: int | None
            The maximum size of a chunk that will be stored inline in the repository.
        get_partial_values_concurrency: int | None
            The number of concurrent requests to make when getting partial values from storage.
        compression: CompressionConfig | None
            The compression configuration for the repository.
        max_concurrent_requests: int | None
            The maximum number of concurrent HTTP requests Icechunk will do for this repo.
            Default is 256.
        caching: CachingConfig | None
            The caching configuration for the repository.
        storage: StorageSettings | None
            The storage configuration for the repository.
        virtual_chunk_containers: dict[str, VirtualChunkContainer] | None
            The virtual chunk containers for the repository.
        manifest: ManifestConfig | None
            The manifest configuration for the repository.
        """
        ...
    @staticmethod
    def default() -> RepositoryConfig:
        """Create a default repository config instance"""
        ...
    @property
    def inline_chunk_threshold_bytes(self) -> int | None:
        """
        The maximum size of a chunk that will be stored inline in the repository. Chunks larger than this size will be written to storage.
        """
        ...
    @inline_chunk_threshold_bytes.setter
    def inline_chunk_threshold_bytes(self, value: int | None) -> None:
        """
        Set the maximum size of a chunk that will be stored inline in the repository. Chunks larger than this size will be written to storage.
        """
        ...
    @property
    def get_partial_values_concurrency(self) -> int | None:
        """
        The number of concurrent requests to make when getting partial values from storage.

        Returns
        -------
        int | None
            The number of concurrent requests to make when getting partial values from storage.
        """
        ...
    @get_partial_values_concurrency.setter
    def get_partial_values_concurrency(self, value: int | None) -> None:
        """
        Set the number of concurrent requests to make when getting partial values from storage.

        Parameters
        ----------
        value: int | None
            The number of concurrent requests to make when getting partial values from storage.
        """
        ...
    @property
    def compression(self) -> CompressionConfig | None:
        """
        The compression configuration for the repository.

        Returns
        -------
        CompressionConfig | None
            The compression configuration for the repository.
        """
        ...
    @compression.setter
    def compression(self, value: CompressionConfig | None) -> None:
        """
        Set the compression configuration for the repository.

        Parameters
        ----------
        value: CompressionConfig | None
            The compression configuration for the repository.
        """
        ...
    @property
    def max_concurrent_requests(self) -> int | None:
        """
        The maximum number of concurrent HTTP requests Icechunk will do for this repo.

        Returns
        -------
        int | None
            The maximum number of concurrent HTTP requests Icechunk will do for this repo.
        """
        ...
    @max_concurrent_requests.setter
    def max_concurrent_requests(self, value: int | None) -> None:
        """
        Set the maximum number of concurrent HTTP requests Icechunk should do for this repo.

        Parameters
        ----------
        value: int | None
            The maximum allowed.
        """
        ...
    @property
    def caching(self) -> CachingConfig | None:
        """
        The caching configuration for the repository.

        Returns
        -------
        CachingConfig | None
            The caching configuration for the repository.
        """
        ...
    @caching.setter
    def caching(self, value: CachingConfig | None) -> None:
        """
        Set the caching configuration for the repository.

        Parameters
        ----------
        value: CachingConfig | None
            The caching configuration for the repository.
        """
        ...
    @property
    def storage(self) -> StorageSettings | None:
        """
        The storage configuration for the repository.

        Returns
        -------
        StorageSettings | None
            The storage configuration for the repository.
        """
        ...
    @storage.setter
    def storage(self, value: StorageSettings | None) -> None:
        """
        Set the storage configuration for the repository.

        Parameters
        ----------
        value: StorageSettings | None
            The storage configuration for the repository.
        """
        ...
    @property
    def manifest(self) -> ManifestConfig | None:
        """
        The manifest configuration for the repository.

        Returns
        -------
        ManifestConfig | None
            The manifest configuration for the repository.
        """
        ...
    @manifest.setter
    def manifest(self, value: ManifestConfig | None) -> None:
        """
        Set the manifest configuration for the repository.

        Parameters
        ----------
        value: ManifestConfig | None
            The manifest configuration for the repository.
        """
        ...
    @property
    def virtual_chunk_containers(self) -> dict[str, VirtualChunkContainer] | None:
        """
        The virtual chunk containers for the repository.

        Returns
        -------
        dict[str, VirtualChunkContainer] | None
            The virtual chunk containers for the repository.
        """
        ...
    def get_virtual_chunk_container(self, name: str) -> VirtualChunkContainer | None:
        """
        Get the virtual chunk container for the repository associated with the given name.

        Parameters
        ----------
        name: str
            The name of the virtual chunk container to get.

        Returns
        -------
        VirtualChunkContainer | None
            The virtual chunk container for the repository associated with the given name.
        """
        ...
    def set_virtual_chunk_container(self, cont: VirtualChunkContainer) -> None:
        """
        Set the virtual chunk container for the repository.

        Parameters
        ----------
        cont: VirtualChunkContainer
            The virtual chunk container to set.
        """
        ...
    def clear_virtual_chunk_containers(self) -> None:
        """
        Clear all virtual chunk containers from the repository.
        """
        ...
    def merge(self, other: RepositoryConfig) -> RepositoryConfig:
        """
        Merge another RepositoryConfig with this one.

        When merging, values from the other config take precedence. For nested configs
        (compression, caching, manifest, storage), the merge is applied recursively.
        For virtual_chunk_containers, entries from the other config extend this one.

        Parameters
        ----------
        other: RepositoryConfig
            The configuration to merge with this one.

        Returns
        -------
        RepositoryConfig
            A new merged configuration.
        """
        ...

caching `property` `writable` #

caching

The caching configuration for the repository.

Returns:

Type	Description
`CachingConfig \| None`	The caching configuration for the repository.

compression `property` `writable` #

compression

The compression configuration for the repository.

Returns:

Type	Description
`CompressionConfig \| None`	The compression configuration for the repository.

get_partial_values_concurrency `property` `writable` #

get_partial_values_concurrency

The number of concurrent requests to make when getting partial values from storage.

Returns:

Type	Description
`int \| None`	The number of concurrent requests to make when getting partial values from storage.

inline_chunk_threshold_bytes `property` `writable` #

inline_chunk_threshold_bytes

The maximum size of a chunk that will be stored inline in the repository. Chunks larger than this size will be written to storage.

manifest `property` `writable` #

manifest

The manifest configuration for the repository.

Returns:

Type	Description
`ManifestConfig \| None`	The manifest configuration for the repository.

max_concurrent_requests `property` `writable` #

max_concurrent_requests

The maximum number of concurrent HTTP requests Icechunk will do for this repo.

Returns:

Type	Description
`int \| None`	The maximum number of concurrent HTTP requests Icechunk will do for this repo.

storage `property` `writable` #

storage

The storage configuration for the repository.

Returns:

Type	Description
`StorageSettings \| None`	The storage configuration for the repository.

virtual_chunk_containers `property` #

virtual_chunk_containers

The virtual chunk containers for the repository.

Returns:

Type	Description
`dict[str, VirtualChunkContainer] \| None`	The virtual chunk containers for the repository.

init #

__init__(inline_chunk_threshold_bytes=None, get_partial_values_concurrency=None, compression=None, max_concurrent_requests=None, caching=None, storage=None, virtual_chunk_containers=None, manifest=None)

Create a new RepositoryConfig object

Parameters:

Name	Type	Description	Default
`inline_chunk_threshold_bytes`	`int \| None`	The maximum size of a chunk that will be stored inline in the repository.	`None`
`get_partial_values_concurrency`	`int \| None`	The number of concurrent requests to make when getting partial values from storage.	`None`
`compression`	`CompressionConfig \| None`	The compression configuration for the repository.	`None`
`max_concurrent_requests`	`int \| None`	The maximum number of concurrent HTTP requests Icechunk will do for this repo. Default is 256.	`None`
`caching`	`CachingConfig \| None`	The caching configuration for the repository.	`None`
`storage`	`StorageSettings \| None`	The storage configuration for the repository.	`None`
`virtual_chunk_containers`	`dict[str, VirtualChunkContainer] \| None`	The virtual chunk containers for the repository.	`None`
`manifest`	`ManifestConfig \| None`	The manifest configuration for the repository.	`None`

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

def __init__(
    self,
    inline_chunk_threshold_bytes: int | None = None,
    get_partial_values_concurrency: int | None = None,
    compression: CompressionConfig | None = None,
    max_concurrent_requests: int | None = None,
    caching: CachingConfig | None = None,
    storage: StorageSettings | None = None,
    virtual_chunk_containers: dict[str, VirtualChunkContainer] | None = None,
    manifest: ManifestConfig | None = None,
) -> None:
    """
    Create a new `RepositoryConfig` object

    Parameters
    ----------
    inline_chunk_threshold_bytes: int | None
        The maximum size of a chunk that will be stored inline in the repository.
    get_partial_values_concurrency: int | None
        The number of concurrent requests to make when getting partial values from storage.
    compression: CompressionConfig | None
        The compression configuration for the repository.
    max_concurrent_requests: int | None
        The maximum number of concurrent HTTP requests Icechunk will do for this repo.
        Default is 256.
    caching: CachingConfig | None
        The caching configuration for the repository.
    storage: StorageSettings | None
        The storage configuration for the repository.
    virtual_chunk_containers: dict[str, VirtualChunkContainer] | None
        The virtual chunk containers for the repository.
    manifest: ManifestConfig | None
        The manifest configuration for the repository.
    """
    ...

clear_virtual_chunk_containers #

clear_virtual_chunk_containers()

Clear all virtual chunk containers from the repository.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

def clear_virtual_chunk_containers(self) -> None:
    """
    Clear all virtual chunk containers from the repository.
    """
    ...

default `staticmethod` #

default()

Create a default repository config instance

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

@staticmethod
def default() -> RepositoryConfig:
    """Create a default repository config instance"""
    ...

get_virtual_chunk_container #

get_virtual_chunk_container(name)

Get the virtual chunk container for the repository associated with the given name.

Parameters:

Name	Type	Description	Default
`name`	`str`	The name of the virtual chunk container to get.	required

Returns:

Type	Description
`VirtualChunkContainer \| None`	The virtual chunk container for the repository associated with the given name.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

def get_virtual_chunk_container(self, name: str) -> VirtualChunkContainer | None:
    """
    Get the virtual chunk container for the repository associated with the given name.

    Parameters
    ----------
    name: str
        The name of the virtual chunk container to get.

    Returns
    -------
    VirtualChunkContainer | None
        The virtual chunk container for the repository associated with the given name.
    """
    ...

merge #

merge(other)

Merge another RepositoryConfig with this one.

When merging, values from the other config take precedence. For nested configs (compression, caching, manifest, storage), the merge is applied recursively. For virtual_chunk_containers, entries from the other config extend this one.

Parameters:

Name	Type	Description	Default
`other`	`RepositoryConfig`	The configuration to merge with this one.	required

Returns:

Type	Description
`RepositoryConfig`	A new merged configuration.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

def merge(self, other: RepositoryConfig) -> RepositoryConfig:
    """
    Merge another RepositoryConfig with this one.

    When merging, values from the other config take precedence. For nested configs
    (compression, caching, manifest, storage), the merge is applied recursively.
    For virtual_chunk_containers, entries from the other config extend this one.

    Parameters
    ----------
    other: RepositoryConfig
        The configuration to merge with this one.

    Returns
    -------
    RepositoryConfig
        A new merged configuration.
    """
    ...

set_virtual_chunk_container #

set_virtual_chunk_container(cont)

Set the virtual chunk container for the repository.

Parameters:

Name	Type	Description	Default
`cont`	`VirtualChunkContainer`	The virtual chunk container to set.	required

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

def set_virtual_chunk_container(self, cont: VirtualChunkContainer) -> None:
    """
    Set the virtual chunk container for the repository.

    Parameters
    ----------
    cont: VirtualChunkContainer
        The virtual chunk container to set.
    """
    ...

S3Credentials #

Credentials for an S3 storage backend

Classes:

Name	Description
`Anonymous`	Does not sign requests, useful for public buckets
`FromEnv`	Uses credentials from environment variables
`Refreshable`	Allows for an outside authority to pass in a function that can be used to provide credentials.
`Static`	Uses s3 credentials without expiration

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class S3Credentials:
    """Credentials for an S3 storage backend"""
    class FromEnv:
        """Uses credentials from environment variables"""
        def __init__(self) -> None: ...

    class Anonymous:
        """Does not sign requests, useful for public buckets"""
        def __init__(self) -> None: ...

    class Static:
        """Uses s3 credentials without expiration

        Parameters
        ----------
        credentials: S3StaticCredentials
            The credentials to use for authentication.
        """
        def __init__(self, credentials: S3StaticCredentials) -> None: ...

    class Refreshable:
        """Allows for an outside authority to pass in a function that can be used to provide credentials.

        This is useful for credentials that have an expiration time, or are otherwise not known ahead of time.

        Parameters
        ----------
        pickled_function: bytes
            The pickled function to use to provide credentials.
        current: S3StaticCredentials
            The initial credentials. They will be returned the first time credentials
            are requested and then deleted.
        """
        def __init__(
            self, pickled_function: bytes, current: S3StaticCredentials | None = None
        ) -> None: ...

Anonymous #

Does not sign requests, useful for public buckets

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class Anonymous:
    """Does not sign requests, useful for public buckets"""
    def __init__(self) -> None: ...

FromEnv #

Uses credentials from environment variables

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class FromEnv:
    """Uses credentials from environment variables"""
    def __init__(self) -> None: ...

Refreshable #

Allows for an outside authority to pass in a function that can be used to provide credentials.

This is useful for credentials that have an expiration time, or are otherwise not known ahead of time.

Parameters:

Name	Type	Description	Default
`pickled_function`	`bytes`	The pickled function to use to provide credentials.	required
`current`	`S3StaticCredentials \| None`	The initial credentials. They will be returned the first time credentials are requested and then deleted.	`None`

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class Refreshable:
    """Allows for an outside authority to pass in a function that can be used to provide credentials.

    This is useful for credentials that have an expiration time, or are otherwise not known ahead of time.

    Parameters
    ----------
    pickled_function: bytes
        The pickled function to use to provide credentials.
    current: S3StaticCredentials
        The initial credentials. They will be returned the first time credentials
        are requested and then deleted.
    """
    def __init__(
        self, pickled_function: bytes, current: S3StaticCredentials | None = None
    ) -> None: ...

Static #

Uses s3 credentials without expiration

Parameters:

Name	Type	Description	Default
`credentials`	`S3StaticCredentials`	The credentials to use for authentication.	required

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class Static:
    """Uses s3 credentials without expiration

    Parameters
    ----------
    credentials: S3StaticCredentials
        The credentials to use for authentication.
    """
    def __init__(self, credentials: S3StaticCredentials) -> None: ...

S3Options #

Options for accessing an S3-compatible storage backend

Methods:

Name	Description
`__init__`	Create a new `S3Options` object

Attributes:

Name	Type	Description
`allow_http`	`bool`	Whether HTTP requests are allowed for the storage backend.
`anonymous`	`bool`	Whether to use anonymous credentials (unsigned requests).
`endpoint_url`	`str \| None`	Optional endpoint URL for the storage backend.
`force_path_style`	`bool`	Whether to force path-style bucket addressing.
`network_stream_timeout_seconds`	`int \| None`	Timeout in seconds for idle network streams.
`region`	`str \| None`	Optional region to use for the storage backend.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class S3Options:
    """Options for accessing an S3-compatible storage backend"""
    def __init__(
        self,
        region: str | None = None,
        endpoint_url: str | None = None,
        allow_http: bool = False,
        anonymous: bool = False,
        force_path_style: bool = False,
        network_stream_timeout_seconds: int | None = None,
        requester_pays: bool = False,
    ) -> None:
        """
        Create a new `S3Options` object

        Parameters
        ----------
        region: str | None
            Optional, the region to use for the storage backend.
        endpoint_url: str | None
            Optional, the endpoint URL to use for the storage backend.
        allow_http: bool
            Whether to allow HTTP requests to the storage backend.
        anonymous: bool
            Whether to use anonymous credentials to the storage backend. When `True`, the s3 requests will not be signed.
        force_path_style: bool
            Whether to force use of path-style addressing for buckets.
        network_stream_timeout_seconds: int | None
            Timeout requests if no bytes can be transmitted during this period of time.
            If set to 0, timeout is disabled. Default is 60 seconds.
        requester_pays: bool
            Enable requester pays for S3 buckets
        """

    @property
    def region(self) -> str | None:
        """
        Optional region to use for the storage backend.

        Returns
        -------
        str | None
            The region configured for the storage backend.
        """
        ...

    @region.setter
    def region(self, value: str | None) -> None:
        """
        Set the region to use for the storage backend.

        Parameters
        ----------
        value: str | None
            The region to use for the storage backend.
        """
        ...

    @property
    def endpoint_url(self) -> str | None:
        """
        Optional endpoint URL for the storage backend.

        Returns
        -------
        str | None
            The endpoint URL configured for the storage backend.
        """
        ...

    @endpoint_url.setter
    def endpoint_url(self, value: str | None) -> None:
        """
        Set the endpoint URL for the storage backend.

        Parameters
        ----------
        value: str | None
            The endpoint URL to use for the storage backend.
        """
        ...

    @property
    def allow_http(self) -> bool:
        """
        Whether HTTP requests are allowed for the storage backend.

        Returns
        -------
        bool
            ``True`` when HTTP requests to the storage backend are permitted.
        """
        ...

    @allow_http.setter
    def allow_http(self, value: bool) -> None:
        """
        Set whether HTTP requests are allowed for the storage backend.

        Parameters
        ----------
        value: bool
            ``True`` to allow HTTP requests to the storage backend, ``False`` otherwise.
        """
        ...

    @property
    def anonymous(self) -> bool:
        """
        Whether to use anonymous credentials (unsigned requests).

        Returns
        -------
        bool
            ``True`` when anonymous access is configured.
        """
        ...

    @anonymous.setter
    def anonymous(self, value: bool) -> None:
        """
        Set whether to use anonymous credentials.

        Parameters
        ----------
        value: bool
            ``True`` to perform unsigned requests, ``False`` to sign requests.
        """
        ...

    @property
    def force_path_style(self) -> bool:
        """
        Whether to force path-style bucket addressing.

        Returns
        -------
        bool
            ``True`` when path-style addressing is forced.
        """
        ...

    @force_path_style.setter
    def force_path_style(self, value: bool) -> None:
        """
        Set whether to force path-style bucket addressing.

        Parameters
        ----------
        value: bool
            ``True`` to always use path-style addressing, ``False`` to allow virtual-host style.
        """
        ...

    @property
    def network_stream_timeout_seconds(self) -> int | None:
        """
        Timeout in seconds for idle network streams.

        Returns
        -------
        int | None
            The timeout duration; ``0`` disables the timeout and ``None`` uses the default.
        """
        ...

    @network_stream_timeout_seconds.setter
    def network_stream_timeout_seconds(self, value: int | None) -> None:
        """
        Set the timeout for idle network streams.

        Parameters
        ----------
        value: int | None
            Timeout duration in seconds. Use ``0`` to disable or ``None`` for the default.
        """
        ...

allow_http `property` `writable` #

allow_http

Whether HTTP requests are allowed for the storage backend.

Returns:

Type	Description
`bool`	`True` when HTTP requests to the storage backend are permitted.

anonymous `property` `writable` #

anonymous

Whether to use anonymous credentials (unsigned requests).

Returns:

Type	Description
`bool`	`True` when anonymous access is configured.

endpoint_url `property` `writable` #

endpoint_url

Optional endpoint URL for the storage backend.

Returns:

Type	Description
`str \| None`	The endpoint URL configured for the storage backend.

force_path_style `property` `writable` #

force_path_style

Whether to force path-style bucket addressing.

Returns:

Type	Description
`bool`	`True` when path-style addressing is forced.

network_stream_timeout_seconds `property` `writable` #

network_stream_timeout_seconds

Timeout in seconds for idle network streams.

Returns:

Type	Description
`int \| None`	The timeout duration; `0` disables the timeout and `None` uses the default.

region `property` `writable` #

region

Optional region to use for the storage backend.

Returns:

Type	Description
`str \| None`	The region configured for the storage backend.

init #

__init__(region=None, endpoint_url=None, allow_http=False, anonymous=False, force_path_style=False, network_stream_timeout_seconds=None, requester_pays=False)

Create a new S3Options object

Parameters:

Name	Type	Description	Default
`region`	`str \| None`	Optional, the region to use for the storage backend.	`None`
`endpoint_url`	`str \| None`	Optional, the endpoint URL to use for the storage backend.	`None`
`allow_http`	`bool`	Whether to allow HTTP requests to the storage backend.	`False`
`anonymous`	`bool`	Whether to use anonymous credentials to the storage backend. When `True`, the s3 requests will not be signed.	`False`
`force_path_style`	`bool`	Whether to force use of path-style addressing for buckets.	`False`
`network_stream_timeout_seconds`	`int \| None`	Timeout requests if no bytes can be transmitted during this period of time. If set to 0, timeout is disabled. Default is 60 seconds.	`None`
`requester_pays`	`bool`	Enable requester pays for S3 buckets	`False`

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

def __init__(
    self,
    region: str | None = None,
    endpoint_url: str | None = None,
    allow_http: bool = False,
    anonymous: bool = False,
    force_path_style: bool = False,
    network_stream_timeout_seconds: int | None = None,
    requester_pays: bool = False,
) -> None:
    """
    Create a new `S3Options` object

    Parameters
    ----------
    region: str | None
        Optional, the region to use for the storage backend.
    endpoint_url: str | None
        Optional, the endpoint URL to use for the storage backend.
    allow_http: bool
        Whether to allow HTTP requests to the storage backend.
    anonymous: bool
        Whether to use anonymous credentials to the storage backend. When `True`, the s3 requests will not be signed.
    force_path_style: bool
        Whether to force use of path-style addressing for buckets.
    network_stream_timeout_seconds: int | None
        Timeout requests if no bytes can be transmitted during this period of time.
        If set to 0, timeout is disabled. Default is 60 seconds.
    requester_pays: bool
        Enable requester pays for S3 buckets
    """

S3StaticCredentials #

Credentials for an S3 storage backend

Attributes: access_key_id: str The access key ID to use for authentication. secret_access_key: str The secret access key to use for authentication. session_token: str | None The session token to use for authentication. expires_after: datetime.datetime | None Optional, the expiration time of the credentials.

Methods:

Name	Description
`__init__`	Create a new `S3StaticCredentials` object

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class S3StaticCredentials:
    """Credentials for an S3 storage backend

    Attributes:
        access_key_id: str
            The access key ID to use for authentication.
        secret_access_key: str
            The secret access key to use for authentication.
        session_token: str | None
            The session token to use for authentication.
        expires_after: datetime.datetime | None
            Optional, the expiration time of the credentials.
    """

    access_key_id: str
    secret_access_key: str
    session_token: str | None
    expires_after: datetime.datetime | None

    def __init__(
        self,
        access_key_id: str,
        secret_access_key: str,
        session_token: str | None = None,
        expires_after: datetime.datetime | None = None,
    ):
        """
        Create a new `S3StaticCredentials` object

        Parameters
        ----------
        access_key_id: str
            The access key ID to use for authentication.
        secret_access_key: str
            The secret access key to use for authentication.
        session_token: str | None
            Optional, the session token to use for authentication.
        expires_after: datetime.datetime | None
            Optional, the expiration time of the credentials.
        """
        ...

init #

__init__(access_key_id, secret_access_key, session_token=None, expires_after=None)

Create a new S3StaticCredentials object

Parameters:

Name	Type	Description	Default
`access_key_id`	`str`	The access key ID to use for authentication.	required
`secret_access_key`	`str`	The secret access key to use for authentication.	required
`session_token`	`str \| None`	Optional, the session token to use for authentication.	`None`
`expires_after`	`datetime \| None`	Optional, the expiration time of the credentials.	`None`

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

def __init__(
    self,
    access_key_id: str,
    secret_access_key: str,
    session_token: str | None = None,
    expires_after: datetime.datetime | None = None,
):
    """
    Create a new `S3StaticCredentials` object

    Parameters
    ----------
    access_key_id: str
        The access key ID to use for authentication.
    secret_access_key: str
        The secret access key to use for authentication.
    session_token: str | None
        Optional, the session token to use for authentication.
    expires_after: datetime.datetime | None
        Optional, the expiration time of the credentials.
    """
    ...

Session #

A session object that allows for reading and writing data from an Icechunk repository.

Methods:

Name	Description
`all_virtual_chunk_locations`	Return the location URLs of all virtual chunks.
`all_virtual_chunk_locations_async`	Return the location URLs of all virtual chunks (async version).
`allow_pickling`	Context manager to allow unpickling this store if writable.
`amend`	Commit the changes in the session to the repository, by amending/overwriting the previous commit.
`amend_async`	Commit the changes in the session to the repository, by amending/overwriting the previous commit.
`chunk_coordinates`	Return an async iterator to all initialized chunks for the array at array_path
`chunk_type`	Return the chunk type for the specified coordinates
`chunk_type_async`	Return the chunk type for the specified coordinates
`commit`	Commit the changes in the session to the repository.
`commit_async`	Commit the changes in the session to the repository (async version).
`discard_changes`	When the session is writable, discard any uncommitted changes.
`flush`	Save the changes in the session to a new snapshot without modifying the current branch.
`flush_async`	Save the changes in the session to a new snapshot without modifying the current branch.
`fork`	Create a child session that can be pickled to a worker job and later merged.
`merge`	Merge the changes for this session with the changes from another session.
`merge_async`	Merge the changes for this session with the changes from another session (async version).
`move`	Move or rename a node (array or group) in the hierarchy.
`move_async`	Async version of :meth:`move`.
`rebase`	Rebase the session to the latest ancestry of the branch.
`rebase_async`	Rebase the session to the latest ancestry of the branch (async version).
`reindex_array`	Reindex chunks in an array by applying a transformation function.
`roll_array`	Roll (circular shift) all chunks in an array by the given chunk offset.
`shift_array`	Shift all chunks in an array by the given chunk offset.
`status`	Compute an overview of the current session changes

Attributes:

Name	Type	Description
`branch`	`str \| None`	The branch that the session is based on. This is only set if the session is writable.
`config`	`RepositoryConfig`	Get the repository configuration.
`has_uncommitted_changes`	`bool`	Whether the session has uncommitted changes. This is only possibly true if the session is writable.
`mode`	`SessionMode`	The mode of this session.
`read_only`	`bool`	Whether the session is read-only.
`snapshot_id`	`str`	The base snapshot ID of the session.
`store`	`IcechunkStore`	Get a zarr Store object for reading and writing data from the repository using zarr python.

Source code in icechunk-python/python/icechunk/session.py

class Session:
    """A session object that allows for reading and writing data from an Icechunk repository."""

    _session: PySession
    _allow_changes: bool

    def __init__(self, session: PySession):
        self._session = session
        self._allow_changes = False

    def __eq__(self, value: object) -> bool:
        if not isinstance(value, Session):
            return False
        return self._session == value._session

    def __getstate__(self) -> object:
        if not self.read_only:
            raise ValueError(
                "You must opt-in to pickle writable sessions in a distributed context "
                "using Session.fork(). "
                "See https://icechunk.io/en/stable/parallel/#distributed-writes for more. "
                "If you are using xarray's `Dataset.to_zarr` method to write dask arrays, "
                "please use `icechunk.xarray.to_icechunk` instead. "
                "If you are using dask & distributed or multi-processing to read/write from the same repository, "
                "then pass a readonly session created using Repository.readonly_session for the read step. "
                "Alternatively, make sure to pass the ForkSession created by Session.fork() for the read step. "
            )
        state = {
            "_session": self._session.as_bytes(),
            "_allow_changes": self._allow_changes,
        }
        return state

    def __setstate__(self, state: object) -> None:
        if not isinstance(state, dict):
            raise ValueError("Invalid state")
        self._session = PySession.from_bytes(state["_session"])
        self._allow_changes = state["_allow_changes"]

    @contextlib.contextmanager
    def allow_pickling(self) -> Generator[None, None, None]:
        """
        Context manager to allow unpickling this store if writable.
        """
        raise RuntimeError(
            "The allow_pickling context manager has been removed. "
            "Use the new `Session.fork` API instead. "
            # FIXME: Add link to docs
            "Better yet, use `to_icechunk` if that will fit your needs."
        )

    @property
    def read_only(self) -> bool:
        """
        Whether the session is read-only.

        Returns
        -------
        bool
            True if the session is read-only, False otherwise.
        """
        return self._session.read_only

    @property
    def mode(self) -> SessionMode:
        """
        The mode of this session.

        Returns
        -------
        SessionMode
            The session mode - one of READONLY, WRITABLE, or REARRANGE.
        """
        return self._session.mode

    @property
    def snapshot_id(self) -> str:
        """
        The base snapshot ID of the session.

        Returns
        -------
        str
            The base snapshot ID of the session.
        """
        return self._session.snapshot_id

    @property
    def branch(self) -> str | None:
        """
        The branch that the session is based on. This is only set if the session is writable.

        Returns
        -------
        str or None
            The branch that the session is based on if the session is writable, None otherwise.
        """
        return self._session.branch

    @property
    def has_uncommitted_changes(self) -> bool:
        """
        Whether the session has uncommitted changes. This is only possibly true if the session is writable.

        Returns
        -------
        bool
            True if the session has uncommitted changes, False otherwise.
        """
        return self._session.has_uncommitted_changes

    def status(self) -> Diff:
        """
        Compute an overview of the current session changes

        Returns
        -------
        Diff
            The operations executed in the current session but still not committed.
        """
        return self._session.status()

    def discard_changes(self) -> None:
        """
        When the session is writable, discard any uncommitted changes.
        """
        self._session.discard_changes()

    @property
    def store(self) -> IcechunkStore:
        """
        Get a zarr Store object for reading and writing data from the repository using zarr python.

        Returns
        -------
        IcechunkStore
            A zarr Store object for reading and writing data from the repository.
        """
        return IcechunkStore(self._session.store, for_fork=False)

    @property
    def config(self) -> RepositoryConfig:
        """
        Get the repository configuration.

        Notice that changes to the returned object won't be impacted. To change configuration values
        use `Repository.reopen`.

        Returns
        -------
        RepositoryConfig
            The config for the repository that owns this session.
        """
        return self._session.config

    def move(self, from_path: str, to_path: str) -> None:
        """Move or rename a node (array or group) in the hierarchy.

        This is a metadata-only operation—no data is copied. Requires a rearrange session.

        Parameters
        ----------
        from_path : str
            The current path of the node (e.g., "/data/raw").
        to_path : str
            The new path for the node (e.g., "/data/v1").

        Examples
        --------
        >>> session = repo.rearrange_session("main")
        >>> session.move("/data/raw", "/data/v1")
        >>> session.commit("Renamed raw to v1")
        """
        return self._session.move_node(from_path, to_path)

    async def move_async(self, from_path: str, to_path: str) -> None:
        """Async version of :meth:`move`."""
        return await self._session.move_node_async(from_path, to_path)

    def all_virtual_chunk_locations(self) -> list[str]:
        """
        Return the location URLs of all virtual chunks.

        Returns
        -------
        list of str
            The location URLs of all virtual chunks.
        """
        return self._session.all_virtual_chunk_locations()

    def reindex_array(
        self,
        array_path: str,
        shift_chunk: Callable[[Iterable[int]], Iterable[int] | None],
    ) -> None:
        """Reindex chunks in an array by applying a transformation function.

        Parameters
        ----------
        array_path : str
            Path to the array.
        shift_chunk : Callable
            Function that receives chunk coordinates and returns new coordinates,
            or None to discard the chunk.
        """
        return self._session.reindex_array(array_path, shift_chunk)

    def shift_array(
        self,
        array_path: str,
        chunk_offset: Iterable[int],
    ) -> tuple[int, ...]:
        """Shift all chunks in an array by the given chunk offset.

        Chunks that shift out of bounds are discarded. Vacated positions retain
        stale chunk references — the caller typically writes new data there.

        Parameters
        ----------
        array_path : str
            The path to the array to shift.
        chunk_offset : Iterable[int]
            Offset added to each chunk coordinate. A chunk at index ``x`` moves
            to ``x + chunk_offset``. For a 3D array, ``chunk_offset=(1, 0, -2)``
            moves the chunk at ``(i, j, k)`` to ``(i+1, j, k-2)``.

        Returns
        -------
        tuple[int, ...]
            The shift in element space (``chunk_offset * chunk_size`` per dimension).
            For example, with ``chunk_size=10`` and ``chunk_offset=(2,)``, returns
            ``(20,)`` — useful for slicing the region that needs new data.

        Notes
        -----
        To shift right while preserving all data, first resize the array using zarr's
        array.resize(), then use shift_array.
        """
        return tuple(self._session.shift_array(array_path, list(chunk_offset)))

    def roll_array(
        self,
        array_path: str,
        chunk_offset: Iterable[int],
    ) -> tuple[int, ...]:
        """Roll (circular shift) all chunks in an array by the given chunk offset.

        Chunks that shift out of one end wrap around to the other side.
        No data is lost — this is a circular buffer operation.

        Parameters
        ----------
        array_path : str
            The path to the array to roll.
        chunk_offset : Iterable[int]
            Offset added to each chunk coordinate (with wraparound). A chunk at
            index ``x`` moves to ``(x + chunk_offset) % num_chunks``.

        Returns
        -------
        tuple[int, ...]
            The index shift in element space (chunk_offset * chunk_size for each dimension).
        """
        return tuple(self._session.roll_array(array_path, list(chunk_offset)))

    async def all_virtual_chunk_locations_async(self) -> list[str]:
        """
        Return the location URLs of all virtual chunks (async version).

        Returns
        -------
        list of str
            The location URLs of all virtual chunks.
        """
        return await self._session.all_virtual_chunk_locations_async()

    async def chunk_coordinates(
        self, array_path: str, batch_size: int = 1000
    ) -> AsyncIterator[tuple[int, ...]]:
        """
        Return an async iterator to all initialized chunks for the array at array_path

        Returns
        -------
        an async iterator to chunk coordinates as tuples
        """
        # We do unbatching here to improve speed. Switching to rust to get
        # a batch is much faster than switching for every element
        async for batch in self._session.chunk_coordinates(array_path, batch_size):
            for coord in batch:
                yield tuple(coord)

    def chunk_type(
        self,
        array_path: str,
        chunk_coordinates: Sequence[int],
    ) -> ChunkType:
        """
        Return the chunk type for the specified coordinates

        Parameters
        ----------
        array_path : str
            The path to the array inside the Zarr store. Example: "/groupA/groupB/outputs/my-array".
        chunk_coordinates: Sequence[int]
            A sequence of integers (list or tuple) used to locate the chunk. Example: [0, 1, 5].

        Returns
        -------
        ChunkType
            One of the supported chunk types.
        """
        return self._session.chunk_type(array_path, chunk_coordinates)

    async def chunk_type_async(
        self,
        array_path: str,
        chunk_coordinates: Sequence[int],
    ) -> ChunkType:
        """
        Return the chunk type for the specified coordinates

        Parameters
        ----------
        array_path : str
            The path to the array inside the Zarr store. Example: "/groupA/groupB/outputs/my-array".
        chunk_coordinates: Sequence[int]
            A sequence of integers (list or tuple) used to locate the chunk. Example: [0, 1, 5].

        Returns
        -------
        ChunkType
            One of the supported chunk types.
        """
        return await self._session.chunk_type_async(array_path, chunk_coordinates)

    def merge(self, *others: "ForkSession") -> None:
        """
        Merge the changes for this session with the changes from another session.

        Parameters
        ----------
        others : ForkSession
            The forked sessions to merge changes from.
        """
        for other in others:
            if not isinstance(other, ForkSession):
                raise TypeError(
                    "Sessions can only be merged with a ForkSession created with Session.fork(). "
                    f"Received {type(other).__name__} instead."
                )
            self._session.merge(other._session)
        self._allow_changes = False

    async def merge_async(self, *others: "ForkSession") -> None:
        """
        Merge the changes for this session with the changes from another session (async version).

        Parameters
        ----------
        others : ForkSession
            The forked sessions to merge changes from.
        """
        for other in others:
            if not isinstance(other, ForkSession):
                raise TypeError(
                    "Sessions can only be merged with a ForkSession created with Session.fork(). "
                    f"Received {type(other).__name__} instead."
                )
            await self._session.merge_async(other._session)
        self._allow_changes = False

    def commit(
        self,
        message: str,
        metadata: dict[str, Any] | None = None,
        rebase_with: ConflictSolver | None = None,
        rebase_tries: int = 1_000,
        allow_empty: bool = False,
    ) -> str:
        """
        Commit the changes in the session to the repository.

        When successful, the writable session is completed and the session is now read-only and based on the new commit. The snapshot ID of the new commit is returned.

        If the session is out of date, this will raise a ConflictError exception depicting the conflict that occurred. The session will need to be rebased before committing.

        Parameters
        ----------
        message : str
            The message to write with the commit.
        metadata : dict[str, Any] | None, optional
            Additional metadata to store with the commit snapshot.
        rebase_with : ConflictSolver | None, optional
            If other session committed while the current session was writing, use Session.rebase with this solver.
        rebase_tries : int, optional
            If other session committed while the current session was writing, use Session.rebase up to this many times in a loop.
        allow_empty : bool, optional
            If True, allow creating a commit even if there are no changes. Default is False.

        Returns
        -------
        str
            The snapshot ID of the new commit.

        Raises
        ------
        icechunk.ConflictError
            If the session is out of date and a conflict occurs.
        icechunk.NoChangesToCommitError
            If there are no changes to commit and allow_empty is False.
        """
        if self._allow_changes:
            warnings.warn(
                "Committing a session after forking, and without merging will not work. "
                "Merge back in the remote changes first using Session.merge().",
                UserWarning,
                stacklevel=2,
            )
        return self._session.commit(
            message,
            metadata,
            rebase_with=rebase_with,
            rebase_tries=rebase_tries,
            allow_empty=allow_empty,
        )

    async def commit_async(
        self,
        message: str,
        metadata: dict[str, Any] | None = None,
        rebase_with: ConflictSolver | None = None,
        rebase_tries: int = 1_000,
        allow_empty: bool = False,
    ) -> str:
        """
        Commit the changes in the session to the repository (async version).

        When successful, the writable session is completed and the session is now read-only and based on the new commit. The snapshot ID of the new commit is returned.

        If the session is out of date, this will raise a ConflictError exception depicting the conflict that occurred. The session will need to be rebased before committing.

        Parameters
        ----------
        message : str
            The message to write with the commit.
        metadata : dict[str, Any] | None, optional
            Additional metadata to store with the commit snapshot.
        rebase_with : ConflictSolver | None, optional
            If other session committed while the current session was writing, use Session.rebase with this solver.
        rebase_tries : int, optional
            If other session committed while the current session was writing, use Session.rebase up to this many times in a loop.
        allow_empty : bool, optional
            If True, allow creating a commit even if there are no changes. Default is False.

        Returns
        -------
        str
            The snapshot ID of the new commit.

        Raises
        ------
        icechunk.ConflictError
            If the session is out of date and a conflict occurs.
        icechunk.NoChangesToCommitError
            If there are no changes to commit and allow_empty is False.
        """
        if self._allow_changes:
            warnings.warn(
                "Committing a session after forking, and without merging will not work. "
                "Merge back in the remote changes first using Session.merge().",
                UserWarning,
                stacklevel=2,
            )
        return await self._session.commit_async(
            message,
            metadata,
            rebase_with=rebase_with,
            rebase_tries=rebase_tries,
            allow_empty=allow_empty,
        )

    def amend(
        self,
        message: str,
        metadata: dict[str, Any] | None = None,
        allow_empty: bool = False,
    ) -> str:
        """
        Commit the changes in the session to the repository, by amending/overwriting the previous commit.

        When successful, the writable session is completed and the session is now read-only and based on the new commit. The snapshot ID of the new commit is returned.

        If the session is out of date, this will raise a ConflictError exception depicting the conflict that occurred. The session will need to be rebased before committing.

        This operation doesn't create a new commit in the repo ancestry. It replaces the previous commit.

        The first commit to the repo cannot be amended.

        Parameters
        ----------
        message : str
            The message to write with the commit.
        metadata : dict[str, Any] | None, optional
            Additional metadata to store with the commit snapshot.
        allow_empty : bool, optional
            If True, allow amending even if no data changes have been made to the session.
            This is useful when you only want to update the commit message. Default is False.

        Returns
        -------
        str
            The snapshot ID of the new commit.

        Raises
        ------
        icechunk.ConflictError
            If the session is out of date and a conflict occurs.
        """
        if self._allow_changes:
            warnings.warn(
                "Committing a session after forking, and without merging will not work. "
                "Merge back in the remote changes first using Session.merge().",
                UserWarning,
                stacklevel=2,
            )
        return self._session.amend(message, metadata, allow_empty=allow_empty)

    async def amend_async(
        self,
        message: str,
        metadata: dict[str, Any] | None = None,
        allow_empty: bool = False,
    ) -> str:
        """
        Commit the changes in the session to the repository, by amending/overwriting the previous commit.

        When successful, the writable session is completed and the session is now read-only and based on the new commit. The snapshot ID of the new commit is returned.

        If the session is out of date, this will raise a ConflictError exception depicting the conflict that occurred. The session will need to be rebased before committing.

        This operation doesn't create a new commit in the repo ancestry. It replaces the previous commit.

        The first commit to the repo cannot be amended.

        Parameters
        ----------
        message : str
            The message to write with the commit.
        metadata : dict[str, Any] | None, optional
            Additional metadata to store with the commit snapshot.
        allow_empty : bool, optional
            If True, allow amending even if no data changes have been made to the session.
            This is useful when you only want to update the commit message. Default is False.

        Returns
        -------
        str
            The snapshot ID of the new commit.

        Raises
        ------
        icechunk.ConflictError
            If the session is out of date and a conflict occurs.
        """
        if self._allow_changes:
            warnings.warn(
                "Committing a session after forking, and without merging will not work. "
                "Merge back in the remote changes first using Session.merge().",
                UserWarning,
                stacklevel=2,
            )
        return await self._session.amend_async(message, metadata, allow_empty=allow_empty)

    def flush(
        self,
        message: str,
        metadata: dict[str, Any] | None = None,
    ) -> str:
        """
        Save the changes in the session to a new snapshot without modifying the current branch.

        When successful, the writable session is completed and the session is now read-only and based on the new snapshot. The ID of the new snapshot is returned.

        Parameters
        ----------
        message : str
            The message to write with the commit.
        metadata : dict[str, Any] | None, optional
            Additional metadata to store with the commit snapshot.

        Returns
        -------
        str
            The ID of the new snapshot.
        """
        if self._allow_changes:
            warnings.warn(
                "Committing a session after forking, and without merging will not work. "
                "Merge back in the remote changes first using Session.merge().",
                UserWarning,
                stacklevel=2,
            )
        return self._session.flush(message, metadata)

    async def flush_async(
        self,
        message: str,
        metadata: dict[str, Any] | None = None,
    ) -> str:
        """
        Save the changes in the session to a new snapshot without modifying the current branch.

        When successful, the writable session is completed and the session is now read-only and based on the new snapshot. The ID of the new snapshot is returned.

        Parameters
        ----------
        message : str
            The message to write with the commit.
        metadata : dict[str, Any] | None, optional
            Additional metadata to store with the commit snapshot.

        Returns
        -------
        str
            The ID of the new snapshot.
        """
        if self._allow_changes:
            warnings.warn(
                "Flushing a session after forking, and without merging will not work. "
                "Merge back in the remote changes first using Session.merge().",
                UserWarning,
                stacklevel=2,
            )
        return await self._session.flush_async(message, metadata)

    def rebase(self, solver: ConflictSolver) -> None:
        """
        Rebase the session to the latest ancestry of the branch.

        This method will iteratively crawl the ancestry of the branch and apply the changes from the branch to the session. If a conflict is detected, the conflict solver will be used to optionally resolve the conflict. When complete, the session will be based on the latest commit of the branch and the session will be ready to attempt another commit.

        When a conflict is detected and a resolution is not possible with the provided solver, a RebaseFailed exception will be raised. This exception will contain the snapshot ID that the rebase failed on and a list of conflicts that occurred.

        Parameters
        ----------
        solver : ConflictSolver
            The conflict solver to use when a conflict is detected.

        Raises
        ------
        RebaseFailedError
            When a conflict is detected and the solver fails to resolve it.
        """
        self._session.rebase(solver)

    async def rebase_async(self, solver: ConflictSolver) -> None:
        """
        Rebase the session to the latest ancestry of the branch (async version).

        This method will iteratively crawl the ancestry of the branch and apply the changes from the branch to the session. If a conflict is detected, the conflict solver will be used to optionally resolve the conflict. When complete, the session will be based on the latest commit of the branch and the session will be ready to attempt another commit.

        When a conflict is detected and a resolution is not possible with the provided solver, a RebaseFailed exception will be raised. This exception will contain the snapshot ID that the rebase failed on and a list of conflicts that occurred.

        Parameters
        ----------
        solver : ConflictSolver
            The conflict solver to use when a conflict is detected.

        Raises
        ------
        RebaseFailedError
            When a conflict is detected and the solver fails to resolve it.
        """
        await self._session.rebase_async(solver)

    def fork(self) -> "ForkSession":
        """
        Create a child session that can be pickled to a worker job and later merged.

        This method supports Icechunk's distributed, collaborative jobs. A coordinator task creates a new session using
        `Repository.writable_session`. Then `Session.fork` is called repeatedly to create as many serializable sessions
        as worker jobs. Each new `ForkSession` is pickled to the worker that uses it to do all its writes.
        Finally, the `ForkSessions` are pickled back to the coordinator that uses `ForkSession.merge` to merge them
        back into the original session and `commit`.

        Learn more about collaborative writes at https://icechunk.io/en/latest/parallel/

        Raises
        ------
        ValueError
            When `self` already has uncommitted changes.
        ValueError
            When `self` is read-only.
        """
        if self.has_uncommitted_changes:
            raise ValueError(
                "Cannot fork a Session with uncommitted changes. "
                "Make a commit, create a new Session, and then fork that to execute distributed writes."
            )
        if self.read_only:
            raise ValueError(
                "You should not need to fork a read-only session. Read-only sessions can be pickled and transmitted directly."
            )
        self._allow_changes = True
        # force a deep-copy of the underlying Session,
        # so that multiple forks can be created and
        # used independently in a local session.
        # See test_dask.py::test_fork_session_deep_copies for an example
        return ForkSession(PySession.from_bytes(self._session.as_bytes()))

branch `property` #

branch

The branch that the session is based on. This is only set if the session is writable.

Returns:

Type	Description
`str or None`	The branch that the session is based on if the session is writable, None otherwise.

config `property` #

config

Get the repository configuration.

Notice that changes to the returned object won't be impacted. To change configuration values use Repository.reopen.

Returns:

Type	Description
`RepositoryConfig`	The config for the repository that owns this session.

has_uncommitted_changes `property` #

has_uncommitted_changes

Whether the session has uncommitted changes. This is only possibly true if the session is writable.

Returns:

Type	Description
`bool`	True if the session has uncommitted changes, False otherwise.

mode `property` #

mode

The mode of this session.

Returns:

Type	Description
`SessionMode`	The session mode - one of READONLY, WRITABLE, or REARRANGE.

read_only `property` #

read_only

Whether the session is read-only.

Returns:

Type	Description
`bool`	True if the session is read-only, False otherwise.

snapshot_id `property` #

snapshot_id

The base snapshot ID of the session.

Returns:

Type	Description
`str`	The base snapshot ID of the session.

store `property` #

store

Get a zarr Store object for reading and writing data from the repository using zarr python.

Returns:

Type	Description
`IcechunkStore`	A zarr Store object for reading and writing data from the repository.

all_virtual_chunk_locations #

all_virtual_chunk_locations()

Return the location URLs of all virtual chunks.

Returns:

Type	Description
`list of str`	The location URLs of all virtual chunks.

Source code in icechunk-python/python/icechunk/session.py

def all_virtual_chunk_locations(self) -> list[str]:
    """
    Return the location URLs of all virtual chunks.

    Returns
    -------
    list of str
        The location URLs of all virtual chunks.
    """
    return self._session.all_virtual_chunk_locations()

all_virtual_chunk_locations_async `async` #

all_virtual_chunk_locations_async()

Return the location URLs of all virtual chunks (async version).

Returns:

Type	Description
`list of str`	The location URLs of all virtual chunks.

Source code in icechunk-python/python/icechunk/session.py

async def all_virtual_chunk_locations_async(self) -> list[str]:
    """
    Return the location URLs of all virtual chunks (async version).

    Returns
    -------
    list of str
        The location URLs of all virtual chunks.
    """
    return await self._session.all_virtual_chunk_locations_async()

allow_pickling #

allow_pickling()

Context manager to allow unpickling this store if writable.

Source code in icechunk-python/python/icechunk/session.py

@contextlib.contextmanager
def allow_pickling(self) -> Generator[None, None, None]:
    """
    Context manager to allow unpickling this store if writable.
    """
    raise RuntimeError(
        "The allow_pickling context manager has been removed. "
        "Use the new `Session.fork` API instead. "
        # FIXME: Add link to docs
        "Better yet, use `to_icechunk` if that will fit your needs."
    )

amend #

amend(message, metadata=None, allow_empty=False)

Commit the changes in the session to the repository, by amending/overwriting the previous commit.

When successful, the writable session is completed and the session is now read-only and based on the new commit. The snapshot ID of the new commit is returned.

If the session is out of date, this will raise a ConflictError exception depicting the conflict that occurred. The session will need to be rebased before committing.

This operation doesn't create a new commit in the repo ancestry. It replaces the previous commit.

The first commit to the repo cannot be amended.

Parameters:

Name	Type	Description	Default
`message`	`str`	The message to write with the commit.	required
`metadata`	`dict[str, Any] \| None`	Additional metadata to store with the commit snapshot.	`None`
`allow_empty`	`bool`	If True, allow amending even if no data changes have been made to the session. This is useful when you only want to update the commit message. Default is False.	`False`

Returns:

Type	Description
`str`	The snapshot ID of the new commit.

Raises:

Type	Description
`ConflictError`	If the session is out of date and a conflict occurs.

Source code in icechunk-python/python/icechunk/session.py

def amend(
    self,
    message: str,
    metadata: dict[str, Any] | None = None,
    allow_empty: bool = False,
) -> str:
    """
    Commit the changes in the session to the repository, by amending/overwriting the previous commit.

    When successful, the writable session is completed and the session is now read-only and based on the new commit. The snapshot ID of the new commit is returned.

    If the session is out of date, this will raise a ConflictError exception depicting the conflict that occurred. The session will need to be rebased before committing.

    This operation doesn't create a new commit in the repo ancestry. It replaces the previous commit.

    The first commit to the repo cannot be amended.

    Parameters
    ----------
    message : str
        The message to write with the commit.
    metadata : dict[str, Any] | None, optional
        Additional metadata to store with the commit snapshot.
    allow_empty : bool, optional
        If True, allow amending even if no data changes have been made to the session.
        This is useful when you only want to update the commit message. Default is False.

    Returns
    -------
    str
        The snapshot ID of the new commit.

    Raises
    ------
    icechunk.ConflictError
        If the session is out of date and a conflict occurs.
    """
    if self._allow_changes:
        warnings.warn(
            "Committing a session after forking, and without merging will not work. "
            "Merge back in the remote changes first using Session.merge().",
            UserWarning,
            stacklevel=2,
        )
    return self._session.amend(message, metadata, allow_empty=allow_empty)

amend_async `async` #

amend_async(message, metadata=None, allow_empty=False)

Commit the changes in the session to the repository, by amending/overwriting the previous commit.

When successful, the writable session is completed and the session is now read-only and based on the new commit. The snapshot ID of the new commit is returned.

If the session is out of date, this will raise a ConflictError exception depicting the conflict that occurred. The session will need to be rebased before committing.

This operation doesn't create a new commit in the repo ancestry. It replaces the previous commit.

The first commit to the repo cannot be amended.

Parameters:

Name	Type	Description	Default
`message`	`str`	The message to write with the commit.	required
`metadata`	`dict[str, Any] \| None`	Additional metadata to store with the commit snapshot.	`None`
`allow_empty`	`bool`	If True, allow amending even if no data changes have been made to the session. This is useful when you only want to update the commit message. Default is False.	`False`

Returns:

Type	Description
`str`	The snapshot ID of the new commit.

Raises:

Type	Description
`ConflictError`	If the session is out of date and a conflict occurs.

Source code in icechunk-python/python/icechunk/session.py

async def amend_async(
    self,
    message: str,
    metadata: dict[str, Any] | None = None,
    allow_empty: bool = False,
) -> str:
    """
    Commit the changes in the session to the repository, by amending/overwriting the previous commit.

    When successful, the writable session is completed and the session is now read-only and based on the new commit. The snapshot ID of the new commit is returned.

    If the session is out of date, this will raise a ConflictError exception depicting the conflict that occurred. The session will need to be rebased before committing.

    This operation doesn't create a new commit in the repo ancestry. It replaces the previous commit.

    The first commit to the repo cannot be amended.

    Parameters
    ----------
    message : str
        The message to write with the commit.
    metadata : dict[str, Any] | None, optional
        Additional metadata to store with the commit snapshot.
    allow_empty : bool, optional
        If True, allow amending even if no data changes have been made to the session.
        This is useful when you only want to update the commit message. Default is False.

    Returns
    -------
    str
        The snapshot ID of the new commit.

    Raises
    ------
    icechunk.ConflictError
        If the session is out of date and a conflict occurs.
    """
    if self._allow_changes:
        warnings.warn(
            "Committing a session after forking, and without merging will not work. "
            "Merge back in the remote changes first using Session.merge().",
            UserWarning,
            stacklevel=2,
        )
    return await self._session.amend_async(message, metadata, allow_empty=allow_empty)

chunk_coordinates `async` #

chunk_coordinates(array_path, batch_size=1000)

Return an async iterator to all initialized chunks for the array at array_path

Returns:

Type	Description
`an async iterator to chunk coordinates as tuples`

Source code in icechunk-python/python/icechunk/session.py

async def chunk_coordinates(
    self, array_path: str, batch_size: int = 1000
) -> AsyncIterator[tuple[int, ...]]:
    """
    Return an async iterator to all initialized chunks for the array at array_path

    Returns
    -------
    an async iterator to chunk coordinates as tuples
    """
    # We do unbatching here to improve speed. Switching to rust to get
    # a batch is much faster than switching for every element
    async for batch in self._session.chunk_coordinates(array_path, batch_size):
        for coord in batch:
            yield tuple(coord)

chunk_type #

chunk_type(array_path, chunk_coordinates)

Return the chunk type for the specified coordinates

Parameters:

Name	Type	Description	Default
`array_path`	`str`	The path to the array inside the Zarr store. Example: "/groupA/groupB/outputs/my-array".	required
`chunk_coordinates`	`Sequence[int]`	A sequence of integers (list or tuple) used to locate the chunk. Example: [0, 1, 5].	required

Returns:

Type	Description
`ChunkType`	One of the supported chunk types.

Source code in icechunk-python/python/icechunk/session.py

def chunk_type(
    self,
    array_path: str,
    chunk_coordinates: Sequence[int],
) -> ChunkType:
    """
    Return the chunk type for the specified coordinates

    Parameters
    ----------
    array_path : str
        The path to the array inside the Zarr store. Example: "/groupA/groupB/outputs/my-array".
    chunk_coordinates: Sequence[int]
        A sequence of integers (list or tuple) used to locate the chunk. Example: [0, 1, 5].

    Returns
    -------
    ChunkType
        One of the supported chunk types.
    """
    return self._session.chunk_type(array_path, chunk_coordinates)

chunk_type_async `async` #

chunk_type_async(array_path, chunk_coordinates)

Return the chunk type for the specified coordinates

Parameters:

Name	Type	Description	Default
`array_path`	`str`	The path to the array inside the Zarr store. Example: "/groupA/groupB/outputs/my-array".	required
`chunk_coordinates`	`Sequence[int]`	A sequence of integers (list or tuple) used to locate the chunk. Example: [0, 1, 5].	required

Returns:

Type	Description
`ChunkType`	One of the supported chunk types.

Source code in icechunk-python/python/icechunk/session.py

async def chunk_type_async(
    self,
    array_path: str,
    chunk_coordinates: Sequence[int],
) -> ChunkType:
    """
    Return the chunk type for the specified coordinates

    Parameters
    ----------
    array_path : str
        The path to the array inside the Zarr store. Example: "/groupA/groupB/outputs/my-array".
    chunk_coordinates: Sequence[int]
        A sequence of integers (list or tuple) used to locate the chunk. Example: [0, 1, 5].

    Returns
    -------
    ChunkType
        One of the supported chunk types.
    """
    return await self._session.chunk_type_async(array_path, chunk_coordinates)

commit #

commit(message, metadata=None, rebase_with=None, rebase_tries=1000, allow_empty=False)

Commit the changes in the session to the repository.

When successful, the writable session is completed and the session is now read-only and based on the new commit. The snapshot ID of the new commit is returned.

If the session is out of date, this will raise a ConflictError exception depicting the conflict that occurred. The session will need to be rebased before committing.

Parameters:

Name	Type	Description	Default
`message`	`str`	The message to write with the commit.	required
`metadata`	`dict[str, Any] \| None`	Additional metadata to store with the commit snapshot.	`None`
`rebase_with`	`ConflictSolver \| None`	If other session committed while the current session was writing, use Session.rebase with this solver.	`None`
`rebase_tries`	`int`	If other session committed while the current session was writing, use Session.rebase up to this many times in a loop.	`1000`
`allow_empty`	`bool`	If True, allow creating a commit even if there are no changes. Default is False.	`False`

Returns:

Type	Description
`str`	The snapshot ID of the new commit.

Raises:

Type	Description
`ConflictError`	If the session is out of date and a conflict occurs.
`NoChangesToCommitError`	If there are no changes to commit and allow_empty is False.

Source code in icechunk-python/python/icechunk/session.py

def commit(
    self,
    message: str,
    metadata: dict[str, Any] | None = None,
    rebase_with: ConflictSolver | None = None,
    rebase_tries: int = 1_000,
    allow_empty: bool = False,
) -> str:
    """
    Commit the changes in the session to the repository.

    When successful, the writable session is completed and the session is now read-only and based on the new commit. The snapshot ID of the new commit is returned.

    If the session is out of date, this will raise a ConflictError exception depicting the conflict that occurred. The session will need to be rebased before committing.

    Parameters
    ----------
    message : str
        The message to write with the commit.
    metadata : dict[str, Any] | None, optional
        Additional metadata to store with the commit snapshot.
    rebase_with : ConflictSolver | None, optional
        If other session committed while the current session was writing, use Session.rebase with this solver.
    rebase_tries : int, optional
        If other session committed while the current session was writing, use Session.rebase up to this many times in a loop.
    allow_empty : bool, optional
        If True, allow creating a commit even if there are no changes. Default is False.

    Returns
    -------
    str
        The snapshot ID of the new commit.

    Raises
    ------
    icechunk.ConflictError
        If the session is out of date and a conflict occurs.
    icechunk.NoChangesToCommitError
        If there are no changes to commit and allow_empty is False.
    """
    if self._allow_changes:
        warnings.warn(
            "Committing a session after forking, and without merging will not work. "
            "Merge back in the remote changes first using Session.merge().",
            UserWarning,
            stacklevel=2,
        )
    return self._session.commit(
        message,
        metadata,
        rebase_with=rebase_with,
        rebase_tries=rebase_tries,
        allow_empty=allow_empty,
    )

commit_async `async` #

commit_async(message, metadata=None, rebase_with=None, rebase_tries=1000, allow_empty=False)

Commit the changes in the session to the repository (async version).

When successful, the writable session is completed and the session is now read-only and based on the new commit. The snapshot ID of the new commit is returned.

If the session is out of date, this will raise a ConflictError exception depicting the conflict that occurred. The session will need to be rebased before committing.

Parameters:

Name	Type	Description	Default
`message`	`str`	The message to write with the commit.	required
`metadata`	`dict[str, Any] \| None`	Additional metadata to store with the commit snapshot.	`None`
`rebase_with`	`ConflictSolver \| None`	If other session committed while the current session was writing, use Session.rebase with this solver.	`None`
`rebase_tries`	`int`	If other session committed while the current session was writing, use Session.rebase up to this many times in a loop.	`1000`
`allow_empty`	`bool`	If True, allow creating a commit even if there are no changes. Default is False.	`False`

Returns:

Type	Description
`str`	The snapshot ID of the new commit.

Raises:

Type	Description
`ConflictError`	If the session is out of date and a conflict occurs.
`NoChangesToCommitError`	If there are no changes to commit and allow_empty is False.

Source code in icechunk-python/python/icechunk/session.py

async def commit_async(
    self,
    message: str,
    metadata: dict[str, Any] | None = None,
    rebase_with: ConflictSolver | None = None,
    rebase_tries: int = 1_000,
    allow_empty: bool = False,
) -> str:
    """
    Commit the changes in the session to the repository (async version).

    When successful, the writable session is completed and the session is now read-only and based on the new commit. The snapshot ID of the new commit is returned.

    If the session is out of date, this will raise a ConflictError exception depicting the conflict that occurred. The session will need to be rebased before committing.

    Parameters
    ----------
    message : str
        The message to write with the commit.
    metadata : dict[str, Any] | None, optional
        Additional metadata to store with the commit snapshot.
    rebase_with : ConflictSolver | None, optional
        If other session committed while the current session was writing, use Session.rebase with this solver.
    rebase_tries : int, optional
        If other session committed while the current session was writing, use Session.rebase up to this many times in a loop.
    allow_empty : bool, optional
        If True, allow creating a commit even if there are no changes. Default is False.

    Returns
    -------
    str
        The snapshot ID of the new commit.

    Raises
    ------
    icechunk.ConflictError
        If the session is out of date and a conflict occurs.
    icechunk.NoChangesToCommitError
        If there are no changes to commit and allow_empty is False.
    """
    if self._allow_changes:
        warnings.warn(
            "Committing a session after forking, and without merging will not work. "
            "Merge back in the remote changes first using Session.merge().",
            UserWarning,
            stacklevel=2,
        )
    return await self._session.commit_async(
        message,
        metadata,
        rebase_with=rebase_with,
        rebase_tries=rebase_tries,
        allow_empty=allow_empty,
    )

discard_changes #

discard_changes()

When the session is writable, discard any uncommitted changes.

Source code in icechunk-python/python/icechunk/session.py

def discard_changes(self) -> None:
    """
    When the session is writable, discard any uncommitted changes.
    """
    self._session.discard_changes()

flush #

flush(message, metadata=None)

Save the changes in the session to a new snapshot without modifying the current branch.

When successful, the writable session is completed and the session is now read-only and based on the new snapshot. The ID of the new snapshot is returned.

Parameters:

Name	Type	Description	Default
`message`	`str`	The message to write with the commit.	required
`metadata`	`dict[str, Any] \| None`	Additional metadata to store with the commit snapshot.	`None`

Returns:

Type	Description
`str`	The ID of the new snapshot.

Source code in icechunk-python/python/icechunk/session.py

def flush(
    self,
    message: str,
    metadata: dict[str, Any] | None = None,
) -> str:
    """
    Save the changes in the session to a new snapshot without modifying the current branch.

    When successful, the writable session is completed and the session is now read-only and based on the new snapshot. The ID of the new snapshot is returned.

    Parameters
    ----------
    message : str
        The message to write with the commit.
    metadata : dict[str, Any] | None, optional
        Additional metadata to store with the commit snapshot.

    Returns
    -------
    str
        The ID of the new snapshot.
    """
    if self._allow_changes:
        warnings.warn(
            "Committing a session after forking, and without merging will not work. "
            "Merge back in the remote changes first using Session.merge().",
            UserWarning,
            stacklevel=2,
        )
    return self._session.flush(message, metadata)

flush_async `async` #

flush_async(message, metadata=None)

Save the changes in the session to a new snapshot without modifying the current branch.

When successful, the writable session is completed and the session is now read-only and based on the new snapshot. The ID of the new snapshot is returned.

Parameters:

Name	Type	Description	Default
`message`	`str`	The message to write with the commit.	required
`metadata`	`dict[str, Any] \| None`	Additional metadata to store with the commit snapshot.	`None`

Returns:

Type	Description
`str`	The ID of the new snapshot.

Source code in icechunk-python/python/icechunk/session.py

async def flush_async(
    self,
    message: str,
    metadata: dict[str, Any] | None = None,
) -> str:
    """
    Save the changes in the session to a new snapshot without modifying the current branch.

    When successful, the writable session is completed and the session is now read-only and based on the new snapshot. The ID of the new snapshot is returned.

    Parameters
    ----------
    message : str
        The message to write with the commit.
    metadata : dict[str, Any] | None, optional
        Additional metadata to store with the commit snapshot.

    Returns
    -------
    str
        The ID of the new snapshot.
    """
    if self._allow_changes:
        warnings.warn(
            "Flushing a session after forking, and without merging will not work. "
            "Merge back in the remote changes first using Session.merge().",
            UserWarning,
            stacklevel=2,
        )
    return await self._session.flush_async(message, metadata)

fork #

fork()

Create a child session that can be pickled to a worker job and later merged.

This method supports Icechunk's distributed, collaborative jobs. A coordinator task creates a new session using Repository.writable_session. Then Session.fork is called repeatedly to create as many serializable sessions as worker jobs. Each new ForkSession is pickled to the worker that uses it to do all its writes. Finally, the ForkSessions are pickled back to the coordinator that uses ForkSession.merge to merge them back into the original session and commit.

Learn more about collaborative writes at https://icechunk.io/en/latest/parallel/

Raises:

Type	Description
`ValueError`	When `self` already has uncommitted changes.
`ValueError`	When `self` is read-only.

Source code in icechunk-python/python/icechunk/session.py

def fork(self) -> "ForkSession":
    """
    Create a child session that can be pickled to a worker job and later merged.

    This method supports Icechunk's distributed, collaborative jobs. A coordinator task creates a new session using
    `Repository.writable_session`. Then `Session.fork` is called repeatedly to create as many serializable sessions
    as worker jobs. Each new `ForkSession` is pickled to the worker that uses it to do all its writes.
    Finally, the `ForkSessions` are pickled back to the coordinator that uses `ForkSession.merge` to merge them
    back into the original session and `commit`.

    Learn more about collaborative writes at https://icechunk.io/en/latest/parallel/

    Raises
    ------
    ValueError
        When `self` already has uncommitted changes.
    ValueError
        When `self` is read-only.
    """
    if self.has_uncommitted_changes:
        raise ValueError(
            "Cannot fork a Session with uncommitted changes. "
            "Make a commit, create a new Session, and then fork that to execute distributed writes."
        )
    if self.read_only:
        raise ValueError(
            "You should not need to fork a read-only session. Read-only sessions can be pickled and transmitted directly."
        )
    self._allow_changes = True
    # force a deep-copy of the underlying Session,
    # so that multiple forks can be created and
    # used independently in a local session.
    # See test_dask.py::test_fork_session_deep_copies for an example
    return ForkSession(PySession.from_bytes(self._session.as_bytes()))

merge #

merge(*others)

Merge the changes for this session with the changes from another session.

Parameters:

Name	Type	Description	Default
`others`	`ForkSession`	The forked sessions to merge changes from.	`()`

Source code in icechunk-python/python/icechunk/session.py

def merge(self, *others: "ForkSession") -> None:
    """
    Merge the changes for this session with the changes from another session.

    Parameters
    ----------
    others : ForkSession
        The forked sessions to merge changes from.
    """
    for other in others:
        if not isinstance(other, ForkSession):
            raise TypeError(
                "Sessions can only be merged with a ForkSession created with Session.fork(). "
                f"Received {type(other).__name__} instead."
            )
        self._session.merge(other._session)
    self._allow_changes = False

merge_async `async` #

merge_async(*others)

Merge the changes for this session with the changes from another session (async version).

Parameters:

Name	Type	Description	Default
`others`	`ForkSession`	The forked sessions to merge changes from.	`()`

Source code in icechunk-python/python/icechunk/session.py

async def merge_async(self, *others: "ForkSession") -> None:
    """
    Merge the changes for this session with the changes from another session (async version).

    Parameters
    ----------
    others : ForkSession
        The forked sessions to merge changes from.
    """
    for other in others:
        if not isinstance(other, ForkSession):
            raise TypeError(
                "Sessions can only be merged with a ForkSession created with Session.fork(). "
                f"Received {type(other).__name__} instead."
            )
        await self._session.merge_async(other._session)
    self._allow_changes = False

move #

move(from_path, to_path)

Move or rename a node (array or group) in the hierarchy.

This is a metadata-only operation—no data is copied. Requires a rearrange session.

Parameters:

Name	Type	Description	Default
`from_path`	`str`	The current path of the node (e.g., "/data/raw").	required
`to_path`	`str`	The new path for the node (e.g., "/data/v1").	required

Examples:

>>> session = repo.rearrange_session("main")
>>> session.move("/data/raw", "/data/v1")
>>> session.commit("Renamed raw to v1")

Source code in icechunk-python/python/icechunk/session.py

def move(self, from_path: str, to_path: str) -> None:
    """Move or rename a node (array or group) in the hierarchy.

    This is a metadata-only operation—no data is copied. Requires a rearrange session.

    Parameters
    ----------
    from_path : str
        The current path of the node (e.g., "/data/raw").
    to_path : str
        The new path for the node (e.g., "/data/v1").

    Examples
    --------
    >>> session = repo.rearrange_session("main")
    >>> session.move("/data/raw", "/data/v1")
    >>> session.commit("Renamed raw to v1")
    """
    return self._session.move_node(from_path, to_path)

move_async `async` #

move_async(from_path, to_path)

Async version of :meth:move.

Source code in icechunk-python/python/icechunk/session.py

async def move_async(self, from_path: str, to_path: str) -> None:
    """Async version of :meth:`move`."""
    return await self._session.move_node_async(from_path, to_path)

rebase #

rebase(solver)

Rebase the session to the latest ancestry of the branch.

This method will iteratively crawl the ancestry of the branch and apply the changes from the branch to the session. If a conflict is detected, the conflict solver will be used to optionally resolve the conflict. When complete, the session will be based on the latest commit of the branch and the session will be ready to attempt another commit.

When a conflict is detected and a resolution is not possible with the provided solver, a RebaseFailed exception will be raised. This exception will contain the snapshot ID that the rebase failed on and a list of conflicts that occurred.

Parameters:

Name	Type	Description	Default
`solver`	`ConflictSolver`	The conflict solver to use when a conflict is detected.	required

Raises:

Type	Description
`RebaseFailedError`	When a conflict is detected and the solver fails to resolve it.

Source code in icechunk-python/python/icechunk/session.py

def rebase(self, solver: ConflictSolver) -> None:
    """
    Rebase the session to the latest ancestry of the branch.

    This method will iteratively crawl the ancestry of the branch and apply the changes from the branch to the session. If a conflict is detected, the conflict solver will be used to optionally resolve the conflict. When complete, the session will be based on the latest commit of the branch and the session will be ready to attempt another commit.

    When a conflict is detected and a resolution is not possible with the provided solver, a RebaseFailed exception will be raised. This exception will contain the snapshot ID that the rebase failed on and a list of conflicts that occurred.

    Parameters
    ----------
    solver : ConflictSolver
        The conflict solver to use when a conflict is detected.

    Raises
    ------
    RebaseFailedError
        When a conflict is detected and the solver fails to resolve it.
    """
    self._session.rebase(solver)

rebase_async `async` #

rebase_async(solver)

Rebase the session to the latest ancestry of the branch (async version).

This method will iteratively crawl the ancestry of the branch and apply the changes from the branch to the session. If a conflict is detected, the conflict solver will be used to optionally resolve the conflict. When complete, the session will be based on the latest commit of the branch and the session will be ready to attempt another commit.

When a conflict is detected and a resolution is not possible with the provided solver, a RebaseFailed exception will be raised. This exception will contain the snapshot ID that the rebase failed on and a list of conflicts that occurred.

Parameters:

Name	Type	Description	Default
`solver`	`ConflictSolver`	The conflict solver to use when a conflict is detected.	required

Raises:

Type	Description
`RebaseFailedError`	When a conflict is detected and the solver fails to resolve it.

Source code in icechunk-python/python/icechunk/session.py

async def rebase_async(self, solver: ConflictSolver) -> None:
    """
    Rebase the session to the latest ancestry of the branch (async version).

    This method will iteratively crawl the ancestry of the branch and apply the changes from the branch to the session. If a conflict is detected, the conflict solver will be used to optionally resolve the conflict. When complete, the session will be based on the latest commit of the branch and the session will be ready to attempt another commit.

    When a conflict is detected and a resolution is not possible with the provided solver, a RebaseFailed exception will be raised. This exception will contain the snapshot ID that the rebase failed on and a list of conflicts that occurred.

    Parameters
    ----------
    solver : ConflictSolver
        The conflict solver to use when a conflict is detected.

    Raises
    ------
    RebaseFailedError
        When a conflict is detected and the solver fails to resolve it.
    """
    await self._session.rebase_async(solver)

reindex_array #

reindex_array(array_path, shift_chunk)

Reindex chunks in an array by applying a transformation function.

Parameters:

Name	Type	Description	Default
`array_path`	`str`	Path to the array.	required
`shift_chunk`	`Callable`	Function that receives chunk coordinates and returns new coordinates, or None to discard the chunk.	required

Source code in icechunk-python/python/icechunk/session.py

def reindex_array(
    self,
    array_path: str,
    shift_chunk: Callable[[Iterable[int]], Iterable[int] | None],
) -> None:
    """Reindex chunks in an array by applying a transformation function.

    Parameters
    ----------
    array_path : str
        Path to the array.
    shift_chunk : Callable
        Function that receives chunk coordinates and returns new coordinates,
        or None to discard the chunk.
    """
    return self._session.reindex_array(array_path, shift_chunk)

roll_array #

roll_array(array_path, chunk_offset)

Roll (circular shift) all chunks in an array by the given chunk offset.

Chunks that shift out of one end wrap around to the other side. No data is lost — this is a circular buffer operation.

Parameters:

Name	Type	Description	Default
`array_path`	`str`	The path to the array to roll.	required
`chunk_offset`	`Iterable[int]`	Offset added to each chunk coordinate (with wraparound). A chunk at index `x` moves to `(x + chunk_offset) % num_chunks`.	required

Returns:

Type	Description
`tuple[int, ...]`	The index shift in element space (chunk_offset * chunk_size for each dimension).

Source code in icechunk-python/python/icechunk/session.py

def roll_array(
    self,
    array_path: str,
    chunk_offset: Iterable[int],
) -> tuple[int, ...]:
    """Roll (circular shift) all chunks in an array by the given chunk offset.

    Chunks that shift out of one end wrap around to the other side.
    No data is lost — this is a circular buffer operation.

    Parameters
    ----------
    array_path : str
        The path to the array to roll.
    chunk_offset : Iterable[int]
        Offset added to each chunk coordinate (with wraparound). A chunk at
        index ``x`` moves to ``(x + chunk_offset) % num_chunks``.

    Returns
    -------
    tuple[int, ...]
        The index shift in element space (chunk_offset * chunk_size for each dimension).
    """
    return tuple(self._session.roll_array(array_path, list(chunk_offset)))

shift_array #

shift_array(array_path, chunk_offset)

Shift all chunks in an array by the given chunk offset.

Chunks that shift out of bounds are discarded. Vacated positions retain stale chunk references — the caller typically writes new data there.

Parameters:

Name	Type	Description	Default
`array_path`	`str`	The path to the array to shift.	required
`chunk_offset`	`Iterable[int]`	Offset added to each chunk coordinate. A chunk at index `x` moves to `x + chunk_offset`. For a 3D array, `chunk_offset=(1, 0, -2)` moves the chunk at `(i, j, k)` to `(i+1, j, k-2)`.	required

Returns:

Type	Description
`tuple[int, ...]`	The shift in element space (`chunk_offset * chunk_size` per dimension). For example, with `chunk_size=10` and `chunk_offset=(2,)`, returns `(20,)` — useful for slicing the region that needs new data.

Notes

To shift right while preserving all data, first resize the array using zarr's array.resize(), then use shift_array.

Source code in icechunk-python/python/icechunk/session.py

def shift_array(
    self,
    array_path: str,
    chunk_offset: Iterable[int],
) -> tuple[int, ...]:
    """Shift all chunks in an array by the given chunk offset.

    Chunks that shift out of bounds are discarded. Vacated positions retain
    stale chunk references — the caller typically writes new data there.

    Parameters
    ----------
    array_path : str
        The path to the array to shift.
    chunk_offset : Iterable[int]
        Offset added to each chunk coordinate. A chunk at index ``x`` moves
        to ``x + chunk_offset``. For a 3D array, ``chunk_offset=(1, 0, -2)``
        moves the chunk at ``(i, j, k)`` to ``(i+1, j, k-2)``.

    Returns
    -------
    tuple[int, ...]
        The shift in element space (``chunk_offset * chunk_size`` per dimension).
        For example, with ``chunk_size=10`` and ``chunk_offset=(2,)``, returns
        ``(20,)`` — useful for slicing the region that needs new data.

    Notes
    -----
    To shift right while preserving all data, first resize the array using zarr's
    array.resize(), then use shift_array.
    """
    return tuple(self._session.shift_array(array_path, list(chunk_offset)))

status #

status()

Compute an overview of the current session changes

Returns:

Type	Description
`Diff`	The operations executed in the current session but still not committed.

Source code in icechunk-python/python/icechunk/session.py

def status(self) -> Diff:
    """
    Compute an overview of the current session changes

    Returns
    -------
    Diff
        The operations executed in the current session but still not committed.
    """
    return self._session.status()

SessionMode #

Bases: Enum

Enum for session access modes

Attributes:

Name	Type	Description
`READONLY`	`int`	Session can only read data
`WRITABLE`	`int`	Session can read and write data
`REARRANGE`	`int`	Session can only move nodes and reindex arrays

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class SessionMode(Enum):
    """Enum for session access modes

    Attributes
    ----------
    READONLY: int
        Session can only read data
    WRITABLE: int
        Session can read and write data
    REARRANGE: int
        Session can only move nodes and reindex arrays
    """

    READONLY = 0
    WRITABLE = 1
    REARRANGE = 2

SnapshotInfo #

Metadata for a snapshot

Attributes:

Name	Type	Description
`id`	`str`	The snapshot ID
`manifests`	`list[ManifestFileInfo]`	The manifests linked to this snapshot
`message`	`str`	The commit message of the snapshot
`metadata`	`dict[str, Any]`	The metadata of the snapshot
`parent_id`	`str \| None`	The snapshot ID
`written_at`	`datetime`	The timestamp when the snapshot was written

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class SnapshotInfo:
    """Metadata for a snapshot"""
    @property
    def id(self) -> str:
        """The snapshot ID"""
        ...
    @property
    def parent_id(self) -> str | None:
        """The snapshot ID"""
        ...
    @property
    def written_at(self) -> datetime.datetime:
        """
        The timestamp when the snapshot was written
        """
        ...
    @property
    def message(self) -> str:
        """
        The commit message of the snapshot
        """
        ...
    @property
    def metadata(self) -> dict[str, Any]:
        """
        The metadata of the snapshot
        """
        ...
    @property
    def manifests(self) -> list[ManifestFileInfo]:
        """
        The manifests linked to this snapshot
        """
        ...

id `property` #

id

The snapshot ID

manifests `property` #

manifests

The manifests linked to this snapshot

message `property` #

message

The commit message of the snapshot

metadata `property` #

metadata

The metadata of the snapshot

parent_id `property` #

parent_id

The snapshot ID

written_at `property` #

written_at

The timestamp when the snapshot was written

Storage #

Storage configuration for an IcechunkStore

Currently supports memory, filesystem S3, azure blob, and google cloud storage backends. Use the following methods to create a Storage object with the desired backend.

Ex:

storage = icechunk.in_memory_storage()
storage = icechunk.local_filesystem_storage("/path/to/root")
storage = icechunk.s3_storage("bucket", "prefix", ...)
storage = icechunk.gcs_storage("bucket", "prefix", ...)
storage = icechunk.azure_storage("container", "prefix", ...)

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class Storage:
    """Storage configuration for an IcechunkStore

    Currently supports memory, filesystem S3, azure blob, and google cloud storage backends.
    Use the following methods to create a Storage object with the desired backend.

    Ex:
    ```
    storage = icechunk.in_memory_storage()
    storage = icechunk.local_filesystem_storage("/path/to/root")
    storage = icechunk.s3_storage("bucket", "prefix", ...)
    storage = icechunk.gcs_storage("bucket", "prefix", ...)
    storage = icechunk.azure_storage("container", "prefix", ...)
    ```
    """

    @classmethod
    def new_s3(
        cls,
        config: S3Options,
        bucket: str,
        prefix: str | None,
        credentials: AnyS3Credential | None = None,
    ) -> Storage: ...
    @classmethod
    def new_s3_object_store(
        cls,
        config: S3Options,
        bucket: str,
        prefix: str | None,
        credentials: AnyS3Credential | None = None,
    ) -> Storage: ...
    @classmethod
    def new_tigris(
        cls,
        config: S3Options,
        bucket: str,
        prefix: str | None,
        use_weak_consistency: bool,
        credentials: AnyS3Credential | None = None,
    ) -> Storage: ...
    @classmethod
    def new_in_memory(cls) -> Storage: ...
    @classmethod
    def new_local_filesystem(cls, path: str) -> Storage: ...
    @classmethod
    def new_gcs(
        cls,
        bucket: str,
        prefix: str | None,
        credentials: AnyGcsCredential | None = None,
        *,
        config: dict[str, str] | None = None,
    ) -> Storage: ...
    @classmethod
    def new_r2(
        cls,
        bucket: str | None,
        prefix: str | None,
        account_id: str | None,
        credentials: AnyS3Credential | None = None,
        *,
        config: S3Options,
    ) -> Storage: ...
    @classmethod
    def new_azure_blob(
        cls,
        account: str,
        container: str,
        prefix: str,
        credentials: AnyAzureCredential | None = None,
        *,
        config: dict[str, str] | None = None,
    ) -> Storage: ...
    @classmethod
    def new_http(
        cls,
        base_url: str,
        config: dict[str, str] | None = None,
    ) -> Storage: ...
    @classmethod
    def new_redirect(
        cls,
        base_url: str,
    ) -> Storage: ...
    def __repr__(self) -> str: ...
    def default_settings(self) -> StorageSettings: ...

StorageConcurrencySettings #

Configuration for how Icechunk uses its Storage instance

Methods:

Name	Description
`__init__`	Create a new `StorageConcurrencySettings` object

Attributes:

Name	Type	Description
`ideal_concurrent_request_size`	`int \| None`	The ideal concurrent request size.
`max_concurrent_requests_for_object`	`int \| None`	The maximum number of concurrent requests for an object.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class StorageConcurrencySettings:
    """Configuration for how Icechunk uses its Storage instance"""

    def __init__(
        self,
        max_concurrent_requests_for_object: int | None = None,
        ideal_concurrent_request_size: int | None = None,
    ) -> None:
        """
        Create a new `StorageConcurrencySettings` object

        Parameters
        ----------
        max_concurrent_requests_for_object: int | None
            The maximum number of concurrent requests for an object.
        ideal_concurrent_request_size: int | None
            The ideal concurrent request size.
        """
        ...
    @property
    def max_concurrent_requests_for_object(self) -> int | None:
        """
        The maximum number of concurrent requests for an object.

        Returns
        -------
        int | None
            The maximum number of concurrent requests for an object.
        """
        ...
    @max_concurrent_requests_for_object.setter
    def max_concurrent_requests_for_object(self, value: int | None) -> None:
        """
        Set the maximum number of concurrent requests for an object.

        Parameters
        ----------
        value: int | None
            The maximum number of concurrent requests for an object.
        """
        ...
    @property
    def ideal_concurrent_request_size(self) -> int | None:
        """
        The ideal concurrent request size.

        Returns
        -------
        int | None
            The ideal concurrent request size.
        """
        ...
    @ideal_concurrent_request_size.setter
    def ideal_concurrent_request_size(self, value: int | None) -> None:
        """
        Set the ideal concurrent request size.

        Parameters
        ----------
        value: int | None
            The ideal concurrent request size.
        """
        ...

ideal_concurrent_request_size `property` `writable` #

ideal_concurrent_request_size

The ideal concurrent request size.

Returns:

Type	Description
`int \| None`	The ideal concurrent request size.

max_concurrent_requests_for_object `property` `writable` #

max_concurrent_requests_for_object

The maximum number of concurrent requests for an object.

Returns:

Type	Description
`int \| None`	The maximum number of concurrent requests for an object.

init #

__init__(max_concurrent_requests_for_object=None, ideal_concurrent_request_size=None)

Create a new StorageConcurrencySettings object

Parameters:

Name	Type	Description	Default
`max_concurrent_requests_for_object`	`int \| None`	The maximum number of concurrent requests for an object.	`None`
`ideal_concurrent_request_size`	`int \| None`	The ideal concurrent request size.	`None`

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

def __init__(
    self,
    max_concurrent_requests_for_object: int | None = None,
    ideal_concurrent_request_size: int | None = None,
) -> None:
    """
    Create a new `StorageConcurrencySettings` object

    Parameters
    ----------
    max_concurrent_requests_for_object: int | None
        The maximum number of concurrent requests for an object.
    ideal_concurrent_request_size: int | None
        The ideal concurrent request size.
    """
    ...

StorageRetriesSettings #

Configuration for how Icechunk retries requests.

Icechunk retries failed requests with an exponential backoff algorithm.

Methods:

Name	Description
`__init__`	Create a new `StorageRetriesSettings` object

Attributes:

Name	Type	Description
`initial_backoff_ms`	`int \| None`	The initial backoff duration in milliseconds.
`max_backoff_ms`	`int \| None`	The maximum backoff duration in milliseconds.
`max_tries`	`int \| None`	The maximum number of tries, including the initial one.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class StorageRetriesSettings:
    """Configuration for how Icechunk retries requests.

    Icechunk retries failed requests with an exponential backoff algorithm."""

    def __init__(
        self,
        max_tries: int | None = None,
        initial_backoff_ms: int | None = None,
        max_backoff_ms: int | None = None,
    ) -> None:
        """
        Create a new `StorageRetriesSettings` object

        Parameters
        ----------
        max_tries: int | None
            The maximum number of tries, including the initial one. Set to 1 to disable retries
        initial_backoff_ms: int | None
            The initial backoff duration in milliseconds
        max_backoff_ms: int | None
            The limit to backoff duration in milliseconds
        """
        ...
    @property
    def max_tries(self) -> int | None:
        """
        The maximum number of tries, including the initial one.

        Returns
        -------
        int | None
            The maximum number of tries.
        """
        ...
    @max_tries.setter
    def max_tries(self, value: int | None) -> None:
        """
        Set the maximum number of tries. Set to 1 to disable retries.

        Parameters
        ----------
        value: int | None
            The maximum number of tries
        """
        ...
    @property
    def initial_backoff_ms(self) -> int | None:
        """
        The initial backoff duration in milliseconds.

        Returns
        -------
        int | None
            The initial backoff duration in milliseconds.
        """
        ...
    @initial_backoff_ms.setter
    def initial_backoff_ms(self, value: int | None) -> None:
        """
        Set the initial backoff duration in milliseconds.

        Parameters
        ----------
        value: int | None
            The initial backoff duration in milliseconds.
        """
        ...
    @property
    def max_backoff_ms(self) -> int | None:
        """
        The maximum backoff duration in milliseconds.

        Returns
        -------
        int | None
            The maximum backoff duration in milliseconds.
        """
        ...
    @max_backoff_ms.setter
    def max_backoff_ms(self, value: int | None) -> None:
        """
        Set the maximum backoff duration in milliseconds.

        Parameters
        ----------
        value: int | None
            The maximum backoff duration in milliseconds.
        """
        ...

initial_backoff_ms `property` `writable` #

initial_backoff_ms

The initial backoff duration in milliseconds.

Returns:

Type	Description
`int \| None`	The initial backoff duration in milliseconds.

max_backoff_ms `property` `writable` #

max_backoff_ms

The maximum backoff duration in milliseconds.

Returns:

Type	Description
`int \| None`	The maximum backoff duration in milliseconds.

max_tries `property` `writable` #

max_tries

The maximum number of tries, including the initial one.

Returns:

Type	Description
`int \| None`	The maximum number of tries.

init #

__init__(max_tries=None, initial_backoff_ms=None, max_backoff_ms=None)

Create a new StorageRetriesSettings object

Parameters:

Name	Type	Description	Default
`max_tries`	`int \| None`	The maximum number of tries, including the initial one. Set to 1 to disable retries	`None`
`initial_backoff_ms`	`int \| None`	The initial backoff duration in milliseconds	`None`
`max_backoff_ms`	`int \| None`	The limit to backoff duration in milliseconds	`None`

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

def __init__(
    self,
    max_tries: int | None = None,
    initial_backoff_ms: int | None = None,
    max_backoff_ms: int | None = None,
) -> None:
    """
    Create a new `StorageRetriesSettings` object

    Parameters
    ----------
    max_tries: int | None
        The maximum number of tries, including the initial one. Set to 1 to disable retries
    initial_backoff_ms: int | None
        The initial backoff duration in milliseconds
    max_backoff_ms: int | None
        The limit to backoff duration in milliseconds
    """
    ...

StorageSettings #

Configuration for how Icechunk uses its Storage instance

Methods:

Name	Description
`__init__`	Create a new `StorageSettings` object

Attributes:

Name	Type	Description
`chunks_storage_class`	`str \| None`	Chunk objects in object store will use this storage class or self.storage_class if None
`concurrency`	`StorageConcurrencySettings \| None`	The configuration for how much concurrency Icechunk store uses
`metadata_storage_class`	`str \| None`	Metadata objects in object store will use this storage class or self.storage_class if None
`minimum_size_for_multipart_upload`	`int \| None`	Use object store's multipart upload for objects larger than this size in bytes
`retries`	`StorageRetriesSettings \| None`	The configuration for how Icechunk retries failed requests.
`storage_class`	`str \| None`	All objects in object store will use this storage class or the default if None
`unsafe_use_conditional_create`	`bool \| None`	True if Icechunk will use conditional PUT operations for creation in the object store
`unsafe_use_conditional_update`	`bool \| None`	True if Icechunk will use conditional PUT operations for updates in the object store
`unsafe_use_metadata`	`bool \| None`	True if Icechunk will write object metadata in the object store

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class StorageSettings:
    """Configuration for how Icechunk uses its Storage instance"""

    def __init__(
        self,
        concurrency: StorageConcurrencySettings | None = None,
        retries: StorageRetriesSettings | None = None,
        unsafe_use_conditional_create: bool | None = None,
        unsafe_use_conditional_update: bool | None = None,
        unsafe_use_metadata: bool | None = None,
        storage_class: str | None = None,
        metadata_storage_class: str | None = None,
        chunks_storage_class: str | None = None,
        minimum_size_for_multipart_upload: int | None = None,
    ) -> None:
        """
        Create a new `StorageSettings` object

        Parameters
        ----------
        concurrency: StorageConcurrencySettings | None
            The configuration for how Icechunk uses its Storage instance.

        retries: StorageRetriesSettings | None
            The configuration for how Icechunk retries failed requests.

        unsafe_use_conditional_update: bool | None
            If set to False, Icechunk loses some of its consistency guarantees.
            This is only useful in object stores that don't support the feature.
            Use it at your own risk.

        unsafe_use_conditional_create: bool | None
            If set to False, Icechunk loses some of its consistency guarantees.
            This is only useful in object stores that don't support the feature.
            Use at your own risk.

        unsafe_use_metadata: bool | None
            Don't write metadata fields in Icechunk files.
            This is only useful in object stores that don't support the feature.
            Use at your own risk.

        storage_class: str | None
            Store all objects using this object store storage class
            If None the object store default will be used.
            Currently not supported in GCS.
            Example: STANDARD_IA

        metadata_storage_class: str | None
            Store metadata objects using this object store storage class.
            Currently not supported in GCS.
            Defaults to storage_class.

        chunks_storage_class: str | None
            Store chunk objects using this object store storage class.
            Currently not supported in GCS.
            Defaults to storage_class.

        minimum_size_for_multipart_upload: int | None
            Use object store's multipart upload for objects larger than this size in bytes.
            Default: 100 MB if None is passed.
        """
        ...
    @property
    def concurrency(self) -> StorageConcurrencySettings | None:
        """
        The configuration for how much concurrency Icechunk store uses

        Returns
        -------
        StorageConcurrencySettings | None
            The configuration for how Icechunk uses its Storage instance.
        """

    @concurrency.setter
    def concurrency(self, value: StorageConcurrencySettings | None) -> None: ...
    @property
    def retries(self) -> StorageRetriesSettings | None:
        """
        The configuration for how Icechunk retries failed requests.

        Returns
        -------
        StorageRetriesSettings | None
            The configuration for how Icechunk retries failed requests.
        """

    @retries.setter
    def retries(self, value: StorageRetriesSettings | None) -> None: ...
    @property
    def unsafe_use_conditional_update(self) -> bool | None:
        """True if Icechunk will use conditional PUT operations for updates in the object store"""
        ...

    @unsafe_use_conditional_update.setter
    def unsafe_use_conditional_update(self, value: bool) -> None: ...
    @property
    def unsafe_use_conditional_create(self) -> bool | None:
        """True if Icechunk will use conditional PUT operations for creation in the object store"""
        ...

    @unsafe_use_conditional_create.setter
    def unsafe_use_conditional_create(self, value: bool) -> None: ...
    @property
    def unsafe_use_metadata(self) -> bool | None:
        """True if Icechunk will write object metadata in the object store"""
        ...

    @unsafe_use_metadata.setter
    def unsafe_use_metadata(self, value: bool) -> None: ...
    @property
    def storage_class(self) -> str | None:
        """All objects in object store will use this storage class or the default if None"""
        ...

    @storage_class.setter
    def storage_class(self, value: str) -> None: ...
    @property
    def metadata_storage_class(self) -> str | None:
        """Metadata objects in object store will use this storage class or self.storage_class if None"""
        ...

    @metadata_storage_class.setter
    def metadata_storage_class(self, value: str) -> None: ...
    @property
    def chunks_storage_class(self) -> str | None:
        """Chunk objects in object store will use this storage class or self.storage_class if None"""
        ...

    @chunks_storage_class.setter
    def chunks_storage_class(self, value: str) -> None: ...
    @property
    def minimum_size_for_multipart_upload(self) -> int | None:
        """Use object store's multipart upload for objects larger than this size in bytes"""
        ...

    @minimum_size_for_multipart_upload.setter
    def minimum_size_for_multipart_upload(self, value: int) -> None: ...

chunks_storage_class `property` `writable` #

chunks_storage_class

Chunk objects in object store will use this storage class or self.storage_class if None

concurrency `property` `writable` #

concurrency

The configuration for how much concurrency Icechunk store uses

Returns:

Type	Description
`StorageConcurrencySettings \| None`	The configuration for how Icechunk uses its Storage instance.

metadata_storage_class `property` `writable` #

metadata_storage_class

Metadata objects in object store will use this storage class or self.storage_class if None

minimum_size_for_multipart_upload `property` `writable` #

minimum_size_for_multipart_upload

Use object store's multipart upload for objects larger than this size in bytes

retries `property` `writable` #

retries

The configuration for how Icechunk retries failed requests.

Returns:

Type	Description
`StorageRetriesSettings \| None`	The configuration for how Icechunk retries failed requests.

storage_class `property` `writable` #

storage_class

All objects in object store will use this storage class or the default if None

unsafe_use_conditional_create `property` `writable` #

unsafe_use_conditional_create

True if Icechunk will use conditional PUT operations for creation in the object store

unsafe_use_conditional_update `property` `writable` #

unsafe_use_conditional_update

True if Icechunk will use conditional PUT operations for updates in the object store

unsafe_use_metadata `property` `writable` #

unsafe_use_metadata

True if Icechunk will write object metadata in the object store

init #

__init__(concurrency=None, retries=None, unsafe_use_conditional_create=None, unsafe_use_conditional_update=None, unsafe_use_metadata=None, storage_class=None, metadata_storage_class=None, chunks_storage_class=None, minimum_size_for_multipart_upload=None)

Create a new StorageSettings object

Parameters:

Name	Type	Description	Default
`concurrency`	`StorageConcurrencySettings \| None`	The configuration for how Icechunk uses its Storage instance.	`None`
`retries`	`StorageRetriesSettings \| None`	The configuration for how Icechunk retries failed requests.	`None`
`unsafe_use_conditional_update`	`bool \| None`	If set to False, Icechunk loses some of its consistency guarantees. This is only useful in object stores that don't support the feature. Use it at your own risk.	`None`
`unsafe_use_conditional_create`	`bool \| None`	If set to False, Icechunk loses some of its consistency guarantees. This is only useful in object stores that don't support the feature. Use at your own risk.	`None`
`unsafe_use_metadata`	`bool \| None`	Don't write metadata fields in Icechunk files. This is only useful in object stores that don't support the feature. Use at your own risk.	`None`
`storage_class`	`str \| None`	Store all objects using this object store storage class If None the object store default will be used. Currently not supported in GCS. Example: STANDARD_IA	`None`
`metadata_storage_class`	`str \| None`	Store metadata objects using this object store storage class. Currently not supported in GCS. Defaults to storage_class.	`None`
`chunks_storage_class`	`str \| None`	Store chunk objects using this object store storage class. Currently not supported in GCS. Defaults to storage_class.	`None`
`minimum_size_for_multipart_upload`	`int \| None`	Use object store's multipart upload for objects larger than this size in bytes. Default: 100 MB if None is passed.	`None`

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

def __init__(
    self,
    concurrency: StorageConcurrencySettings | None = None,
    retries: StorageRetriesSettings | None = None,
    unsafe_use_conditional_create: bool | None = None,
    unsafe_use_conditional_update: bool | None = None,
    unsafe_use_metadata: bool | None = None,
    storage_class: str | None = None,
    metadata_storage_class: str | None = None,
    chunks_storage_class: str | None = None,
    minimum_size_for_multipart_upload: int | None = None,
) -> None:
    """
    Create a new `StorageSettings` object

    Parameters
    ----------
    concurrency: StorageConcurrencySettings | None
        The configuration for how Icechunk uses its Storage instance.

    retries: StorageRetriesSettings | None
        The configuration for how Icechunk retries failed requests.

    unsafe_use_conditional_update: bool | None
        If set to False, Icechunk loses some of its consistency guarantees.
        This is only useful in object stores that don't support the feature.
        Use it at your own risk.

    unsafe_use_conditional_create: bool | None
        If set to False, Icechunk loses some of its consistency guarantees.
        This is only useful in object stores that don't support the feature.
        Use at your own risk.

    unsafe_use_metadata: bool | None
        Don't write metadata fields in Icechunk files.
        This is only useful in object stores that don't support the feature.
        Use at your own risk.

    storage_class: str | None
        Store all objects using this object store storage class
        If None the object store default will be used.
        Currently not supported in GCS.
        Example: STANDARD_IA

    metadata_storage_class: str | None
        Store metadata objects using this object store storage class.
        Currently not supported in GCS.
        Defaults to storage_class.

    chunks_storage_class: str | None
        Store chunk objects using this object store storage class.
        Currently not supported in GCS.
        Defaults to storage_class.

    minimum_size_for_multipart_upload: int | None
        Use object store's multipart upload for objects larger than this size in bytes.
        Default: 100 MB if None is passed.
    """
    ...

VersionSelection #

Bases: Enum

Enum for selecting the which version of a conflict

Attributes:

Name	Type	Description
`Fail`	`int`	Fail the rebase operation
`UseOurs`	`int`	Use the version from the source store
`UseTheirs`	`int`	Use the version from the target store

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class VersionSelection(Enum):
    """Enum for selecting the which version of a conflict

    Attributes
    ----------
    Fail: int
        Fail the rebase operation
    UseOurs: int
        Use the version from the source store
    UseTheirs: int
        Use the version from the target store
    """

    Fail = 0
    UseOurs = 1
    UseTheirs = 2

VirtualChunkContainer #

A virtual chunk container is a configuration that allows Icechunk to read virtual references from a storage backend.

Attributes:

Name	Type	Description
`url_prefix`	`str`	The prefix of urls that will use this containers configuration for reading virtual references.
`store`	`ObjectStoreConfig`	The storage backend to use for the virtual chunk container.

Methods:

Name	Description
`__init__`	Create a new `VirtualChunkContainer` object

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class VirtualChunkContainer:
    """A virtual chunk container is a configuration that allows Icechunk to read virtual references from a storage backend.

    Attributes
    ----------
    url_prefix: str
        The prefix of urls that will use this containers configuration for reading virtual references.
    store: ObjectStoreConfig
        The storage backend to use for the virtual chunk container.
    """

    name: str
    url_prefix: str
    store: ObjectStoreConfig

    def __init__(self, url_prefix: str, store: AnyObjectStoreConfig):
        """
        Create a new `VirtualChunkContainer` object

        Parameters
        ----------
        url_prefix: str
            The prefix of urls that will use this containers configuration for reading virtual references.
        store: ObjectStoreConfig
            The storage backend to use for the virtual chunk container.
        """

init #

__init__(url_prefix, store)

Create a new VirtualChunkContainer object

Parameters:

Name	Type	Description	Default
`url_prefix`	`str`	The prefix of urls that will use this containers configuration for reading virtual references.	required
`store`	`AnyObjectStoreConfig`	The storage backend to use for the virtual chunk container.	required

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

def __init__(self, url_prefix: str, store: AnyObjectStoreConfig):
    """
    Create a new `VirtualChunkContainer` object

    Parameters
    ----------
    url_prefix: str
        The prefix of urls that will use this containers configuration for reading virtual references.
    store: ObjectStoreConfig
        The storage backend to use for the virtual chunk container.
    """

VirtualChunkSpec #

The specification for a virtual chunk reference.

Attributes:

Name	Type	Description
`etag_checksum`	`str \| None`	Optional object store e-tag for the containing object.
`index`	`list[int]`	The chunk index, in chunk coordinates space
`last_updated_at_checksum`	`datetime \| None`	Optional timestamp for the containing object.
`length`	`int`	The length of the chunk in bytes
`location`	`str`	The URL to the virtual chunk data, something like 's3://bucket/foo.nc'
`offset`	`int`	The chunk offset within the pointed object, in bytes

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

class VirtualChunkSpec:
    """The specification for a virtual chunk reference."""
    @property
    def index(self) -> list[int]:
        """The chunk index, in chunk coordinates space"""
        ...
    @property
    def location(self) -> str:
        """The URL to the virtual chunk data, something like 's3://bucket/foo.nc'"""
        ...
    @property
    def offset(self) -> int:
        """The chunk offset within the pointed object, in bytes"""
        ...
    @property
    def length(self) -> int:
        """The length of the chunk in bytes"""
        ...
    @property
    def etag_checksum(self) -> str | None:
        """Optional object store e-tag for the containing object.

        Icechunk will refuse to serve data from this chunk if the etag has changed.
        """
        ...
    @property
    def last_updated_at_checksum(self) -> datetime.datetime | None:
        """Optional timestamp for the containing object.

        Icechunk will refuse to serve data from this chunk if it has been modified in object store after this time.
        """
        ...

    def __init__(
        self,
        index: list[int],
        location: str,
        offset: int,
        length: int,
        etag_checksum: str | None = None,
        last_updated_at_checksum: datetime.datetime | None = None,
    ) -> None: ...

etag_checksum `property` #

etag_checksum

Optional object store e-tag for the containing object.

Icechunk will refuse to serve data from this chunk if the etag has changed.

index `property` #

index

The chunk index, in chunk coordinates space

last_updated_at_checksum `property` #

last_updated_at_checksum

Optional timestamp for the containing object.

Icechunk will refuse to serve data from this chunk if it has been modified in object store after this time.

length `property` #

length

The length of the chunk in bytes

location `property` #

location

The URL to the virtual chunk data, something like 's3://bucket/foo.nc'

offset `property` #

offset

The chunk offset within the pointed object, in bytes

_upgrade_icechunk_repository #

_upgrade_icechunk_repository(repo, *, dry_run=True, delete_unused_v1_files=False)

Migrate a repository to the latest version of Icechunk.

This is an administrative operation, and must be executed in isolation from other readers and writers. Other processes running concurrently on the same repo may see undefined behavior.

At this time, this function supports only migration from Icechunk spec version 1 to Icechunk spec version 2. This means Icechunk versions 1.x to 2.x.

The operation is usually fast, but it can take several minutes if there is a very large version history (thousands of snapshots).

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

def _upgrade_icechunk_repository(
    repo: PyRepository, *, dry_run: bool = True, delete_unused_v1_files: bool = False
) -> None:
    """
    Migrate a repository to the latest version of Icechunk.

    This is an administrative operation, and must be executed in isolation from
    other readers and writers. Other processes running concurrently on the same
    repo may see undefined behavior.

    At this time, this function supports only migration from Icechunk spec version 1
    to Icechunk spec version 2. This means Icechunk versions 1.x to 2.x.

    The operation is usually fast, but it can take several minutes if there is a very
    large version history (thousands of snapshots).
    """
    ...

azure_credentials #

azure_credentials(*, access_key=None, sas_token=None, bearer_token=None, from_env=None)

Create credentials Azure Blob Storage object store.

If all arguments are None, credentials are fetched from the operative system environment.

Source code in icechunk-python/python/icechunk/credentials.py

def azure_credentials(
    *,
    access_key: str | None = None,
    sas_token: str | None = None,
    bearer_token: str | None = None,
    from_env: bool | None = None,
) -> AnyAzureCredential:
    """Create credentials Azure Blob Storage object store.

    If all arguments are None, credentials are fetched from the operative system environment.
    """
    if (from_env is None or from_env) and (
        access_key is None and sas_token is None and bearer_token is None
    ):
        return azure_from_env_credentials()

    if (access_key is not None or sas_token is not None or bearer_token is not None) and (
        from_env is None or not from_env
    ):
        return AzureCredentials.Static(
            azure_static_credentials(
                access_key=access_key,
                sas_token=sas_token,
                bearer_token=bearer_token,
            )
        )

    raise ValueError("Conflicting arguments to azure_credentials function")

azure_from_env_credentials #

azure_from_env_credentials()

Instruct Azure Blob Storage object store to fetch credentials from the operative system environment.

Source code in icechunk-python/python/icechunk/credentials.py

def azure_from_env_credentials() -> AzureCredentials.FromEnv:
    """Instruct Azure Blob Storage object store to fetch credentials from the operative system environment."""
    return AzureCredentials.FromEnv()

azure_static_credentials #

azure_static_credentials(*, access_key=None, sas_token=None, bearer_token=None)

Create static credentials Azure Blob Storage object store.

Source code in icechunk-python/python/icechunk/credentials.py

def azure_static_credentials(
    *,
    access_key: str | None = None,
    sas_token: str | None = None,
    bearer_token: str | None = None,
) -> AnyAzureStaticCredential:
    """Create static credentials Azure Blob Storage object store."""
    if [access_key, sas_token, bearer_token].count(None) != 2:
        raise ValueError("Conflicting arguments to azure_static_credentials function")
    if access_key is not None:
        return AzureStaticCredentials.AccessKey(access_key)
    if sas_token is not None:
        return AzureStaticCredentials.SasToken(sas_token)
    if bearer_token is not None:
        return AzureStaticCredentials.BearerToken(bearer_token)
    raise ValueError(
        "No valid static credential provided for Azure Blob Storage object store"
    )

azure_storage #

azure_storage(*, account, container, prefix, access_key=None, sas_token=None, bearer_token=None, from_env=None, config=None)

Create a Storage instance that saves data in Azure Blob Storage object store.

Parameters:

Name	Type	Description	Default
`account`	`str`	The account to which the caller must have access privileges	required
`container`	`str`	The container where the repository will store its data	required
`prefix`	`str`	The prefix within the container that is the root directory of the repository	required
`access_key`	`str \| None`	Azure Blob Storage credential access key	`None`
`sas_token`	`str \| None`	Azure Blob Storage credential SAS token	`None`
`bearer_token`	`str \| None`	Azure Blob Storage credential bearer token	`None`
`from_env`	`bool \| None`	Fetch credentials from the operative system environment	`None`
`config`	`dict[str, str] \| None`	A dictionary of options for the Azure Blob Storage object store. See https://docs.rs/object_store/latest/object_store/azure/enum.AzureConfigKey.html#variants for a list of possible configuration keys.	`None`

Source code in icechunk-python/python/icechunk/storage.py

def azure_storage(
    *,
    account: str,
    container: str,
    prefix: str,
    access_key: str | None = None,
    sas_token: str | None = None,
    bearer_token: str | None = None,
    from_env: bool | None = None,
    config: dict[str, str] | None = None,
) -> Storage:
    """Create a Storage instance that saves data in Azure Blob Storage object store.

    Parameters
    ----------
    account: str
        The account to which the caller must have access privileges
    container: str
        The container where the repository will store its data
    prefix: str
        The prefix within the container that is the root directory of the repository
    access_key: str | None
        Azure Blob Storage credential access key
    sas_token: str | None
        Azure Blob Storage credential SAS token
    bearer_token: str | None
        Azure Blob Storage credential bearer token
    from_env: bool | None
        Fetch credentials from the operative system environment
    config: dict[str, str] | None
        A dictionary of options for the Azure Blob Storage object store. See https://docs.rs/object_store/latest/object_store/azure/enum.AzureConfigKey.html#variants for a list of possible configuration keys.
    """
    credentials = azure_credentials(
        access_key=access_key,
        sas_token=sas_token,
        bearer_token=bearer_token,
        from_env=from_env,
    )
    return Storage.new_azure_blob(
        account=account,
        container=container,
        prefix=prefix,
        credentials=credentials,
        config=config,
    )

containers_credentials #

containers_credentials(m)

Build a map of credentials for virtual chunk containers.

Parameters:

Name	Type	Description	Default
`m`	`Mapping[str, AnyS3Credential \| AnyGcsCredential \| AnyAzureCredential \| None]`	A mapping from container url prefixes to credentials.	required

Examples:

import icechunk as ic

config = ic.RepositoryConfig.default()
config.inline_chunk_threshold_bytes = 512

virtual_store_config = ic.s3_store(
    region="us-east-1",
    endpoint_url="http://localhost:9000",
    allow_http=True,
    s3_compatible=True,
    force_path_style=True,
)
container = ic.VirtualChunkContainer("s3://somebucket", virtual_store_config)
config.set_virtual_chunk_container(container)
credentials = ic.containers_credentials(
    {"s3://somebucket": ic.s3_credentials(access_key_id="ACCESS_KEY", secret_access_key="SECRET"}
)

repo = ic.Repository.create(
    storage=ic.local_filesystem_storage(store_path),
    config=config,
    authorize_virtual_chunk_access=credentials,
)

Source code in icechunk-python/python/icechunk/credentials.py

def containers_credentials(
    m: Mapping[str, AnyS3Credential | AnyGcsCredential | AnyAzureCredential | None],
) -> dict[str, AnyCredential | None]:
    """Build a map of credentials for virtual chunk containers.

    Parameters
    ----------
    m: Mapping[str, AnyS3Credential | AnyGcsCredential | AnyAzureCredential ]
        A mapping from container url prefixes to credentials.

    Examples
    --------
    ```python
    import icechunk as ic

    config = ic.RepositoryConfig.default()
    config.inline_chunk_threshold_bytes = 512

    virtual_store_config = ic.s3_store(
        region="us-east-1",
        endpoint_url="http://localhost:9000",
        allow_http=True,
        s3_compatible=True,
        force_path_style=True,
    )
    container = ic.VirtualChunkContainer("s3://somebucket", virtual_store_config)
    config.set_virtual_chunk_container(container)
    credentials = ic.containers_credentials(
        {"s3://somebucket": ic.s3_credentials(access_key_id="ACCESS_KEY", secret_access_key="SECRET"}
    )

    repo = ic.Repository.create(
        storage=ic.local_filesystem_storage(store_path),
        config=config,
        authorize_virtual_chunk_access=credentials,
    )
    ```

    """
    res: dict[str, AnyCredential | None] = {}
    for name, cred in m.items():
        if cred is None:
            res[name] = None
        elif isinstance(cred, AnyS3Credential):
            res[name] = Credentials.S3(cred)
        elif (
            isinstance(cred, GcsCredentials.FromEnv)
            or isinstance(cred, GcsCredentials.Static)
            or isinstance(cred, GcsCredentials.Refreshable)
            or isinstance(cred, GcsCredentials.Anonymous)
        ):
            res[name] = Credentials.Gcs(cast(GcsCredentials, cred))
        elif isinstance(cred, AzureCredentials.FromEnv) or isinstance(
            cred, AzureCredentials.Static
        ):
            res[name] = Credentials.Azure(cast(AzureCredentials, cred))
        else:
            raise ValueError(f"Unknown credential type {type(cred)}")
    return res

gcs_credentials #

gcs_credentials(*, service_account_file=None, service_account_key=None, application_credentials=None, bearer_token=None, from_env=None, anonymous=None, get_credentials=None, scatter_initial_credentials=False)

Create credentials Google Cloud Storage object store.

If all arguments are None, credentials are fetched from the operative system environment.

Source code in icechunk-python/python/icechunk/credentials.py

def gcs_credentials(
    *,
    service_account_file: str | None = None,
    service_account_key: str | None = None,
    application_credentials: str | None = None,
    bearer_token: str | None = None,
    from_env: bool | None = None,
    anonymous: bool | None = None,
    get_credentials: Callable[[], GcsBearerCredential] | None = None,
    scatter_initial_credentials: bool = False,
) -> AnyGcsCredential:
    """Create credentials Google Cloud Storage object store.

    If all arguments are None, credentials are fetched from the operative system environment.
    """
    if anonymous is not None and anonymous:
        return gcs_anonymous_credentials()

    if (from_env is None or from_env) and (
        service_account_file is None
        and service_account_key is None
        and application_credentials is None
        and bearer_token is None
    ):
        return gcs_from_env_credentials()

    if (
        service_account_file is not None
        or service_account_key is not None
        or application_credentials is not None
        or bearer_token is not None
    ) and (from_env is None or not from_env):
        return GcsCredentials.Static(
            gcs_static_credentials(
                service_account_file=service_account_file,
                service_account_key=service_account_key,
                application_credentials=application_credentials,
                bearer_token=bearer_token,
            )
        )

    if get_credentials is not None:
        return gcs_refreshable_credentials(
            get_credentials, scatter_initial_credentials=scatter_initial_credentials
        )

    raise ValueError("Conflicting arguments to gcs_credentials function")

gcs_from_env_credentials #

gcs_from_env_credentials()

Instruct Google Cloud Storage object store to fetch credentials from the operative system environment.

Source code in icechunk-python/python/icechunk/credentials.py

def gcs_from_env_credentials() -> GcsCredentials.FromEnv:
    """Instruct Google Cloud Storage object store to fetch credentials from the operative system environment."""
    return GcsCredentials.FromEnv()

gcs_refreshable_credentials #

gcs_refreshable_credentials(get_credentials, scatter_initial_credentials=False)

Create refreshable credentials for Google Cloud Storage object store.

Parameters:

Name	Type	Description	Default
`get_credentials`	`Callable[[], GcsBearerCredential]`	Use this function to get and refresh the credentials. The function must be pickable.	required
`scatter_initial_credentials`	`bool`	Immediately call and store the value returned by get_credentials. This is useful if the repo or session will be pickled to generate many copies. Passing scatter_initial_credentials=True will ensure all those copies don't need to call get_credentials immediately. After the initial set of credentials has expired, the cached value is no longer used. Notice that credentials obtained are stored, and they can be sent over the network if you pickle the session/repo.	`False`

Source code in icechunk-python/python/icechunk/credentials.py

def gcs_refreshable_credentials(
    get_credentials: Callable[[], GcsBearerCredential],
    scatter_initial_credentials: bool = False,
) -> GcsCredentials.Refreshable:
    """Create refreshable credentials for Google Cloud Storage object store.

    Parameters
    ----------
    get_credentials: Callable[[], S3StaticCredentials]
        Use this function to get and refresh the credentials. The function must be pickable.
    scatter_initial_credentials: bool, optional
        Immediately call and store the value returned by get_credentials. This is useful if the
        repo or session will be pickled to generate many copies. Passing scatter_initial_credentials=True will
        ensure all those copies don't need to call get_credentials immediately. After the initial
        set of credentials has expired, the cached value is no longer used. Notice that credentials
        obtained are stored, and they can be sent over the network if you pickle the session/repo.
    """

    current = get_credentials() if scatter_initial_credentials else None
    return GcsCredentials.Refreshable(pickle.dumps(get_credentials), current)

gcs_static_credentials #

gcs_static_credentials(*, service_account_file=None, service_account_key=None, application_credentials=None, bearer_token=None)

Create static credentials Google Cloud Storage object store.

Source code in icechunk-python/python/icechunk/credentials.py

def gcs_static_credentials(
    *,
    service_account_file: str | None = None,
    service_account_key: str | None = None,
    application_credentials: str | None = None,
    bearer_token: str | None = None,
) -> AnyGcsStaticCredential:
    """Create static credentials Google Cloud Storage object store."""
    if service_account_file is not None:
        return GcsStaticCredentials.ServiceAccount(service_account_file)
    if service_account_key is not None:
        return GcsStaticCredentials.ServiceAccountKey(service_account_key)
    if application_credentials is not None:
        return GcsStaticCredentials.ApplicationCredentials(application_credentials)
    if bearer_token is not None:
        return GcsStaticCredentials.BearerToken(bearer_token)
    raise ValueError("Conflicting arguments to gcs_static_credentials function")

gcs_storage #

gcs_storage(*, bucket, prefix, service_account_file=None, service_account_key=None, application_credentials=None, bearer_token=None, anonymous=None, from_env=None, config=None, get_credentials=None, scatter_initial_credentials=False)

Create a Storage instance that saves data in Google Cloud Storage object store.

Parameters:

Name	Type	Description	Default
`bucket`	`str`	The bucket where the repository will store its data	required
`prefix`	`str \| None`	The prefix within the bucket that is the root directory of the repository	required
`service_account_file`	`str \| None`	The path to the service account file	`None`
`service_account_key`	`str \| None`	The service account key	`None`
`application_credentials`	`str \| None`	The path to the application credentials file	`None`
`bearer_token`	`str \| None`	The bearer token to use for the object store	`None`
`anonymous`	`bool \| None`	If set to True requests to the object store will not be signed	`None`
`from_env`	`bool \| None`	Fetch credentials from the operative system environment	`None`
`config`	`dict[str, str] \| None`	A dictionary of options for the Google Cloud Storage object store. See https://docs.rs/object_store/latest/object_store/gcp/enum.GoogleConfigKey.html#variants for a list of possible configuration keys.	`None`
`get_credentials`	`Callable[[], GcsBearerCredential] \| None`	Use this function to get and refresh object store credentials	`None`
`scatter_initial_credentials`	`bool`	Immediately call and store the value returned by get_credentials. This is useful if the repo or session will be pickled to generate many copies. Passing scatter_initial_credentials=True will ensure all those copies don't need to call get_credentials immediately. After the initial set of credentials has expired, the cached value is no longer used. Notice that credentials obtained are stored, and they can be sent over the network if you pickle the session/repo.	`False`

Source code in icechunk-python/python/icechunk/storage.py

def gcs_storage(
    *,
    bucket: str,
    prefix: str | None,
    service_account_file: str | None = None,
    service_account_key: str | None = None,
    application_credentials: str | None = None,
    bearer_token: str | None = None,
    anonymous: bool | None = None,
    from_env: bool | None = None,
    config: dict[str, str] | None = None,
    get_credentials: Callable[[], GcsBearerCredential] | None = None,
    scatter_initial_credentials: bool = False,
) -> Storage:
    """Create a Storage instance that saves data in Google Cloud Storage object store.

    Parameters
    ----------
    bucket: str
        The bucket where the repository will store its data
    prefix: str | None
        The prefix within the bucket that is the root directory of the repository
    service_account_file: str | None
        The path to the service account file
    service_account_key: str | None
        The service account key
    application_credentials: str | None
        The path to the application credentials file
    bearer_token: str | None
        The bearer token to use for the object store
    anonymous: bool | None
        If set to True requests to the object store will not be signed
    from_env: bool | None
        Fetch credentials from the operative system environment
    config: dict[str, str] | None
        A dictionary of options for the Google Cloud Storage object store. See https://docs.rs/object_store/latest/object_store/gcp/enum.GoogleConfigKey.html#variants for a list of possible configuration keys.
    get_credentials: Callable[[], GcsBearerCredential] | None
        Use this function to get and refresh object store credentials
    scatter_initial_credentials: bool, optional
        Immediately call and store the value returned by get_credentials. This is useful if the
        repo or session will be pickled to generate many copies. Passing scatter_initial_credentials=True will
        ensure all those copies don't need to call get_credentials immediately. After the initial
        set of credentials has expired, the cached value is no longer used. Notice that credentials
        obtained are stored, and they can be sent over the network if you pickle the session/repo.
    """
    credentials = gcs_credentials(
        service_account_file=service_account_file,
        service_account_key=service_account_key,
        application_credentials=application_credentials,
        bearer_token=bearer_token,
        from_env=from_env,
        anonymous=anonymous,
        get_credentials=get_credentials,
        scatter_initial_credentials=scatter_initial_credentials,
    )
    return Storage.new_gcs(
        bucket=bucket,
        prefix=prefix,
        credentials=credentials,
        config=config,
    )

gcs_store #

gcs_store(opts=None)

Build an ObjectStoreConfig instance for Google Cloud Storage object stores.

Parameters:

Name	Type	Description	Default
`opts`	`dict[str, str] \| None`	A dictionary of options for the Google Cloud Storage object store. See https://docs.rs/object_store/latest/object_store/gcp/enum.GoogleConfigKey.html#variants for a list of possible configuration keys.	`None`

Source code in icechunk-python/python/icechunk/storage.py

def gcs_store(
    opts: dict[str, str] | None = None,
) -> ObjectStoreConfig.Gcs:
    """Build an ObjectStoreConfig instance for Google Cloud Storage object stores.

    Parameters
    ----------
    opts: dict[str, str] | None
        A dictionary of options for the Google Cloud Storage object store. See https://docs.rs/object_store/latest/object_store/gcp/enum.GoogleConfigKey.html#variants for a list of possible configuration keys.
    """
    return ObjectStoreConfig.Gcs(opts)

http_storage #

http_storage(base_url, opts=None)

Create a read-only Storage instance that reads data from an HTTP(s) server

Parameters:

Name	Type	Description	Default
`base_url`	`str`	The URL path to the root of the repository	required
`opts`	`dict[str, str] \| None`	A dictionary of options for the HTTP object store. See https://docs.rs/object_store/latest/object_store/client/enum.ClientConfigKey.html#variants for a list of possible keys in snake case format.	`None`

Source code in icechunk-python/python/icechunk/storage.py

def http_storage(base_url: str, opts: dict[str, str] | None = None) -> Storage:
    """Create a read-only Storage instance that reads data from an HTTP(s) server

    Parameters
    ----------
    base_url: str
        The URL path to the root of the repository
    opts: dict[str, str] | None
        A dictionary of options for the HTTP object store. See https://docs.rs/object_store/latest/object_store/client/enum.ClientConfigKey.html#variants for a list of possible keys in snake case format.
    """
    return Storage.new_http(base_url, opts)

http_store #

http_store(opts=None)

Build an ObjectStoreConfig instance for HTTP object stores.

Parameters:

Name	Type	Description	Default
`opts`	`dict[str, str] \| None`	A dictionary of options for the HTTP object store. See https://docs.rs/object_store/latest/object_store/client/enum.ClientConfigKey.html#variants for a list of possible keys in snake case format.	`None`

Source code in icechunk-python/python/icechunk/storage.py

def http_store(
    opts: dict[str, str] | None = None,
) -> ObjectStoreConfig.Http:
    """Build an ObjectStoreConfig instance for HTTP object stores.

    Parameters
    ----------
    opts: dict[str, str] | None
        A dictionary of options for the HTTP object store. See https://docs.rs/object_store/latest/object_store/client/enum.ClientConfigKey.html#variants for a list of possible keys in snake case format.
    """
    return ObjectStoreConfig.Http(opts)

in_memory_storage #

in_memory_storage()

Create a Storage instance that saves data in memory.

This Storage implementation is used for tests. Data will be lost after the process finishes, and can only be accesses through the Storage instance returned. Different instances don't share data.

Source code in icechunk-python/python/icechunk/storage.py

def in_memory_storage() -> Storage:
    """Create a Storage instance that saves data in memory.

    This Storage implementation is used for tests. Data will be lost after the process finishes, and can only be accesses through the Storage instance returned. Different instances don't share data."""
    return Storage.new_in_memory()

initialize_logs #

initialize_logs()

Initialize the logging system for the library.

Reads the value of the environment variable ICECHUNK_LOG to obtain the filters. This is autamtically called on import icechunk.

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

def initialize_logs() -> None:
    """
    Initialize the logging system for the library.

    Reads the value of the environment variable ICECHUNK_LOG to obtain the filters.
    This is autamtically called on `import icechunk`.
    """
    ...

local_filesystem_storage #

local_filesystem_storage(path)

Create a Storage instance that saves data in the local file system.

This Storage instance is not recommended for production data

Source code in icechunk-python/python/icechunk/storage.py

def local_filesystem_storage(path: str) -> Storage:
    """Create a Storage instance that saves data in the local file system.

    This Storage instance is not recommended for production data
    """
    return Storage.new_local_filesystem(path)

local_filesystem_store #

local_filesystem_store(path)

Build an ObjectStoreConfig instance for local file stores.

Parameters:

Name	Type	Description	Default
`path`	`str`	The root directory for the store.	required

Source code in icechunk-python/python/icechunk/storage.py

def local_filesystem_store(
    path: str,
) -> ObjectStoreConfig.LocalFileSystem:
    """Build an ObjectStoreConfig instance for local file stores.

    Parameters
    ----------
    path: str
        The root directory for the store.
    """
    return ObjectStoreConfig.LocalFileSystem(path)

r2_storage #

r2_storage(*, bucket=None, prefix=None, account_id=None, endpoint_url=None, region=None, allow_http=False, access_key_id=None, secret_access_key=None, session_token=None, expires_after=None, anonymous=None, from_env=None, get_credentials=None, scatter_initial_credentials=False, network_stream_timeout_seconds=60)

Create a Storage instance that saves data in Tigris object store.

Parameters:

Name	Type	Description	Default
`bucket`	`str \| None`	The bucket name	`None`
`prefix`	`str \| None`	The prefix within the bucket that is the root directory of the repository	`None`
`account_id`	`str \| None`	Cloudflare account ID. When provided, a default endpoint URL is constructed as `https://<ACCOUNT_ID>.r2.cloudflarestorage.com`. If not provided, `endpoint_url` must be provided instead.	`None`
`endpoint_url`	`str \| None`	Endpoint where the object store serves data, example: `https://<ACCOUNT_ID>.r2.cloudflarestorage.com`	`None`
`region`	`str \| None`	The region to use in the object store, if `None` the default region 'auto' will be used	`None`
`allow_http`	`bool`	If the object store can be accessed using http protocol instead of https	`False`
`access_key_id`	`str \| None`	S3 credential access key	`None`
`secret_access_key`	`str \| None`	S3 credential secret access key	`None`
`session_token`	`str \| None`	Optional S3 credential session token	`None`
`expires_after`	`datetime \| None`	Optional expiration for the object store credentials	`None`
`anonymous`	`bool \| None`	If set to True requests to the object store will not be signed	`None`
`from_env`	`bool \| None`	Fetch credentials from the operative system environment	`None`
`get_credentials`	`Callable[[], S3StaticCredentials] \| None`	Use this function to get and refresh object store credentials	`None`
`scatter_initial_credentials`	`bool`	Immediately call and store the value returned by get_credentials. This is useful if the repo or session will be pickled to generate many copies. Passing scatter_initial_credentials=True will ensure all those copies don't need to call get_credentials immediately. After the initial set of credentials has expired, the cached value is no longer used. Notice that credentials obtained are stored, and they can be sent over the network if you pickle the session/repo.	`False`
`network_stream_timeout_seconds`	`int`	Timeout requests if no bytes can be transmitted during this period of time. If set to 0, timeout is disabled.	`60`

Source code in icechunk-python/python/icechunk/storage.py

def r2_storage(
    *,
    bucket: str | None = None,
    prefix: str | None = None,
    account_id: str | None = None,
    endpoint_url: str | None = None,
    region: str | None = None,
    allow_http: bool = False,
    access_key_id: str | None = None,
    secret_access_key: str | None = None,
    session_token: str | None = None,
    expires_after: datetime | None = None,
    anonymous: bool | None = None,
    from_env: bool | None = None,
    get_credentials: Callable[[], S3StaticCredentials] | None = None,
    scatter_initial_credentials: bool = False,
    network_stream_timeout_seconds: int = 60,
) -> Storage:
    """Create a Storage instance that saves data in Tigris object store.

    Parameters
    ----------
    bucket: str | None
        The bucket name
    prefix: str | None
        The prefix within the bucket that is the root directory of the repository
    account_id: str | None
        Cloudflare account ID. When provided, a default endpoint URL is constructed as
        `https://<ACCOUNT_ID>.r2.cloudflarestorage.com`. If not provided, `endpoint_url`
        must be provided instead.
    endpoint_url: str | None
        Endpoint where the object store serves data, example: `https://<ACCOUNT_ID>.r2.cloudflarestorage.com`
    region: str | None
        The region to use in the object store, if `None` the default region 'auto' will be used
    allow_http: bool
        If the object store can be accessed using http protocol instead of https
    access_key_id: str | None
        S3 credential access key
    secret_access_key: str | None
        S3 credential secret access key
    session_token: str | None
        Optional S3 credential session token
    expires_after: datetime | None
        Optional expiration for the object store credentials
    anonymous: bool | None
        If set to True requests to the object store will not be signed
    from_env: bool | None
        Fetch credentials from the operative system environment
    get_credentials: Callable[[], S3StaticCredentials] | None
        Use this function to get and refresh object store credentials
    scatter_initial_credentials: bool, optional
        Immediately call and store the value returned by get_credentials. This is useful if the
        repo or session will be pickled to generate many copies. Passing scatter_initial_credentials=True will
        ensure all those copies don't need to call get_credentials immediately. After the initial
        set of credentials has expired, the cached value is no longer used. Notice that credentials
        obtained are stored, and they can be sent over the network if you pickle the session/repo.
    network_stream_timeout_seconds: int
        Timeout requests if no bytes can be transmitted during this period of time.
        If set to 0, timeout is disabled.
    """
    credentials = s3_credentials(
        access_key_id=access_key_id,
        secret_access_key=secret_access_key,
        session_token=session_token,
        expires_after=expires_after,
        anonymous=anonymous,
        from_env=from_env,
        get_credentials=get_credentials,
        scatter_initial_credentials=scatter_initial_credentials,
    )
    options = S3Options(
        region=region,
        endpoint_url=endpoint_url,
        allow_http=allow_http,
        network_stream_timeout_seconds=network_stream_timeout_seconds,
        anonymous=anonymous or False,
    )
    return Storage.new_r2(
        config=options,
        bucket=bucket,
        prefix=prefix,
        account_id=account_id,
        credentials=credentials,
    )

s3_anonymous_credentials #

s3_anonymous_credentials()

Create no-signature credentials for S3 and S3 compatible object stores.

Source code in icechunk-python/python/icechunk/credentials.py

def s3_anonymous_credentials() -> S3Credentials.Anonymous:
    """Create no-signature credentials for S3 and S3 compatible object stores."""
    return S3Credentials.Anonymous()

s3_credentials #

s3_credentials(*, access_key_id=None, secret_access_key=None, session_token=None, expires_after=None, anonymous=None, from_env=None, get_credentials=None, scatter_initial_credentials=False)

Create credentials for S3 and S3 compatible object stores.

If all arguments are None, credentials are fetched from the environment.

Parameters:

Name	Type	Description	Default
`access_key_id`	`str \| None`	S3 credential access key	`None`
`secret_access_key`	`str \| None`	S3 credential secret access key	`None`
`session_token`	`str \| None`	Optional S3 credential session token	`None`
`expires_after`	`datetime \| None`	Optional expiration for the object store credentials	`None`
`anonymous`	`bool \| None`	If set to True requests to the object store will not be signed	`None`
`from_env`	`bool \| None`	Fetch credentials from the operative system environment	`None`
`get_credentials`	`Callable[[], S3StaticCredentials] \| None`	Use this function to get and refresh object store credentials	`None`
`scatter_initial_credentials`	`bool`	Immediately call and store the value returned by get_credentials. This is useful if the repo or session will be pickled to generate many copies. Passing scatter_initial_credentials=True will ensure all those copies don't need to call get_credentials immediately. After the initial set of credentials has expired, the cached value is no longer used. Notice that credentials obtained are stored, and they can be sent over the network if you pickle the session/repo.	`False`

Source code in icechunk-python/python/icechunk/credentials.py

def s3_credentials(
    *,
    access_key_id: str | None = None,
    secret_access_key: str | None = None,
    session_token: str | None = None,
    expires_after: datetime | None = None,
    anonymous: bool | None = None,
    from_env: bool | None = None,
    get_credentials: Callable[[], S3StaticCredentials] | None = None,
    scatter_initial_credentials: bool = False,
) -> AnyS3Credential:
    """Create credentials for S3 and S3 compatible object stores.

    If all arguments are None, credentials are fetched from the environment.

    Parameters
    ----------
    access_key_id: str | None
        S3 credential access key
    secret_access_key: str | None
        S3 credential secret access key
    session_token: str | None
        Optional S3 credential session token
    expires_after: datetime | None
        Optional expiration for the object store credentials
    anonymous: bool | None
        If set to True requests to the object store will not be signed
    from_env: bool | None
        Fetch credentials from the operative system environment
    get_credentials: Callable[[], S3StaticCredentials] | None
        Use this function to get and refresh object store credentials
    scatter_initial_credentials: bool, optional
        Immediately call and store the value returned by get_credentials. This is useful if the
        repo or session will be pickled to generate many copies. Passing scatter_initial_credentials=True will
        ensure all those copies don't need to call get_credentials immediately. After the initial
        set of credentials has expired, the cached value is no longer used. Notice that credentials
        obtained are stored, and they can be sent over the network if you pickle the session/repo.
    """
    if (
        (from_env is None or from_env)
        and access_key_id is None
        and secret_access_key is None
        and session_token is None
        and expires_after is None
        and not anonymous
        and get_credentials is None
    ):
        return s3_from_env_credentials()

    if (
        anonymous
        and access_key_id is None
        and secret_access_key is None
        and session_token is None
        and expires_after is None
        and not from_env
        and get_credentials is None
    ):
        return s3_anonymous_credentials()

    if (
        get_credentials is not None
        and access_key_id is None
        and secret_access_key is None
        and session_token is None
        and expires_after is None
        and not from_env
        and not anonymous
    ):
        return s3_refreshable_credentials(
            get_credentials, scatter_initial_credentials=scatter_initial_credentials
        )

    if (
        access_key_id
        and secret_access_key
        and not from_env
        and not anonymous
        and get_credentials is None
    ):
        return s3_static_credentials(
            access_key_id=access_key_id,
            secret_access_key=secret_access_key,
            session_token=session_token,
            expires_after=expires_after,
        )

    raise ValueError("Conflicting arguments to s3_credentials function")

s3_from_env_credentials #

s3_from_env_credentials()

Instruct S3 and S3 compatible object stores to gather credentials from the operative system environment.

Source code in icechunk-python/python/icechunk/credentials.py

def s3_from_env_credentials() -> S3Credentials.FromEnv:
    """Instruct S3 and S3 compatible object stores to gather credentials from the operative system environment."""
    return S3Credentials.FromEnv()

s3_refreshable_credentials #

s3_refreshable_credentials(get_credentials, scatter_initial_credentials=False)

Create refreshable credentials for S3 and S3 compatible object stores.

Parameters:

Name	Type	Description	Default
`get_credentials`	`Callable[[], S3StaticCredentials]`	Use this function to get and refresh the credentials. The function must be pickable.	required
`scatter_initial_credentials`	`bool`	Immediately call and store the value returned by get_credentials. This is useful if the repo or session will be pickled to generate many copies. Passing scatter_initial_credentials=True will ensure all those copies don't need to call get_credentials immediately. After the initial set of credentials has expired, the cached value is no longer used. Notice that credentials obtained are stored, and they can be sent over the network if you pickle the session/repo.	`False`

Source code in icechunk-python/python/icechunk/credentials.py

def s3_refreshable_credentials(
    get_credentials: Callable[[], S3StaticCredentials],
    scatter_initial_credentials: bool = False,
) -> S3Credentials.Refreshable:
    """Create refreshable credentials for S3 and S3 compatible object stores.

    Parameters
    ----------
    get_credentials: Callable[[], S3StaticCredentials]
        Use this function to get and refresh the credentials. The function must be pickable.
    scatter_initial_credentials: bool, optional
        Immediately call and store the value returned by get_credentials. This is useful if the
        repo or session will be pickled to generate many copies. Passing scatter_initial_credentials=True will
        ensure all those copies don't need to call get_credentials immediately. After the initial
        set of credentials has expired, the cached value is no longer used. Notice that credentials
        obtained are stored, and they can be sent over the network if you pickle the session/repo.
    """
    current = get_credentials() if scatter_initial_credentials else None
    return S3Credentials.Refreshable(pickle.dumps(get_credentials), current)

s3_static_credentials #

s3_static_credentials(*, access_key_id, secret_access_key, session_token=None, expires_after=None)

Create static credentials for S3 and S3 compatible object stores.

Parameters:

Name	Type	Description	Default
`access_key_id`	`str`	S3 credential access key	required
`secret_access_key`	`str`	S3 credential secret access key	required
`session_token`	`str \| None`	Optional S3 credential session token	`None`
`expires_after`	`datetime \| None`	Optional expiration for the object store credentials	`None`

Source code in icechunk-python/python/icechunk/credentials.py

def s3_static_credentials(
    *,
    access_key_id: str,
    secret_access_key: str,
    session_token: str | None = None,
    expires_after: datetime | None = None,
) -> S3Credentials.Static:
    """Create static credentials for S3 and S3 compatible object stores.

    Parameters
    ----------
    access_key_id: str | None
        S3 credential access key
    secret_access_key: str | None
        S3 credential secret access key
    session_token: str | None
        Optional S3 credential session token
    expires_after: datetime | None
        Optional expiration for the object store credentials
    """
    return S3Credentials.Static(
        S3StaticCredentials(
            access_key_id=access_key_id,
            secret_access_key=secret_access_key,
            session_token=session_token,
            expires_after=expires_after,
        )
    )

s3_storage #

s3_storage(*, bucket, prefix, region=None, endpoint_url=None, allow_http=False, access_key_id=None, secret_access_key=None, session_token=None, expires_after=None, anonymous=None, from_env=None, get_credentials=None, scatter_initial_credentials=False, force_path_style=False, network_stream_timeout_seconds=60, requester_pays=False)

Create a Storage instance that saves data in S3 or S3 compatible object stores.

Parameters:

Name	Type	Description	Default
`bucket`	`str`	The bucket where the repository will store its data	required
`prefix`	`str \| None`	The prefix within the bucket that is the root directory of the repository	required
`region`	`str \| None`	The region to use in the object store, if `None` a default region will be used	`None`
`endpoint_url`	`str \| None`	Optional endpoint where the object store serves data, example: http://localhost:9000	`None`
`allow_http`	`bool`	If the object store can be accessed using http protocol instead of https	`False`
`access_key_id`	`str \| None`	S3 credential access key	`None`
`secret_access_key`	`str \| None`	S3 credential secret access key	`None`
`session_token`	`str \| None`	Optional S3 credential session token	`None`
`expires_after`	`datetime \| None`	Optional expiration for the object store credentials	`None`
`anonymous`	`bool \| None`	If set to True requests to the object store will not be signed	`None`
`from_env`	`bool \| None`	Fetch credentials from the operative system environment	`None`
`get_credentials`	`Callable[[], S3StaticCredentials] \| None`	Use this function to get and refresh object store credentials	`None`
`scatter_initial_credentials`	`bool`	Immediately call and store the value returned by get_credentials. This is useful if the repo or session will be pickled to generate many copies. Passing scatter_initial_credentials=True will ensure all those copies don't need to call get_credentials immediately. After the initial set of credentials has expired, the cached value is no longer used. Notice that credentials obtained are stored, and they can be sent over the network if you pickle the session/repo.	`False`
`force_path_style`	`bool`	Whether to force using path-style addressing for buckets	`False`
`network_stream_timeout_seconds`	`int`	Timeout requests if no bytes can be transmitted during this period of time. If set to 0, timeout is disabled.	`60`
`requester_pays`	`bool`	Enable requester pays for S3 buckets	`False`

Source code in icechunk-python/python/icechunk/storage.py

def s3_storage(
    *,
    bucket: str,
    prefix: str | None,
    region: str | None = None,
    endpoint_url: str | None = None,
    allow_http: bool = False,
    access_key_id: str | None = None,
    secret_access_key: str | None = None,
    session_token: str | None = None,
    expires_after: datetime | None = None,
    anonymous: bool | None = None,
    from_env: bool | None = None,
    get_credentials: Callable[[], S3StaticCredentials] | None = None,
    scatter_initial_credentials: bool = False,
    force_path_style: bool = False,
    network_stream_timeout_seconds: int = 60,
    requester_pays: bool = False,
) -> Storage:
    """Create a Storage instance that saves data in S3 or S3 compatible object stores.

    Parameters
    ----------
    bucket: str
        The bucket where the repository will store its data
    prefix: str | None
        The prefix within the bucket that is the root directory of the repository
    region: str | None
        The region to use in the object store, if `None` a default region will be used
    endpoint_url: str | None
        Optional endpoint where the object store serves data, example: http://localhost:9000
    allow_http: bool
        If the object store can be accessed using http protocol instead of https
    access_key_id: str | None
        S3 credential access key
    secret_access_key: str | None
        S3 credential secret access key
    session_token: str | None
        Optional S3 credential session token
    expires_after: datetime | None
        Optional expiration for the object store credentials
    anonymous: bool | None
        If set to True requests to the object store will not be signed
    from_env: bool | None
        Fetch credentials from the operative system environment
    get_credentials: Callable[[], S3StaticCredentials] | None
        Use this function to get and refresh object store credentials
    scatter_initial_credentials: bool, optional
        Immediately call and store the value returned by get_credentials. This is useful if the
        repo or session will be pickled to generate many copies. Passing scatter_initial_credentials=True will
        ensure all those copies don't need to call get_credentials immediately. After the initial
        set of credentials has expired, the cached value is no longer used. Notice that credentials
        obtained are stored, and they can be sent over the network if you pickle the session/repo.
    force_path_style: bool
        Whether to force using path-style addressing for buckets
    network_stream_timeout_seconds: int
        Timeout requests if no bytes can be transmitted during this period of time.
        If set to 0, timeout is disabled.
    requester_pays: bool
        Enable requester pays for S3 buckets
    """

    credentials = s3_credentials(
        access_key_id=access_key_id,
        secret_access_key=secret_access_key,
        session_token=session_token,
        expires_after=expires_after,
        anonymous=anonymous,
        from_env=from_env,
        get_credentials=get_credentials,
        scatter_initial_credentials=scatter_initial_credentials,
    )
    options = S3Options(
        region=region,
        endpoint_url=endpoint_url,
        allow_http=allow_http,
        force_path_style=force_path_style,
        network_stream_timeout_seconds=network_stream_timeout_seconds,
        requester_pays=requester_pays,
        anonymous=anonymous or False,
    )
    return Storage.new_s3(
        config=options,
        bucket=bucket,
        prefix=prefix,
        credentials=credentials,
    )

s3_store #

s3_store(region=None, endpoint_url=None, allow_http=False, anonymous=False, s3_compatible=False, force_path_style=False, network_stream_timeout_seconds=60, requester_pays=False)

Build an ObjectStoreConfig instance for S3 or S3 compatible object stores.

Source code in icechunk-python/python/icechunk/storage.py

def s3_store(
    region: str | None = None,
    endpoint_url: str | None = None,
    allow_http: bool = False,
    anonymous: bool = False,
    s3_compatible: bool = False,
    force_path_style: bool = False,
    network_stream_timeout_seconds: int = 60,
    requester_pays: bool = False,
) -> ObjectStoreConfig.S3Compatible | ObjectStoreConfig.S3:
    """Build an ObjectStoreConfig instance for S3 or S3 compatible object stores."""

    options = S3Options(
        region=region,
        endpoint_url=endpoint_url,
        allow_http=allow_http,
        force_path_style=force_path_style,
        network_stream_timeout_seconds=network_stream_timeout_seconds,
        requester_pays=requester_pays,
        anonymous=anonymous,
    )
    return (
        ObjectStoreConfig.S3Compatible(options)
        if s3_compatible
        else ObjectStoreConfig.S3(options)
    )

set_logs_filter #

set_logs_filter(log_filter_directive)

Set filters and log levels for the different modules.

Examples: - set_logs_filter("trace") # trace level for all modules - set_logs_filter("error") # error level for all modules - set_logs_filter("icechunk=debug,info") # debug level for icechunk, info for everything else

Full spec for the log_filter_directive syntax is documented in https://docs.rs/tracing-subscriber/latest/tracing_subscriber/filter/struct.EnvFilter.html#directives

Parameters:

Name	Type	Description	Default
`log_filter_directive`	`str \| None`	The comma separated list of directives for modules and log levels. If None, the directive will be read from the environment variable ICECHUNK_LOG	required

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

def set_logs_filter(log_filter_directive: str | None) -> None:
    """
    Set filters and log levels for the different modules.

    Examples:
      - set_logs_filter("trace")  # trace level for all modules
      - set_logs_filter("error")  # error level for all modules
      - set_logs_filter("icechunk=debug,info")  # debug level for icechunk, info for everything else

    Full spec for the log_filter_directive syntax is documented in
    https://docs.rs/tracing-subscriber/latest/tracing_subscriber/filter/struct.EnvFilter.html#directives

    Parameters
    ----------
    log_filter_directive: str | None
        The comma separated list of directives for modules and log levels.
        If None, the directive will be read from the environment variable
        ICECHUNK_LOG
    """
    ...

spec_version #

spec_version()

The version of the Icechunk specification that the library is compatible with.

Returns: int: The version of the Icechunk specification that the library is compatible with

Source code in icechunk-python/python/icechunk/_icechunk_python.pyi

def spec_version() -> int:
    """
    The version of the Icechunk specification that the library is compatible with.

    Returns:
        int: The version of the Icechunk specification that the library is compatible with
    """
    ...

tigris_storage #

tigris_storage(*, bucket, prefix, region=None, endpoint_url=None, use_weak_consistency=False, allow_http=False, access_key_id=None, secret_access_key=None, session_token=None, expires_after=None, anonymous=None, from_env=None, get_credentials=None, scatter_initial_credentials=False, network_stream_timeout_seconds=60)

Create a Storage instance that saves data in Tigris object store.

Parameters:

Name	Type	Description	Default
`bucket`	`str`	The bucket where the repository will store its data	required
`prefix`	`str \| None`	The prefix within the bucket that is the root directory of the repository	required
`region`	`str \| None`	The region to use in the object store, if `None` a default region will be used	`None`
`endpoint_url`	`str \| None`	Optional endpoint where the object store serves data, example: http://localhost:9000	`None`
`use_weak_consistency`	`bool`	If set to True it will return a Storage instance that is read only, and can read from the the closest Tigris region. Behavior is undefined if objects haven't propagated to the region yet. This option is for experts only.	`False`
`allow_http`	`bool`	If the object store can be accessed using http protocol instead of https	`False`
`access_key_id`	`str \| None`	S3 credential access key	`None`
`secret_access_key`	`str \| None`	S3 credential secret access key	`None`
`session_token`	`str \| None`	Optional S3 credential session token	`None`
`expires_after`	`datetime \| None`	Optional expiration for the object store credentials	`None`
`anonymous`	`bool \| None`	If set to True requests to the object store will not be signed	`None`
`from_env`	`bool \| None`	Fetch credentials from the operative system environment	`None`
`get_credentials`	`Callable[[], S3StaticCredentials] \| None`	Use this function to get and refresh object store credentials	`None`
`scatter_initial_credentials`	`bool`	Immediately call and store the value returned by get_credentials. This is useful if the repo or session will be pickled to generate many copies. Passing scatter_initial_credentials=True will ensure all those copies don't need to call get_credentials immediately. After the initial set of credentials has expired, the cached value is no longer used. Notice that credentials obtained are stored, and they can be sent over the network if you pickle the session/repo.	`False`
`network_stream_timeout_seconds`	`int`	Timeout requests if no bytes can be transmitted during this period of time. If set to 0, timeout is disabled.	`60`

Source code in icechunk-python/python/icechunk/storage.py

def tigris_storage(
    *,
    bucket: str,
    prefix: str | None,
    region: str | None = None,
    endpoint_url: str | None = None,
    use_weak_consistency: bool = False,
    allow_http: bool = False,
    access_key_id: str | None = None,
    secret_access_key: str | None = None,
    session_token: str | None = None,
    expires_after: datetime | None = None,
    anonymous: bool | None = None,
    from_env: bool | None = None,
    get_credentials: Callable[[], S3StaticCredentials] | None = None,
    scatter_initial_credentials: bool = False,
    network_stream_timeout_seconds: int = 60,
) -> Storage:
    """Create a Storage instance that saves data in Tigris object store.

    Parameters
    ----------
    bucket: str
        The bucket where the repository will store its data
    prefix: str | None
        The prefix within the bucket that is the root directory of the repository
    region: str | None
        The region to use in the object store, if `None` a default region will be used
    endpoint_url: str | None
        Optional endpoint where the object store serves data, example: http://localhost:9000
    use_weak_consistency: bool
        If set to True it will return a Storage instance that is read only, and can read from the
        the closest Tigris region. Behavior is undefined if objects haven't propagated to the region yet.
        This option is for experts only.
    allow_http: bool
        If the object store can be accessed using http protocol instead of https
    access_key_id: str | None
        S3 credential access key
    secret_access_key: str | None
        S3 credential secret access key
    session_token: str | None
        Optional S3 credential session token
    expires_after: datetime | None
        Optional expiration for the object store credentials
    anonymous: bool | None
        If set to True requests to the object store will not be signed
    from_env: bool | None
        Fetch credentials from the operative system environment
    get_credentials: Callable[[], S3StaticCredentials] | None
        Use this function to get and refresh object store credentials
    scatter_initial_credentials: bool, optional
        Immediately call and store the value returned by get_credentials. This is useful if the
        repo or session will be pickled to generate many copies. Passing scatter_initial_credentials=True will
        ensure all those copies don't need to call get_credentials immediately. After the initial
        set of credentials has expired, the cached value is no longer used. Notice that credentials
        obtained are stored, and they can be sent over the network if you pickle the session/repo.
    network_stream_timeout_seconds: int
        Timeout requests if no bytes can be transmitted during this period of time.
        If set to 0, timeout is disabled.
    """
    credentials = s3_credentials(
        access_key_id=access_key_id,
        secret_access_key=secret_access_key,
        session_token=session_token,
        expires_after=expires_after,
        anonymous=anonymous,
        from_env=from_env,
        get_credentials=get_credentials,
        scatter_initial_credentials=scatter_initial_credentials,
    )
    options = S3Options(
        region=region,
        endpoint_url=endpoint_url,
        allow_http=allow_http,
        network_stream_timeout_seconds=network_stream_timeout_seconds,
        anonymous=anonymous or False,
    )
    return Storage.new_tigris(
        config=options,
        bucket=bucket,
        prefix=prefix,
        use_weak_consistency=use_weak_consistency,
        credentials=credentials,
    )

icechunk.xarray #

Functions:

Name	Description
`to_icechunk`	Write an Xarray object to a group of an Icechunk store.

to_icechunk #

to_icechunk(obj, session, *, group=None, mode=None, safe_chunks=True, align_chunks=False, append_dim=None, region=None, encoding=None, chunkmanager_store_kwargs=None, split_every=None)

Write an Xarray object to a group of an Icechunk store.

Parameters:

Name	Type	Description	Default
`obj`	`DataArray \| Dataset`	Xarray object to write	required
`session`	`Session`	Writable Icechunk Session	required
`mode`	`"w", "w-", "a", "a-", r+", None`	Persistence mode: "w" means create (overwrite if exists); "w-" means create (fail if exists); "a" means override all existing variables including dimension coordinates (create if does not exist); "a-" means only append those variables that have `append_dim`. "r+" means modify existing array values only (raise an error if any metadata or shapes would change). The default mode is "a" if `append_dim` is set. Otherwise, it is "r+" if `region` is set and `w-` otherwise.	`"w"`
`group`	`str`	Group path. (a.k.a. `path` in zarr terminology.)	`None`
`encoding`	`dict`	Nested dictionary with variable names as keys and dictionaries of variable specific encodings as values, e.g., `{"my_variable": {"dtype": "int16", "scale_factor": 0.1,}, ...}`	`None`
`append_dim`	`hashable`	If set, the dimension along which the data will be appended. All other dimensions on overridden variables must remain the same size.	`None`
`region`	`dict or auto`	Optional mapping from dimension names to either a) `"auto"`, or b) integer slices, indicating the region of existing zarr array(s) in which to write this dataset's data. If `"auto"` is provided the existing store will be opened and the region inferred by matching indexes. `"auto"` can be used as a single string, which will automatically infer the region for all dimensions, or as dictionary values for specific dimensions mixed together with explicit slices for other dimensions. Alternatively integer slices can be provided; for example, `{'x': slice(0, 1000), 'y': slice(10000, 11000)}` would indicate that values should be written to the region `0:1000` along `x` and `10000:11000` along `y`. Users are expected to ensure that the specified region aligns with Zarr chunk boundaries, and that dask chunks are also aligned. Xarray makes limited checks that these multiple chunk boundaries line up. It is possible to write incomplete chunks and corrupt the data with this option if you are not careful.	`None`
`safe_chunks`	`bool`	If True, only allow writes to when there is a many-to-one relationship between Zarr chunks (specified in encoding) and Dask chunks. Set False to override this restriction; however, data may become corrupted if Zarr arrays are written in parallel. In addition to the many-to-one relationship validation, it also detects partial chunks writes when using the region parameter, these partial chunks are considered unsafe in the mode "r+" but safe in the mode "a". Note: Even with these validations it can still be unsafe to write two or more chunked arrays in the same location in parallel if they are not writing in independent regions.	`True`
`align_chunks`	`bool`	If True, rechunks the Dask array to align with Zarr chunks before writing. This ensures each Dask chunk maps to one or more contiguous Zarr chunks, which avoids race conditions. Internally, the process sets safe_chunks=False and tries to preserve the original Dask chunking as much as possible. Note: While this alignment avoids write conflicts stemming from chunk boundary misalignment, it does not protect against race conditions if multiple uncoordinated processes write to the same Zarr array concurrently.	`False`
`chunkmanager_store_kwargs`	`dict`	Additional keyword arguments passed on to the `ChunkManager.store` method used to store chunked arrays. For example for a dask array additional kwargs will be passed eventually to `dask.array.store()`. Experimental API that should not be relied upon.	`None`
`split_every`	`int \| None`	Number of tasks to merge at every level of the tree reduction.	`None`

Returns:

Type	Description
`None`

Notes

Two restrictions apply to the use of region:

If region is set, all variables in a dataset must have at least one dimension in common with the region. Other variables should be written in a separate single call to to_icechunk().
Dimensions cannot be included in both region and append_dim at the same time. To create empty arrays to fill in with region, use the _XarrayDatasetWriter directly.

Source code in icechunk-python/python/icechunk/xarray.py

def to_icechunk(
    obj: DataArray | Dataset,
    session: Session,
    *,
    group: str | None = None,
    mode: ZarrWriteModes | None = None,
    safe_chunks: bool = True,
    align_chunks: bool = False,
    append_dim: Hashable | None = None,
    region: Region = None,
    encoding: Mapping[Any, Any] | None = None,
    chunkmanager_store_kwargs: MutableMapping[Any, Any] | None = None,
    split_every: int | None = None,
) -> None:
    """
    Write an Xarray object to a group of an Icechunk store.

    Parameters
    ----------
    obj: DataArray or Dataset
        Xarray object to write
    session : icechunk.Session
        Writable Icechunk Session
    mode : {"w", "w-", "a", "a-", r+", None}, optional
        Persistence mode: "w" means create (overwrite if exists);
        "w-" means create (fail if exists);
        "a" means override all existing variables including dimension coordinates (create if does not exist);
        "a-" means only append those variables that have ``append_dim``.
        "r+" means modify existing array *values* only (raise an error if
        any metadata or shapes would change).
        The default mode is "a" if ``append_dim`` is set. Otherwise, it is
        "r+" if ``region`` is set and ``w-`` otherwise.
    group : str, optional
        Group path. (a.k.a. `path` in zarr terminology.)
    encoding : dict, optional
        Nested dictionary with variable names as keys and dictionaries of
        variable specific encodings as values, e.g.,
        ``{"my_variable": {"dtype": "int16", "scale_factor": 0.1,}, ...}``
    append_dim : hashable, optional
        If set, the dimension along which the data will be appended. All
        other dimensions on overridden variables must remain the same size.
    region : dict or "auto", optional
        Optional mapping from dimension names to either a) ``"auto"``, or b) integer
        slices, indicating the region of existing zarr array(s) in which to write
        this dataset's data.

        If ``"auto"`` is provided the existing store will be opened and the region
        inferred by matching indexes. ``"auto"`` can be used as a single string,
        which will automatically infer the region for all dimensions, or as
        dictionary values for specific dimensions mixed together with explicit
        slices for other dimensions.

        Alternatively integer slices can be provided; for example, ``{'x': slice(0,
        1000), 'y': slice(10000, 11000)}`` would indicate that values should be
        written to the region ``0:1000`` along ``x`` and ``10000:11000`` along
        ``y``.

        Users are expected to ensure that the specified region aligns with
        Zarr chunk boundaries, and that dask chunks are also aligned.
        Xarray makes limited checks that these multiple chunk boundaries line up.
        It is possible to write incomplete chunks and corrupt the data with this
        option if you are not careful.
    safe_chunks : bool, default: True
        If True, only allow writes to when there is a many-to-one relationship
        between Zarr chunks (specified in encoding) and Dask chunks.
        Set False to override this restriction; however, data may become corrupted
        if Zarr arrays are written in parallel.
        In addition to the many-to-one relationship validation, it also detects partial
        chunks writes when using the region parameter,
        these partial chunks are considered unsafe in the mode "r+" but safe in
        the mode "a".
        Note: Even with these validations it can still be unsafe to write
        two or more chunked arrays in the same location in parallel if they are
        not writing in independent regions.
    align_chunks: bool, default False
        If True, rechunks the Dask array to align with Zarr chunks before writing.
        This ensures each Dask chunk maps to one or more contiguous Zarr chunks,
        which avoids race conditions.
        Internally, the process sets safe_chunks=False and tries to preserve
        the original Dask chunking as much as possible.
        Note: While this alignment avoids write conflicts stemming from chunk
        boundary misalignment, it does not protect against race conditions
        if multiple uncoordinated processes write to the same
        Zarr array concurrently.
    chunkmanager_store_kwargs : dict, optional
        Additional keyword arguments passed on to the `ChunkManager.store` method used to store
        chunked arrays. For example for a dask array additional kwargs will be passed eventually to
        `dask.array.store()`. Experimental API that should not be relied upon.
    split_every: int, optional
        Number of tasks to merge at every level of the tree reduction.

    Returns
    -------
    None

    Notes
    -----
    Two restrictions apply to the use of ``region``:

      - If ``region`` is set, _all_ variables in a dataset must have at
        least one dimension in common with the region. Other variables
        should be written in a separate single call to ``to_icechunk()``.
      - Dimensions cannot be included in both ``region`` and
        ``append_dim`` at the same time. To create empty arrays to fill
        in with ``region``, use the `_XarrayDatasetWriter` directly.
    """

    as_dataset = _make_dataset(obj)

    # This ugliness is needed so that we allow users to call `to_icechunk` with a dirty Session
    # for _serial_ writes
    is_dask = is_dask_collection(obj)
    fork: Session | ForkSession
    if is_dask:
        if session.has_uncommitted_changes:
            raise ValueError(
                "Calling `to_icechunk` is not allowed on a Session with uncommitted changes. Please commit first."
            )
        fork = session.fork()
    else:
        fork = session

    writer = _XarrayDatasetWriter(
        as_dataset, store=fork.store, safe_chunks=safe_chunks, align_chunks=align_chunks
    )

    writer._open_group(group=group, mode=mode, append_dim=append_dim, region=region)

    # write metadata
    writer.write_metadata(encoding)
    # write in-memory arrays
    writer.write_eager()
    # eagerly write dask arrays
    maybe_fork_session = writer.write_lazy(
        chunkmanager_store_kwargs=chunkmanager_store_kwargs,
        split_every=split_every,
    )
    if is_dask:
        if maybe_fork_session is None:
            raise RuntimeError(
                "Logic bug! Please open at issue at https://github.com/earth-mover/icechunk"
            )
        session.merge(maybe_fork_session)
    else:
        if maybe_fork_session is not None:
            raise RuntimeError(
                "Unexpected write of dask arrays! Please open at issue at https://github.com/earth-mover/icechunk"
            )

icechunk.dask #

Functions:

Name	Description
`computing_meta`	A decorator to handle the dask-specific `computing_meta` flag.
`store_dask`	A version of `dask.array.store` for Icechunk stores.

computing_meta #

computing_meta(func)

A decorator to handle the dask-specific computing_meta flag.

If computing_meta is True in the keyword arguments, the decorated function will return a placeholder meta object (np.array([object()], dtype=object)). Otherwise, it will execute the original function.

Source code in icechunk-python/python/icechunk/dask.py

def computing_meta(func: Callable[P, R]) -> Callable[P, Any]:
    """
    A decorator to handle the dask-specific `computing_meta` flag.

    If `computing_meta` is True in the keyword arguments, the decorated
    function will return a placeholder meta object (np.array([object()], dtype=object)).
    Otherwise, it will execute the original function.
    """

    @functools.wraps(func)
    def wrapper(*args: P.args, **kwargs: P.kwargs) -> Any:
        if kwargs.get("computing_meta", False):
            return np.array([object()], dtype=object)
        return func(*args, **kwargs)

    return wrapper

store_dask #

store_dask(*, sources, targets, regions=None, split_every=None, **store_kwargs)

A version of dask.array.store for Icechunk stores.

This method will eagerly execute writes to the Icechunk store, and will merge the changesets corresponding to each write task. The store object passed in will be updated in-place with the fully merged changeset.

For distributed or multi-processing writes, this method must be called within the Session.allow_pickling() context. All Zarr arrays in targets must also be created within this context since they contain a reference to the Session.

Parameters:

Name	Type	Description	Default
`sources`	`list[Array]`	List of dask arrays to write.	required
`targets`	list of `zarr.Array`	Corresponding list of Zarr array objects to write to.	required
`regions`	`list[tuple[slice, ...]] \| None`	Corresponding region for each of `targets` to write to.	`None`
`split_every`	`int \| None`	Number of changesets to merge at a given time.	`None`
`**store_kwargs`	`Any`	Arbitrary keyword arguments passed to `dask.array.store`. Notably `compute`, `return_stored`, `load_stored`, and `lock` are unsupported.	`{}`

Source code in icechunk-python/python/icechunk/dask.py

def store_dask(
    *,
    sources: list[Array],
    targets: "list[zarr.Array[ArrayV3Metadata]]",
    regions: list[tuple[slice, ...]] | None = None,
    split_every: int | None = None,
    **store_kwargs: Any,
) -> ForkSession:
    """
    A version of ``dask.array.store`` for Icechunk stores.

    This method will eagerly execute writes to the Icechunk store, and will
    merge the changesets corresponding to each write task. The `store` object
    passed in will be updated in-place with the fully merged changeset.

    For distributed or multi-processing writes, this method must be called within
    the `Session.allow_pickling()` context. All Zarr arrays in `targets` must also
    be created within this context since they contain a reference to the Session.

    Parameters
    ----------
    sources: list of `dask.array.Array`
        List of dask arrays to write.
    targets : list of `zarr.Array`
        Corresponding list of Zarr array objects to write to.
    regions: list of tuple of slice, optional
        Corresponding region for each of `targets` to write to.
    split_every: int, optional
        Number of changesets to merge at a given time.
    **store_kwargs:
        Arbitrary keyword arguments passed to `dask.array.store`. Notably `compute`,
        `return_stored`, `load_stored`, and `lock` are unsupported.
    """
    _assert_correct_dask_version()
    stored_arrays = dask.array.store(
        sources=sources,
        targets=targets,  # type: ignore[arg-type]
        regions=regions,
        compute=False,
        return_stored=True,
        load_stored=False,
        lock=False,
        **store_kwargs,
    )
    return session_merge_reduction(stored_arrays, split_every=split_every, **store_kwargs)

Python API Reference

icechunk #

AzureCredentials #

FromEnv #

Static #

AzureStaticCredentials #

AccessKey #

BearerToken #

SasToken #

BasicConflictSolver #

__init__ #

CachingConfig #

num_bytes_attributes property writable #

num_bytes_chunks property writable #

num_chunk_refs property writable #

num_snapshot_nodes property writable #

num_transaction_changes property writable #

__init__ #

ChunkType #

CompressionAlgorithm #

default staticmethod #

CompressionConfig #

algorithm property writable #

level property writable #

__init__ #

default staticmethod #

Conflict #

conflict_type property #

conflicted_chunks property #

path property #

__init__ #

ConflictDetector #

ConflictError #

actual_parent property #

expected_parent property #

__init__ #

ConflictSolver #

ConflictType #

ChunkDoubleUpdate class-attribute instance-attribute #

ChunksUpdatedInDeletedArray class-attribute instance-attribute #

ChunksUpdatedInUpdatedArray class-attribute instance-attribute #

DeleteOfUpdatedArray class-attribute instance-attribute #

DeleteOfUpdatedGroup class-attribute instance-attribute #

NewNodeConflictsWithExistingNode class-attribute instance-attribute #

NewNodeInInvalidGroup class-attribute instance-attribute #

ZarrMetadataDoubleUpdate class-attribute instance-attribute #

ZarrMetadataUpdateOfDeletedArray class-attribute instance-attribute #

ZarrMetadataUpdateOfDeletedGroup class-attribute instance-attribute #

Diff #

deleted_arrays property #

deleted_groups property #

moved_nodes property #

new_arrays property #

new_groups property #

updated_arrays property #

updated_chunks property #

updated_groups property #

is_empty #

ForkSession #

store property #

merge_async async #

GCSummary #

attributes_deleted property #

bytes_deleted property #

chunks_deleted property #

manifests_deleted property #

snapshots_deleted property #

transaction_logs_deleted property #

GcsBearerCredential #

__init__ #

GcsCredentials #

Anonymous #

FromEnv #

Refreshable #

Static #

GcsStaticCredentials #

ApplicationCredentials #

BearerToken #

ServiceAccount #

ServiceAccountKey #

init #

num_bytes_attributes `property` `writable` #

num_bytes_chunks `property` `writable` #

num_chunk_refs `property` `writable` #

num_snapshot_nodes `property` `writable` #

num_transaction_changes `property` `writable` #

init #

default `staticmethod` #

algorithm `property` `writable` #

level `property` `writable` #

init #

default `staticmethod` #

conflict_type `property` #

conflicted_chunks `property` #

path `property` #

init #

actual_parent `property` #

expected_parent `property` #

init #

ChunkDoubleUpdate `class-attribute` `instance-attribute` #

ChunksUpdatedInDeletedArray `class-attribute` `instance-attribute` #

ChunksUpdatedInUpdatedArray `class-attribute` `instance-attribute` #

DeleteOfUpdatedArray `class-attribute` `instance-attribute` #

DeleteOfUpdatedGroup `class-attribute` `instance-attribute` #

NewNodeConflictsWithExistingNode `class-attribute` `instance-attribute` #

NewNodeInInvalidGroup `class-attribute` `instance-attribute` #

ZarrMetadataDoubleUpdate `class-attribute` `instance-attribute` #

ZarrMetadataUpdateOfDeletedArray `class-attribute` `instance-attribute` #

ZarrMetadataUpdateOfDeletedGroup `class-attribute` `instance-attribute` #

deleted_arrays `property` #

deleted_groups `property` #

moved_nodes `property` #

new_arrays `property` #

new_groups `property` #

updated_arrays `property` #

updated_chunks `property` #

updated_groups `property` #

store `property` #

merge_async `async` #

attributes_deleted `property` #

bytes_deleted `property` #

chunks_deleted `property` #

manifests_deleted `property` #

snapshots_deleted `property` #

transaction_logs_deleted `property` #

init #

supports_listing `property` #

supports_partial_writes `property` #

supports_writes `property` #

init #

clear `async` #

delete `async` #

delete_dir `async` #

exists `async` #

get `async` #

get_partial_values `async` #

is_empty `async` #

set `async` #

set_if_not_exists `async` #

set_partial_values `async` #

set_virtual_ref_async `async` #

set_virtual_refs_async `async` #

preload `property` `writable` #

splitting `property` `writable` #

init #

id `property` #

num_chunk_refs `property` #

size_bytes `property` #

and #

or #

and_conditions `staticmethod` #

false `staticmethod` #

name_matches `staticmethod` #

num_refs `staticmethod` #

or_conditions `staticmethod` #

path_matches `staticmethod` #

true `staticmethod` #

max_arrays_to_scan `property` `writable` #

max_total_refs `property` `writable` #

preload_if `property` `writable` #