Overview

Package summary goes here, ideally with a diagram

Install

The trainlib package can be installed from PyPI:

pip install trainlib

Development

Initialize/synchronize the project with uv sync, creating a virtual environment with base package dependencies.
Depending on needs, install the development dependencies with uv sync --extra dev.

Testing

To run the unit tests, make sure to first have the test dependencies installed with uv sync --extra test, then run make test.
For notebook testing, run make install-kernel to make the environment available as a Jupyter kernel (to be selected when running notebooks).

Documentation

Install the documentation dependencies with uv sync --extra doc.
Run make docs-build (optionally preceded by make docs-clean), and serve locally with make docs-serve.

Development remarks

Across Trainer / Estimator / Dataset, I've considered a ParamSpec-based typing scheme to better orchestrate alignment in the Trainer.train() loop, e.g., so we can statically check whether a dataset appears to be fulfilling the argument requirements for the estimator's loss() / metrics() methods. Something like

class Estimator[**P](nn.Module):
    def loss(
        self,
        input: Tensor,
        *args: P.args,
        **kwargs: P.kwargs,
    ) -> Generator:
        ...

class Trainer[**P]:
    def __init__(
        self,
        estimator: Estimator[P],
        ...
    ): ...

might be how we begin threading signatures. But ensuring dataset items can match P is challenging. You can consider a "packed" object where we obfuscate passing data through P-signatures:

class PackedItem[**P]:
    def __init__(self, *args: P.args, **kwargs: P.kwargs) -> None:
        self._args = args
        self._kwargs = kwargs

    def apply[R](self, func: Callable[P, R]) -> R:
        return func(*self._args, **self._kwargs)


class BatchedDataset[U, R, I, **P](Dataset):
    @abstractmethod
    def _process_item_data(
        self,
        item_data: I,
        item_index: int,
    ) -> PackedItem[P]:
        ...

    def __iter__(self) -> Iterator[PackedItem[P]]:
        ...

Meaningfully shaping those signatures is what remains, but you can't really do this, not with typical type expression flexibility. For instance, if I'm trying to appropriately type my base TupleDataset:

class SequenceDataset[I, **P](HomogenousDataset[int, I, I, P]):
    ...

class TupleDataset[I](SequenceDataset[tuple[I, ...], ??]):
    ...

Here there's no way for me to shape a ParamSpec to indicate arbitrarily many arguments of a fixed type (I in this case) to allow me to unpack my item tuples into an appropriate PackedItem.

Until this (among other issues) becomes clearer, I'm setting up around a simpler TypedDict type variable. We won't have particularly strong static checks for item alignment inside Trainer, but this seems about as good as I can get around the current infrastructure.

3.1 KiB Raw Blame History