# Overview Package summary goes here, ideally with a diagram # Install Installation instructions ```sh pip install ``` or as a CLI tool ```sh uv tool install ``` # Development - Initialize/synchronize the project with `uv sync`, creating a virtual environment with base package dependencies. - Depending on needs, install the development dependencies with `uv sync --extra dev`. # Testing - To run the unit tests, make sure to first have the test dependencies installed with `uv sync --extra test`, then run `make test`. - For notebook testing, run `make install-kernel` to make the environment available as a Jupyter kernel (to be selected when running notebooks). # Documentation - Install the documentation dependencies with `uv sync --extra doc`. - Run `make docs-build` (optionally preceded by `make docs-clean`), and serve locally with `docs-serve`. # Development remarks - Across `Trainer` / `Estimator` / `Dataset`, I've considered a `ParamSpec`-based typing scheme to better orchestrate alignment in the `Trainer.train()` loop, e.g., so we can statically check whether a dataset appears to be fulfilling the argument requirements for the estimator's `loss()` / `metrics()` methods. Something like ```py class Estimator[**P](nn.Module): def loss( self, input: Tensor, *args: P.args, **kwargs: P.kwargs, ) -> Generator: ... class Trainer[**P]: def __init__( self, estimator: Estimator[P], ... ): ... ``` might be how we begin threading signatures. But ensuring dataset items can match `P` is challenging. You can consider a "packed" object where we obfuscate passing data through `P`-signatures: ```py class PackedItem[**P]: def __init__(self, *args: P.args, **kwargs: P.kwargs) -> None: self._args = args self._kwargs = kwargs def apply[R](self, func: Callable[P, R]) -> R: return func(*self._args, **self._kwargs) class BatchedDataset[U, R, I, **P](Dataset): @abstractmethod def _process_item_data( self, item_data: I, item_index: int, ) -> PackedItem[P]: ... def __iter__(self) -> Iterator[PackedItem[P]]: ... ``` Meaningfully shaping those signatures is what remains, but you can't really do this, not with typical type expression flexibility. For instance, if I'm trying to appropriately type my base `TupleDataset`: ```py class SequenceDataset[I, **P](HomogenousDataset[int, I, I, P]): ... class TupleDataset[I](SequenceDataset[tuple[I, ...], ??]): ... ``` Here there's no way for me to shape a `ParamSpec` to indicate arbitrarily many arguments of a fixed type (`I` in this case) to allow me to unpack my item tuples into an appropriate `PackedItem`. Until this (among other issues) becomes clearer, I'm setting up around a simpler `TypedDict` type variable. We won't have particularly strong static checks for item alignment inside `Trainer`, but this seems about as good as I can get around the current infrastructure.