Files
trainlib/README.md
2026-03-05 01:14:24 -08:00

3.1 KiB

Overview

Package summary goes here, ideally with a diagram

Install

The trainlib package can be installed from PyPI:

pip install trainlib

Development

  • Initialize/synchronize the project with uv sync, creating a virtual environment with base package dependencies.
  • Depending on needs, install the development dependencies with uv sync --extra dev.

Testing

  • To run the unit tests, make sure to first have the test dependencies installed with uv sync --extra test, then run make test.
  • For notebook testing, run make install-kernel to make the environment available as a Jupyter kernel (to be selected when running notebooks).

Documentation

  • Install the documentation dependencies with uv sync --extra doc.
  • Run make docs-build (optionally preceded by make docs-clean), and serve locally with make docs-serve.

Development remarks

  • Across Trainer / Estimator / Dataset, I've considered a ParamSpec-based typing scheme to better orchestrate alignment in the Trainer.train() loop, e.g., so we can statically check whether a dataset appears to be fulfilling the argument requirements for the estimator's loss() / metrics() methods. Something like

    class Estimator[**P](nn.Module):
        def loss(
            self,
            input: Tensor,
            *args: P.args,
            **kwargs: P.kwargs,
        ) -> Generator:
            ...
    
    class Trainer[**P]:
        def __init__(
            self,
            estimator: Estimator[P],
            ...
        ): ...
    

    might be how we begin threading signatures. But ensuring dataset items can match P is challenging. You can consider a "packed" object where we obfuscate passing data through P-signatures:

    class PackedItem[**P]:
        def __init__(self, *args: P.args, **kwargs: P.kwargs) -> None:
            self._args = args
            self._kwargs = kwargs
    
        def apply[R](self, func: Callable[P, R]) -> R:
            return func(*self._args, **self._kwargs)
    
    
    class BatchedDataset[U, R, I, **P](Dataset):
        @abstractmethod
        def _process_item_data(
            self,
            item_data: I,
            item_index: int,
        ) -> PackedItem[P]:
            ...
    
        def __iter__(self) -> Iterator[PackedItem[P]]:
            ...
    

    Meaningfully shaping those signatures is what remains, but you can't really do this, not with typical type expression flexibility. For instance, if I'm trying to appropriately type my base TupleDataset:

    class SequenceDataset[I, **P](HomogenousDataset[int, I, I, P]):
        ...
    
    class TupleDataset[I](SequenceDataset[tuple[I, ...], ??]):
        ...
    

    Here there's no way for me to shape a ParamSpec to indicate arbitrarily many arguments of a fixed type (I in this case) to allow me to unpack my item tuples into an appropriate PackedItem.

    Until this (among other issues) becomes clearer, I'm setting up around a simpler TypedDict type variable. We won't have particularly strong static checks for item alignment inside Trainer, but this seems about as good as I can get around the current infrastructure.