initial commit
This commit is contained in:
105
README.md
Normal file
105
README.md
Normal file
@@ -0,0 +1,105 @@
|
||||
# Overview
|
||||
Package summary goes here, ideally with a diagram
|
||||
|
||||
# Install
|
||||
Installation instructions
|
||||
|
||||
```sh
|
||||
pip install <package>
|
||||
```
|
||||
|
||||
or as a CLI tool
|
||||
|
||||
```sh
|
||||
uv tool install <package>
|
||||
```
|
||||
|
||||
# Development
|
||||
- Initialize/synchronize the project with `uv sync`, creating a virtual
|
||||
environment with base package dependencies.
|
||||
- Depending on needs, install the development dependencies with `uv sync
|
||||
--extra dev`.
|
||||
|
||||
# Testing
|
||||
- To run the unit tests, make sure to first have the test dependencies
|
||||
installed with `uv sync --extra test`, then run `make test`.
|
||||
- For notebook testing, run `make install-kernel` to make the environment
|
||||
available as a Jupyter kernel (to be selected when running notebooks).
|
||||
|
||||
# Documentation
|
||||
- Install the documentation dependencies with `uv sync --extra doc`.
|
||||
- Run `make docs-build` (optionally preceded by `make docs-clean`), and serve
|
||||
locally with `docs-serve`.
|
||||
|
||||
# Development remarks
|
||||
- Across `Trainer` / `Estimator` / `Dataset`, I've considered a
|
||||
`ParamSpec`-based typing scheme to better orchestrate alignment in the
|
||||
`Trainer.train()` loop, e.g., so we can statically check whether a dataset
|
||||
appears to be fulfilling the argument requirements for the estimator's
|
||||
`loss()` / `metrics()` methods. Something like
|
||||
|
||||
```py
|
||||
class Estimator[**P](nn.Module):
|
||||
def loss(
|
||||
self,
|
||||
input: Tensor,
|
||||
*args: P.args,
|
||||
**kwargs: P.kwargs,
|
||||
) -> Generator:
|
||||
...
|
||||
|
||||
class Trainer[**P]:
|
||||
def __init__(
|
||||
self,
|
||||
estimator: Estimator[P],
|
||||
...
|
||||
): ...
|
||||
```
|
||||
|
||||
might be how we begin threading signatures. But ensuring dataset items can
|
||||
match `P` is challenging. You can consider a "packed" object where we
|
||||
obfuscate passing data through `P`-signatures:
|
||||
|
||||
```py
|
||||
class PackedItem[**P]:
|
||||
def __init__(self, *args: P.args, **kwargs: P.kwargs) -> None:
|
||||
self._args = args
|
||||
self._kwargs = kwargs
|
||||
|
||||
def apply[R](self, func: Callable[P, R]) -> R:
|
||||
return func(*self._args, **self._kwargs)
|
||||
|
||||
|
||||
class BatchedDataset[U, R, I, **P](Dataset):
|
||||
@abstractmethod
|
||||
def _process_item_data(
|
||||
self,
|
||||
item_data: I,
|
||||
item_index: int,
|
||||
) -> PackedItem[P]:
|
||||
...
|
||||
|
||||
def __iter__(self) -> Iterator[PackedItem[P]]:
|
||||
...
|
||||
```
|
||||
|
||||
Meaningfully shaping those signatures is what remains, but you can't really
|
||||
do this, not with typical type expression flexibility. For instance, if I'm
|
||||
trying to appropriately type my base `TupleDataset`:
|
||||
|
||||
```py
|
||||
class SequenceDataset[I, **P](HomogenousDataset[int, I, I, P]):
|
||||
...
|
||||
|
||||
class TupleDataset[I](SequenceDataset[tuple[I, ...], ??]):
|
||||
...
|
||||
```
|
||||
|
||||
Here there's no way for me to shape a `ParamSpec` to indicate arbitrarily
|
||||
many arguments of a fixed type (`I` in this case) to allow me to unpack my
|
||||
item tuples into an appropriate `PackedItem`.
|
||||
|
||||
Until this (among other issues) becomes clearer, I'm setting up around a
|
||||
simpler `TypedDict` type variable. We won't have particularly strong static
|
||||
checks for item alignment inside `Trainer`, but this seems about as good as I
|
||||
can get around the current infrastructure.
|
||||
Reference in New Issue
Block a user