Long-term

Implement a dataloader in-house, with a clear, lightweight mechanism for collection-of-structures to structure-of-collections. For multi-proc handling (happens in torch's dataloader, as well as the BatchedDataset for two different purposes), we should rely on (a hopefully more stable) execlib.
Domains may be externalized (co3 or convlib)
Up next: CLI, fully JSON-ification of model selection + train.
Consider a "multi-train" alternative (or arg support in train()) for training many "rollouts" from the same base estimator (basically forks under different seeds). For architecture benchmarking above all, seeing average training behavior. Consider corresponding Plotter methods (error bars)