| Safe Haskell | None |
|---|---|
| Language | GHC2024 |
Napkin.Spec.Yaml.Preprocessors.DatasetHygiene
Synopsis
- data Mode
- preprocessor :: PreprocessorForYaml b
Documentation
Constructors
| ModeAutomatic | |
| ModeManual |
preprocessor :: PreprocessorForYaml b #
Validates the consistency of dataset management states (managed vs unmanaged) for tables in a BigQuery-like environment. It supports two modes of operation:
- Automatic: Infers managed/unmanaged state from the graph of tables and their references.
- Manual: Uses explicit lists of managed and unmanaged datasets, with an optional strictness flag.
The checkup ensures:
- No dataset is simultaneously marked as both managed and unmanaged.
- All tables in a dataset are consistently managed (optionally final, e.i. doesn't have any dependencies in it) or unmanaged.
- In strict mode, all datasets used in the napkin's spec must be explicitly listed as managed or unmanaged.
- Reports inconsistencies or configuration errors with detailed messages.
YAML
dataset_hygiene:
mode: manual # default is automatic
strict: false # optional, only in manual mode, default is true
managed:
- derived
- training
final:
- training
unmanaged:
- inputs