API Reference

Usage

The CLI is set-up according to typer's Building a Package instructions so the usage conforms to aedg_metadata [OPTIONS] COMMAND [ARGS]...

 Usage: aedg_metadata generate [OPTIONS] CONFIG

╭─ Arguments ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ *    data-path      TEXT  Path to CSV or GeoJSON data (req). [default: None] [required]
│ *    source-dir-path      TEXT  Path to YML configs for upstream data sources listed in config.sources. In the context of AEDG, this is aedg-etl-2024 repo. (req). [default: None] [required]                                                                                                                               │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────╭─ Options ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
                                                          │
│ --data-dictionary  -dd               TEXT                       Filename of the data dictionary stashed with the data file. If not specified, use the default fields registry file.                         │
│ --bbox             -b                [infer|calc|specify|none]  How the spatial bounding box should be determined. [default: specify]                                                                       │
│ --time             -t                [infer|calc|specify|none]  How the temporal description should be determined. [default: specify]                                                                       │
│ --save                  --no-save                               Write generated metadata to the file or else to the screen. [default: no-save]                                                              │
│ --help                                                          Show this message and exit.                                                                                                                 │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Example

% aedg_metadata generate \
    data/public_fuel_prices/public_fuel_prices.csv \
    ~/repos/aedg-etl-2024/data-sources \
    -dd fields.csv \
    --bbox infer \
    -t specify \
    --save

In this example, the first argument defines the path to the data file located at data/public_fuel_prices/public_fuel_prices.csv which also assumes that the YML metadata config file is located in the same directory with the same filestem name, in this case public_fuel_prices.yml. In keeping with the co-location of data, config, and outputs, the metadata generated by this function will be written to this path.

The second argument defines the path to the upstream YML config files used to get data from source APIs at the beginning of the ETL pipeline. In this case, the source config files live in a seperate repository, aedg-etl-2024, located alongside this repository. This is currently AEDG-specific in format and not yet generalized. But roughly, sources listed in the metadata config under ‘sources’ will need to be present in the directory listed here, conform to AEDG ETL config styling, and named {organization.file}/source.yml. If ETL processing diverged from AEDG techniques, this could be manually built for each source and stored in your repository near the data.

Optional arguments define the path to the data dictionary used to populate field descriptions in the metadata, set the spatial bounding box, set the temporal scale, and define the output (save to file or print to terminal).