Publishing files to a SpatioTemporal Asset Catalog

The SpatioTemporal Asset Catalog (STAC) family of specifications aim to standardize the way geospatial asset metadata is structured and queried. A “spatiotemporal asset” is any file that represents information about the Earth at a certain place and time. The original focus was on scenes of satellite imagery, but the specifications now cover a broad variety of uses, including sources such as aircraft and drone and data such as hyperspectral optical, synthetic aperture radar (SAR), video, point clouds, lidar, digital elevation models (DEM), vector, machine learning labels, and composites like NDVI and mosaics. STAC is intentionally designed with a minimal core and flexible extension mechanism to support a broad set of use cases. This specification has matured over the past several years, and is used in numerous production deployments.

pygeoapi built-in providers to browse STAC catalogs are described below:

Static catalog

FileSystem Provider

The FileSystem Provider implements STAC as a geospatial file browser through the server’s file system, supporting any level of file/directory nesting/hierarchy.

Configuring STAC in pygeoapi is done by simply pointing the data provider property to the given directory and specifying allowed file types:

Connection examples

my-stac-resource:
    type: stac-collection
    ...
    providers:
        - type: stac
          name: FileSystem
          data: /Users/tomkralidis/Dev/data/gdps
          file_types:
              - .grib2

Note

rasterio and fiona are required for describing geospatial files.

pygeometa metadata control files

pygeoapi’s STAC filesystem functionality supports pygeometa MCF files residing in the same directory as data files. If an MCF file is found, it will be used as part of generating the STAC item metadata (e.g. a file named birds.csv having an associated birds.yml file). If no MCF file is found, then pygeometa will generate the STAC item metadata from configuration and by reading the data’s properties.

Publishing ESRI Shapefiles

ESRI Shapefile publishing requires to specify all required component file extensions (.shp, .shx, .dbf) with the provider file_types option.

Data access examples

From here, browse the filesystem accordingly.

Azure Blob Storage Provider

The AzureBlobStorage Provider implements STAC as a geospatial file browser through Azure Blob Storage, supporting any level of file/directory nesting/hierarchy.

Configuring STAC in pygeoapi is done by simply pointing the data provider property to the given container and specifying allowed file types:

Connection examples

my-stac-resource:
    type: stac-collection
    ...
    providers:
        - type: stac
          name: AzureBlobStorage
          data: my-container-name
          file_types:
              - .grib2

Note

The AZURE_STORAGE_CONNECTION_STRING environment variable is required and should be set accordingly.

Note

rasterio and fiona are required for describing geospatial files.

Hateoas Provider

HATEOAS (Hypermedia as the Engine of Application State) is a way of implementing a REST application that allows the client to dynamically navigate to the appropriate resources by browsing hypermedia links. This type of navigation is similar to WEB navigation and requires a very precise data structure that must be respected to allow the HATEOAS Provider to behave correctly.

There are three component specifications (Catalog, Collection, Item) that together make up the core SpatioTemporal Asset Catalog specification. An Item represents a single spatiotemporal asset as GeoJSON. The Catalog specification provides structural elements, to group Items and Collections. Collections are catalogs, that add more required metadata and describe a group of related Items.

The full catalog structure of links down to sub-catalogs and Items, and their links back to their parents and roots, must be done with relative URL’s for the HATEOAS Provider work correctly. The structural rel types include root, parent, child, item, and collection. Assets links must be absolute URL’s. Other links can be absolute, especially if they describe a resource that makes less sense in the catalog, like derived_from or even license (it can be nice to include the license in the catalog, but some licenses live at a canonical online location which makes more sense to refer to directly). This enables the full catalog (excluding the assets) to be downloaded or copied to another location and to still be valid. This also implies no self link, as that link must be absolute.

So, the following rules must be respected:

  1. Root documents (Catalogs / Collections) must be at the root of a directory tree containing the static catalog.

  2. Catalogs must be named catalog.json and Collections must be named collection.json.

  3. Sub-Catalogs or sub-Collections must be stored in subdirectories of their parent (and only 1 subdirectory deeper than a document’s parent, e.g. …/sample/sub1/catalog.json).

  4. Limit the number of Items in a Catalog or Collection, grouping / partitioning as relevant to the dataset.

  5. Use structural elements (Catalog and Collection) consistently across each ‘level’ of your hierarchy. For example, if levels 2 and 4 of the hierarchy only contain Collections, don’t add a Catalog at levels 2 and 4.

  6. Items must be named <id>.json.

  7. Items must be stored in subdirectories (1 level deeper) of their parent Catalog or Collection. The subdirectory must have the same name (<id>) as the Item without the .json extension. This means that each Item are contained in a unique subdirectory.

  8. The links to the actual assets must be an absolute URL.


File examples

Structure of the catalog.json file:

{
    "id": "STAC-Catalog",
    "type": "Catalog",
    "stac_version": "1.0.0",
    "description": "A description of the STAC Catalog",
    "links": [
        {
            "rel": "root",
            "href": "./catalog.json",
            "type": "application/json"
        },
        {
            "rel": "child",
            "href": "./eo4ce/catalog.json",
            "type": "application/json"
        },
        {
            "rel": "child",
            "href": "./dem/catalog.json",
            "type": "application/json"
        }
    ],
    "stac_extensions": [],
    "title": "STAC Catalog"
}

The code above shows the root catalog. The sub-catalogs have an additional rel entry pointing to the parent.

{
    "id": "dem",
    "type": "Catalog",
    "stac_version": "1.0.0",
    "description": "Digital Elevation Data",
    "links": [
        {
            "rel": "root",
            "href": "../catalog.json",
            "type": "application/json"
        },
        {
            "rel": "child",
            "href": "./hrdsm/collection.json",
            "type": "application/json"
        },
        {
            "rel": "parent",
            "href": "../catalog.json",
            "type": "application/json"
        }
    ],
    "stac_extensions": [],
    "title": "DEM"
}

Structure of the collection.json file:

Collections are similar to Catalogs with extra fields.

{
    "id": "hrdsm",
    "stac_version": "1.0.0",
    "description": "High Resolution Digital Surface Model",
    "links": [
        {
            "rel": "root",
            "href": "../../catalog.json",
            "type": "application/json"
        },
        {
            "rel": "item",
            "href": "./arcticdem-frontiere-0/arcticdem-frontiere-0.json",
            "type": "application/json"
        },
        {
            "rel": "item",
            "href": "./arcticdem-frontiere-9/arcticdem-frontiere-9.json",
            "type": "application/json"
        },
        {
            "rel": "parent",
            "href": "../catalog.json",
            "type": "application/json"
        }
    ],
    "stac_extensions": [],
    "extent": {
        "spatial": {
            "bbox": [
                [
                    -142.76516601842533,
                    59.65274347822059,
                    -138.41658819177135,
                    69.81052152420365
                ]
            ]
        },
        "temporal": {
            "interval": [
                [
                    "2014-09-03T14:00:00Z",
                    "2020-09-28T15:49:00.559166Z"
                ]
            ]
        }
    },
    "license": "proprietary"
}

Structure of the Item <id>.json file:

The example below shows the content of a file named arcticdem-frontiere-0.json:

{
    "type": "Feature",
    "stac_version": "1.0.0",
    "id": "arcticdem-frontiere-0",
    "properties": {
        "layer:ids": [
            "dem-hrdsm"
        ],
        "collection": "hrdsm",
        "datetime": "2020-09-28T15:48:56.483794Z"
    },
    "geometry": {
        "type": "Polygon",
        "coordinates": [
            [
                [
                    -140.27389595735178,
                    59.65274347822059
                ],
                [
                    -138.41658819177135,
                    59.65274347822059
                ],
                [
                    -138.41658819177135,
                    60.579416456816496
                ],
                [
                    -140.27389595735178,
                    60.579416456816496
                ],
                [
                    -140.27389595735178,
                    59.65274347822059
                ]
            ]
        ]
    },
    "links": [
        {
            "rel": "root",
            "href": "../../../catalog.json",
            "type": "application/json"
        },
        {
            "rel": "collection",
            "href": "../collection.json",
            "type": "application/json"
        },
        {
            "rel": "parent",
            "href": "../collection.json",
            "type": "application/json"
        }
    ],
    "assets": {
        "image": {
            "href": "https://example.com/path/to/resource/arcticdem-frontiere-0.tif",
            "type": "image/tiff; application=geotiff; profile=cloud-optimized",
            "roles": []
        }
    },
    "bbox": [
        -140.27389595735178,
        59.65274347822059,
        -138.41658819177135,
        60.579416456816496
    ],
    "stac_extensions": [],
    "collection": "hrdsm"
}

HATEOAS configuration

Configuring HATEOAS STAC Provider in pygeoapi is done by simply pointing the data provider property to the local directory or remote URL and specifying the root file name (catalog.json or collection.json) in the file_types property:

Connection examples

my-remote-stac-resource:
    type: stac-collection
    ...
    providers:
        - type: stac
          name: Hateoas
          data: https://datacube-dev-data-public.s3.ca-central-1.amazonaws.com/catalog/water
          file_types: catalog.json

my-local-stac-resource:
    type: stac-collection
    ...
    providers:
        - type: stac
          name: Hateoas
          data: tests/stac
          file_types: catalog.json

STAC API

STAC API support is provided as a wrapper on top of resources that have feature or record providers configured.

pygeoapi implements the following conformance classes:

To enable STAC API support, configure a resource with a feature or record provider, and set the resource type to stac-collection:

canada-metadata:
    type: stac-collection
    title:
        en: Open Canada sample data
        fr: Exemple de donn\u00e9es Canada Ouvert
    description:
        en: Sample metadata records from open.canada.ca
        fr: Exemples d'enregistrements de m\u00e9tadonn\u00e9es sur ouvert.canada.ca
    keywords:
        en:
            - canada
            - open data
        fr:
            - canada
            - donn\u00e9es ouvertes
    links:
        - type: text/html
          rel: canonical
          title: information
          href: https://open.canada.ca/en/open-data
          hreflang: en-CA
        - type: text/html
          rel: alternate
          title: informations
          href: https://ouvert.canada.ca/fr/donnees-ouvertes
          hreflang: fr-CA
    extents:
        spatial:
            bbox: [-180,-90,180,90]
            crs: http://www.opengis.net/def/crs/OGC/1.3/CRS84
    providers:
        - type: record
          name: TinyDBCatalogue
          data: tests/data/open.canada.ca/sample-records.tinydb
          id_field: externalId
          time_field: created
          title_field: title

STAC API queries will search all feature or record based resources configured as stac-collection. Results are decorated with the required STAC elements (unless they already exist).

Note

pygeoapi STAC API support is minimally designed to leverage the OGC API - Features and OGC API - Records implementations. A typical setup would be a features or records backend of STAC Items. pygeoapi does not add or implement any STAC Catalog/Item relationships beyond what is encoded in a STAC resource.

Data access examples

# query features (spatial)
curl -X POST http://localhost:5000/stac-api/search \
    -H "Content-Type: application/json" \
    -d "{\"bbox\": [-142, 52, -140, 55]}"

# paging
curl -X POST http://localhost:5000/stac-api/search \
    -H "Content-Type: application/json" \
    -d "{\"offset\": 10, \"limit\": 10}"

# query features (temporal)
curl -X POST http://localhost:5000/stac-api/search \
    -H "Content-Type: application/json" \
    -d "{\"datetime\": \"2019-11-11T11:11:11Z/..\"}"