Preserving QGIS Metadata in FlatGeobuf

To preserve QGIS metadata in FlatGeobuf, you must explicitly extract layer-level metadata from the source dataset, serialize it as a valid JSON string, and inject it into the FlatGeobuf binary header using GDAL’s METADATA creation option. FlatGeobuf does not parse .qgz project files, nor does it automatically carry over QGIS symbology, relations, or custom XML blocks. The specification only supports a single, layer-scoped JSON metadata field embedded in the file header. For production pipelines, extract metadata via PyQGIS or direct SQLite queries, validate the payload against the FlatGeobuf specification, and write using pyogrio or ogr2ogr with GDAL ≥3.6.0.

Architecture & Serialization Constraints

QGIS distributes metadata across three distinct storage layers:

  • Project-level settings (.qgz): Canvas state, relations, and global variables.
  • Layer-level properties: Stored in GeoPackage metadata tables (gpkg_metadata, gpkg_metadata_reference) or as layer description fields.
  • Custom feature attributes: Appended directly to the attribute table.

FlatGeobuf is a streaming, cloud-optimized vector format designed for low-latency web delivery and efficient HTTP range requests. Its binary header contains an optional metadata JSON field, but this field is strictly layer-scoped and does not support nested project configurations or QML styling. When designing Data Conversion & Migration Pipelines for cloud-native geospatial platforms, treat metadata as an explicit serialization target rather than relying on implicit driver behavior. The FlatGeobuf driver strips unrecognized QGIS XML tags and drops project-level context during conversion. Only explicitly mapped key-value pairs survive the export process.

Extraction & Serialization Workflow

Because QGIS does not embed metadata directly into GeoPackage feature tables, extraction requires querying the metadata registry or using PyQGIS APIs. The most reliable approach for headless pipelines is querying the gpkg_metadata table via SQLite, then mapping relevant fields to a flat dictionary.

Once extracted, the metadata must be:

  1. Flattened: Nested dictionaries or lists should be serialized to strings to maintain header readability.
  2. Validated: Strict JSON validation prevents header corruption during GDAL writes.
  3. Scoped: Only include layer-relevant keys. Project-wide settings belong in external sidecar files or cloud storage manifests.

Complete Python Implementation

The following production-ready workflow extracts QGIS layer metadata from a GeoPackage, validates it, and writes it into a FlatGeobuf header using pyogrio. It requires GDAL ≥3.6.0 and pyogrio ≥0.6.0.

python
import json
import sqlite3
import subprocess
from pathlib import Path
from typing import Dict, Any, Optional

import pyogrio
import geopandas as gpd

def extract_qgis_layer_metadata(gpkg_path: str, layer_name: str) -> Dict[str, Any]:
    """Extract QGIS layer metadata from GeoPackage metadata tables."""
    db_path = Path(gpkg_path)
    if not db_path.exists():
        raise FileNotFoundError(f"GeoPackage not found: {gpkg_path}")

    metadata = {}
    try:
        with sqlite3.connect(str(db_path)) as conn:
            cursor = conn.cursor()
            # QGIS stores metadata in gpkg_metadata; link to layer via gpkg_metadata_reference
            query = """
                SELECT m.metadata, m.id
                FROM gpkg_metadata m
                JOIN gpkg_metadata_reference r ON m.id = r.md_file_id
                WHERE r.table_name = ?
            """
            cursor.execute(query, (layer_name,))
            rows = cursor.fetchall()
            
            for row in rows:
                raw_meta = row[0]
                if raw_meta:
                    try:
                        parsed = json.loads(raw_meta)
                        metadata.update(parsed)
                    except json.JSONDecodeError:
                        # Fallback: treat as plain text if not valid JSON
                        metadata["qgis_raw_metadata"] = raw_meta
    except sqlite3.OperationalError:
        # Fallback to ogrinfo if metadata tables are missing
        cmd = ["ogrinfo", "-json", "-so", gpkg_path, layer_name]
        result = subprocess.run(cmd, capture_output=True, text=True, check=True)
        ogr_json = json.loads(result.stdout)
        for layer in ogr_json.get("layers", []):
            metadata.update(layer.get("metadata_fields", {}))
            if "description" in layer:
                metadata["qgis_description"] = layer["description"]

    return metadata

def write_flatgeobuf_with_metadata(
    input_gpkg: str,
    output_fgb: str,
    layer_name: str,
    metadata: Optional[Dict[str, Any]] = None
) -> None:
    """Write GeoDataFrame to FlatGeobuf with injected header metadata."""
    if metadata is None:
        metadata = extract_qgis_layer_metadata(input_gpkg, layer_name)

    # Validate JSON serializability and size
    try:
        meta_json = json.dumps(metadata, ensure_ascii=False)
        json.loads(meta_json)  # Round-trip validation
    except (TypeError, ValueError) as e:
        raise ValueError(f"Invalid metadata payload: {e}")

    gdf = gpd.read_file(input_gpkg, layer=layer_name, driver="GPKG")
    
    # GDAL FlatGeobuf driver expects METADATA as a dataset creation option
    pyogrio.write_dataframe(
        gdf,
        output_fgb,
        driver="FlatGeobuf",
        dataset_creation_options={"METADATA": meta_json}
    )
    print(f"Successfully wrote FlatGeobuf with {len(metadata)} metadata keys.")

# Usage example
if __name__ == "__main__":
    write_flatgeobuf_with_metadata(
        input_gpkg="data/source.gpkg",
        output_fgb="data/output.fgb",
        layer_name="administrative_boundaries"
    )

Validation & Pipeline Integration

FlatGeobuf headers are read during initial stream initialization. Corrupted or oversized JSON payloads will cause client-side parsing failures. Always enforce these constraints before deployment:

  • Size limit: Keep header metadata under 64 KB. Large payloads should be offloaded to external catalogs (e.g., STAC or cloud object tags).
  • Type safety: Avoid binary blobs or unescaped control characters. Use ensure_ascii=False to preserve UTF-8 safely.
  • Schema alignment: Similar constraints apply when Preserving Metadata During GeoParquet Conversion, where column-level metadata must be mapped to Parquet key-value pairs instead of a single header field.

For automated validation, integrate jsonschema into your CI/CD pipeline to enforce required keys (title, description, source, last_updated) before triggering ogr2ogr or pyogrio exports.

Best Practices for Cloud-Native Workflows

  1. Decouple styling from metadata: QGIS .qml files should be stored alongside FlatGeobuf files in cloud storage. Reference them via external catalogs rather than embedding them in the binary header.
  2. Use GDAL ≥3.8 for optimal performance: Newer GDAL versions include FlatGeobuf spatial index optimizations and improved metadata serialization handling. See the GDAL FlatGeobuf driver documentation for version-specific creation options.
  3. Idempotent pipelines: Always extract metadata from the source dataset immediately before conversion. Caching metadata across pipeline runs risks drift when QGIS layer properties are updated.
  4. Client-side hydration: Web clients (e.g., MapLibre, OpenLayers) can read the metadata field via HTTP range requests. Structure keys predictably so frontend applications can parse them without custom adapters.

By treating metadata as an explicit, validated payload rather than an implicit export artifact, you ensure consistent behavior across cloud storage, CDN delivery, and web GIS clients.