Data Inclusion Workflow

What the Data Producer Should Do

Prepare Data
- Convert data to a cloud-optimized format (COG, Zarr, or CF-compliant NetCDF4)
- Apply appropriate chunking and compression (especially for NetCDF4)
Validate Data
- Verify coordinate reference systems (CRS)
- Ensure metadata is complete, consistent, and CF-compliant (where applicable)
Upload to Cloud Storage
- Upload data to:
  - The S3 bucket/prefix provided by the VEDA team, or
  - A publicly accessible S3 bucket managed by the data provider
- Ensure correct access permissions (e.g., public-read if applicable)
Provide Metadata
- Dataset description and purpose
- Variables and units
- Temporal and spatial coverage
- Preferred colormaps and rescaling parameters (for visualization)
- Citation information (DOI, authors, version, etc.)

Data Review
- Validate format, accessibility, and performance
- Ensure compatibility with VEDA infrastructure
Ingestion
- Ingest the dataset in the STAC catalog
- Create a Virtual Zarr store (e.g., via Kerchunk) if needed
Optimization (if needed)
- Rechunking
- Format conversion (e.g., NetCDF → Zarr)
Integration
- Integrate dataset into the AIR4US platform
- Configure visualization layers and access endpoints
QA/QC
- Verify rendering and map performance
- Validate query and analytics workflows

Timelines vary based on:
- Dataset size
- Format readiness
- Required optimization steps
- Team capacity
For current estimates, please contact the ODSI/DSE team.