Spatialys proposes different projects, related to the improvements of open source
geospatial software on which we have recognized expertise (GDAL, PROJ, etc.),
for which your financial contribution is needed to enable their fulfillment.
The size of those projects is such that is hard to find a single sponsor
to fund them.
PROJ 6 brings undeniable advances in the management
of coordinate transformations between datums, by relying on information available
in the PROJ database. To get accurate
results, most of the time a grid file is needed. The
proj-datumgrid project centralizes all the grids that are available under
an open data license, and bundles them in different archives split along big
geographical regions of the world. That approach assumes that the user of PROJ
has downloaded and installed those optional, but strongly recommended, packages,
otherwise it will get inaccurate results. However there are use cases, like
serverless compute solutions, where it is impossible to bundle of them given
their size (currently more than 700 MB of uncompressed data, and growing) and
the restrictions set by the cloud provider.
The goal of this project is to provide additional capabilities to PROJ:
- Grids will be hosted by a Content Delivery Network (CDN) (we currently have
hosting proposals by two potential providers)
- Users no longer have to manually fetch grid files and place them in PROJ_LIB.
- Full and accurate capability of the software will no longer require hundreds
of megabytes of grid shift files in advance, even if only just a few of them
are needed for the transformations done by the user.
- Local caching of grid files, or even part of files, so that users end up
mirroring what they actually use.
The use of grids locally available will of course still be available, and will
be the default behaviour.
3. Detailed tasks
We have structured this work around three work packages, a core, and two optional,
but strongly desirable additions.
- Work Package 1 (core)
- curl will be an optional build dependency in
autoconf/cmake (if curl available, used by default)
- Network access abstracted through an interface (with C callbacks through C
API) attached to the PROJ context, and curl used as the default implementation
when available.
- Download of grids will not be enabled by default, and i will require the user
to set an environment variable or set an attribute on the PROJ context.
- When enabled, all grids known to PROJ in the database (that is in
grid_alternatives table) will be assumed to be available through the CDN, and
thus for sorting ad filtering logic in createOperations() will be treated as if
there are local files.
- Deep refactoring of the PROJ code dealing with grid use, so as to avoid
to ingest everything in memory as done currently. This should also benefit to local access on big grids.
- Network access will only be attempted if the file doesn't exist locally.
- The network layer will use a in-memory cache of chunks like GDAL /vsicurl/, both to limit the number of small GET requests and have caching effect.
- In download mode, download failures will be propagated as PROJ errors for
coordinate computations.
- The currently supported grid formats (CTable2, NTv1, NTv2 .gsb, GTX), using a line-based
approach, will still be used.
- Upload of the content of existing proj-datumgrid on the CDN storage.
- Work Package 2 (local disk cache)
- a SQLite3 database, located in a user writable directory, will hold
partially downloaded chunks grid files
- Access to it will be thread-safe and multiprocess-safe.
- Work Package 3 (tiled GeoTIFF files for grids)
- Tiles are better suited for piece-wise download than scanline oriented
formats (although this is a bit difficult to anticipate how much benefit this
will give concretely).
- GeoTIFF is a well known format that has more tools to deal with.
A number of websites with grids in that format. TIFF has built-in capability to
add metadata, whereas dedicated grid formats have often few and limited provision
for them.
- As we cannot use libgeotiff, since libgeotiff uses PROJ, a minimal parsing
of GeoTIFF information (extraction of the upper left pixel coordinate and
resolution from the GeoKeys) will be done in the PROJ codebase.
- libtiff will be added as an optional dependency, but required to be able to
use GeoTIFF grids, and required to be able to use the download capability.
Grids distributed on the CDN will only be in GeoTIFF format, while grids
distributed as proj-datumgrid will be available with two options:
with .gsb/.gtx files or with GeoTIFF file (might be subject to adjustment after
discussion with the larger PROJ community).
- The NTv2 format has an extra capability when compared to other formats, which is the
possibility to have sub-grids. This is for example used by the original Canadian NTV2 gid,
ntv2_0.gsb. However the way such subgrids are organized in the NTv2 format is not
cloud optimized (the descriptor of each all subgrid is immediately before the data
of the subgrid, so they are spread all over the file, and
opening that NTv2 file thus may require a lot of seeks). In a TIFF storage, we
could implement this subgrid concept with the TIFF IFD concept, and use a
technique similar to Cloud Optimized GeoTIFF (COG) files where the descriptors
for all subgrids would be put at the beginning of the file, so they can be fetched
in a single GET request. So that subtask includes improvement in the GDAL GTiff
driver to be able to write such an optimized TIFF file from a source dataset with
subdatasets, and make PROJ to be able to use it.
4. Costs and state of funding
- WP 1 (core). Funding target: 8000 euros. Reached
- WP 2 (local disk cache). Funding target: 2000 euros. Reached
- WP 2 (GeoTIFF grids). Funding target: 4800 euros. Reached
We would thank the sponsors of this crowdfunding:
GDAL Coordinate System Barn Raising
The
gdalbarn.com website is dedicated to
this successful compaign which led to the release of PROJ 6.0, GDAL 3.0 and
libgeotiff 1.5.