Initiating a LocalCluser leads to CERT_TRUST_IS_UNTRUSTED_ROOT error on data load

Hello,

I use dask as part of a satellite image data access and processing workflow. Essentially, I initiate a LocalCluster using dask.distributed, create an lazy xarray DataArray with stackstac, and load into memory for further processing. I am doing this across thousands of spatial tiles (essentially a for loop where I create these data cubes and use dask to load them into memory one after the other).

Since this is workflow can be finished much quicker by using multiple machines (i.e., giving a second machine half the tiles to process), I secured a second workstation. I set up the same Python environment. Importantly, I am on a government network, which means I need to make sure my certifications are recognized. Currently, I do this by making sure cacert.pem in certifi contains the required certifications.

At first, this worked without issue - with both machines running the processing workflow. However, recently I updated a couple Python modules on the second workstation. Since then, I have been unable to to continue the processing on the second workstation. If I create a LocalCluster, i.e.,:

from dask.distributed import LocalCluster
cluster = LocalCluster(n_workers = 10, threads_per_worker = 10)
client = cluster.get_client()

On data load (i.e., data.load()). I will get this error: RasterioIOError('HTTP response code: 303 - schannel: CertGetCertificateChain trust error CERT_TRUST_IS_UNTRUSTED_ROOT')

Which tracks back to Dask:

File "C:\path\to\my\env\Lib\site-packages\dask\base.py", line 661, in compute
    results = schedule(dsk, keys, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

If I comment out the LocalCluster cell (above), the processing will complete (albeit more slowly since I am not fully making use of my machine by configuring the LocalCluster).

I have tried a few different fixes, including trying to set Security for the LocalCluster by linking to the pem cert file (but the cluster would not start up), and have gone as far as fully resetting the python installation on the second machine by cleaning out all the environments and uninstalling miniforge. What confuses me is it worked initially (and is still working fine on the first machine), and I have been following the same steps as when I initially set it up when trying to fix the issue. It is also weird that the LocalCluser is the issue - since a base compute call works just fine without it (which is still using dask as far as I am am aware?).

Any help would be appreciated!

To follow-up, I get the same error if I avoid the local cluster but use scheduler = 'processes' during data.load() (rather than default threads).

Hi @ZZMitch, welcome to Dask community,

So it looks like spawning new Python process is the problem, they probably don’t get the same system environment as the main process. I really don’t know why though.

Are you sure there has only been Python env modification? The problem occuring sunddenly would suggest that the OS configuration has been modified too in some way…

Could you print the complete stack trace?

Hi @guillaumeeb, thanks for getting back to me!

I cannot think of any other changes that would have occurred. I set my Miniforge3\condabin folder as a Path env variable, but that happened when I initially set up the PC for this workflow. My first thought was that the module updates I did triggered an update to one of certifi/openssl/ca-certificates (since those are set by conda to aggressively update), but forcing matching versions to my original machine set up did not help (also the versions I had already been using look like the latest ones anyway). I also verified that my certificates still exist in cacert.pem.

Here is the complete trace:

---------------------------------------------------------------------------
CPLE_HttpResponseError                    Traceback (most recent call last)
File rasterio\\_base.pyx:310, in rasterio._base.DatasetBase.__init__()

File rasterio\\_base.pyx:221, in rasterio._base.open_dataset()

File rasterio\\_err.pyx:221, in rasterio._err.exc_wrap_pointer()

CPLE_HttpResponseError: HTTP response code: 303 - schannel: CertGetCertificateChain trust error CERT_TRUST_IS_UNTRUSTED_ROOT

During handling of the above exception, another exception occurred:

RasterioIOError                           Traceback (most recent call last)
File ~\Miniforge3\envs\fmask_processor\Lib\site-packages\stackstac\rio_reader.py:327, in _open()
    326 try:
--> 327     ds = SelfCleaningDatasetReader(self.url, sharing=False)
    328 except Exception as e:

File rasterio\\_base.pyx:312, in rasterio._base.DatasetBase.__init__()

RasterioIOError: HTTP response code: 303 - schannel: CertGetCertificateChain trust error CERT_TRUST_IS_UNTRUSTED_ROOT

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
File <timed exec>:1

File C:\Users/mbonney/Documents/mtb_processing/UtilityCode_23May24/DataAccess/Utilities\PreProcess_Utils.py:179, in loadXR(cube)
    176     with ddiag.ProgressBar():
    177         with rio.Env(GDAL_HTTP_UNSAFESSL = 'YES') as env: 
    178             # compute (returns python object fit in mem), persist (returns dask object), load (like compute but inplace)
--> 179             cube = cube.load() 
    181 return cube

File ~\Miniforge3\envs\fmask_processor\Lib\site-packages\xarray\core\dataarray.py:1129, in DataArray.load(self, **kwargs)
   1111 def load(self, **kwargs) -> Self:
   1112     """Manually trigger loading of this array's data from disk or a
   1113     remote source into memory and return this array.
   1114 
   (...)
   1127     dask.compute
   1128     """
-> 1129     ds = self._to_temp_dataset().load(**kwargs)
   1130     new = self._from_temp_dataset(ds)
   1131     self._variable = new._variable

File ~\Miniforge3\envs\fmask_processor\Lib\site-packages\xarray\core\dataset.py:845, in Dataset.load(self, **kwargs)
    842 chunkmanager = get_chunked_array_type(*lazy_data.values())
    844 # evaluate all the chunked arrays simultaneously
--> 845 evaluated_data: tuple[np.ndarray[Any, Any], ...] = chunkmanager.compute(
    846     *lazy_data.values(), **kwargs
    847 )
    849 for k, data in zip(lazy_data, evaluated_data):
    850     self.variables[k].data = data

File ~\Miniforge3\envs\fmask_processor\Lib\site-packages\xarray\namedarray\daskmanager.py:86, in DaskManager.compute(self, *data, **kwargs)
     81 def compute(
     82     self, *data: Any, **kwargs: Any
     83 ) -> tuple[np.ndarray[Any, _DType_co], ...]:
     84     from dask.array import compute
---> 86     return compute(*data, **kwargs)

File ~\Miniforge3\envs\fmask_processor\Lib\site-packages\dask\base.py:663, in compute(traverse, optimize_graph, scheduler, get, *args, **kwargs)
    660     postcomputes.append(x.__dask_postcompute__())
    662 with shorten_traceback():
--> 663     results = schedule(dsk, keys, **kwargs)
    665 return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])

File ~\Miniforge3\envs\fmask_processor\Lib\site-packages\stackstac\to_dask.py:189, in fetch_raster_window()
    182 # Only read if the window we're fetching actually overlaps with the asset
    183 if windows.intersect(current_window, asset_window):
    184     # NOTE: when there are multiple assets, we _could_ parallelize these reads with our own threadpool.
    185     # However, that would probably increase memory usage, since the internal, thread-local GDAL datasets
    186     # would end up copied to even more threads.
    187 
    188     # TODO when the Reader won't be rescaling, support passing `output` to avoid the copy?
--> 189     data = reader.read(current_window)
    191     if all_empty:
    192         # Turn `output` from a broadcast-trick array to a real array, so it's writeable
    193         if (
    194             np.isnan(data)
    195             if np.isnan(fill_value)
    196             else np.equal(data, fill_value)
    197         ).all():
    198             # Unless the data we just read is all empty anyway

File ~\Miniforge3\envs\fmask_processor\Lib\site-packages\stackstac\rio_reader.py:385, in read()
    384 def read(self, window: Window, **kwargs) -> np.ndarray:
--> 385     reader = self.dataset
    386     try:
    387         result = reader.read(
    388             window=window,
    389             out_dtype=self.dtype,
   (...)
    393             **kwargs,
    394         )

File ~\Miniforge3\envs\fmask_processor\Lib\site-packages\stackstac\rio_reader.py:381, in dataset()
    379 with self._dataset_lock:
    380     if self._dataset is None:
--> 381         self._dataset = self._open()
    382     return self._dataset

File ~\Miniforge3\envs\fmask_processor\Lib\site-packages\stackstac\rio_reader.py:336, in _open()
    331             warnings.warn(msg)
    332             return NodataReader(
    333                 dtype=self.dtype, fill_value=self.fill_value
    334             )
--> 336         raise RuntimeError(msg) from e
    337 if ds.count != 1:
    338     ds.close()

RuntimeError: Error opening 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T12UVE.2023209T182234.v2.0/HLS.L30.T12UVE.2023209T182234.v2.0.Fmask.tif': RasterioIOError('HTTP response code: 303 - schannel: CertGetCertificateChain trust error CERT_TRUST_IS_UNTRUSTED_ROOT')

This definitly looks like some setup problem at operating system level. I would wagger you’ll encounter it without Dask if using multiprocessing package.

What is strange is that you don’t get the error in your main Python process.

I would look for differences in systemconfiguration file, like /etc/security or /etc/profile…

Excuse my ignorance, but where can I look for these? I am on Windows for reference.

Also, I do not have admin rights on the second machine - but do on the first. Not sure if that is an issue (but again, was not originally!).

Oh sorry, I didn’t though this could be windows. Well, you need to speak with the person who is admin on this machine I think.

Fixed this issue!

It was due to an environmental setting GDAL_HTTP_UNSAFESSL = 'YES' not getting properly passed to dask workers in the LocalCluster.

Originally, I passed this setting on load() using:

with rio.Env(GDAL_HTTP_UNSAFESSL = 'YES') as env:
    cube = cube.load() 

However, as discussed in stackstac #228, this does not pass the environmental setting to dask workers. Funnily enough, I was the one who started this issue in December of last year, but had since forgotten about this advice from gjoseph92.

Something like this:

gdal_env=stackstac.DEFAULT_GDAL_ENV.updated(always=dict(GDAL_HTTP_UNSAFESSL ='YES'))

That is applied during stackstac.stack(), e.g.:

stackstac.stack(..., gdal_env=gdal_env)

Will pass the setting to the workers.

I am still not sure why this issue only came up now and not when I initially set up the machine (or why the old way still works fine on my original machine), but not complaining that I have a fix! I will update the initial machine to apply settings in this manner when the current processing is completed.

1 Like

Just a quick note: it is generally not recommended to use unsafe SSL. You definitly have a wrong setup on your host for needing that.