Running this distance-function individually locally, is 1 magnitude faster if I set
parallel=True. But I place it inside a function which is submitted to a cluster of workers, (on this deployment), as seen below, and wonder if there is a way to take advantage of the nested functions parallelizability or not.
Timing the full embarrasingly parallel computation with the nested functions parallel-option set to
False, does not seem to make any difference on the walltime. So I am guessing running a nested function in parallel does not make sense in this case, and I should use resources on parallelizing the enclosing function.
Before I conclude with that, I wanted to ask if someone know of an alternative I am not aware of, that takes advantage of the nested functions parallelizability? Also, what do others do in a similar situation?
Example is supposed to be reproducible
import numpy as np import numpy.ma as ma import math from numba import njit, prange @njit(fastmath=True, parallel=False) def dist_loop_njit_fm_parallelFalse( lat0, lon0, lat, lon ): distances = np.empty_like(lat) R_earth = 6.371e6 phi1 = math.radians(lat0) lambda1 = math.radians(lon0) for i in prange(lat.size): phi2 = math.radians(lat[i]) lambda2 = math.radians(lon[i]) a = math.sin((phi2 - phi1)/2.0)**2 b = math.cos(phi1) * math.cos(phi2) * math.sin((lambda1 - lambda2)/2.0)**2 distances[i] = R_earth * 2 * math.asin( math.sqrt(a+b) ) return distances def function( lat0, lon0, lat_scattered, lon_scattered, window_size, ... ): distance = dist_loop_njit_fm_parallelFalse( lat0, lon0, lat, lon ) in_window = np.where(distance < window_size) lat, lon = lat[in_window], lon[in_window] # Rest of code... return some_results def run_function(client, window_size, workers = None ... ): # Load data n=2_000_000 hlat, llat = 90.0, -90.0 hlon, llon = 180.0, -180.0 lat = np.random.uniform(low=llat, high=hlat, size=(n,)) lon = np.random.uniform(low=llon, high=hlon, size=(n,)) # Scatters that data to cluster lat_scattered = client.scatter(lat) lon_scattered = client.scatter(lon) # make iterable latg = np.arange(-80,80, 5) long = np.arange(-180,180, 5) lonG, latG = np.meshgrid(long,latg) lons = ma.MaskedArray(lonG).compressed() lats = ma.MaskedArray(latG).compressed() # Starts embarrasingly parallel computation list_of_submitted_tasks =  for lat0, lon0 in zip( lats, lons ): submitted = client.submit(function, lat0, lon0, lat_scattered, lon_scattered, window_size, pure = False, workers = workers ) list_of_submitted_tasks.append( submitted ) return list_of_submitted_tasks # Start client and cluster, and run window_size = 100e3 results = client.gather(run_function(client, window_size, workers = None ))