Very good questions @jmdelahanty! It seems you’re getting the hang of this!
I’m very sorry for the confusion: I indeed made some typos in my previous code-snippet — that’s what happens when one writes untested code! I’m glad you figured out most of them though!
Since you have corrected almost everything already, I’ll just concentrate on those remaining. I’ve also corrected my posts above. If you find any more such typos, do let me know.
-
instead of
for image in new_frames:
it should have been:
for i, image in enumerate(new_frames):
-
Ellipsis can be quite handy (see this link).
-
Regarding
meta
, I forgot to say something else:meta
need not be precisely of the same type/form as the chunk type returned. Its value is only nominal. In fact, it can be used to spoofdask
into expecting something other than what the chunk-function actually returns. But the one essential thing is thatmeta
should have a.ndim
attribute. -
A tuple does not have a
.ndim
attribute, so it cannot be used as ameta
. So better use an array formeta
. In fact, themeta
array does not even need to be of the right dimensionality. In your case, just usemeta=np.array([[[]]])
, a 3D numpy array. -
You can specify the true expected dimensionality of the output chunks using other parameters of
map_blocks()
, namelynew_axis
(and/ordrop_axis
) (see this link). But if the dimensionality of the output is the same as that of the input, then there is no need to do anything, asmap_blocks()
assumes that by default.def get_ith_tuple_element(tuple_, i=0): return tuple_[i] meta = np.array([[[]]]) dtype = grey_frames.dtype my_hogs = grey_frames.map_blocks( make_hogs, coordinates=coordinates, dtype=dtype, meta=meta ) my_hogs = my_hogs.persist() # At this point, `dask` thinks `my_hogs` has the same shape as `grey_frames`. # It doesn't know that `my_hogs` has chunks of tuples of arrays. As long as # you don't compute `my_hogs` directly, you can get away with this # inconsistency. hog_images = my_hogs.map_blocks( get_ith_tuple_element, i=1, dtype=dtype, meta=meta ) # At this point, `hog_images` truly has the same shape as `grey_frames`. # In contrast, a hog-descriptor has a different shape from a hog-image, so # we need to let `map_blocks()` know what to expect. We will first tell # `map_blocks()` to drop the image axes (1 and 2) that `dask` thinks # `my_hogs` has and next include the descriptor axes as new ones: image_axes = [1, 2] hog_descriptor, hog_image = hog(first_frame) descriptor_axes = list(range(1, hog_descriptor.ndim + 1)) # this gives [1, 2, ..., hog_descriptor.ndim] # It is probably best not to chunk along any descriptor axes; only chunk # along the first axis of the entire array of all descriptors: descriptors_array_chunks = (grey_frames.chunks[0][0],) + hog_descriptor.shape hog_descriptors = my_hogs.map_blocks( get_ith_tuple_element, i=0, drop_axis=image_axes, new_axis=descriptor_axes, chunks=descriptors_array_chunks, dtype=dtype, meta=meta, # you can correct `meta` for consistency, if you want. But it really does not matter. )
After this, you can save your arrays to disk in a file format of your choice. Recall my previous post at this link.
I hope this helps.