Performing HOG Matrices on PIMS Chunks through ImageIO

ParticularMiner · April 22, 2022, 3:32am

That was unexpected!

But it turns out that for map_blocks(..., chunks=chunks) it is important to specify the sizes for each chunk along an axis if the chunk sizes along that axis are not all the same.

So wherever you see grey_frames.chunks[0][0] in your script (that’s in two places), change that to grey_frames.chunks[0] and that should hopefully fix the problem!

jmdelahanty · April 25, 2022, 7:05pm

Was away for the weekend so I’m only just now trying this out. Also great to know this tidbit about how map_blocks works! I’ll update this post in a few minutes with what happens here.

EDIT:

Okay, it appears that the hog_images worked! Those gotten written to disk and it appears that the calculations were successful. Now I’m getting this for the descriptors:

DESCRIPTORS:  <class 'dask.array.core.Array'> (851, 1, 3360) dask.array<get_ith_tuple_element, shape=(851, 1, 3360), dtype=uint8, chunksize=(100, 1, 3360), chunktype=numpy.ndarray>
IMAGES:  <class 'dask.array.core.Array'> (851, 506, 908) dask.array<get_ith_tuple_element, shape=(851, 506, 908), dtype=uint8, chunksize=(100, 506, 908), chunktype=numpy.ndarray>
Traceback (most recent call last):
  File "dask_faces.py", line 210, in <module>
    da.to_zarr(hog_descriptors, "hog_descriptors/data.zarr", compressor=compressor)
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask/array/core.py", line 3512, in to_zarr
    return arr.store(z, lock=False, compute=compute, return_stored=return_stored)
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask/array/core.py", line 1689, in store
    r = store([self], [target], **kwargs)
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask/array/core.py", line 1163, in store
    compute_as_if_collection(Array, store_dsk, map_keys, **kwargs)
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask/base.py", line 317, in compute_as_if_collection
    return schedule(dsk2, keys, **kwargs)
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask/threaded.py", line 81, inget
    results = get_async(
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask/local.py", line 506, in get_async
    raise_exception(exc, tb)
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask/local.py", line 314, in reraise
    raise exc
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask/local.py", line 219, in execute_task
    result = _execute_task(task, data)
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask/core.py", line 119, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask/core.py", line 119, in <genexpr>
    return func(*(_execute_task(a, cache) for a in args))
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask/core.py", line 119, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask/core.py", line 119, in <genexpr>
    return func(*(_execute_task(a, cache) for a in args))
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask/core.py", line 119, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask/array/core.py", line 5094, in concatenate_axes
    raise ValueError("Length of axes should equal depth of nested arrays")
ValueError: Length of axes should equal depth of nested arrays

ParticularMiner · April 25, 2022, 9:21pm

@jmdelahanty

Can you send me a link to the video file? I’d like to run the code myself.

jmdelahanty · April 25, 2022, 9:28pm

Sure! Here’s a drive link to the video I’ve been using.

By the way, thank you so much for all your help @ParticularMiner ! I don’t know how I can properly thank you for spending all this time teaching me!

ParticularMiner · April 25, 2022, 10:42pm

No problem @jmdelahanty ! I’m also obsessed with science, and therefore wish to help. As one scientist to another. .

I see you’ve corrected most of my errors. That’s great.
But I see where I made another oversight (sorry) — you need to make the following replacement for the relevant line:

descriptor_axes = list(range(1, first_hog_descriptor.ndim))

(Quite the mean bug, eh?) I’ll correct my previous post too. And I’m sure you’ll figure out why this change is necessary. Feel free to ask about it if you need to.

And thanks for the link to the video file. I hope you have enough computing resources to process it and all your images. That’s how you get the most out of dask!

jmdelahanty · April 26, 2022, 12:00am

This community seems to be pretty responsive and helpful. I’m really glad I’ve found one with helpful people like you around! Hopefully I’ll be good enough at this kind of stuff to return the favor.

So now that I’ve implemented that change, I’m running into a different problem about the dimensions which makes me think that I need that +1 in there at some point…

Traceback (most recent call last):
  File "dask_faces.py", line 193, in <module>
    hog_descriptors = my_hogs.map_blocks(
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask/array/core.py", line 2481, in map_blocks
    return map_blocks(func, self, *args, **kwargs)
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask/array/core.py", line 813, in map_blocks
    raise ValueError(
ValueError: Provided chunks have 3 dims; expected 2 dims

To make it easier to share what we’re currently using, here’s the full code block. It’s currently running into this new error at the hog_descriptors call with map_blocks:

import numpy as np
from skimage.feature import hog
import cv2
import pims
import dask.array as da
import dask_image.imread
import matplotlib.pyplot as plt
from numcodecs import Blosc
import zarr

def as_grey(frame):
    """Convert a 2D image or array of 2D images to greyscale.

    This weights the color channels according to their typical
    response to white light.

    It does nothing if the input is already greyscale.
    (Copied and slightly modified from pims source code.)
    """
    if len(frame.shape) == 2 or frame.shape[-1] != 3:  # already greyscale
        return frame
    else:
        red = frame[..., 0]
        green = frame[..., 1]
        blue = frame[..., 2]
        return 0.2125 * red + 0.7154 * green + 0.0721 * blue


def crop_frame(video_object):

    coords = cv2.selectROI("ROI Selection", video_object)
    cv2.destroyAllWindows()

    return coords

def make_hogs(frames, coords, kwargs):

    # frames will be a chunk of elements from the dask array
    # coords are the cropping coordinates used for selecting subset of image
    # kwargs are the keyword arguments that hog() expects
    # Example:
    # kwargs = dict(
    # orientations=8,
    # pixels_per_cell=(32, 32),
    # cells_per_block=(1, 1),
    # transform_sqrt=True,
    # visualize=True
    # )

    # Perform cropping procedure upon every frame, the : slice,
    # crop the x coordinates in the second slice, and crop the y
    # coordinates in the third slice. Save this new array as 
    # new_frames
    new_frames = frames[
        :,
        coords[1]:coords[1] + coords[3],
        coords[0]:coords[0] + coords[2]
    ]

    # Get the number of frames and shape for making
    # np.arrays of hog descriptors and images later
    nframes = new_frames.shape[0]
    first_frame = new_frames[0]

    print("NEW IMAGE SHAPE: ", first_frame.shape)

    # Use first frame to generate hog descriptor np.array and
    # np.array of a hog image
    hog_descriptor, hog_image = hog(
        first_frame,
        **kwargs
        )

    print("FIRST FRAME HOG SHAPE: ", hog_image.shape)

    # Make empty numpy array that equals the number of frames passed into
    # the function, use the fed in datatype as the datatype of the images
    # and descriptors, and make the arrays shaped as each object's shape
    hog_images = np.empty((nframes,) + hog_image.shape, dtype=hog_image.dtype)
    hog_descriptors = np.empty((nframes,) + hog_descriptor.shape, dtype=hog_descriptor.dtype)

    print("EMPTY HOG IMAGES NUMPY ARRAY SHAPE: ", hog_images.shape)

    # Until I edit the hog code, perform the hog calculation upon each
    # frame in a loop and append them to their respective np arrays
    for index, image in enumerate(new_frames):
        hog_descriptor, hog_image = hog(image, **kwargs)
        hog_descriptors[index, ...] = hog_descriptor
        hog_images[index, ...] = hog_image

    print("COMPUTED HOG IMAGE NUMPY ARRAY: ", hog_images.shape)
    
    return hog_descriptors, hog_images


def get_ith_tuple_element(tuple_, i=0):
    return  tuple_[i]


video_path = "test_vid.mp4"

pims.ImageIOReader.class_priority = 100  # we set this very high in order to force dask's imread() to use this reader [via pims.open()]

original_video = dask_image.imread.imread(video_path)

# Turn pims frame into numpy array that opencv will take for cropping image
coords = crop_frame(np.array(original_video[0]))

# kwargs to use for generating both hog images and hog_descriptors
kwargs = dict(
    orientations=8,
    pixels_per_cell=(32, 32),
    cells_per_block=(1, 1),
    transform_sqrt=True,
    visualize=True
    )

grey_frames = original_video.map_blocks(as_grey, drop_axis=-1)

grey_frames = grey_frames.rechunk({0: 100})

meta = np.array([[[]]])

dtype = grey_frames.dtype

my_hogs = grey_frames.map_blocks(
    make_hogs,
    coords=coords,
    kwargs=kwargs,
    dtype=dtype,
    meta=meta
    )

my_hogs = my_hogs.persist()

hog_images = my_hogs.map_blocks(
    get_ith_tuple_element,
    i = 1,
    dtype=dtype,
    meta=meta
)

first_hog_descriptor, first_hog_image = make_hogs(
    grey_frames[:1, ...].compute(),
    coords=coords,
    kwargs=kwargs
)

hog_images = my_hogs.map_blocks(
    get_ith_tuple_element,
    i = 1,
    chunks=(grey_frames.chunks[0],) + first_hog_image.shape[1:],
    dtype=dtype,
    meta=meta
)

image_axes = [1, 2]
descriptor_axes = list(range(1, first_hog_descriptor.ndim))
descriptors_array_chunks = (grey_frames.chunks[0],) + first_hog_descriptor.shape



hog_descriptors = my_hogs.map_blocks(
    get_ith_tuple_element,
    i=0,
    drop_axis=image_axes,
    new_axis=descriptor_axes,
    chunks=descriptors_array_chunks,
    dtype=dtype,
    meta=meta
)

print("DESCRIPTORS: ", type(hog_descriptors), hog_descriptors.shape, hog_descriptors)
print("IMAGES: ", type(hog_images), hog_images.shape, hog_images)

compressor = Blosc(cname='zstd', clevel=1)

da.to_zarr(hog_images, "hog_images/data.zarr", compressor=compressor)

da.to_zarr(hog_descriptors, "hog_descriptors/data.zarr", compressor=compressor)

print("Data written to zarr! Hooray!")

ParticularMiner · April 26, 2022, 12:11am

Seems you missed the [1:] at the end of that line. That is,

descriptors_array_chunks = (grey_frames.chunks[0],) + first_hog_descriptor.shape[1:]

ParticularMiner · April 26, 2022, 1:03am

My script seems to be running. A few descriptors get saved to disk. Then it raises an error. I’ll take a closer look at this tomorrow. Does the same happen to you too?

jmdelahanty · April 26, 2022, 1:57am

I’m having a similar situation here it seems, here’s the error I’m getting:

Traceback (most recent call last):
  File "dask_faces.py", line 435, in <module>
    da.to_zarr(hog_descriptors, "hog_descriptors/data.zarr", compressor=compressor)
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask/array/core.py", line 3512, in to_zarr
    return arr.store(z, lock=False, compute=compute, return_stored=return_stored)
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask/array/core.py", line 1689, in store
    r = store([self], [target], **kwargs)
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask/array/core.py", line 1163, in store
    compute_as_if_collection(Array, store_dsk, map_keys, **kwargs)
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask/base.py", line 317, in compute_as_if_collection
    return schedule(dsk2, keys, **kwargs)
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask/threaded.py", line 81, inget
    results = get_async(
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask/local.py", line 506, in get_async
    raise_exception(exc, tb)
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask/local.py", line 314, in reraise
    raise exc
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask/local.py", line 219, in execute_task
    result = _execute_task(task, data)
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask/core.py", line 119, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask/core.py", line 119, in <genexpr>
    return func(*(_execute_task(a, cache) for a in args))
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask/core.py", line 119, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask/core.py", line 119, in <genexpr>
    return func(*(_execute_task(a, cache) for a in args))
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask/core.py", line 119, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask/array/core.py", line 5094, in concatenate_axes
    raise ValueError("Length of axes should equal depth of nested arrays")
ValueError: Length of axes should equal depth of nested arrays

Which is the same as something I ran into above. Strange dimension problems!

ParticularMiner · April 26, 2022, 10:29am

Hey @jmdelahanty !

What a good sleep does to clear the mind!

It turns out the bug originates from using the parameter drop_axis in map_blocks().

The docs (follow this link) clearly warn that drop_axis will attempt to concatenate the input to map_blocks() before applying the chunk-function (get_ith_tuple_element() in our case).

It is ironic that this hadn’t occurred to me, since I proposed and wrote that warning in the docs myself (see link) just a few weeks ago! The phrase, “Doctor, heal thyself” comes to mind …

Obviously, we absolutely do not wish to concatenate chunks that are tuples. Recall that this is the very reason why we never call my_hogs.compute() in the first place.

Anyway, your problems should be over now as I’ve successfully ran the now updated script myself (see below), and I hope you can too. My line of reasoning was as follows:

The idea is to avoid using drop_axis in map_blocks() since the chunks of your input array are tuples.
Instead, another way of removing the axes of an array is by using squeeze() (follow this link) after calling map_blocks(). But the caveat here is, you can only use squeeze() to remove axes which have a length of 1.
Since the hog descriptors, compared to the hog images, have one less dimension, we will add one more dimension of length 1 to each of the descriptors. I do this in the newly defined chunk function normalize_hog_desc_dims(). This process amounts to mere dimension book-keeping and therefore should take no time at all.
We then have to inform hog_descriptors = map_blocks(...), using the chunks parameter, that the length of axis 2 of the chunks is now 1.
After this, we use squeeze() to remove axis 2 (now of length 1) from the hog descriptors.

Click to see updated script

import numpy as np
from skimage.feature import hog
from numcodecs.blosc import Blosc
import cv2
import pims
from pims import FramesSequence
import dask.array as da
import dask_image.imread


def as_grey(frame):
    """Convert a 2D image or array of 2D images to greyscale.

    This weights the color channels according to their typical
    response to white light.

    It does nothing if the input is already greyscale.
    (Copied and slightly modified from pims source code.)
    """
    if len(frame.shape) == 2 or frame.shape[-1] != 3:  # already greyscale
        return frame
    else:
        red = frame[..., 0]
        green = frame[..., 1]
        blue = frame[..., 2]
        return 0.2125 * red + 0.7154 * green + 0.0721 * blue


def crop_frame(video_object):

    coords = cv2.selectROI("ROI Selection", video_object)
    cv2.destroyAllWindows()

    return coords

def make_hogs(frames, coords, **kwargs):

    # frames will be a chunk of elements from the dask array
    # coords are the cropping coordinates used for selecting subset of image
    # kwargs are the keyword arguments that hog() expects
    # Example:
    # kwargs = dict(
    # orientations=8,
    # pixels_per_cell=(32, 32),
    # cells_per_block=(1, 1),
    # transform_sqrt=True,
    # visualize=True
    # )

    # Perform cropping procedure upon every frame, the : slice,
    # crop the x coordinates in the second slice, and crop the y
    # coordinates in the third slice. Save this new array as 
    # new_frames
    new_frames = frames[
        :,
        coords[1]:coords[1] + coords[3],
        coords[0]:coords[0] + coords[2]
    ]

    # Get the number of frames and shape for making
    # np.arrays of hog descriptors and images later
    nframes = new_frames.shape[0]
    first_frame = new_frames[0]

    print("NEW IMAGE SHAPE: ", first_frame.shape)

    # Use first frame to generate hog descriptor np.array and
    # np.array of a hog image
    hog_descriptor, hog_image = hog(
        first_frame,
        **kwargs
        )

    print("FIRST FRAME HOG SHAPE: ", hog_image.shape)

    # Make empty numpy array that equals the number of frames passed into
    # the function, use the fed in datatype as the datatype of the images
    # and descriptors, and make the arrays shaped as each object's shape
    hog_images = np.empty((nframes,) + hog_image.shape, dtype=hog_image.dtype)
    hog_descriptors = np.empty((nframes,) + hog_descriptor.shape, dtype=hog_descriptor.dtype)

    print("EMPTY HOG IMAGES NUMPY ARRAY SHAPE: ", hog_images.shape)

    # Until I edit the hog code, perform the hog calculation upon each
    # frame in a loop and append them to their respective np arrays
    for index, image in enumerate(new_frames):
        hog_descriptor, hog_image = hog(image, **kwargs)
        hog_descriptors[index, ...] = hog_descriptor
        hog_images[index, ...] = hog_image

    print("COMPUTED HOG IMAGE NUMPY ARRAY: ", hog_images.shape)
    
    return hog_descriptors, hog_images


def get_ith_tuple_element(tuple_, i=0):
    return  tuple_[i]


def normalize_hog_desc_dims(tuple_):
    # add more dimensions (each of length 1) to the hog descriptor chunk in
    # order to match the number of dimensions of the hog_image
    descriptor = tuple_[0]
    image = tuple_[1]
    if descriptor.ndim >= image.ndim:
        return tuple_[0]
    else:
        return  np.expand_dims(
            tuple_[0], axis=list(range(descriptor.ndim, image.ndim))
        )


video_path = "test_vid.mp4"

pims.ImageIOReader.class_priority = 100  # we set this very high in order to force dask's imread() to use this reader [via pims.open()]

original_video = dask_image.imread.imread(video_path)

# Turn pims frame into numpy array that opencv will take for cropping image
coords = crop_frame(original_video[0].compute())

# kwargs to use for generating both hog images and hog_descriptors
kwargs = dict(
    orientations=8,
    pixels_per_cell=(32, 32),
    cells_per_block=(1, 1),
    transform_sqrt=True,
    visualize=True
)

grey_frames = original_video.map_blocks(as_grey, drop_axis=-1)

grey_frames = grey_frames.rechunk({0: 100})

meta = np.array([[[]]])

dtype = grey_frames.dtype

my_hogs = grey_frames.map_blocks(
    make_hogs,
    coords=coords,
    dtype=dtype,
    meta=meta,
    **kwargs,
)


my_hogs = my_hogs.persist()

# first determine the output hog shapes from the first grey-scaled image so that
# we can use them for all other images: 
first_hog_descr, first_hog_image = make_hogs(
    grey_frames[:1, ...].compute(), coords, **kwargs
)

hog_images = my_hogs.map_blocks(
    get_ith_tuple_element,
    i=1,
    chunks=(grey_frames.chunks[0],) + first_hog_image.shape[1:],
    dtype=dtype,
    meta=meta
)

descr_array_chunks = (grey_frames.chunks[0],) + first_hog_descr.shape[1:]
if first_hog_descr.ndim <= first_hog_image.ndim:
    # we will recreate the missing hog_descriptor axes but give them each a size of 1
    new_axes = []
    n_missing_dims = first_hog_image.ndim - first_hog_descr.ndim
    descr_array_chunks += (1,)*n_missing_dims
else:
    new_axes = list(range(first_hog_image.ndim, first_hog_descr.ndim))

# Do not use `drop_axes` here!  `drop_axes` will attempt to concatenate the
# tuples, which is undesirable.  Instead, use `squeeze()` later to drop the
# unwanted axes.
hog_descriptors = my_hogs.map_blocks(
    normalize_hog_desc_dims,
    new_axis=new_axes,
    chunks=descr_array_chunks,
    dtype=dtype,
    meta=meta,
)
hog_descriptors = hog_descriptors.squeeze(-1)  # here's where we drop the last dimension

print("DESCRIPTORS: ", type(hog_descriptors), hog_descriptors.shape, hog_descriptors)
print("IMAGES: ", type(hog_images), hog_images.shape, hog_images)

compressor = Blosc(cname='zstd', clevel=1)

da.to_zarr(hog_images, "hog_images/data.zarr", compressor=compressor)

da.to_zarr(hog_descriptors, “hog_descriptors/data.zarr", compressor=compressor)

print("Data written to zarr! Hooray!")

So it turns out this project was not as straight-forward as I purported it would be! I believe, the complication here arises mainly because of the function hog() which returns a tuple of arrays instead of a single array. I hadn’t anticipated this before. One needs to tread carefully when tuple-chunks are involved.

Anyway, this should be a learning experience for both of us!

Feel free to ask for clarification if you need it.

jmdelahanty · April 26, 2022, 7:36pm

This works @ParticularMiner ! Thank you so much for your help! The only thing that can be edited is in the final writing of data to zarrs.

From:

da.to_zarr(hog_descriptors, "hog_images/data.zarr", compressor=compressor)

To:

da.to_zarr(hog_descriptors, "hog_descriptors/data.zarr", compressor=compressor)

So the data is written to it’s own location. In the future, when I get this packaged up a bit cleaner in a repository, I’ll make a comment here that puts a link out that people can use to reference how we’re adapting things.

ParticularMiner · April 26, 2022, 8:06pm

@jmdelahanty
Good idea! I’ve updated the script with your correction.

I’ve so far assumed that you’re running the script on a (distributed memory) computing cluster. Is this correct? If so, what’s the performance/speed like compared with your previous serial implementation? I hope it is faster.

jmdelahanty · April 27, 2022, 3:56am

For now, this has all been running on a specific machine in our cluster named Cheetos that has 64 cores and 264GB of RAM. I haven’t tried running it serially (so reading in an individual frame, doing a HOG calculation, and writing that result out to disk) because the only way I had tried doing that previously was by reading with opencv. That both took a while to load everything into memory and, when it finally did it, it used up something like 110GB.

When I learned that PIMS could read individual frames one at a time I thought I would try that, but figured that since there’s so many frames to perform that calculation on that I should try throwing Dask at it. When I learned that Dask uses PIMS, I figured I’d just do it through Dask so I could take advantage of parallelism while also reducing the amount of hardware required for one video.

In the near future, we’ll be performing analyses upon many of these videos and we’ll produce many of them pretty frequently. Something like 20 or even 40 per day for several days per project. At the moment, there are 2 projects running that perform this task. Since we’ll be doing this at (at least for me!) high volume of data collection and analysis, I wanted to try running as many videos as possible per machine and eventually get into coordinating jobs across multiple machines in the cluster we have on campus (or even our super computing center in San Diego one day maybe!) so we can get through all this data efficiently. Getting Dask and PIMS to do this on one small video is a great first step in building infrastructure/code that can let us do it efficiently!

When running that test video that’s linked, it seems like it only uses about 40GB of memory at most. I don’t have more specific metrics yet for it. And also am not sure about how to actually record those metrics best. Some of the next tests I’d like to do are measuring performance like that! Things like CPU usage, limiting numbers of workers and performance, and RAM usage are all things I hope to characterize.

jmdelahanty · April 28, 2022, 11:34pm

Okay, here’s an update for running a full 25 minute video @ParticularMiner:

It took about 3 hours, used only a small amount of each CPU for a majority of the processing (I don’t have data for this since I’m not sure how to record resource usage over time) and eventually was holding onto about 160GB of data in RAM somehow! I have no idea how to profile this properly or reduce the amount being held in ram at a given moment. I’m guessing that reducing the chunk size from 100 to something smaller will be helpful, so I’m going to try doing that now. Any advice about this?

ParticularMiner · April 29, 2022, 10:28am

@jmdelahanty

I take it that you ran the script on Cheetos, the monster single machine with 64 cores and 264GB?
If so, then I suggest you start by letting dask automatically choose a chunksize:

grey_frames = grey_frames.rechunk({0: 'auto'})

After this you can compare its runtime to that of your own chosen chunksize setting and see if you can choose a better setting. I’m afraid the optimum chunksize can only be found by trial and error since other unknown machine (hardware) specs could be involved in determining the runtime.

Apart from the chunksize, you have yet another degree of freedom, namely, the total number of dask workers. I’m guessing that the bottleneck of your script is the hog() function, which operates on a single image at a time. It is possible (but I’m not certain) that hog() could be sped-up by making more threads available to it. You can experiment with this by including the following commands in your script:

... # other import statements from your script
from dask.distributed import Client


client = Client(threads_per_worker=4)  # experiment with this number
... # your script's code
client.shutdown()

Note that increasing threads_per_worker comes at the cost of proportionally decreasing n_workers the total number of dask workers from 64 (the default for on your machine) to

n_workers = max(1, CPU_COUNT // threads_per_worker)

Regarding your question on profiling, I recommend you follow the examples laid out in the docs on dask.diagnostics.

jmdelahanty · April 30, 2022, 4:22am

That’s right! I’ve been using Cheetos so far.

Until I learn how to use the diagnostics tools you linked better, I’ve been looking at how the speed of processing video chunks is going through just logging how long it takes to do the HOG calculations on a chunk of images (in this case sizes of 32, I have the auto setting trying to run now). Soon I’ll do this also by changing the number of threads per worker as well to see how it goes.

Initially, things go pretty fast as it performs operations on the chunk of images! It takes about 10 seconds. Later on, however (about 20 minutes later), it starts taking 20 seconds and appears to be continuously increasing. As does the usage of RAM in the computer.

I’m wondering if maybe PIMS and Dask can’t really effectively chop up videos in the manner that I’m doing it. I mean I don’t think videos were meant to be accessed in the manner that I’m trying here. Perhaps I should try just iterating through a video with pims to create an intermediate zarr as grayscale video and then process the video that way? I tried tagging one of the lead PIMS developers on Twitter to see what he thought and hopefully have him hop onto the thread and teach us, but I’m not sure he’ll see things here…

Maybe @jakirkham has an idea of this since he showed me that dask_image uses PIMS

ParticularMiner · April 30, 2022, 6:07am

@jmdelahanty

I forgot to ask earlier, have you already tried removing the persist statement? That statement was included because I assumed you were running the script on a cluster. It could overwhelm a single machine.

ParticularMiner · April 30, 2022, 6:56am

@jmdelahanty

Also, have you already tried setting dask’s imread()'s keyword parameter nframes instead of calling rechunk()?

It turns out nframes is supposed to give the desired chunksize directly, but it has a default value of 1! (Sorry, I didn’t check this before.)

    nframes : int, optional
        Number of the frames to include in each chunk (default: 1).

For a video file, I expect nframes=1 to be bad for performance. For example, for a 25 minute video, it would mean pims is called to open the video file 25 x 60 x 24 = 36 000 times (assuming there are 24 frames per second) just to get one frame for each chunk! This must definitely be avoided.

ParticularMiner · April 30, 2022, 10:16am

@jmdelahanty

So, while experimenting with your script on my puny laptop (8-core CPU, 16GB RAM), I discovered that the processes-scheduler and not the threaded-scheduler (the default) yielded the better performance on a single machine. The former used all the cores at 100% while the latter used them at only 33%. Consequently, the former was almost 3 times faster than the latter. It turns out, the reason for this outcome lay in the guts of Python itself, namely, the so-called Python GIL (Global Interpreter Lock) which severely limits multi-threading.

And as expected, the memory footprint during runtime depended proportionally on the chunksize (that is, the nframes parameter to dask_image.imread.imread()). So nframes should not be set too high, otherwise your computer’s RAM will be overwhelmed.

... # import statements from your script
from dask.distributed import Client


... # leading script statements

client = Client(threads_per_worker=1)

video_path = "/path/to/video.mp4"

pims.ImageIOReader.class_priority = 100

# I'm sure Cheetos could manage a larger `nframes` value here:
original_video = dask_image.imread.imread(video_path, nframes=32)

... # rest of the script with `rechunk()` and `persist()` statements removed!!

client.shutdown()

Alternatively, if you don’t want to import Client() from dask.distributed as in the code-snippet above, then simply place your .to_zarr() calls within a dask.config.set(scheduler='processes') context manager:

... # leading script statements

# I'm sure Cheetos could manage a much larger `nframes` value here:
original_video = dask_image.imread.imread(video_path, nframes=32)

... # rest of script with `rechunk()`, `persist()`, and `.to_zarr()` statements removed!!

with dask.config.set(scheduler='processes'):
	da.to_zarr(hog_images, "/path/to/hog_images/data.zarr", compressor=compressor)
	da.to_zarr(hog_descriptors, "/path/to/hog_descriptors/data.zarr", compressor=compressor)
	
... # etc.

And oh, when I ran the processes-scheduler, some PIMS warning messages got repeatedly spewed out reporting an AttributeError exception . Simply ignore them. They do not appear to affect the final result.

I later discovered (see this link), that those warnings occur because, unlike in the threaded-scheduler, the workers in the processes-scheduler do not share the updated value of ImageIOReader.class_priority. So pims.open() attempts to use a couple of other readers with higher priority than ImageIOReader. Luckily, however, those other readers fail for some reason (on my laptop), allowing ImageIOReader to be eventually used by pims.open() anyway. I expect similar behavior to occur on Cheetos.

I hope this helps.

jmdelahanty · May 1, 2022, 11:04pm

Hey @ParticularMiner ! Just sitting down to check out all your work! I’ll be running it in the background while I respond to things.

EDIT:
First, I’m running into something odd it seems here with the client = Client(threads_per_worker=1) statement. I have zero idea what it all means really.

Exception ignored in: <module 'threading' from '/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/threading.py'>
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/multiprocessing/spawn.py", line 125, in _main
Task exception was never retrieved
future: <Task finished name='Task-38' coro=<_wrap_awaitable() done, defined at /home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/asyncio/tasks.py:688> exception=RuntimeError('\n        An attempt has been made to start a new process before the\n        current process has finished its bootstrappin
g phase.\n\n        This probably means that you are not using fork to start your\n        child processes and you have forgotten to use the proper idiom\n        in the main module:\n\n            if __name__ == \'__main__\':\n                freeze_support()\n                ...\n\n        The "freeze_support(
)" line can be omitted if the program\n        is not going to be frozen to produce an executable.')>
Traceback (most recent call last):
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/asyncio/tasks.py", line 695, in _wrap_awaitable
    return (yield from awaitable.__await__())
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/distributed/core.py", line 297, in _
    await self.start()
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/distributed/nanny.py", line 334, in start
    response = await self.instantiate()
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/distributed/nanny.py", line 417, in instantiate
    result = await self.process.start()
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/distributed/nanny.py", line 687, in start
    await self.process.start()
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/distributed/process.py", line 32, in _call_and_set_future
    res = func(*args, **kwargs)
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/distributed/process.py", line 186, in _start
    process.start()
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 42, in _launch
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/multiprocessing/spawn.py", line 154, in get_preparation_data
    _check_not_importing_main()
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/multiprocessing/spawn.py", line 134, in _check_not_importing_main
    raise RuntimeError('''
RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.
Task exception was never retrieved

Tons of these messages get printed immediately when I start running the code.

When I try running it with the dask.config parameter, it looks like something else odd is happening. It keeps trying to re-open the the first frame for me to re-select the crop coordinates or something. Here’s what those errors look like:

/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask_image/imread/__init__.py:61: RuntimeWarning: `nframes` does not nicely divide number of frames in file. Last chunk will contain the remainder.
  warnings.warn(
/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask_image/imread/__init__.py:61: RuntimeWarning: `nframes` does not nicely divide number of frames in file. Last chunk will contain the remainder.
  warnings.warn(
/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask_image/imread/__init__.py:61: RuntimeWarning: `nframes` does not nicely divide number of frames in file. Last chunk will contain the remainder.
  warnings.warn(
/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask_image/imread/__init__.py:61: RuntimeWarning: `nframes` does not nicely divide number of frames in file. Last chunk will contain the remainder.
  warnings.warn(
/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask_image/imread/__init__.py:61: RuntimeWarning: `nframes` does not nicely divide number of frames in file. Last chunk will contain the remainder.
  warnings.warn(
/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask_image/imread/__init__.py:61: RuntimeWarning: `nframes` does not nicely divide number of frames in file. Last chunk will contain the remainder.
  warnings.warn(
/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask_image/imread/__init__.py:61: RuntimeWarning: `nframes` does not nicely divide number of frames in file. Last chunk will contain the remainder.
  warnings.warn(
/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask_image/imread/__init__.py:61: RuntimeWarning: `nframes` does not nicely divide number of frames in file. Last chunk will contain the remainder.
  warnings.warn(
/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask_image/imread/__init__.py:61: RuntimeWarning: `nframes` does not nicely divide number of frames in file. Last chunk will contain the remainder.
  warnings.warn(
/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask_image/imread/__init__.py:61: RuntimeWarning: `nframes` does not nicely divide number of frames in file. Last chunk will contain the remainder.
  warnings.warn(
/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask_image/imread/__init__.py:61: RuntimeWarning: `nframes` does not nicely divide number of frames in file. Last chunk will contain the remainder.
  warnings.warn(
/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask_image/imread/__init__.py:61: RuntimeWarning: `nframes` does not nicely divide number of frames in file. Last chunk will contain the remainder.
  warnings.warn(
/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask_image/imread/__init__.py:61: RuntimeWarning: `nframes` does not nicely divide number of frames in file. Last chunk will contain the remainder.
  warnings.warn(
/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask_image/imread/__init__.py:61: RuntimeWarning: `nframes` does not nicely divide number of frames in file. Last chunk will contain the remainder.
  warnings.warn(
/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask_image/imread/__init__.py:61: RuntimeWarning: `nframes` does not nicely divide number of frames in file. Last chunk will contain the remainder.
  warnings.warn(
/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask_image/imread/__init__.py:61: RuntimeWarning: `nframes` does not nicely divide number of frames in file. Last chunk will contain the remainder.
  warnings.warn(
/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask_image/imread/__init__.py:61: RuntimeWarning: `nframes` does not nicely divide number of frames in file. Last chunk will contain the remainder.
  warnings.warn(
/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask_image/imread/__init__.py:61: RuntimeWarning: `nframes` does not nicely divide number of frames in file. Last chunk will contain the remainder.
  warnings.warn(
/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask_image/imread/__init__.py:61: RuntimeWarning: `nframes` does not nicely divide number of frames in file. Last chunk will contain the remainder.
  warnings.warn(
/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask_image/imread/__init__.py:61: RuntimeWarning: `nframes` does not nicely divide number of frames in file. Last chunk will contain the remainder.
  warnings.warn(
/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask_image/imread/__init__.py:61: RuntimeWarning: `nframes` does not nicely divide number of frames in file. Last chunk will contain the remainder.
  warnings.warn(
/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask_image/imread/__init__.py:61: RuntimeWarning: `nframes` does not nicely divide number of frames in file. Last chunk will contain the remainder.
  warnings.warn(
/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask_image/imread/__init__.py:61: RuntimeWarning: `nframes` does not nicely divide number of frames in file. Last chunk will contain the remainder.
  warnings.warn(
Select a ROI and then press SPACE or ENTER button!
Cancel the selection process by pressing c button!
/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask_image/imread/__init__.py:61: RuntimeWarning: `nframes` does not nicely divide number of frames in file. Last chunk will contain the remainder.
  warnings.warn(
/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask_image/imread/__init__.py:61: RuntimeWarning: `nframes` does not nicely divide number of frames in file. Last chunk will contain the remainder.
  warnings.warn(
/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask_image/imread/__init__.py:61: RuntimeWarning: `nframes` does not nicely divide number of frames in file. Last chunk will contain the remainder.
  warnings.warn(
/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask_image/imread/__init__.py:61: RuntimeWarning: `nframes` does not nicely divide number of frames in file. Last chunk will contain the remainder.
  warnings.warn(
/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask_image/imread/__init__.py:61: RuntimeWarning: `nframes` does not nicely divide number of frames in file. Last chunk will contain the remainder.
  warnings.warn(
/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask_image/imread/__init__.py:61: RuntimeWarning: `nframes` does not nicely divide number of frames in file. Last chunk will contain the remainder.
  warnings.warn(
/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask_image/imread/__init__.py:61: RuntimeWarning: `nframes` does not nicely divide number of frames in file. Last chunk will contain the remainder.
  warnings.warn(
/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask_image/imread/__init__.py:61: RuntimeWarning: `nframes` does not nicely divide number of frames in file. Last chunk will contain the remainder.
  warnings.warn(
/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask_image/imread/__init__.py:61: RuntimeWarning: `nframes` does not nicely divide number of frames in file. Last chunk will contain the remainder.
  warnings.warn(
QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-jdelahanty'
Select a ROI and then press SPACE or ENTER button!
Cancel the selection process by pressing c button!
/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask_image/imread/__init__.py:61: RuntimeWarning: `nframes` does not nicely divide number of frames in file. Last chunk will contain the remainder.
  warnings.warn(
/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask_image/imread/__init__.py:61: RuntimeWarning: `nframes` does not nicely divide number of frames in file. Last chunk will contain the remainder.
  warnings.warn(
/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask_image/imread/__init__.py:61: RuntimeWarning: `nframes` does not nicely divide number of frames in file. Last chunk will contain the remainder.
  warnings.warn(
Select a ROI and then press SPACE or ENTER button!
Cancel the selection process by pressing c button!
Select a ROI and then press SPACE or ENTER button!
Cancel the selection process by pressing c button!
Select a ROI and then press SPACE or ENTER button!
Cancel the selection process by pressing c button!
/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask_image/imread/__init__.py:61: RuntimeWarning: `nframes` does not nicely divide number of frames in file. Last chunk will contain the remainder.
  warnings.warn(
Select a ROI and then press SPACE or ENTER button!
Cancel the selection process by pressing c button!
Select a ROI and then press SPACE or ENTER button!
Cancel the selection process by pressing c button!
/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask_image/imread/__init__.py:61: RuntimeWarning: `nframes` does not nicely divide number of frames in file. Last chunk will contain the remainder.
  warnings.warn(
QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-jdelahanty'
Select a ROI and then press SPACE or ENTER button!
Cancel the selection process by pressing c button!
Select a ROI and then press SPACE or ENTER button!
Cancel the selection process by pressing c button!
ASSERT: "false" in file qasciikey.cpp, line 501
QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-jdelahanty'
/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask_image/imread/__init__.py:61: RuntimeWarning: `nframes` does not nicely divide number of frames in file. Last chunk will contain the remainder.
  warnings.warn(
Traceback (most recent call last):
  File "dask_faces.py", line 628, in <module>
    da.to_zarr(hog_images, "hog_images/data.zarr", compressor=compressor)
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask/array/core.py", line 3512, in to_zarr
    return arr.store(z, lock=False, compute=compute, return_stored=return_stored)
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask/array/core.py", line 1689, in store
    r = store([self], [target], **kwargs)
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask/array/core.py", line 1163, in store
    compute_as_if_collection(Array, store_dsk, map_keys, **kwargs)
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask/base.py", line 317, in compute_as_if_collection
    return schedule(dsk2, keys, **kwargs)
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask/multiprocessing.py", line 220, in get
    result = get_async(
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask/local.py", line 495, in get_async
    for key, res_info, failed in queue_get(queue).result():
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/concurrent/futures/_base.py", line 437, in result
    return self.__get_result()
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
    raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

A different error I get is that the destination for my zarrs as they’re built already contains a zarr array. I’m not seeing how to specify append to the to_zarr call:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/multiprocessing/spawn.py", line 125, in _main
    prepare(preparation_data)
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/multiprocessing/spawn.py", line 236, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/multiprocessing/spawn.py", line 287, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/runpy.py", line 265, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/nadata/snlkt/data/facial_expression/specialk/dask_faces.py", line 633, in <module>
    da.to_zarr(hog_images, "/scratch/snlkt_facial_expression/hog_images/data.zarr", compressor=compressor)
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask/array/core.py", line 3503, in to_zarr
    z = zarr.create(
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/zarr/creation.py", line 149, in create
    init_array(store, shape=shape, chunks=chunks, dtype=dtype, compressor=compressor,
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/zarr/storage.py", line 350, in init_array
    _init_array_metadata(store, shape=shape, chunks=chunks, dtype=dtype,
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/zarr/storage.py", line 381, in _init_array_metadata
    raise ContainsArrayError(path)
zarr.errors.ContainsArrayError: path '' contains an array
Traceback (most recent call last):
  File "dask_faces.py", line 633, in <module>
    da.to_zarr(hog_images, "/scratch/snlkt_facial_expression/hog_images/data.zarr", compressor=compressor)
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask/array/core.py", line 3512, in to_zarr
    return arr.store(z, lock=False, compute=compute, return_stored=return_stored)
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask/array/core.py", line 1689, in store
    r = store([self], [target], **kwargs)
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask/array/core.py", line 1163, in store
    compute_as_if_collection(Array, store_dsk, map_keys, **kwargs)
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask/base.py", line 317, in compute_as_if_collection
    return schedule(dsk2, keys, **kwargs)
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask/multiprocessing.py", line 220, in get
    result = get_async(
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/site-packages/dask/local.py", line 495, in get_async
    for key, res_info, failed in queue_get(queue).result():
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/concurrent/futures/_base.py", line 437, in result
    return self.__get_result()
  File "/home/jdelahanty/miniconda3/envs/facial_expression/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
    raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

I’m guessing that the reason the process pool fails is because a different error has taken place.

It looks like if I just supply some coordinates to the script for the cropping procedure instead of having the crop_frame function, it doesn’t try to re-open the video for me to crop frames again which is encouraging. So I’m guessing that Dask is going to execute everything in the script across processes, not just the parts that invoke Dask directly?

I also get many of the same warnings from dask_image stating that the number of frames doesn’t divide nicely into the selected chunk size from nframes. I’m a little confused that it’s trying to do that so many times since I only call the imread function once.

Another piece of good news here is that, when things run for a couple iterations, it’s iterating through the creation of 32 HOGs in 0.32 seconds! Wow!

Were these things happening to you also when you were running things?

END EDIT

I see you’ve removed persist here and I think I’m somewhat confused again about how it works.

Just to double check my understanding, not having persist() in the script therefore makes it so the chunks created by make_hogs() are written to disk as a dask.array? And a single machine can be overwhelmed because it’s holding each chunk in memory if persist() is used? And by precomputed, does it mean that Dask is going to allocate the appropriate memory according to the meta provided and then just fill it in as time goes on?

I am now! That’s my bad for not reading through the docs more carefully. It makes sense that it would be hard to read a big video file quickly if you had to call pims thousands of times like that! I’ll have to toy around with what cheetos is capable of for that parameter.

One thing I’m somewhat confused about here is that you’re specifying just one thread per worker. Is that effectively what a process is? I had thought that scheduling things as a process basically meant scheduling one core per job. Does it really mean just a single threaded thing instead? And then, therefore, does setting the scheduler='processes' basically mean the same thing?

Topic		Replies	Views
Parallelize or map chunks of arrays with different sizes, shapes and number of blocks Dask Array dask-array	4	610	July 31, 2023
Dask image array to jpg Dask Array dask-array	0	391	November 12, 2022
Subtracting Arrays from Chunks Efficiently Dask Array zarr , dask-array , distributed	7	628	April 12, 2022
Performing Pairwise Correlation Coefficient Calculations Across Chunks (and map_blocks vs blockwise) Dask Array dask-array	6	1435	May 24, 2022
Using Cull-like func to prune branches based on knowledge of chunks with all zeros Dask Array dask-array	1	337	May 23, 2022

Performing HOG Matrices on PIMS Chunks through ImageIO

Related topics