Core Dump Error When Calling Numba Functions with map_overlap()

Describe the issue:

Calling numba wrapped functions with map_overlap will cause the core dump issue, if run in jupyter notebook then the kernel will die. This is associated with numpy structured array. Calling with naive numpy array seems fine. This issue is not occuring everytime, but re-run for 3-5 times should occurs, error info can be:

free(): invalid pointer
Aborted (core dumped)

or segment fault.

Interestingly if change map_overlap() to map_partitions() then the error won’t occur.

Minimal Complete Verifiable Example:

import os
import dask
from distributed import Client, LocalCluster
import numpy as np
import pandas as pd
from numba import njit    
from IPython import embed
import dask.dataframe as dd

@njit(nogil=True)
def other_numba_func(array, idx, val):
    array['Y'][idx] = val

@njit(nogil=True)
def numba_func(arr, params, array, other_numba_func):
    other_numba_func(array, 0, arr['Z'][0])
    return array

def get_structured_array(num_rows):
    dtype = [
        ('X', np.float64),
        ('A', np.float64),
        ('B', np.float64),
        ('Y', np.float64)
    ]
    ret = np.zeros(num_rows, dtype=dtype)
    return ret

def call_numba_func(df: pd.DataFrame, **kwargs):
    params = pd.DataFrame(kwargs, index=[0]).to_records()[0]
    array = get_structured_array(df.shape[0])
    array = numba_func(df.to_records(), params, array, other_numba_func)
    df['X'] = array['X'].astype(np.float32)
    return df

if __name__ == '__main__':
    client = Client(
        LocalCluster(n_workers=1, threads_per_worker=8, dashboard_address=18787),
        set_as_default=True
    )
    data = {
        'Z': [i for i in range(1000)],
    }
    
    df = pd.DataFrame(data)
    ddf = dd.from_pandas(df, npartitions=2)
    ddf.map_overlap(call_numba_func, 0, 0).head()
    
    client.shutdown()

Environment:

This is my environment:

Python 3.11.10 
dask                      2025.1.0                 pypi_0    pypi
dask-cloudprovider        2024.9.1                 pypi_0    pypi
dask-expr                 1.1.20             pyhd8ed1ab_0    conda-forge
dask-labextension         7.0.0              pyhd8ed1ab_0    conda-forge
numba                     0.61.0                   pypi_0    pypi
numpy                     2.1.0                    pypi_0    pypi
pandas                    2.2.3           py311h7db5c69_1    conda-forge

Description:    Ubuntu 24.04.1 LTS
Release:        24.04
Codename:       noble

Hi @Somedaywilldo, welcome to Dask community!

I was able to reproduce an error (inside Jupyter, but also on command line with same outputs), I just had to run the dataframe computation into a loop, and I always get a kernel crash if I try to do 1000 iterations.

Using map_partitions I didn’t get the error even with 10_000 iterations.

I have no idea from what it could be coming.