When building a pipeline in dask.bag
, I usually do this to pipe several arguments between functions:
from functools import wraps
def star(f):
@wraps(f)
def wrapper(*args, **kwargs):
return f(*args[0], *args[1:], **kwargs)
return wrapper
...
(bag
.from_sequence(urls)
.map(lambda url: (url, download(url)))
.map(star(lambda url, data: (url, process_data(data))))
.filter(star(lambda url, result: result is not None))
.map(star(lambda url, result: save_for_url(url, result)))
)
This allows me to define functions with several arguments instead of a single tuple argument. How do you deal with this quirk? Is there a better way? I guess dask could provide a starmap
method, similar to itertools.starmap
in Python’s standard library.
Ian.