I am trying to generate a bunch of images through selenium webdriver across four workers/processes.
Right now, I am restarting the webdriver instance per task, but that is highly inefficient (10000 tasks).
Instead, I’m thinking that upon initiating per worker, each will start their own webdriver instance, and later tasks, reuse the premade webdriver instance. Wondering if that’s possible?
I printed the IDs to confirm that these are the same workers.
Worker-b35541a2-7546-4909-bed9-477e28daa9f0
Worker-731ecc29-5eb8-43ba-9fba-710dc6ec8ca6
Worker-b35541a2-7546-4909-bed9-477e28daa9f0
Worker-20c44df1-6fb9-4f86-a296-a93baeb825f8
Worker-0f7788a7-4fdb-4abb-bb03-e1d629068fde
Worker-20c44df1-6fb9-4f86-a296-a93baeb825f8
Worker-731ecc29-5eb8-43ba-9fba-710dc6ec8ca6
Worker-0f7788a7-4fdb-4abb-bb03-e1d629068fde
Worker-b35541a2-7546-4909-bed9-477e28daa9f0
Worker-b35541a2-7546-4909-bed9-477e28daa9f0
Worker-0f7788a7-4fdb-4abb-bb03-e1d629068fde
Worker-731ecc29-5eb8-43ba-9fba-710dc6ec8ca6
Worker-20c44df1-6fb9-4f86-a296-a93baeb825f8
Worker-20c44df1-6fb9-4f86-a296-a93baeb825f8
Worker-731ecc29-5eb8-43ba-9fba-710dc6ec8ca6
Worker-0f7788a7-4fdb-4abb-bb03-e1d629068fde
Here’s some pseudo code that I envision:
import dask
from distributed import Client
from selenium.webdriver.chrome.webdriver import WebDriver
def task(i):
if "webdriver" in dask.worker.cache:
webdriver = dask.worker.cache["webdriver"]
else:
webdriver = WebDriver()
dask.worker.cache["webdriver"] = webdriver
# use webdriver to do things...
client = Client()
client.map(task, range(1000))