Hi folks I currently using for testing at a time one virtual machine that is running all in one dask scheduler worker and nfs client and flask web app i also configured a second machine as an external worker install dask complete mount the nfs share and i run the command dask worker tcp scheduler address so far it’s all ok but when i upload a file using my flask application i get an error from worker file not found. But this is not true because i am checking the folder an i can find the file that i have uploaded. Do you have any idea why this is happening or can anyone guide me how to implement attach external workers using the same nfs share? i have already search on internet but there is nothing specific for my case.
Hi @wh1t3rabit, welcome to Dask Discourse!
First of all, your post is a little hard to read, please consider editing it to add some punctuation and line feed.
About your problem, it’s a bit hard to tell. You checked on both VMs that the file was there at the same absolute path? Do you use absolute path when trying to access it from Workers? What code do you use in order to read the file though Dask workers?
Another possibility is that your NFS share is lagging a bit, and the file is not available yet when you try to access it.
But usually, if Python code running on a worker is saying that a file is not here, that means it cannot see it.
Hello guillaumeeb and thank you for your reply.
I notice that every time worker is starting creates a folder named /tmp/dask-worker-space/ that is the correct folder that i should share with nfs ? or the folder that i uploaded the csv file.
Also, it is very odd the error that I get because it seems that the worker already reads the file and then i get Function: execute_task
args: ((subgraph_callable-d4d21d52-7b4a-4ccc-a9f4-c5cb195d9981, [(<function read_block_from_file at 0x7f8bbd8c1300>, <OpenFile ‘/nfs_dir/declaration_remuneration_only_A.csv’>, 0, 9026743, b’\n’), None, True, True]))
kwargs: {}
Exception: “FileNotFoundError(2, ‘No such file or directory’)”
In python i am using the following sequence
import dask.dataframe as dd
from distributed import Client
data = dd.read_csv('declaration_remuneration_only_A.csv', dtype=str)
client = Client("192.168.122.152:8786")
data.compute()
Nope, it’s only a folder local to the Worker, that it uses to spill data to disk among others.
You should share the folder on your first VM where you put your data in. The path to this folder should be the same on the Scheduler host and the Worker host.
Maybe the Worker on the first machine is able to read it, but not the worker on the second machine.
You should use an absolute path here. Else the Worker on the second machine will look for the file on its working directory, which is /tmp/dask-worker-space/
.