Too many open files on SLURMCluster

My goal is to process 1000 files at once on a SLURM cluster. I can perform this on 8 nodes each with 32 cores. But then trying with 16 nodes (each with 32 cores) almost in the end of the computation I receive an error message that I have too many open files (Too many open files: ‘/proc/PID/stat’ and very similar message, but ‘/proc/PID/statm’).

I have already checked https://www.cyberciti.biz/faq/linux-increase-the-maximum-number-of-open-files/ and I cannot apply global changes since I do not have rights. I currently have:

cat /proc/sys/fs/file-max → 1458549
ulimit -Hn → 4096
ulimit -Sn → 4096

Or can I be sure that the problem comes from here?

Hi,

The problem comes from the setting of the server from where you’re scheduler is started. See Frequently Asked Questions — Dask.distributed 2022.8.1 documentation.

16 * 32 processes (assuming you still use 1 process for each cores) means 512 workers. So you might reach the limit if you multiply this by 10 as stated in the docs.

You need to ask cluster admin if they can change this setting.

Other solutions are :

  • try to run the scheduler on a compute node via a job. Or check on compute nodes if limits are the same.
  • try to launch less processes, maybe your computation is not gil limited and can benefit from multithreading?
2 Likes

Thanks for the reponse. I checked the documentation, but it referred to me to he page, which suggested increasing the limits as a su user, which however I do not have.

I am not sure if this is a bug, but while trying to solve the issue today, I figured out that:

  1. If I have an open performance report, i.e. do with performance_report() ....: do f(x) I am getting this error already on 16 nodes. After removing the performance report I managed to start a script on 16 and 32 nodes (both with 1000 files).
  2. When having the report disabled, I started a job with 100 files on 64 (full) nodes, it again told me that I am reaching a limit of open files.

I am using a distributed dataframe which is not multithreaded but should be optimized for clusters. My colleagues previously tested it on spark cluster, and it certainly works on above 64 nodes.

I will try the first suggestion tomorrow and provide you with my results/observations. This I have not tried yet.

I started a script from a node and it worked. Unfortunately, when I have a performance_report, I get an error of too many file descriptors.

I have the issue that my jobs often start, but from times to times I encounter the too many open files ~ @guillaumeeb. I really cannot argue why?

Not sure why performance_report does increase that much the number of open files on the Scheduler, but this is not entirely unexpected I guess, Scheduler must be saving a lots of little things into files.

Dask Scheduler does open multiple files per Worker, I guess if you cannot increase the system limit (even through admin team), there is not much you can do except:

  • Trying to use less process with the same amount of cores: if you work with Dataframes, Python threads might work, at least for reading your data in parallel. So try to lower the process count to 1 per two cores, or even 1 for 4 cores.
  • Just launch Dask clusters with 32 workers max, but launch several of them with less files each.

Another solution might be to open an issue on github so that the distributed scheduler if improved to use less files, but this might have already been considered.

1 Like

2 posts were merged into an existing topic: Task Scheduling for a single node level