Calculations in a column

just a background, I am novice in Python, PD and Dask. I would appreciated your input.

I have a large export (CSV) with data from various countries. (source_Country)
I want to achieve two things:

  • get an % of how each column is filled by each country (focussing on this issue first)
  • get a value count on important a selection of columns in my csv file.

as said the file is very large therefore Pandas didn’t work.

the code i have so far is

import pandas as pd
import dask.array as da
import dask.dataframe as dd
from dask.distributed import Client
client = Client(n_workers=10, threads_per_worker=8, processes=True, memory_limit='10GB')

ddf = dd.read_csv('eloqua_export (2).csv',blocksize="50MB",on_bad_lines='skip',engine='python',sep=';')

DE_=ddf['Source_Country']== 'DE'

here I get all sort of issues with calculation issues related to DType=float64

in an ideal world i would like a table(csv export)

Columnname ;Country1;Country2

how can I achieve this. Any help is appreciated.