Hello All I’m having issues using doing a replace with my data. The closest thing that I’ve found to help is the dask.dataframe.Series.replace
function. I want to replace a string in one column that meets a condition in another column. Here’s a sample CSV to give you an idea of what I’m dealing with:
Type,Indicator,Attribution Hash,abcdef0123456789abcdef0123456789,adversary1 Hash,abcdef0123456789abcdef0123456789,adversary2 Hash,abcdef0123456789abcdef0123456789,adversary3 Hash,abcdef0123456789abcdef0123456789abcdef01,adversary4 Hash,abcdef0123456789abcdef0123456789abcdef01,adversary5 Hash,abcdef0123456789abcdef0123456789abcdef01,adversary6 Hash,abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789,adversary7 Hash,abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789,adversary8 Hash,abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789,adversary9
I’d like to get to this state:
Type,Indicator,Attribution hash_md5,abcdef0123456789abcdef0123456789,adversary1 hash_md5,abcdef0123456789abcdef0123456789,adversary2 hash_md5,abcdef0123456789abcdef0123456789,adversary3 hash_sha1,abcdef0123456789abcdef0123456789abcdef01,adversary4 hash_sha1,abcdef0123456789abcdef0123456789abcdef01,adversary5 hash_sha1,abcdef0123456789abcdef0123456789abcdef01,adversary6 hash_sha256,abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789,adversary7 hash_sha256,abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789,adversary8 hash_sha256,abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789,adversary9
I have a replace regex to look for MD5 hashes in the Indicator
column and I want to change it’s Type
from Hash
to hash_md5
and perform the same process for the SHA1 and SHA256 hashes. What I’ve run into is I can make the replace, but it replaces in a series for all instances of Hash
instead of just for that row where Indicator
meets my regex condition/search. I hope this sound to confusing.
This is what has worked, but it replaces all instances of Hash
to hash_md5
instead of just in the rows where there is a regex match:
ddf.replace({'Indicator': r'^[a-fA-F0-9]{32}%', 'Hash': 'hash_md5'}', regex=True)