Hello team.
I am very pleased by Coiled blog with step-by-step instructions on how to better perform commonly seen operations like indexing the DF or joining 2 DFs like in the articles below:
here and there
I reckon dict.fromkeys() is used to remove duplicates from the divisions list, however afaik using set() is more explicit and thus more pythonic.
A brief investigation on performance gains/losses is below:
import timeit
from pprint import pprint
mylist = ["nowplaying", "PBS", "PBS", "nowplaying", "job",
"debate", "thenandnow", "nowplaying", "PBS", "PBS",
"nowplaying", "job", "debate", "thenandnow",]
coiled_div = list(dict.fromkeys(mylist))
uniq_div_via_set = list(set(mylist))
setup = "mylist = [u'nowplaying', u'PBS', u'PBS', u'nowplaying',\
u'job', u'debate', u'thenandnow',\
u'nowplaying', u'PBS', u'PBS', u'nowplaying',\
u'job', u'debate', u'thenandnow']"
fromk = timeit.timeit("list(dict.fromkeys(mylist))", setup=setup)
un_set = timeit.timeit("list(set(mylist))", setup=setup)
print( f"{fromk=:0.3f} sec, \n{un_set=:0.3f} sec"
, coiled_div
, uniq_div_via_set
, sep="\n" )
pprint(dict.fromkeys(mylist),)
Produces the following result
fromk=0.468 sec,
un_set=0.331 sec
So the questions is - which option is considered better when extracting unique divisions for a stand-alone index setting?
Thanks a lot in advance for your time and attention.