Until about a week ago (07/03/2022), I had various tests using parquet files on the s3://nyc-tlc public bucket. For example, the following code prints zero as the length of the DataFrame, where a week ago, the dataframe was over 84 million rows:
import dask.dataframe as dd
df_nyctlc = dd.read_parquet(
Does anyone know where this data went?
We’re also trying to figure this out, ref: Access denied for NYC taxi dataset · Issue #1418 · awslabs/open-data-registry · GitHub
I’ll let you know if we have any updates!
Thanks so much. I though I was losing my mind, or doing something really dumb.
I sent a message to opendata.cityofnewyork asking if they knew anything. Also, this page
shows information about the s3://nyc-tlc data. It has a link to nyc.gov:
However, this link redirects you to a page that says:
Taxi & Limousine Commission has recently redesigned its website and this page has moved. Please update your bookmark to:
TLC Trip Record Data - TLC
You will be redirected in 5 seconds, or click on the link above.
Not sure if this helps, but it would not surprise me that transition caused this problem.