S3://nyc-tlc seems to have disappeared ...?

Until about a week ago (07/03/2022), I had various tests using parquet files on the s3://nyc-tlc public bucket. For example, the following code prints zero as the length of the DataFrame, where a week ago, the dataframe was over 84 million rows:

import dask.dataframe as dd
df_nyctlc = dd.read_parquet(
 "s3://nyc-tlc/trip data/yellow_tripdata_2019-*.parquet",
parse_dates=["tpep_pickup_datetime", "tpep_dropoff_datetime"],
 dtype={
     "payment_type": "UInt8",
     "VendorID": "UInt8",
     "passenger_count": "UInt8",
     "RatecodeID": "UInt8",
     "store_and_fwd_flag": "category",
     "PULocationID": "UInt16",
     "DOLocationID": "UInt16",
     "tolls_amount": "float64"
 },
 storage_options={"anon": True},
 blocksize="16 MiB",
).persist()

print(len(df_nyctlc))

Does anyone know where this data went?

@bgithub1 Welcome!

We’re also trying to figure this out, ref: Access denied for NYC taxi dataset · Issue #1418 · awslabs/open-data-registry · GitHub

I’ll let you know if we have any updates!

1 Like

Thanks so much. I though I was losing my mind, or doing something really dumb.

I sent a message to opendata.cityofnewyork asking if they knew anything. Also, this page
https://registry.opendata.aws/nyc-tlc-trip-records-pds/

shows information about the s3://nyc-tlc data. It has a link to nyc.gov:
http://www.nyc.gov/html/tlc/html/about/trip _record_data.shtml

However, this link redirects you to a page that says:

Taxi & Limousine Commission has recently redesigned its website and this page has moved. Please update your bookmark to:

TLC Trip Record Data - TLC

You will be redirected in 5 seconds, or click on the link above.

Not sure if this helps, but it would not surprise me that transition caused this problem.

1 Like