Skip to main content

Datasets

city_bike_nyc

Citi Bikers System Data

Dataset description

The dataset contains information about what Citi Bikers do in NYC.

NAMEREQUIREDTYPE
station_idfalsestring
num_bikes_availablefalseint
num_ebikes_availablefalsestring
num_bikes_disabledfalsestring
num_docks_availablefalseint
num_docks_disabledfalsestring
is_installedfalsestring
is_rentingfalsestring
is_returningfalsestring
station_status_last_reportedfalseint
station_namefalsestring
latfalsestring
lonfalsestring
region_idfalsestring
capacityfalsestring
has_kioskfalsestring
station_information_last_updatedfalsestring
missing_station_informationfalseboolean

NUMBER OF ROWS: 366,676,951

taxi_fhvhv

NYC TLC Trip Record Data, available under the nyc.gov terms of use.

Dataset description

Dataset segment:

  • Trips from 2019/02 to 2023/07;
  • High Volume For-Hire Vehicle Trip Records only (fhvhv).

Please note that each row corresponds to one taxi trip.

Column nameRequiredType
access_a_ride_flagFALSEstring
airport_feeFALSEint
base_passenger_fareFALSEdouble
bcfFALSEdouble
congestion_surchargeFALSEdouble
dispatching_base_numFALSEstring
DOLocationIDFALSElong
driver_payFALSEdouble
dropoff_datetimeFALSEtimestamptz
hvfhs_license_numFALSEstring
on_scene_datetimeFALSEtimestamptz
originating_base_numFALSEstring
pickup_datetimeFALSEtimestamptz
PULocationIDFALSElong
request_datetimeFALSEtimestamptz
sales_taxFALSEdouble
shared_match_flagFALSEstring
shared_request_flagFALSEstring
tipsFALSEdouble
tollsFALSEdouble
trip_milesFALSEdouble
trip_timeFALSElong
wav_match_flagFALSEstring
wav_request_flagFALSEstring

NUMBER OF ROWS: 899,297,740

taxi_zones

NYC Taxi Zones, available under the nyc.gov terms of use.

Dataset description

NYC Taxi Zones, which correspond to the pickup and drop-off zones, or LocationIDs, included in the Yellow, Green, and FHV Trip Records published to Open Data

NAMEREQUIREDTYPE
LocationIDfalselong
Boroughfalsestring
Zonefalsestring
service_zonefalsestring

NUMBER OF ROWS: 265

titanic

Titanic - Machine Learning from Disaster

Dataset description

The data includes only the train.csv part of the original dataset. The dataset contains the ground truth for each passenger of the Titanic.

Column nameRequiredType
PassengerIdfalselong
Survivedfalselong
Pclassfalselong
Namefalsestring
Sexfalsestring
Agefalsedouble
SibSpfalselong
Parchfalselong
Ticketfalsestring
Farefalsedouble
Cabinfalsestring
Embarkedfalsestring

NUMBER OF ROWS: 891

wind_energy_sensor_data

Wind Power Generation

The Dataset description

The dataset contains information from four German energy companies (50 Hertz, Amprion, TenneT TSO and TransnetBW). It contains power generation data (non-normalized) with an interval of 15 minutes, totalizing 96 points a day. Generation is in THw, with data collected between 23/08/2019 and 22/09/2020.

COLUMN NAMEREQUIREDTYPE
hour_000000falsedouble
hour_001500falsedouble
hour_003000falsedouble
observation_datefalsedate
companyfalsestring

Number of rows: 1588