Datasets

city_bike_nyc

Citi Bikers System Data

Dataset description

The dataset contains information about what Citi Bikers do in NYC.

NAME

REQUIRED

TYPE

station_id

false

string

num_bikes_available

false

int

num_ebikes_available

false

string

num_bikes_disabled

false

string

num_docks_available

false

int

num_docks_disabled

false

string

is_installed

false

string

is_renting

false

string

is_returning

false

string

station_status_last_reported

false

int

station_name

false

string

lat

false

string

lon

false

string

region_id

false

string

capacity

false

string

has_kiosk

false

string

station_information_last_updated

false

string

missing_station_information

false

boolean

NUMBER OF ROWS: 366,676,951

taxi_fhvhv

NYC TLC Trip Record Data, available under the nyc.gov terms of use.

Dataset description

Dataset segment:

  • Trips from 2019/02 to 2023/07;

  • High Volume For-Hire Vehicle Trip Records only (fhvhv).

Please note that each row corresponds to one taxi trip.

Column name

Required

Type

access_a_ride_flag

FALSE

string

airport_fee

FALSE

int

base_passenger_fare

FALSE

double

bcf

FALSE

double

congestion_surcharge

FALSE

double

dispatching_base_num

FALSE

string

DOLocationID

FALSE

long

driver_pay

FALSE

double

dropoff_datetime

FALSE

timestamptz

hvfhs_license_num

FALSE

string

on_scene_datetime

FALSE

timestamptz

originating_base_num

FALSE

string

pickup_datetime

FALSE

timestamptz

PULocationID

FALSE

long

request_datetime

FALSE

timestamptz

sales_tax

FALSE

double

shared_match_flag

FALSE

string

shared_request_flag

FALSE

string

tips

FALSE

double

tolls

FALSE

double

trip_miles

FALSE

double

trip_time

FALSE

long

wav_match_flag

FALSE

string

wav_request_flag

FALSE

string

NUMBER OF ROWS: 899,297,740

taxi_zones

NYC Taxi Zones, available under the nyc.gov terms of use.

Dataset description

NYC Taxi Zones, which correspond to the pickup and drop-off zones, or LocationIDs, included in the Yellow, Green, and FHV Trip Records published to Open Data

NAME

REQUIRED

TYPE

LocationID

false

long

Borough

false

string

Zone

false

string

service_zone

false

string

NUMBER OF ROWS: 265

titanic

Titanic - Machine Learning from Disaster

Dataset description

The data includes only the train.csv part of the original dataset. The dataset contains the ground truth for each passenger of the Titanic.

Column name

Required

Type

PassengerId

false

long

Survived

false

long

Pclass

false

long

Name

false

string

Sex

false

string

Age

false

double

SibSp

false

long

Parch

false

long

Ticket

false

string

Fare

false

double

Cabin

false

string

Embarked

false

string

NUMBER OF ROWS: 891

wind_energy_sensor_data

Wind Power Generation

The Dataste description

The dataset contains information from four German energy companies (50 Hertz, Amprion, TenneT TSO and TransnetBW). It contains power generation data (non-normalized) with an interval of 15 minutes, totalizing 96 points a day. Generation is in THw, with data collected between 23/08/2019 and 22/09/2020.

COLUMN NAME

REQUIRED

TYPE

hour_000000

false

double

hour_001500

false

double

hour_003000

false

double

observation_date

false

date

company

false

string

Number of rows: 1588