An illustrative field dataset that contains a variety of variables commonly collected in the automotive sector.

The dataset has complete information about failed and incomplete information about intact vehicles. See 'Format' and 'Details' for further insights.

field_data

Format

A tibble with 10,684 rows and 20 variables:

vin

Vehicle identification number.

dis

Days in service.

mileage

Distances covered, which are unknown for censored units.

status

1 for failed and 0 for censored units.

production_date

Date of production.

registration_date

Date of registration. Known for all failed units and for a few intact units.

repair_date

The date on which the failure was repaired. It is assumed that the repair date is equal to the date of failure occurrence.

report_date

The date on which lifetime information about the failure were available.

country

Delivering country.

region

The region within the country of delivery. Known for registered vehicles, NA for units with a missing registration_date.

climatic_zone

Climatic zone based on "Köppen-Geiger" climate classification. Known for registered vehicles, NA for units with a missing registration_date.

climatic_subzone

Climatic subzone based on "Köppen-Geiger" climate classification. Known for registered vehicles, NA for units with a registration_date.

brand

Brand of the vehicle.

vehicle_model

Model of the vehicle.

engine_type

Type of the engine.

engine_date

Date where the engine was installed.

gear_type

Type of the gear.

gear_date

Date where the gear was installed.

transmission

Transmission of the vehicle.

fuel

Vehicle fuel.

Details

All vehicles were produced in 2014 and an analysis of the field data was made at the end of 2015. At the date of analysis, there were 684 failed and 10,000 intact vehicles.

Censored vehicles:

For censored units the service time (dis) was computed as the difference of the date of analysis "2015-12-31" and the registration_date.

For many units the latter date is unknown. For these, the difference of the analysis date and production_date was used to get a rough estimation of the real service time. This uncertainty has to be considered in the subsequent analysis (see delay in registration in the section 'Details' of mcs_delay).

Furthermore, due to the delay in report, the computed service time could also be inaccurate. This uncertainty should be considered as well (see delay in report in the section 'Details' of mcs_delay).

The lifetime characteristic mileage is unknown for all censored units. If an analysis is to be made for this lifetime characteristic, covered distances for these units have to be estimated (see mcs_mileage).

Failed vehicles: For failed units the service time (dis) is computed as the difference of repair_date and registration_date, which are known for all of them.

See also