Dataset metadata¶
This page describes the metadata schema for describing a dataset.
There are two file formats for the metadata: YAML and HDF5.
YAML metadata files should be in a metadata
folder.
Each section of this doc starts by describing where the relevant
metadata is stored in both file formats.
Dataset¶
This object describes aspects about the entire dataset. For example, the name of the dataset, the authors, the geographical location of the entire dataset etc.
- Location in YAML:
dataset.yaml
- Location in HDF5:
store.root._v_attrs.metadata
Metadata attributes (some of these attributes are adapted from the Dublin Core Metadata Initiative (DCMI)):
name: | (string) (required) Short name for the dataset. e.g. ‘REDD’ or ‘UK-DALE’. Equivalent DCMI element is ‘title’. If this dataset is the output of a disaggregation algorithm then name will be set to a short name for the algorithm; e.g. ‘CO’ or ‘FHMM’. |
||||||||
---|---|---|---|---|---|---|---|---|---|
long_name: | (string) Full name of the dataset, eg. ‘Reference Energy Disaggregation Data Set’. |
||||||||
creators: | (list of strings) in the format ‘<Lastname>, <Firstname>’. DCMI element. |
||||||||
timezone: | (string) Please use the standard TZ name from the IANA (aka Olson) Time Zone Database e.g. ‘America/New_York’ or ‘Europe/London’. |
||||||||
date: | (string) ISO 8601 format. e.g. ‘2014-06-23’ Identical to the ‘date’ DCMI element. |
||||||||
contact: | (string) Email address |
||||||||
institution: | (string) |
||||||||
description: | (string) DCMI element. Human-readable, brief description. e.g. describe sample rate, geo location etc. |
||||||||
number_of_buildings: | |||||||||
(int) |
|||||||||
identifier: | (string): A digital object identifier (DOI) or URI for the dataset. DCMI element. |
||||||||
subject: | (string): For example, is this dataset about domestic or commercial buildings? Does it include disaggregated appliance-by-appliance data or just whole-building data? DCMI element. Human-readable free text. |
||||||||
geospatial_coverage: | |||||||||
(string): Spatial coverage. e.g. ‘Southern England’. Related to the ‘coverage’ DCMI element. Human-readable free text. |
|||||||||
timeframe: | (TimeFrame, see below) Start and end dates for the entire dataset. |
||||||||
funding: | (list of strings) A list of all the sources of funding used to produce this dataset. |
||||||||
publisher: | (string) The entity responsible for making the resource available. Examples of a Publisher include a person, an organization, or a service. DCMI element. |
||||||||
geo_location: | (dict)
|
||||||||
rights_list: |
|
||||||||
description_of_subjects: | |||||||||
(string) A brief description of how subjects were recruited. Are they all PhD students, for example? Were they incentivised to reduce their energy consumption? How were they chosen? |
|||||||||
related_documents: | |||||||||
(list of strings) References about this dataset (e.g. references to academic papers or web pages). Also briefly describe the contents of each reference (e.g. does it contain a description of the metering setup? Or an analysis of the data?) Related to the ‘relation’ DCMI element. |
|||||||||
schema: | (string) The URL of the NILM_metadata version (tag) against which this metadata is validated. e.g. https://github.com/nilmtk/nilm_metadata/tree/v0.2 |
MeterDevice¶
Metadata describing every model of meter used in the dataset. (Please
note that ElecMeter is used for representing individual instances
of meters in a building whilst MeterDevice
is used to represent
information common to all instances of a specific make and model of
meter). Think of this section as a catalogue of meter models used in
the dataset.
- Location in YAML:
meter_devices.yaml
- Location in HDF5:
store.root._v_attrs.metadata
inmeter_devices
One big dict. Keys are device model names (e.g. ‘EnviR’). The purpose is to record information about specific models of meter. Values are dicts with these keys:
model: | (string) (required) The model name for this meter device. |
||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
model_url: | (string) The URL with more information about this meter model. |
||||||||||
manufacturer: | (string) |
||||||||||
manufacturer_url: | |||||||||||
(string) |
|||||||||||
sample_period: | (number) (required) The meter’s nominal sample period (i.e. the length of time between consecutive samples) in seconds. |
||||||||||
max_sample_period: | |||||||||||
(number) (required) The maximum permissible length
of time between consecutive samples. We assume the
meter is switched off during any gap longer than
|
|||||||||||
measurements: | (list) (required) The order is the order of the columns in the data table.
|
||||||||||
description: | (string) |
||||||||||
pre_pay: | (boolean) Is this a pre-pay meter? |
||||||||||
wireless: | (boolean) |
||||||||||
wireless_configuration: | |||||||||||
(dict) All strings are human-readable free text:
|
|||||||||||
data_logger: | (string) Description of the data logger used |
Building¶
- Location in YAML:
building<I>.yaml
- Location in HDF5:
store.root.building<I>._v_attrs.metadata
instance: | (int) (required) The building instance in this dataset, starting from 1 |
||||||||
---|---|---|---|---|---|---|---|---|---|
original_name: | (string) Original name of building from old (pre-NILM Metadata) metadata. |
||||||||
elec_meters: | (dict of dicts) (required) Each key is an integer
(>= 1) representing the meter instance in this building.
Each value is an |
||||||||
appliances: | (list of dicts) (required) See section below on Appliance. |
||||||||
water_meters: | (dict of dicts) Same structure as |
||||||||
gas_meters: | (dict of dicts) Same structure as |
||||||||
description: | (string) |
||||||||
rooms: | (list of dicts):
|
||||||||
n_occupants: | (int) Mode number of occupants. |
||||||||
description_of_occupants: | |||||||||
(string) free-text describing the occupants. Number of children, teenagers, adults, pensioners? Demographics? Were all occupants away from the house during all week days? |
|||||||||
timeframe: | (TimeFrame, see below) |
||||||||
periods_unoccupied: | |||||||||
(list of TimeFrame objects, see below) Periods when this building was empty for more than a day (e.g. holidays) |
|||||||||
construction_year: | |||||||||
(int) Four-digit calendar year of construction. |
|||||||||
energy_improvements: | |||||||||
(list of strings) Any post-construction modifications? Some combination of {‘photovoltaics’, ‘solar thermal’, ‘cavity wall insulation’, ‘loft insulation’, ‘solid wall insulation’, ‘double glazing’, ‘secondary glazing’, ‘triple glazing’} |
|||||||||
heating: | (ordered list of strings, with the most dominant fuel first) Some combination of {‘natural gas’, ‘electricity’, ‘coal’, ‘wood’, ‘biomass’, ‘oil’, ‘LPG’} |
||||||||
communal_boiler: | |||||||||
boolean (set to true if heating is provided by a shared boiler for the flats) |
|||||||||
ownership: | (string) one of {‘rented’, ‘bought’} |
||||||||
building_type: | (string) one of {‘bungalow’, ‘cottage’, ‘detached’, ‘end of terrace’, ‘flat’, ‘semi-detached’, ‘mid-terrace’, ‘student halls’, ‘factory’, ‘office’, ‘university’} |
Building metadata which is inherited from Dataset but can be
overridden by Building
:
- geo_location
- timezone
- timeframe
ElecMeter¶
ElecMeters are the values of the elec_meters
dict of each building (see the
section on Building metadata above).
device_model: | (string) (required) model which keys into meter_devices |
---|---|
submeter_of: | (int) (required) the meter instance of the upstream
meter. Or set to 0 to mean “one of the
site_meters”. In practice, 0 will be interpreted to
mean “downstream of a ‘MeterGroup’ representing all the
site meters summed together”. |
submeter_of_is_uncertain: | |
(boolean) Set to true if the value for submeter_of is uncertain. | |
upstream_meter_in_building: | |
(int) If the upstream meter is in a different building then specify that building instance here. If left blank then we assume the upstream meter is in the same building as this meter. | |
site_meter: | (boolean): required and set to True if this is a site
meter (i.e. furthest upstream meter) otherwise not
required. If there are multiple mains phases
(e.g. 3-phase mains) or multiple mains ‘splits’ (e.g. in
North America where there are two 120 volt splits) then
set site_meter=true in every site meter. All
non-site-meters directly downstream of the site meters
should set submeter_of=0 . Optionally also use
phase to describe which phase this meter measures.
What happens if there are multiple site meters in
parallel (i.e. there are redundant meters)? For
example, perhaps there is a site meter installed by the
utility company which provides infrequent readings; and
there is also a fancy digital site meter which measures
at the same point in the wiring tree and so, in a sense,
the utility meter can be considered ‘redundant’ but is
included in the dataset for comparison). In this
situation, set site_meter=true in every site meter.
Then set disabled=true in all but the ‘favoured’ site
meter (which would usually be the site meter which
provides the ‘best’ readings). It is important to set
disabled=true so NILMTK does not sum together
parallel site meters. The disabled site meters should
also set submeter_of to the ID of the enabled site
meter. All non-site-meters directly downstream of site
meters should set submeter_of=0 . |
utility_meter: | (boolean) required and set to True if this is meter was installed by the utility company. Otherwise not required. |
timeframe: | (TimeFrame object) |
name: | (string) (optional) e.g. ‘first floor total’. |
phase: | (int or string) (optional) Used in multiple-phase setups. |
room: | (string) |
||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
floor: | (int) Not necessary if |
||||||||||||||||
data_location: | (string) (required) Path relative to root directory of
dataset. e.g. |
||||||||||||||||
disabled: | (bool): Set to true if NILMTK should ignore this channel. This is useful if, for example, this channel is a redundant site_meter. |
||||||||||||||||
preprocessing_applied: | |||||||||||||||||
(dict): Each key is optional and is only present if that preprocessing function has been run.
|
|||||||||||||||||
statistics: | (list of dicts): Each dict describes statistics for one set of timeframes. Each dict has:
Note that some of these statistics are cached by
NILMTK at
|
Appliance¶
Each appliance dict has:
type: | (string) (required) appliance type (e.g. ‘kettle’). Use NILM
Metadata controlled vocabulary. See
nilm_metadata/central_metadata/appliance_types/*.yaml. Each |
||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
instance: | (int starting from 1) (required) instance of this appliance within the building. |
||||||||||||
meters: | (list of ints) (required) meter instance(s) directly upstream of this appliance. This is a list to handle the case where some appliances draw power from both 120 volt legs in a north American house. Or 3-phase appliances. |
||||||||||||
dominant_appliance: | |||||||||||||
(boolean) (required if multiple appliances attached to one meter). Is this appliance responsible for most of the power demand on this meter? |
|||||||||||||
on_power_threshold: | |||||||||||||
(number) watts. Not required. Default is taken from the appliance type. The threshold (in watts) used to decide if the appliance is on or off. |
|||||||||||||
max_power: | (number) watts. Not required. |
||||||||||||
min_off_duration: | |||||||||||||
(number) (seconds) Not required. |
|||||||||||||
min_on_duration: | |||||||||||||
(number) (seconds) Not required. |
|||||||||||||
room: | see ElecMeter-room |
||||||||||||
multiple: | (boolean) True if there are more than one
of these appliances represented by this single
|
||||||||||||
count: | (int) If there are more than one of these appliances
represented by this |
||||||||||||
control: | (list of strings) Give a list of all control methods which apply. For example, a video recorder would be both ‘manual’ and ‘timer’. The vocabulary is: {‘timer’, ‘manual’, ‘motion’, ‘sunlight’, ‘thermostat’, ‘always on’} |
||||||||||||
efficiency_rating: | |||||||||||||
(dict):
|
|||||||||||||
nominal_consumption: | |||||||||||||
(dict): Specifications reported by the manufacturer.
|
|||||||||||||
components: | (list of dicts): Components within this appliance. Each dict is an Appliance dict. |
||||||||||||
model: | (string) |
||||||||||||
manufacturer: | (string) |
||||||||||||
brand: | (string) |
||||||||||||
original_name: | (string) |
||||||||||||
model_url: | (string) URL for this model of appliance |
||||||||||||
manufacturer_url: | |||||||||||||
(string) URL for the manufacturer |
|||||||||||||
dates_active: | (list of TimeFrame objects, see below) Can be used to specify a change in appliance over time (for example if one appliance is replaced with another). |
||||||||||||
year_of_purchase: | |||||||||||||
(int) Four-digit year. |
|||||||||||||
year_of_manufacture: | |||||||||||||
(int) Four-digit year. |
|||||||||||||
subtype: | (string) |
||||||||||||
part_number: | (string) |
||||||||||||
gtin: | |||||||||||||
version: | (string) |
||||||||||||
portable: | (boolean) |
Additional properties are specified for some Appliance Types. Please
look up objects in
nilm_metadata/central_metadata/appliances/*.yaml
for details.
When an Appliance object is used as a component for an ApplianceType,
then the Appliance object may have a distributions
dict (see
ApplianceType:distributions
in Central appliance metadata)
specified and may also use a property do_not_merge_categories:
true
which prevents the system from merging categories from the
component into the container appliance.
TimeFrame¶
Represent an arbitrary time frame. If either start or end is absent then assume it equals the start or the end of the dataset, respectively. Please use ISO 8601 format for dates or date times (e.g. 2014-03-17 or 2014-03-17T21:00:52+00:00)
start: | (string) |
---|---|
end: | (string) |