BayesCalibJob
class provides a prototype of conducting Bayesian calibration
of EnergyPlus model.
The basic workflow is basically:
Setting input and output variables using
$input()
and
$output()
,
respectively.
Input variables should be variables listed in RDD while output variables
should be variables listed in RDD and MDD.
Adding parameters to calibrate using
$param()
or
$apply_measure()
.
Check parameter sampled values and generated parametric models using
$samples()
and
$models()
,
respectively.
Run EnergyPlus simulations in parallel using
$eplus_run()
,
Gather simulated data of input and output parameters using
$data_sim()
.
Specify field measured data of input and output parameters using
$data_field()
.
Specify input data for Stan for Bayesian calibration using
$data_bc()
.
Run bayesian calibration using stan using
$stan_run()
.
Currently, when using builtin Bayesian calibration algorithm, only
one prediction output variable is supported. An error will be issued
if multiple output variables found in data
.
A. Chong and K. Menberg, "Guidelines for the Bayesian calibration of building energy models", Energy and Buildings, vol. 174, pp. 527–547. DOI: 10.1016/j.enbuild.2018.06.028
Hongyuan Jia, Adrian Chong
eplusr::EplusGroupJob
-> eplusr::ParametricJob
-> BayesCalibJob
Inherited methods
new()
Create a BayesCalibJob
object
BayesCalibJob$new(idf, epw)
idf
A path to an local EnergyPlus IDF file or an eplusr::Idf object.
epw
A path to an local EnergyPlus EPW file or an eplusr::Epw object.
When initialization, the objects of classes related in output variable
reporting in the original eplusr::Idf will be deleted, in order to
make sure all input and output variable specifications can be
achieved using Output:Variable
and Output:Meter
. Classes to be
deleted include:
Output:Variable
Output:Meter
Output:Meter:MeterFileOnly
Output:Meter:Cumulative
Output:Meter:Cumulative:MeterFileOnly
Meter:Custom
Meter:CustomDecrement
Output:EnvironmentalImpactFactors
An BayesCalibJob
object.
\dontrun{ if (eplusr::is_avail_eplus(8.8)) { idf_name <- "1ZoneUncontrolled.idf" epw_name <- "USA_CA_San.Francisco.Intl.AP.724940_TMY3.epw" idf_path <- file.path(eplusr::eplus_config(8.8)$dir, "ExampleFiles", idf_name) epw_path <- file.path(eplusr::eplus_config(8.8)$dir, "WeatherData", epw_name) # create from local files BayesCalibJob$new(idf_path, epw_path) # create from an Idf and an Epw object bc <- BayesCalibJob$new(eplusr::read_idf(idf_path), eplusr::read_epw(epw_path)) } }
read_rdd()
Read EnergyPlus Report Data Dictionary (RDD) file
BayesCalibJob$read_rdd(update = FALSE)
update
Whether to run the design-day-only simulation and parse
.rdd
and .mdd
file again. Default: FALSE
.
$read_rdd()
silently runs EnergyPlus using input seed model with
design-day-only mode to create the .rdd
file and returns the
corresponding RddFile object.
The RddFile
object is stored internally and will be directly
returned whenever you call $read_rdd()
again. You can force to
rerun the design-day-only simulation again to update the contents by
setting update
to TRUE
.
$read_rdd()
and
$read_mdd()
are useful when adding input and output parameters using
$input()
and
$output()
,
respectively.
An RddFile object.
\dontrun{ bc$read_rdd() # force to rerun bc$read_rdd(update = TRUE) }
read_mdd()
Read EnergyPlus Meter Data Dictionary (MDD) file
BayesCalibJob$read_mdd(update = FALSE)
update
Whether to run the design-day-only simulation and parse
.rdd
and .mdd
file again. Default: FALSE
.
$read_mdd()
silently runs EnergyPlus using input seed model with
design-day-only mode to create the .mdd
file and returns the
corresponding MddFile object.
The MddFile
object is stored internally and will be directly
returned whenever you call $read_mdd()
again. You can force to
rerun the design-day-only simulation again to update the contents by
setting update
to TRUE
.
$read_rdd()
and
read_mdd()
are useful when adding input and output parameters using
$input()
and
$output()
,
respectively.
An MddFile object.
\dontrun{ bc$read_mdd() # force to rerun bc$read_mdd(update = TRUE) }
input()
Set input parameters
BayesCalibJob$input( key_value = NULL, name = NULL, reporting_frequency = NULL, append = FALSE )
key_value
Key value name for variables. If not specified,
"*"
are used for all variables. key_value
can also be an
RddFile
, MddFile
or a data.frame()
. Please see
description above.
name
Variable names listed in RDD or MDD.
reporting_frequency
Variable reporting frequency for all
variables. If NULL
, "Timestep"
are used for all
variables. All possible values: "Detailed"
, "Timestep"
,
"Hourly"
, "Daily"
, "Monthly"
, "RunPeriod"
,
"Environment"
, and "Annual"
. Default: NULL
.
append
Whether to append input variables at the end of
existing ones. A special value NULL
can be given to remove
all existing parameters. Default: FALSE
.
$input()
takes input parameter definitions in a similar pattern as
you set output variables in Output:Variable
and Output:Meter
class and returns a data.table::data.table()
containing the
information of input parameters. Only variables in
RDD are
allowed.The returned data.table::data.table()
has 5 columns:
index
: Indices of input or output parameters.
class
: The class that parameters belong to. Will be either
Output:Variable
or Output:Meter
.
key_value
: Key value name for variables.
variable_name
: Variable names listed in RDD or MDD.
reporting_frequency
: Variable reporting frequency.
If calling without any argument, the existing input parameters are
directly returned, e.g. bc$input()
.
You can remove all existing input parameters by setting append
to
NULL
, e.g. bc$input(append = NULL)
.
key_value
accepts 3 different formats:
A character vector.
An RddFile object. It can be retrieved using
$read_rdd()
.
In this case, name
argument will be ignored, as its values are
directly taken from variable names in input
RddFile object. For example:
bc$input(bc$read_rdd()[1:5])
A data.frame()
with valid format for adding Output:Variable
and
Output:Meter
objects using eplusr::Idf$load(). In
this case, name
argument will be ignored. For example:
bc$input(eplusr::rdd_to_load(bc$read_rdd()[1:5]))
\dontrun{ # explicitly specify input variable name bc$input(name = "fan air mass flow rate", reporting_frequency = "hourly") # use an RddFile bc$input(bc$read_rdd()[1:5]) # use a data.frame bc$input(eplusr::rdd_to_load(bc$read_rdd()[1:5])) # get existing input bc$input() }
output()
Set output parameters
BayesCalibJob$output( key_value = NULL, name = NULL, reporting_frequency = NULL, append = FALSE )
key_value
Key value name for variables. If not specified,
"*"
are used for all variables. key_value
can also be an
RddFile
, MddFile
or a data.frame()
. Please see
description above.
name
Variable names listed in RDD or MDD.
reporting_frequency
Variable reporting frequency for all
variables. If NULL
, "Timestep"
are used for all
variables. All possible values: "Detailed"
, "Timestep"
,
"Hourly"
, "Daily"
, "Monthly"
, "RunPeriod"
,
"Environment"
, and "Annual"
. Default: NULL
.
append
Whether to append input variables at the end of
existing ones. A special value NULL
can be given to remove
all existing parameters. Default: FALSE
.
$output()
takes output parameter definitions in a
similar pattern as you set output variables in Output:Variable
and
Output:Meter
class and returns a data.table::data.table()
containing the information of output parameters. Unlike
$input()
both variables in RDD and
MDD are allowd. The returned data.table has 5
columns:
index
: Indices of input or output parameters.
class
: The class that parameters belong to. Will be either
Output:Variable
or Output:Meter
.
key_value
: Key value name for variables.
variable_name
: Variable names listed in RDD or MDD.
reporting_frequency
: Variable reporting frequency.
If calling without any argument, the existing output parameters are
directly returned, e.g. bc$output()
.
You can remove all existing parameter by setting append
to NULL
,
e.g. bc$output(append = NULL)
.
key_value
accepts 3 different formats:
A character vector.
An RddFile object or an
MddFile object. They can be retrieved using
$read_rdd()
and
$read_mdd()
,
respectively. In this case, name
argument will be ignored, as
its values are directly taken from variable names in input
RddFile object or
MddFile object. For example:
bc$output(bc$read_mdd()[1:5])
A data.frame()
with valid format for adding Output:Variable
and
Output:Meter
objects using Idf$load(). In this
case, name
argument will be ignored. For example:
bc$output(eplusr::mdd_to_load(bc$read_mdd()[1:5]))
\dontrun{ # explicitly specify input variable name bc$output(name = "fan electric power", reporting_frequency = "hourly") # use an RddFile or MddFile bc$output(bc$read_rdd()[6:10]) bc$output(bc$read_mdd()[6:10]) # use a data.frame bc$output(eplusr::rdd_to_load(bc$read_mdd()[6:10])) # get existing input bc$output() }
param()
Set parameters for Bayesian calibration
BayesCalibJob$param(..., .names = NULL, .num_sim = 30L)
...
Lists of paramter definitions. Please see above on the syntax.
.names
A character vector of the parameter names. If NULL
,
the parameter will be named in format t + number
, where
number
is the index of parameter. Default: NULL
.
.num_sim
An positive integer specifying the number of
simulations to run for each combination of calibration
parameter value. Default: 30L
.
$param()
takes parameter definitions in list format, which is
similar to $set()
in eplusr::Idf class except that each field is
not assigned with a single value, but a numeric vector of length 2,
indicating the minimum and maximum value of each
parameter.
Similar like the way of modifying object field values in eplusr::Idf$set(), there are 3 different ways of defining a parameter in epluspar:
object = list(field = c(min, max))
: Where object
is a
valid object ID or name. Note object ID should be denoted with two
periods ..
, e.g. ..10
indicates the object with ID 10
, It
will set that specific field in that object as one parameter.
.(object, object) := list(field = c(min, max))
: Simimar like
above, but note the use of .()
in the left hand side. You can put
multiple object ID or names in .()
. It will set the field of all
specified objects as one parameter.
class := list(field = c(min, max, levels))
: Note the use of :=
instead of =
. The main difference is that, unlike =
, the left
hand side of :=
should be a valid class name in current
eplusr::Idf. It will set that field of all objects in specified
class as one parameter.
For example, the code block below defines 4 calibration parameters:
Field Fan Total Efficiency
in object named Supply Fan 1
in
class Fan:VariableVolume
class, with minimum and maximum being
0.1 and 1.0, respectively.
Field Thickness
in all objects in class Material
, with minimum
and maximum being 0.01 and 1.0, respectively.
Field Conductivity
in all objects in class Material
, with
minimum and maximum being 0.1 and 0.6, respectively.
Field Watts per Zone Floor Area
in objects Light1
and Light2
in class Lights
, with minimum and maximum being 10 and 30,
respectively.
bc$param( `Supply Fan 1` = list(Fan_Total_Efficiency = c(min = 0.1, max = 1.0)), Material := list(Thickness = c(0.01, 1), Conductivity = c(0.1, 0.6)), .("Light1", "Light2") := list(Watts_per_Zone_Floor_Area = c(10, 30)) )
All models created using $param()
will be named in the same
pattern, i.e. Case_ParameterName(ParamterValue)...
. Note that only
paramter names will be abbreviated using abbreviate()
with
minlength
being 5L
and use.classes
being TRUE
. If samples
contain duplications, make.unique()
will be called to make sure
every model has a unique name.
The modified BayesCalibJob
object itself.
\dontrun{ bc$param( `Supply Fan 1` = list(Fan_Total_Efficiency = c(min = 0.1, max = 1.0)), Material := list(Thickness = c(0.01, 1), Conductivity = c(0.1, 0.6)), .("Light1", "Light2") := list(Watts_per_Zone_Floor_Area = c(10, 30)) ) }
apply_measure()
Set parameters for Bayesian calibration using function
BayesCalibJob$apply_measure(measure, ..., .num_sim = 30L)
measure
A function that takes an eplusr::Idf and other arguments as input and returns an eplusr::Idf object as output.
...
Arguments except first Idf
argument that are passed
to that measure
.
.num_sim
An positive integer specifying the number of
simulations to run taking into account of all parameter
combinations. Default: 30L
.
$apply_measure()
works in a similar way as the $apply_measure
in
eplusr::ParametricJob class, with only exception that each argument
supplied in ...
should be a numeric vector of length 2, indicating
the minimum value and maximum value of each parameter.
Basically $apply_measure()
allows to apply a measure to an
eplusr::Idf. A measure here is just a function that takes an
eplusr::Idf object and other arguments as input, and returns a
modified eplusr::Idf object as output.
The names of function parameter will be used as the names of
calibration parameter. For example, the equivalent version of
specifying parameters described in
$param()
using $apply_measure()
can be:
# set calibration parameters using $apply_measure() # (a) first define a "measure" measure <- function (idf, efficiency, thickness, conducitivy, lpd) { idf$set( `Supply Fan 1` = list(Fan_Total_Efficiency = efficiency), Material := list(Thickness = thickness, Conductivity = conducivity) .("Light1", "Light2") := list(Watts_per_Zone_Floor_Area = lpd) ) idf } # (b) then apply that measure with parameter space definitions as # function arguments bc$apply_measure(measure, efficiency = c(min = 0.1, max = 1.0), thickness = c(0.01, 1), conductivity = c(0.1, 0.6), lpd = c(10, 30) )
All models created using $apply_measure()
will be named in the same
pattern, i.e. Case_ParameterName(ParamterValue)...
. Note that only
paramter names will be abbreviated using abbreviate()
with
minlength
being 5L
and use.classes
being TRUE
. If samples
contain duplications, make.unique()
will be called to make sure
every model has a unique name.
The modified BayesCalibJob
object itself.
\dontrun{ # set calibration parameters using $apply_measure() # (a) first define a "measure" measure <- function (idf, efficiency, thickness, conducitivy, lpd) { idf$set( `Supply Fan 1` = list(Fan_Total_Efficiency = efficiency), Material := list(Thickness = thickness, Conductivity = conducivity) .("Light1", "Light2") := list(Watts_per_Zone_Floor_Area = lpd) ) idf } # (b) then apply that measure with parameter space definitions as # function arguments bc$apply_measure(measure, efficiency = c(min = 0.1, max = 1.0), thickness = c(0.01, 1), conductivity = c(0.1, 0.6), lpd = c(10, 30) ) }
samples()
Get sampled parameter values
BayesCalibJob$samples()
$samples()
returns a data.table::data.table()
which contains the
sampled value for each parameter using Random Latin Hypercube Sampling method. The returned
data.table::data.table()
has 1 + n
columns, where n
is the
parameter number, and 1
indicates an extra column named case
giving the index of each sample.
Note that if $samples()
is called before input and output
parameters being set using
$input()
,
and
$output()
,
only the sampling will be performed and no parametric models will be
created. This is because information of input and output parameters
are needed in order to make sure that corresponding variables will be
reported during simulations. In this case, you can use
$models()
,
to create those models.
\dontrun{ bc$samples() }
models()
Get parametric models
BayesCalibJob$models()
$models()
returns a list of parametric eplusr::Idf objects
created using calibration parameter values genereated using Random
Latin Hypercube Sampling. As stated above, parametric models can only
be created after input, output and calibration parameters have all be
set using
$input()
,
$output()
and
$param()
(or
$apply_measure()
), respectively.
All models will be named in the same pattern, i.e.
Case_ParameterName(ParamterValue)...
. Note that paramter names will
be abbreviated using abbreviate()
with minlength
being 5L
and
use.classes
being TRUE
.
A named list of eplusr::Idf objects.
\dontrun{ bc$models() }
data_sim()
Collect simulation data
BayesCalibJob$data_sim(resolution = NULL, exclude_ddy = TRUE, all = FALSE)
resolution
A character string specifying a time unit or a
multiple of a unit to change the time resolution of returned
simulation data. Valid base units are min
, hour
, day
,
week
, month
, and year
. Example: 10 mins
, 2 hours
,
1 day
. If NULL
, the variable reporting frequency is used.
Default: NULL
.
exclude_ddy
Whether to exclude design day data. Default:
TRUE
. Default: FALSE
.
all
If TRUE
, extra columns are also included in the returned
data.table::data.table()
describing the simulation case and
datetime components. Default: FALSE
.
$data_sim()
returns a list of 2 data.table::data.table()
which
contains the simulated data of input and output parameters. These
data will be stored internally and used during Bayesian calibration
using Stan.
The resolution
parameter can be used to specify the time resolution
of returned data. Note that input time resolution cannot be smaller
than the reporting frequency, otherwise an error will be issued.
The parameter is named in the same way as standard EnergyPlus csv
output file, i.e. KeyValue:VariableName [Unit](Frequency)
.
By default, $data_sim()
returns minimal columns, i.e. the
Date/Time
column together with all input and output parameters are
returned.
You can retrieve extra columns by setting all
to TRUE
. Those
column include:
case
: Integer type. Indices of parametric simulations.
environment_period_index
: Integer type. The indice of environment.
environment_name
: Character type. A text string identifying the
simulation environment.
simulation_days
: Integer type. Day of simulation.
datetime
: DateTime type. The date time of simulation result. Note
that the year valueas are automatically calculated to meets the
start day of week restriction for each simulation environment.
month
: Integer type. The month of reported date time.
day
: Integer type. The day of month of reported date time.
hour
: Integer type. The hour of reported date time.
minute
: Integer type. The minute of reported date time.
day_type
: Character type. The type of day, e.g. Monday
,
Tuesday
and etc. Note that day_type
will always be NA
if
resolution
is specified.
A list of 2 data.table::data.table()
.
\dontrun{ bc$data_sim() }
data_field()
Specify field measured data
BayesCalibJob$data_field(output, new_input = NULL, all = FALSE)
output
A data.frame()
containing measured value of output
parameters.
new_input
A data.frame()
containing newly measured value of
input parameters used for prediction. If NULL
, values of the
first case in
$data_sim()
will be used.
all
If TRUE
, extra columns are also included in the returned
data.table::data.table()
describing the simulation case and
datetime components. For details, please see
$data_sim()
.
Default: FALSE
.
$data_field()
takes a data.frame()
of measured value of output
parameters and returns a list of data.table::data.table()
s which
contains the measured value of input and output parameters, and newly
measured value of input if applicable.
The specified output
data.frame()
is validated using criteria
below:
The column number should be the same as the number of output
specified in
$output()
.
The row number should be the same as the number of simulated values
for each case extracted using
$data_sim()
.
For input parameters, the values of simulation data for the first case are directly used as the measured values.
Parameter new_input
can be used to give a data.frame()
of newly
measured value of input parameters. The column number of input
data.frame()
should be the same as the number of input parameters
specified in
$input()
.
If not specified, the measured values of
input parameters will be used for predictions.
All the data will be stored internally and used during Bayesian calibration using Stan.
Note that as $data_field()
relies on the output of
$data_sim()
.
to
perform validation on the specified data, $data_field()
cannot be
called before
$data_sim()
.
and internally stored data will be
removed whenever
$data_sim()
.
is called. This aims to make sure that
simulated data and field data can be matched whenever the calibration
is performed.
A list of 3 elements:
input
: a data.table::data.table()
which is basically the input
variable values of the first case in
$data_sim()
.
output
: a data.table::data.table()
of output variable values.
new_output
: NULL
or a data.table::data.table()
of newly
measured input variable values.
For details on the meaning of each columns, see
$data_sim()
.
data_bc()
Combine simulation data and field measured data
BayesCalibJob$data_bc(data_field = NULL, data_sim = NULL)
data_field
A data.frame()
specifying field measured data.
Should have the same structure as the output from
$data_field()
.
If NULL
, the output from
$data_field()
will be used. Default: NULL
.
data_sim
A data.frame()
specifying field measured data.
Should have the same structure as the output from
$data_sim()
.
If NULL
, the output from
$data_sim()
will be used. Default: NULL
.
$data_bc()
takes a list of field data and simulated data, and
returns a list that contains data input for Bayesian calibration
using the Stan model from Chong (2018):
n
: Number of measured parameter observations.
n_pred
: Number of newly design points for predictions.
m
: Number of simulated observations.
p
: Number of input parameters.
q
: Number of calibration parameters.
yf
: Data of measured output after z-score standardization using data of
simulated output.
yc
: Data of simulated output after z-score standardization.
xf
: Data of measured input after min-max normalization.
xc
: Data of simulated input after min-max normalization.
x_pred
: Data of new design points for predictions after min-max
normalization.
tc
: Data of calibration parameters after min-max normalization.
Input data_field
and data_sim
should have the same structure as the
output from $data_field()
and $data_sim()
. If data_field
and
data_sim
is not specified, the output from $data_field()
and
$data_sim()
will be used.
A list of 11 elements.
\dontrun{ bc$data_bc() }
eplus_run()
Run parametric simulations
BayesCalibJob$eplus_run( dir = NULL, run_period = NULL, wait = TRUE, force = FALSE, copy_external = FALSE, echo = wait )
dir
The parent output directory for specified simulations.
Outputs of each simulation are placed in a separate folder
under the parent directory. If NULL
, directory of seed
model will be used. Default: NULL
.
run_period
A list giving a new RunPeriod
object definition.
If not NULL
, only this new RunPeriod will take effect with
all existing RunPeriod objects in the seed model being
commented out. If NULL
, existing run period in the seed
model will be used. Default: NULL
.
wait
If TRUE
, R will hang on and wait all EnergyPlus simulations
finish. If FALSE
, all EnergyPlus simulations are run in the
background. Default: TRUE
.
force
Only applicable when the last simulation runs with
wait
equals to FALSE
and is still running. If TRUE
,
current running job is forced to stop and a new one will
start. Default: FALSE
.
copy_external
If TRUE
, the external files that every Idf
object depends on will also be copied into the simulation
output directory. The values of file paths in the Idf will be
changed automatically. Currently, only Schedule:File
class
is supported. This ensures that the output directory will
have all files needed for the model to run. Default is
FALSE
.
echo
Only applicable when wait
is TRUE
. Whether to print
simulation status. Default: same as the value of wait
.
$eplus_run()
runs all parametric models in parallel. Parameter
run_period
can be given to insert a new RunPeriod
object. In this
case, all existing RunPeriod
objects in the seed model will be
commented out.
Note that when run_period
is given, value of field Run Simulation for Weather File Run Periods
in SimulationControl
class will be
reset to Yes
to make sure input run period can take effect.
The modified BayesCalibJob
object itself.
\dontrun{ # specify output directory and run period bc$eplus_run(dir = tempdir(), run_period = list("example", 1, 1, 1, 31)) # run in the background bc$eplus_run(wait = TRUE) # see job status bc$status() # force to kill background job before running the new one bc$eplus_run(force = TRUE) # do not show anything in the console bc$eplus_run(echo = FALSE) # copy external files used in the model to simulation output directory bc$eplus_run(copy_external = TRUE) }
eplus_kill()
Kill current running EnergyPlus simulations
BayesCalibJob$eplus_kill()
$eplus_kill()
kills all background EnergyPlus processes that are
current running if possible. It only works when simulations run in
non-waiting mode.
A single logical value of TRUE
or FALSE
, invisibly.
\dontrun{ bc$eplus_kill() }
eplus_status()
Get the EnergyPlus simulation status
BayesCalibJob$eplus_status()
$eplus_status()
returns a named list of values indicates the status
of the job:
run_before
: TRUE
if the job has been run before. FALSE
otherwise.
alive
: TRUE
if the job is still running in the background. FALSE
otherwise.
terminated
: TRUE
if the job was terminated during last
simulation. FALSE
otherwise. NA
if the job has not been run yet.
successful
: TRUE
if all simulations ended successfully. FALSE
if
there is any simulation failed. NA
if the job has not been run yet.
changed_after
: TRUE
if the seed model has been modified since last
simulation. FALSE
otherwise.
job_status
: A data.table::data.table()
contains meta data
for each simulation job. For details, please see run_multi()
. If the
job has not been run before, a data.table::data.table()
with 4 columns is returned:
index
: The index of simulation
status
: The status of simulation. As the simulation has not been run,
status
will always be "idle".
idf
: The path of input IDF file.
epw
: The path of input EPW file. If not provided, NA
will be
assigned.
A named list of 6 elements.
\dontrun{ bc$eplus_status() }
eplus_output_dir()
Get EnergyPlus simulation output directory
BayesCalibJob$eplus_output_dir(which = NULL)
which
An integer vector of the indexes or a character vector
or names of parametric simulations. If NULL
, results of all
parametric simulations are returned. Default: NULL
.
$eplus_output_dir()
returns the output directory of EnergyPlus
simulation results.
A character vector.
\dontrun{ # get output directories of all simulations bc$eplus_output_dir() # get output directories of specified simulations bc$eplus_output_dir(c(1, 4)) }
eplus_locate_output()
Get paths of EnergyPlus output file
BayesCalibJob$eplus_locate_output(which = NULL, suffix = ".err", strict = TRUE)
which
An integer vector of the indexes or a character vector
or names of parametric simulations. If NULL
, results of all
parametric simulations are returned. Default: NULL
.
suffix
A string that indicates the file extension of
simulation output. Default: ".err"
.
strict
If TRUE
, it will check if the simulation was
terminated, is still running or the file exists or not.
Default: TRUE
.
$eplus_locate_output()
returns the path of a single output file of
specified simulations.
A character vector.
\dontrun{ # get the file path of the error file bc$eplus_locate_output(c(1, 4), ".err", strict = FALSE) # can use to detect if certain output file exists bc$eplus_locate_output(c(1, 4), ".expidf", strict = TRUE) }
eplus_errors()
Read EnergyPlus simulation errors
BayesCalibJob$eplus_errors(which = NULL, info = FALSE)
which
An integer vector of the indexes or a character vector
or names of parametric simulations. If NULL
, results of all
parametric simulations are returned. Default: NULL
.
info
If FALSE
, only warnings and errors are printed.
Default: FALSE
.
$eplus_errors() returns a list of ErrFile
objects which contain all contents of the simulation error files
(.err
). If info
is FALSE
, only warnings and errors are printed.
A list of ErrFile objects.
\dontrun{ bc$errors() # show all information bc$errors(info = TRUE) }
eplus_report_data_dict()
Read report data dictionary from EnergyPlus SQL outputs
BayesCalibJob$eplus_report_data_dict(which = NULL)
which
An integer vector of the indexes or a character vector
or names of parametric simulations. If NULL
, results of all
parametric simulations are returned. Default: NULL
.
$eplus_report_data_dict()
returns a data.table::data.table()
which contains all information about report data.
For details on the meaning of each columns, please see "2.20.2.1 ReportDataDictionary Table" in EnergyPlus "Output Details and Examples" documentation.
A data.table::data.table()
of 10 columns:
case
: The model name. This column can be used to distinguish
output from different simulations
report_data_dictionary_index
: The integer used to link the
dictionary data to the variable data. Mainly useful when joining
diferent tables
is_meter
: Whether report data is a meter data. Possible values:
0
and 1
timestep_type
: Type of data timestep. Possible values: Zone
and
HVAC System
key_value
: Key name of the data
name
: Actual report data name
reporting_frequency
:
schedule_name
: Name of the the schedule that controls reporting
frequency.
units
: The data units
\dontrun{ bc$eplus_report_data_dict(c(1, 4)) }
eplus_report_data()
Read EnergyPlus report data
BayesCalibJob$eplus_report_data( which = NULL, key_value = NULL, name = NULL, year = NULL, tz = "UTC", all = FALSE, wide = FALSE, period = NULL, month = NULL, day = NULL, hour = NULL, minute = NULL, interval = NULL, simulation_days = NULL, day_type = NULL, environment_name = NULL )
which
An integer vector of the indexes or a character vector
or names of parametric simulations. If NULL
, results of all
parametric simulations are returned. Default: NULL
.
key_value
A character vector to identify key values of the
data. If NULL
, all keys of that variable will be returned.
key_value
can also be data.frame that contains key_value
and name
columns. In this case, name
argument in
$eplus_report_data()
is ignored. All available key_value
for
current simulation output can be obtained using
$eplus_report_data_dict()
.
Default: NULL
.
name
A character vector to identify names of the data. If
NULL
, all names of that variable will be returned. If
key_value
is a data.frame, name
is ignored. All available
name
for current simulation output can be obtained using
$eplus_report_data_dict()
.
Default: NULL
.
year
Year of the date time in column datetime
. If NULL
, it
will calculate a year value that meets the start day of week
restriction for each environment. Default: NULL
.
tz
Time zone of date time in column datetime
. Default:
"UTC"
.
all
If TRUE
, extra columns are also included in the returned
data.table::data.table()
.
wide
If TRUE
, the output is formated in the same way as
standard EnergyPlus csv output file.
period
A Date or POSIXt vector used to specify which time
period to return. The year value does not matter and only
month, day, hour and minute value will be used when
subsetting. If NULL
, all time period of data is returned.
Default: NULL
.
month, day, hour, minute
Each is an integer vector for month,
day, hour, minute subsetting of datetime
column when
querying on the SQL database. If NULL
, no subsetting is
performed on those components. All possible month
, day
,
hour
and minute
can be obtained using
$eplus_report_data_dict()
.
Default: NULL
.
interval
An integer vector used to specify which interval
length of report to extract. If NULL
, all interval will be
used. Default: NULL
.
simulation_days
An integer vector to specify which simulation
day data to extract. Note that this number resets after warmup
and at the beginning of an environment period. All possible
simulation_days
can be obtained using
$eplus_report_data_dict()
.
If NULL
, all simulation days will be used. Default: NULL
.
day_type
A character vector to specify which day type of data
to extract. All possible day types are: Sunday
, Monday
,
Tuesday
, Wednesday
, Thursday
, Friday
, Saturday
,
Holiday
, SummerDesignDay
, WinterDesignDay
, CustomDay1
,
and CustomDay2
. All possible values for current simulation
output can be obtained using
$eplus_report_data_dict()
.
environment_name
A character vector to specify which
environment data to extract. If NULL
, all environment data
are returned. Default: NULL
. All possible
environment_name
for current simulation output can be
obtained using:
$read_table(NULL, "EnvironmentPeriods")
case
If not NULL
, a character column will be added indicates
the case of this simulation. If "auto"
, the name of the IDF
file without extension is used.
$eplus_report_data()
extracts the report data in a
data.table::data.table()
using key values, variable names and other
specifications.
$eplus_report_data()
can also directly take all or subset output from
$eplus_report_data_dict()
as input, and extract all data specified.
The returned column numbers varies depending on all
argument.
all
is FALSE
, the returned data.table::data.table()
has 6 columns:
case
: The model name. This column can be used to distinguish
output from different simulations
datetime
: The date time of simulation result
key_value
: Key name of the data
name
: Actual report data name
units
: The data units
value
: The data value
all
is TRUE
, besides columns described above, extra columns are also
included:
month
: The month of reported date time
day
: The day of month of reported date time
hour
: The hour of reported date time
minute
: The minute of reported date time
dst
: Daylight saving time indicator. Possible values: 0
and 1
interval
: Length of reporting interval
simulation_days
: Day of simulation
day_type
: The type of day, e.g. Monday
, Tuesday
and etc.
environment_period_index
: The indice of environment.
environment_name
: A text string identifying the environment.
is_meter
: Whether report data is a meter data. Possible values: 0
and
1
type
: Nature of data type with respect to state. Possible values: Sum
and Avg
index_group
: The report group, e.g. Zone
, System
timestep_type
: Type of data timestep. Possible values: Zone
and HVAC System
reporting_frequency
: The reporting frequency of the variable, e.g.
HVAC System Timestep
, Zone Timestep
.
schedule_name
: Name of the the schedule that controls reporting
frequency.
With the datetime
column, it is quite straightforward to apply time-series
analysis on the simulation output. However, another painful thing is that
every simulation run period has its own Day of Week for Start Day
. Randomly
setting the year
may result in a date time series that does not have
the same start day of week as specified in the RunPeriod objects.
eplusr provides a simple solution for this. By setting year
to NULL
,
which is the default behavior, eplusr will calculate a year value (from
current year backwards) for each run period that compliances with the start
day of week restriction.
It is worth noting that EnergyPlus uses 24-hour clock system where 24 is only used to denote midnight at the end of a calendar day. In EnergyPlus output, "00:24:00" with a time interval being 15 mins represents a time period from "00:23:45" to "00:24:00", and similarly "00:15:00" represents a time period from "00:24:00" to "00:15:00" of the next day. This means that if current day is Friday, day of week rule applied in schedule time period "00:23:45" to "00:24:00" (presented as "00:24:00" in the output) is also Friday, but not Saturday. However, if you try to get the day of week of time "00:24:00" in R, you will get Saturday, but not Friday. This introduces inconsistency and may cause problems when doing data analysis considering day of week value.
With wide
equals TRUE
, $eplus_report_data()
will format the
simulation output in the same way as standard EnergyPlus csv output
file. Sometimes this can be useful as there may be existing
tools/workflows that depend on this format. When both wide
and
all
are TRUE
, columns of runperiod environment names and date
time components are also returned, including:
environment_period_index", "environment_name
, simulation_days
,
datetime
, month
, day
, hour
, minute
, day_type
.
For convenience, input character arguments matching in
$eplus_report_data()
are case-insensitive.
\dontrun{ # read report data bc$report_data(c(1, 4)) # specify output variables using report data dictionary dict <- bc$report_data_dict(1) bc$report_data(c(1, 4), dict[units == "C"]) # specify output variables using 'key_value' and 'name' bc$report_data(c(1, 4), "environment", "site outdoor air drybulb temperature") # explicitly specify year value and time zone bc$report_data(c(1, 4), dict[1], year = 2020, tz = "Etc/GMT+8") # get all possible columns bc$report_data(c(1, 4), dict[1], all = TRUE) # return in a format that is similar as EnergyPlus CSV output bc$report_data(c(1, 4), dict[1], wide = TRUE) # return in a format that is similar as EnergyPlus CSV output with # extra columns bc$report_data(c(1, 4), dict[1], wide = TRUE, all = TRUE) # only get data at the working hour on the first Monday bc$report_data(c(1, 4), dict[1], hour = 8:18, day_type = "monday", simulation_days = 1:7) }
eplus_tabular_data()
Read EnergyPlus tabular data
BayesCalibJob$eplus_tabular_data( which = NULL, report_name = NULL, report_for = NULL, table_name = NULL, column_name = NULL, row_name = NULL )
which
An integer vector of the indexes or a character vector
or names of parametric simulations. If NULL
, results of all
parametric simulations are returned. Default: NULL
.
report_name, report_for, table_name, column_name, row_name
Each is a character vector for subsetting when querying the SQL database. For the meaning of each argument, please see the description above.
$eplus_tabular_data()
extracts the tabular data in a
data.table::data.table()
using report, table, column and row name
specifications. The returned data.table::data.table()
has
9 columns:
case
: The model name. This column can be used to distinguish
output from different simulations
index
: Tabular data index
report_name
: The name of the report that the record belongs to
report_for
: The For
text that is associated with the record
table_name
: The name of the table that the record belongs to
column_name
: The name of the column that the record belongs to
row_name
: The name of the row that the record belongs to
units
: The units of the record
value
: The value of the record in string format
For convenience, input character arguments matching in
$eplus_tabular_data()
are case-insensitive.
A data.table::data.table()
with 8 columns.
\dontrun{ # read all tabular data bc$eplus_tabular_data(c(1, 4)) # explicitly specify data you want str(bc$eplus_tabular_data(c(1, 4), report_name = "AnnualBuildingUtilityPerformanceSummary", table_name = "Site and Source Energy", column_name = "Total Energy", row_name = "Total Site Energy" )) }
eplus_save()
Save EnergyPlus parametric models
BayesCalibJob$eplus_save(dir = NULL, separate = TRUE, copy_external = FALSE)
dir
The parent output directory for models to be saved. If
NULL
, the directory of the seed model will be used. Default:
NULL
.
separate
If TRUE
, all models are saved in a separate folder
with each model's name under specified directory. If FALSE
,
all models are saved in the specified directory. Default:
TRUE
.
copy_external
Only applicable when separate
is TRUE
. If
TRUE
, the external files that every Idf
object depends on
will also be copied into the saving directory. The values of
file paths in the Idf will be changed automatically.
Currently, only Schedule:File
class is supported. This
ensures that the output directory will have all files needed
for the model to run. Default: FALSE
.
$eplus_save()
saves all parametric models in specified folder. An
error will be issued if no measure has been applied.
A data.table::data.table()
with two columns:
model: The path of saved parametric model files.
weather: The path of saved weather files.
\dontrun{ # save all parametric models with each model in a separate folder bc$save(tempdir()) # save all parametric models with all models in the same folder bc$save(tempdir(), separate = FALSE) }
stan_run()
Run Bayesian calibration using Stan
BayesCalibJob$stan_run( file = NULL, data = NULL, iter = 2000L, chains = 4L, echo = TRUE, mc.cores = parallel::detectCores(), all = FALSE, merge = TRUE, ... )
file
The path to the Stan program to use. If NULL
, the
pre-compiled Stan code from Chong (2018) will be used.
Default: NULL
.
data
Only applicable when file
is not NULL
. The data to be
used for Bayesian calibration. If NULL
, the data that
$data_bc()
returns is used. Default: NULL
.
iter
A positive integer specifying the number of iterations
for each chain (including warmup). Default: 2000
.
chains
A positive integer specifying the number of Markov
chains. Default: 4
.
echo
Only applicable when file
is NULL. Whether to print the
summary of Informational Messages to the screen after a chain
is finished or a character string naming a path where the
summary is stored. Default: TRUE
.
mc.cores
An integer specifying how many cores to be used for
Stan. Default: parallel::detectCores()
.
all
If FALSE
, among above meta data columns, only index
,
type
and Date/Time
will be returned. Default: FALSE
.
merge
If TRUE
, y_pred
in returned list will merge all
$data_field()
,
and predicted output into one data.table::data.table()
with
all predicted values put in columns with a \\[prediction\\]
prefix. If FALSE
, similar like above, but combine rows of
field measured output and predicted output together, with a
new column type
added giving field
indicating field
measured output and prediction
indicating predicted output.
Default: TRUE
.
...
Additional arguments to pass to rstan::sampling (when
file
is NULL
) or rstan::stan (when file
is not
NULL
).
$stan_run()
runs Bayesian calibration using Stan and
returns a list of 2 elements:
fit
: An object of S4 class rstan::stanfit.
y_pred
: The output of
$prediction()
A list of 2 elements.
\dontrun{ bc$stan_run() }
stan_file()
Extract Stan file for Bayesian calibration
BayesCalibJob$stan_file(path = NULL)
path
A path to save the Stan code. If NULL
, a character
vector of the Stan code is returned.
$stan_file()
saves the Stan file used internally for Bayesian
calibration. If no path is given, a character vector of the Stan
code is returned. If given, the code will be save to the path and the
file path is returned.
\dontrun{ bc$stan_file() }
post_dist()
Extract posterior distributions of calibrated parameters
BayesCalibJob$post_dist()
$post_dist()
extracted calibrated parameter posterior distributions
based on the results of
$stan_run()
and returns a data.table::data.table()
with each parameter values
filling one column. The parameter names are defined by the .names
arguments in the
$param()
.
\dontrun{ bc$post_dist() }
prediction()
Extract predictions of output variables
BayesCalibJob$prediction(all = FALSE, merge = TRUE)
all
If FALSE
, among above meta data columns, only index
,
type
and Date/Time
will be returned. Default: FALSE
.
merge
If TRUE
, y_pred
in returned list will merge all
$data_field()
,
and predicted output into one data.table::data.table()
with
all predicted values put in columns with a \\[prediction\\]
prefix. If FALSE
, similar like above, but combine rows of
field measured output and predicted output together, with a
new column type
added giving field
indicating field
measured output and prediction
indicating predicted output.
Default: TRUE
.
$prediction()
calculates predicted output variable values based
on the results of
$stan_run()
and returns a data.table::data.table()
which combines the output of
$data_field()
and predicted output values.
Possible returned meta data columns:
index
: Integer type. Row indices of field input data in
$data_field()
sample
: Integer type. Sample indices of the MCMC.
type
: Character type. Only exists when merge
is FALSE
. The
type of output values. field
indicates field measured output
values while prediction
means predicted output values.
Data/Time
: Character type. The date time in EnergyPlus-format.
environment_period_index
: Integer type. The indice of environment.
environment_name
: Character type. A text string identifying the
simulation environment.
simulation_days
: Integer type. Day of simulation.
datetime
: DateTime type. The date time of simulation result. Note
that the year valueas are automatically calculated to meets the
start day of week restriction for each simulation environment.
month
: Integer type. The month of reported date time.
day
: Integer type. The day of month of reported date time.
hour
: Integer type. The hour of reported date time.
minute
: Integer type. The minute of reported date time.
day_type
: Character type. The type of day, e.g. Monday
,
Tuesday
and etc. Note that day_type
will always be NA
if
resolution
is specified.
A data.table::data.table()
with 1 column sample
giving
the sample indices from MCMC, plus the same number of columns as
given calibrated parameters.
\dontrun{ bc$prediction() }
evaluate()
Calculate statistical indicators of output variable predictions
BayesCalibJob$evaluate(funs = list(nmbe, cvrmse))
funs
A list of functions that takes the simulation results as
the first argument and the measured results as the second
argument. Default: list(cvrmse, nmbe)
.
$evalute()
quantify the uncertainty of output variable predictions
from each MCMC sample gathered from
$prediction()
by calculating the statistical indicators.
The default behavior is to evaluate the principal uncertainty indices used in ASHRAE Guideline 14 are Normalized Mean Bias Error (NMBE) and Coefficient of Variation of the Root Mean Square Error (CVRMSE).
A data.table::data.table()
with 1 column sample
giving
the sample indices from MCMC, plus the same number of columns as
given evaluation functions.
\dontrun{ bc$evaluate() }