Skip to contents

A small, checkmate-first schema DSL for R data.

CRAN_Status_Badge CRAN RStudio mirror downloads

schemate provides a small, checkmate-first schema DSL for R data. It can infer schemas from example objects, edit schema documents, save them as JSON, read them back, and validate new inputs against the schema.

The package is meant for package authors and pipeline authors who want a compact R-native schema format without adopting the full JSON Schema vocabulary. A typical workflow is:

  1. infer a conservative schema with schema_infer();
  2. edit it with schema_*() authoring verbs;
  3. save it with schema_write();
  4. read it back with schema_read();
  5. validate inputs with schema_validate().

Installation

install.packages("schemate")

Development Version

To get a bug fix or to use a feature from the development version, you can install the development version of schemate from GitHub.

# install.packages("pak")
pak::pak("hongyuanjia/schemate")

Quick Start

The public API uses a single schema_ prefix and works well in pipelines. Start from an example object, infer a conservative schema, then compact it into something easier to edit and review.

library(schemate)

payload <- list(
    items = list(
        list(id = 1L, name = "alpha", label = "Alpha", slug = "alpha"),
        list(id = 2L, name = "beta", label = "Beta", slug = "beta")
    )
)

schema <- payload |>
    schema_infer(keys = "named", arrays = "rest") |>
    schema_compact() |>
    schema_set_desc("$items", "Repository-like result items")

schema
## {
##   "check": {
##     "kind": "list"
##   },
##   "keys": {
##     "type": "named"
##   },
##   "fields": {
##     "items": {
##       "description": "Repository-like result items",
##       "check": {
##         "kind": "list"
##       },
##       "keys": {
##         "type": "unnamed"
##       },
##       "rest": {
##         "check": {
##           "kind": "list"
##         },
##         "keys": {
##           "type": "named"
##         },
##         "fields": {
##           "id": {
##             "check": {
##               "kind": "int"
##             }
##           }
##         },
##         "groups": [
##           {
##             "names": ["name", "label", "slug"],
##             "check": {
##               "kind": "string"
##             }
##           }
##         ]
##       }
##     }
##   }
## }
schema |>
    schema_validate(payload, mode = "test")
## [1] TRUE

schema_validate() defaults to assert mode: invalid input raises an error and valid input is returned invisibly. Other modes are available when you need a message or a boolean result.

bad_payload <- payload
bad_payload$items[[1L]]$id <- "bad"

schema |>
    schema_validate(bad_payload, mode = "check", name = "payload")
## [1] "payload$items[[1]]$id: Must be of type 'single integerish value', not 'character'"
schema |>
    schema_validate(bad_payload, mode = "test", name = "payload")
## [1] FALSE

When validating many payloads against the same schema, flatten once and reuse the flattened schema.

flat <- schema_flatten(schema)
schema_validate(flat, payload, mode = "test")
## [1] TRUE

For a data frame example, see the Get started article.

JSON Workflow

Schemas are stored as a compact JSON DSL. The DSL is not JSON Schema; it is a thin representation of checkmate checks, field schemas, local definitions, and combinators. See the Schema DSL article for the complete format reference. schema_read() and schema_write() require the suggested package jsonlite.

path <- tempfile(fileext = ".json")
schema_write(schema, path)

restored <- schema_read(path)
restored
## {
##   "check": {
##     "kind": "list"
##   },
##   "keys": {
##     "type": "named"
##   },
##   "fields": {
##     "items": {
##       "description": "Repository-like result items",
##       "check": {
##         "kind": "list"
##       },
##       "keys": {
##         "type": "unnamed"
##       },
##       "rest": {
##         "check": {
##           "kind": "list"
##         },
##         "keys": {
##           "type": "named"
##         },
##         "fields": {
##           "id": {
##             "check": {
##               "kind": "int"
##             }
##           }
##         },
##         "groups": [
##           {
##             "names": ["name", "label", "slug"],
##             "check": {
##               "kind": "string"
##             }
##           }
##         ]
##       }
##     }
##   }
## }
restored |>
    schema_validate(payload)

Example schema files are installed under inst/extdata:

system.file("extdata", "person-schema.json", package = "schemate")

Validation Modes

schema_validate() supports four modes:

Mode Return value on success Return value on failure
assert invisibly returns the input throws an error
check TRUE diagnostic string
test TRUE FALSE
expect testthat-style expectation object expectation failure object

Use assert inside application code, check when displaying diagnostics, test for control flow, and expect in tests.

Standalone Use

schemate also publishes a generated standalone bundle for packages that want the schema features without depending on schemate at runtime.

usethis::use_standalone("hongyuanjia/schemate", "schema", ref = "standalone")

Relation to Other Tools

schemate is closest in spirit to checkmate: schemas ultimately validate R objects by calling checkmate checks. It adds a schema lifecycle around those checks: infer, edit, serialize, read, and validate.

pointblank is a better fit for tabular data quality workflows, reporting, and column-oriented validation plans. schemate is deliberately narrower and more structural: it describes R values, R object names, nested lists, JSON-like payloads, and package-facing input contracts. It is not a replacement for JSON Schema or jsonvalidate, which are better choices when you need standards-compliant JSON document validation.

The R validation ecosystem is broad:

  • validate captures data validation rules that can be documented, stored, and applied to data sets.
  • assertr is designed for assertive data checks inside analysis pipelines.
  • data.validator focuses on dataset validation with reporting.
  • vetr provides template-based structural checks for R objects.
  • testthat is the right home for unit-test expectations; schema_validate(..., mode = "expect") is intended to fit into that style.

License

The project is released under the terms of MIT License.