JSON Schema
- free
- growth
- enterprise
6 minute read
JSON schema is a vocabulary used to annotate and validate JSON documents.
Follow this guide to standardize and define expectations for your events while upserting them into your Tracking Plan via the Data Catalog API.
Keywords
Keywords are properties appearing within a JSON schema object.
{
"title": "Example Schema",
"type": "object"
}
In the above snippet, the title and type are keywords.
Type-specific keywords
The type keyword specifies the data type for a JSON schema. RudderStack supports the following keywords:
Strings
The string data type is used to represent strings of text and can contain Unicode characters.
RudderStack supports the following advanced keywords for strings:
minLengthmaxLengthpatternformat
A sample schema definition for a string:
{
"type": "string",
"minLength": 2,
"maxLength": 5,
"pattern": "^(\\([0-9]{3}\\))?[0-9]{3}-[0-9]{4}$"
}
Some examples of data that are compliant and non-compliant to the above schema:
Integers and numbers
The JSON schema defines two numeric data types:
integer: Used for integral numbers.number: Used for any numeric type like integers or floating point numbers.
RudderStack supports the following advanced keywords for numbers:
multipleOfminimummaximumexclusiveMinimumexclusiveMaximum
A sample schema definition for an integer and number:
Some examples of data that are compliant and non-compliant to the Integer schema:
Some examples of data that are compliant and non-compliant to the Number schema:
Objects
You can use JSON objects to map specific keys to values.
While using the Data Catalog API, you can use the rules object to specify the property mappings for the event to be upserted in the Tracking Plan. A sample rules object is shown:
{
"identify": {
"type": "object",
"$schema": "http://json-schema.org/draft-07/schema#",
"properties": {
"anonymousId": {
"type": "string"
},
"userId": {
"type": "string"
}
}
}
}
RudderStack supports the following keywords within an object:
A sample schema definition for an object:
{
"type": "object",
"properties": {
"number": {
"type": "number"
},
"street_name": {
"type": "string"
},
"street_type": {
"enum": ["Street", "Avenue", "Boulevard"]
}
},
"additionalProperties": false
}
The following snippet highlights a JSON compliant with the above schema:
{
"number": 1600,
"street_name": "Pennsylvania",
"street_type": "Avenue"
}
The above schema definition invalidates the following JSON as it contains an undefined property direction and additionalProperties is set to false:
{
"number": 1600,
"street_name": "Pennsylvania",
"street_type": "Avenue",
"direction": "North-west"
}
RudderStack restricts usage of the following object data type keywords:
Arrays
You can use arrays for ordered elements.
RudderStack expects only objects to be present within an array.
There are two ways of using arrays in JSON:
- List validation: Sequence of arbitrary length where each item matches the same schema.
- Tuple validation: Sequence of fixed length where each item can have a different schema.
RudderStack supports only list validation and theitemskeyword to validate the items in the array.
RudderStack supports the following advanced keywords for arrays:
minItemsmaxItemsuniqueItems
A sample schema definition for an array:
{
"type": "array",
"items": {
"type": "number"
},
"minItems": 2,
"maxItems": 3,
"uniqueItems": true
}
Some examples of data that are compliant and non-compliant to the above schema:
RudderStack restricts usage of the following keywords:
Nesting properties
RudderStack supports defining complex nested properties within an object or array while defining the event properties.
A sample object highlighting the nested properties is shown:
{
"type": "object",
"properties": {
"traits": {
"type": "object",
"properties": {
"name": {
"type": "string"
},
"industry": {
"type": "string"
},
"plan": {
"type": "string"
}
}
}
}
}
Note that:
- RudderStack supports up to three levels of nesting within an event property.
- You can nest properties only within an object or an array.
- Removing the parent object or array automatically removes all the nested properties.
- If not explicitly declared, RudderStack allows all data types for a property by default. However, it does not support nesting for that property.
- You cannot nest properties within a property having both array and object data types.
Enum
You can use the enum keyword to restrict a field to a fixed set of values.
Note that an enum must be an array satisfying the following conditions:
- It contains at least one element.
- Each element must be unique.
A sample schema definition for enum:
{
"enum": ["Red", "Green", "Amber", null, 100]
}
Some examples of data that are compliant and non-compliant to the above schema:
Boolean
The boolean data type supports only two values - true and false. RudderStack does not support values that evaluate to true or false, like 1 and 0.
A sample schema definition for a boolean:
{
"type": "boolean"
}
Null
The null data type accepts only one value - null.
A sample schema definition for null data type:
{
"type": "null"
}
Some examples of data that are compliant and non-compliant to the Null schema shown above:
Multi data types
RudderStack also supports specifying multi data types for the event properties along with the above data types.
A sample schema definition for multi data types:
{
"type": ["string", "integer", "boolean", "null"]
}
Some examples of data that are compliant and non-compliant to the above schema:
Metadata
You can use the metadata parameter to provide generic keywords such as annotations and comments to provide additional context and meaning to your JSON schema.
RudderStack supports the below generic keywords:
Restricted keyword structures
Apart from the data type-specific keywords, RudderStack also restricts usage of the following keyword structures:
