Search Data File
Data file is a collection of your catalog items available on your online store. This data needs to be converted into a format that allows the search engine to parse it and provide accurate results when a query is entered.
Data sent to the search engine will be loaded into the collection
that you specify in the primary configuration file, and will also be used to generate the Search As You Type (SAYT) keywords and navigations.
This article outlines the mandatory knowledge, as well as best practices on how to structure your data so that that your search experience is the best it can be.
Your catalog items data can be curated in the following way:
- Use the GroupBy guidelines to create a good data file
- Leverage our data enrichment services that leads to improved conversion rates
Supported Data Format: JSON
Line delimited JSON is the highly recommended and preferred format for data upload. The response from the engine is returned in JSON, and all examples in both data input and output will be using JSON.
Please note that the JSON file must match the following format:
- Each record must fit on exactly one line; new line / carriage return characters must be escaped.
- Each record must not exceed 32 MB in size.
- Each record must be a single JSON object, not an array.
- Every single value must be sent with quotes (including numbers).
- The keys for each item must start with an alphabetic character and then only include alphanumeric characters and underscores.
{
"id": "1001",
"title": "Ultra Comfort Flip flops",
"gender": "Women",
"categoryId": [
"230"
],
"brand": "Nike",
"variants": [
{
"sku": "1001G-7-B",
"shoeSize": "7",
"width": "B",
"color": "Green",
"image": "1001G",
"onSale": "true",
"originalPrice": "149.95",
"finalPrice": "99.95",
"stores": [
{
"storeID": "1",
"inventory": "15",
"price": "99.95"
},
{
"storeID": "2",
"inventory": "5",
"price": "99.95"
},
{
"storeID": "3",
"inventory": "1",
"price": "99.95"
}
]
},
{
"sku": "1001P-7-B",
"shoeSize": "7",
"width": "B",
"color": "Pink",
"image": "1001P",
"onSale": "true",
"originalPrice": "149.95",
"finalPrice": "99.95",
"stores": [
{
"storeID": "1",
"inventory": "15",
"price": "99.95"
},
{
"storeID": "2",
"inventory": "5",
"price": "99.95"
},
{
"storeID": "3",
"inventory": "1",
"price": "99.95"
}
]
},
{
"sku": "1001G-6.5-B",
"shoeSize": "6.5",
"width": "B",
"color": "Green",
"image": "1001G",
"onSale": "true",
"originalPrice": "149.95",
"finalPrice": "99.95",
"stores": [
{
"storeID": "1",
"inventory": "5",
"price": "99.95"
},
{
"storeID": "2",
"inventory": "2",
"price": "99.95"
},
{
"storeID": "3",
"inventory": "2",
"price": "99.95"
}
]
},
{
"sku": "1001G-9-B",
"shoeSize": "9",
"width": "B",
"color": "Green",
"image": "1001G",
"onSale": "true",
"originalPrice": "149.99",
"finalPrice": "99.95",
"stores": [
{
"storeID": "1",
"inventory": "9",
"price": "99.95"
},
{
"storeID": "2",
"inventory": "4",
"price": "99.95"
},
{
"storeID": "3",
"inventory": "5",
"price": "99.95"
}
]
}
]
}
You can see that it contains:
- The mandatory fields of
id
andtitle
- Metadata, like gender and brand
- A
categoryId
field which will allow us to map categories onto the record - this can be an array of values - An array of color and size based variants, that itself holds another array of store-level availability
Upload Limits
To protect the system from misuse, there are a number of limits in place on uploads. If any limit is exceeded, the upload will fail. If you need to evaluate whether these limits could impact you, or if you think you need to expand beyond them, please reach out to your Customer Success Director to discuss the next steps. Following are the upload limits and restrictions:
- 50,000,000 records in a collection
- 1,000 fields (key names) in a record
- Field names are restricted to characters in
[a-zA-Z0-9-_ ]
and have a total of 80 characters or less - Maximum depth of nesting for a given record cannot exceed 8 levels. In the following example,
level1.level2.level3.nestedField
represents the fieldnestedField
at 3 levels deep, whilelevel1.level2.level3.nestedObject.metadata
represents another fieldmetadata
at 4 levels deep.
{
"level1": {
"level2": {
"level3": {
"nestedField": "value1",
"nestedObject": [
{
"metadata": "value2"
}
]
}
}
}
}
{ "id":"1", "title":"Pet Rock", "price":"0.99", "priceRange":"Under 1 dollar", "tag":"toy", "categoryId":"4"
{ "id":"2", "title":"Barbie", "price":"122.99", "priceRange":"Under 150 dollars", "tag":"toy", "categoryId":"5B"}
Error on line 1: Unexpected end-of-input: expected close marker for OBJECT (from
[Source: { "id":"1", "title":"Pet Rock", "price":"0.99", "priceRange":"Under 1 dollar", "tag":"toy", "categoryId":"4"; line: 1, column: 0])
at [Source: { "id":"1", "title":"Pet Rock", "price":"0.99", "priceRange":"Under 1 dollar", "tag":"toy", "categoryId":"4"; line: 1, column: 227]
We support CSV and XML formats, but they may not be compatible with all of our features. We advise you to check with your project lead before using either format to understand the roadmap for our support.
Data can be gzipped before uploading to improve upload speed and reduce the size of data file. The linux gzip
utility should be used to compress data. As shown in the example below, gzipped data must have a supported file type extension, followed by the .gz
extension.
curl -X POST \
--form "config=@uploadConfig.txt" \
--form "data=@fullData.json.gz" \
"https://{$$.env.customerid}.groupbycloud.com/data/v1/upload/stream"