Web pivot tables by ElasticSearch Data Source Setup


Cube schema is specified in the appsettings.json file PivotDataService:Cubes section.

Cube config for ElasticSearch should have "SourceType": "elasticsearch" and an additional "SourceElasticSearch" section:

{
  "Id": "elastic-bank-accounts",
  "Name": "Bank Accounts from sample ElasticSearch dataset",
  "SourceType": "elasticsearch",
  "SourceElasticSearch": {
    "ConnectionUrl": "https://ReadOnly:nxxr8qt7n26c7jwijhwg1ipnarf402yn@thorin-us-east-1.searchly.com",
    "Index": "bank",
	// optional: define filtering conditions to limit number of documents to scan
	//"FilterRelex": " timestamp >= "2019-01-01":datetime "
  },
  "InferSchema": true,  /* dimensions and measures are determined automatically by first N documents */
  "Dimensions": [],
  "Measures": [],
  "Parameters": []
}

"ConnectionURL" specifies base URL of your ElasticSearch API.
"Index" specifies index name to query; it can contain wildcards if you want to search in multiple indexes, for example: logs-myservice-*.
"FilterRelex" allows to specify filtering condition with syntax used in NReco.Data relex.

NOTE: "InferSchema" makes additional query to determine columns and their types. In production use it is better to specify list of dimensions and measures explicitely to avoid some overhead (startup delay) caused by probe query.

Dimensions

In case of ElasticSearch data source dimension name refers to the document or sub-document field name:

  "Dimensions": [
    {
      "Name": "name.keyword",  /* don't forget to specify '.keyword' suffix for text fields */
      "LabelText": "Name"
    },
    {
      "Name": "address.street.keyword",  /* use dot to specify sub-document field */
      "LabelText": "Street"
    }
  ]

It is possible to specify script field by providing "Params" array:

    {
      "Name" : "registered_year",
      "Params": ["doc[\"registered\"].empty ? null : doc[\"registered\"].date.year", "number"]
    }

First parameter is an inline script expression (ElasticSearch "painless" syntax), and second one determines result type ("number" or "string").

Instead of script you can also specify custom ES bucket aggregation JSON:

    {
      "Name" : "age_ranges",
      "Params": ["{\"histogram\":{\"field\":\"age\",\"interval\":5}}"]
    }

Measures

First measure parameter for types Sum, Average, Min, Max should refer to the document field:

  "Measures": [
     {
       "Type": "Average",
       "Params": [ "age" ],
       "LabelText": "Average Age"
     }
]

Also it is possible to define custom ES metric aggregation type with FirstValue type:

     {
       "Type": "FirstValue",
       "Params": ["{\"extended_stats\":{\"field\":\"balance\"}}", "std_deviation"],
	   "Name": "StdDevOfBalance",
       "LabelText": "StdDev of Balance"
     }

where in "Params" first value is an aggregation type JSON, and second value is a name of metric value in ES result (used in case of multi-value metric aggregation).

Parameters

Report parameter is used when you need to specify some filtering condition by user-defined vari able and use it in the "FilterRelex" condition:

  "Parameters": [
    {
      "Name": "filterByName",
      "LabelText": "Name",
      "DataType": "string"
    }
  ]

When parameter is defined it can be used in "FilterRelex" as following:

"1"="1" @filterByName[ and name.keyword="{0}":var ]

@filterByName[ ] defines parameter placeholder: everything between [ ] is added to the condition expression only if parameter is defined. "{0}":var refers to use parameter value.