Data Cube Basics PivotData C# examples


PivotData library implements in-memory multidimensional data structure (OLAP cube) for fast data aggregation and grouping in C# code. You may thought about it as lightweight embedded OLAP engine that doesn't need OLAP server or MDX queries.
SDK is divided into two parts:

OLAP library
NReco.PivotData.dll
Generic data cube implementation (PivotData class), aggregate functions, in-memory OLAP operations (SliceQuery class).
Toolkit components
NReco.PivotData.Extensions.dll
Advanced components: connectors to SQL/CSV/JSON data sources, pivot table HTML rendering, exports, formatting and calculations (percentage, difference, running total, heatmap), factory components needed for web pivot table builder functionality and many others.

What is a data cube

Formally data cube is a simple relation between complex key (represented by N dimension keys) and the aggregate function result:

(D1, …, DN) → M

This relation can be built as result of grouping tabular data by N columns and applying aggregate function to the rows inside groups:

Fact table (input dataset)
Columns
Product Country Year Amount
Product1 USA 2015 500
Product1 Canada 2015 250
Product1 Canada 2014 310
Data cube
Dimensions (N=2) Measures (Aggregators)
Product Country Rows Count Sum of Amount
Product1 USA 1 500
Product1 Canada 2 560

Data cube is configured by the list of dimensions (that correspond to the input dataset columns) and aggregate function(s).

Cubes can answer on the analytical queries like "how many" / "how much"; for example, sample cube from above can answer on questions:

  • How many orders in Canada? (2)
  • How much orders' amount for Product1? (1060)

OLAP cubes can calculate values for pivot tables (cross-tabs), pivot charts: financial/logistics/sales/marketing reports, cohort analysis reports, KPI BI dashboards etc.

OLAP Library API Overview

Class/Interface Purpose
PivotData generic IPivotData implementation of in-memory data cube. Can aggregate data from any IEnumberable source or IDataReader.
IAggregatorFactory represents summary function. Used as parameter for PivotData.
SliceQuery Performs OLAP operations as transformation of input IPivotData to new PivotData.
PivotTable Pivot table data model (rows x columns) based on underlying IPivotData data cube.
PivotDataState Represents IPivotData instance serializable data, implements fast binary serialization/deserialization.

How to build a data cube

PivotData is configured with the list of dimensions and aggregate function:

var cube = new PivotData(new[] {"Product", "Country" }, new CountAggregatorFactory() );

Several measures may be collected in one cube with help of CompositeAggregatorFactory:

var cube = new PivotData(
  new[] {"Product", "Country" },         // list of input columns/fields for dimensions  
  new CompositeAggregatorFactory(
    new CountAggregatorFactory(),        // count is a measure #0
    new SumAggregatorFactory("Amount")   // sum of amount is a measure #1 
  ) );

It is possible to perform data aggregation with ProcessData method:

DataTable ordersTable; // this is our sample facts table
cube.ProcessData( new DataTableReader(ordersTable) );  // accepts any IDataReader implementation

It processed input data as a stream, and suitable for handling datasets with millions of records. ProcessData has an overload that accepts any IEnumerable facts source; in this case in addition to the collection it needs special values accessor delegate. You can call ProcessData several times if needed (this is useful for updating the cube with new portion of the facts).

NOTE: it is possible to perform data aggregation outside PivotData library (say, with database query in ROLAP way) and use aggregation results to initialize PivotData instance: how to load pre-aggregated data.

Accessing cube values

Cube values are represented by IAggregator objects and accessed with indexer by dimension keys:

IAggregator m = cube["Product1", "USA"]; // number of keys should match cube dimensions
object mValues = m.Value;                // type of m.Value depends on the aggregate function
var countValue = m.AsComposite().Aggregators[0].Value;  // access composite aggregator value #0
var sumValue = m.AsComposite().Aggregators[1].Value;	// access composite aggregator value #1

IAggregator.Value property returns a value of the aggregator and IAggregator.Count returns a number of aggregated objects.

Key.Empty is used to roll-up by some dimension(s) and get the sub-total value:

var product1Aggr = cube["Product1", Key.Empty];
var grandTotalAggr = cube[Key.Empty,Key.Empty];

PivotData implements IEnumerable and all (D,V) pairs may be iterated with a simple foreach:

foreach (var entry in cube) {
	// entry.Key holds array of dimension keys
	var measure = entry.Value; // implements IAggregator interface
	Console.WriteLine("D1={0} D2={1} V={2}", entry.Key[0], entry.Key[1], measure.Value); 
}

It is possible to get all dimension keys collected by PivotData with GetDimensionKeys method:

var allKeys = cube.GetDimensionKeys(); // returns array of keys for each dimension
var productKeys = allKeys[0]; // dimension at index=0 is "Product"
var countryKeys = allKeys[1]; // dimension at index=1 is "Country"

Next section: List of aggregate functions