Data Cube Basics PivotData C# examples

PivotData library implements in-memory multidimensional data structure (OLAP cube) for fast data aggregation and grouping in C# code. You may thought about it as lightweight embedded OLAP engine that doesn't need OLAP server or MDX queries.
SDK is divided into two parts:

OLAP library
NReco.PivotData.dll: Generic data cube implementation (PivotData class), aggregate functions, in-memory OLAP operations (SliceQuery class).
Toolkit components
NReco.PivotData.Extensions.dll: Advanced components: connectors to SQL/CSV/JSON data sources, pivot table HTML rendering, exports, formatting and calculations (percentage, difference, running total, heatmap), factory components needed for web pivot table builder functionality and many others.

What is a data cube

Formally data cube is a simple relation between complex key (represented by N dimension keys) and the aggregate function result:

(D₁, …, D_N) → M

This relation can be built as result of grouping tabular data by N columns and applying aggregate function to the rows inside groups:

Fact table (input dataset)

Columns
Product	Country	Year	Amount
Product1	USA	2015	500
Product1	Canada	2015	250
Product1	Canada	2014	310

→

Data cube

Dimensions (N=2)		Measures (Aggregators)
Product	Country	Rows Count	Sum of Amount
Product1	USA	1	500
Product1	Canada	2	560

Data cube is configured by the list of dimensions (that correspond to the input dataset columns) and aggregate function(s).

Cubes can answer on the analytical queries like "how many" / "how much"; for example, sample cube from above can answer on questions:

How many orders in Canada? (2)
How much orders' amount for Product1? (1060)

OLAP cubes can calculate values for pivot tables (cross-tabs), pivot charts: financial/logistics/sales/marketing reports, cohort analysis reports, KPI BI dashboards etc.

OLAP Library API Overview

Class/Interface	Purpose
`PivotData`	generic `IPivotData` implementation of in-memory data cube. Can aggregate data from any `IEnumberable` source or `IDataReader`.
`IAggregatorFactory`	represents summary function. Used as parameter for `PivotData`.
`SliceQuery`	Performs OLAP operations as transformation of input `IPivotData` to new `PivotData`.
`PivotTable`	Pivot table data model (rows x columns) based on underlying `IPivotData` data cube.
`PivotDataState`	Represents `IPivotData` instance serializable data, implements fast binary serialization/deserialization.

How to build a data cube

PivotData is configured with the list of dimensions and aggregate function:

var cube = new PivotData(new[] {"Product", "Country" }, new CountAggregatorFactory() );

Several measures may be collected in one cube with help of CompositeAggregatorFactory:

var cube = new PivotData(
  new[] {"Product", "Country" },         // list of input columns/fields for dimensions  
  new CompositeAggregatorFactory(
    new CountAggregatorFactory(),        // count is a measure #0
    new SumAggregatorFactory("Amount")   // sum of amount is a measure #1 
  ) );

It is possible to perform data aggregation with ProcessData method:

DataTable ordersTable; // this is our sample facts table
cube.ProcessData( new DataTableReader(ordersTable) );  // accepts any IDataReader implementation

It processed input data as a stream, and suitable for handling datasets with millions of records. ProcessData has an overload that accepts any IEnumerable facts source; in this case in addition to the collection it needs special values accessor delegate. You can call ProcessData several times if needed (this is useful for updating the cube with new portion of the facts).

NOTE: it is possible to perform data aggregation outside PivotData library (say, with database query in ROLAP way) and use aggregation results to initialize PivotData instance: how to load pre-aggregated data.

Accessing cube values

Cube values are represented by IAggregator objects and accessed with indexer by dimension keys:

IAggregator m = cube["Product1", "USA"]; // number of keys should match cube dimensions
object mValues = m.Value;                // type of m.Value depends on the aggregate function
var countValue = m.AsComposite().Aggregators[0].Value;  // access composite aggregator value #0
var sumValue = m.AsComposite().Aggregators[1].Value;	// access composite aggregator value #1

IAggregator.Value property returns a value of the aggregator and IAggregator.Count returns a number of aggregated objects.

Key.Empty is used to roll-up by some dimension(s) and get the sub-total value:

var product1Aggr = cube["Product1", Key.Empty];
var grandTotalAggr = cube[Key.Empty,Key.Empty];

PivotData implements IEnumerable and all (D,V) pairs may be iterated with a simple foreach:

foreach (var entry in cube) {
	// entry.Key holds array of dimension keys
	var measure = entry.Value; // implements IAggregator interface
	Console.WriteLine("D1={0} D2={1} V={2}", entry.Key[0], entry.Key[1], measure.Value); 
}

It is possible to get all dimension keys collected by PivotData with GetDimensionKeys method:

var allKeys = cube.GetDimensionKeys(); // returns array of keys for each dimension
var productKeys = allKeys[0]; // dimension at index=0 is "Product"
var countryKeys = allKeys[1]; // dimension at index=1 is "Country"

Next section: List of aggregate functions