Data Cube Basics PivotData C# examples


PivotData library implements in-memory multidimensional data structure (OLAP cube) for fast (real-time) data aggregation and grouping in C# code. You may thought about it as lightweight embedded OLAP engine that doesn't need OLAP server or MDX queries.

What is a data cube

Formally data cube is a simple relation between complex key (represented by N dimension keys) and the aggregate function result:
(D1, …, DN) → M

This relation can be built as result of grouping tabular data by N columns and applying aggregate function to the rows inside groups:

Fact table (orders)
Columns
Product Country Year Amount
Product1 USA 2015 500
Product1 Canada 2015 250
Product1 Canada 2014 310
Data cube
Dimensions (N=2) Measures (Aggregators)
Product Country Rows Count Sum of Amount
Product1 USA 1 500
Product1 Canada 2 560

Data cube is configured by the list of dimensions (that correspond to the base table columns) and aggregate function. Aggregate function may be complex and collect several measures (for example, rows count and sum of the some column).

Dimensions might be organized in the hierarchies to reflect parent-child relationships (for example, "Country" / "City" or "Year" / "Quarter" / "Month" / "Day"). Hierarchies are mainly used for vizualization purposes; since these relationships are not important for implementing cube operations PivotData doesn't need them.

Cubes can answer on the analytical queries like "how many" / "how much" without processing facts table on every request; for example, sample cube from above can answer on questions:

  • How many orders in Canada? (2)
  • How much orders' amount for Product1? (1060)

OLAP cubes are very useful for building pivot tables (crosstabs), pivot charts, calculating subtotals in financial reports and KPI metrics in dashboards.

How to create a cube with PivotData

Generic implementation of the data cube is provided by PivotData class; dimensions are configured by the list of base table column names and aggregate function is defined with factory component that implements IAggregatorFactory:

var cube = new PivotData(new[] {"Product", "Country" }, new CountAggregatorFactory() );

Several measures may be collected with help of CompositeAggregatorFactory:

var cube = new PivotData(
  new[] {"Product", "Country" },         // list of input columns/fields for dimensions  
  new CompositeAggregatorFactory(
    new CountAggregatorFactory(),        // count is a measure #0
    new SumAggregatorFactory("Amount")   // sum of amount is a measure #1 
  ) );

Once PivotData instance is created it can compute measures by facts data (collection, table, data reader etc) with ProcessData method:

DataTable ordersTable; // this is our sample facts table
cube.ProcessData( new DataTableReader(ordersTable) );  // accepts any IDataReader implementation

ProcessData has an overload that accepts any IEnumerable facts source; in this case in addition to the collection it needs special values accessor delegate. You can call ProcessData several times if needed (this is useful for updating the cube in the real-time with new portion of the facts).

Accessing cube values

Cube values are represented by IAggregator objects and accessed with indexer by dimension keys:

IAggregator m = cube["Product1", "USA"]; // number of keys should match cube dimensions
object mValues = m.Value;                // type of m.Value depends on the aggregate function
var countValue = m.AsComposite().Aggregators[0].Value;  // access composite aggregator value #0
var sumValue = m.AsComposite().Aggregators[1].Value;	// access composite aggregator value #1

IAggregator.Value property returns a value of the aggregator and IAggregator.Count returns a number of aggregated source objects.

Key.Empty is used to roll-up by some dimension(s) and get the sub-total value:

var product1Aggr = cube["Product1", Key.Empty];
var grandTotalAggr = cube[Key.Empty,Key.Empty];

PivotData implements IEnumerable and all (D, V) pairs may be iterated with a simple foreach:

foreach (var entry in cube) {
	// entry.Key holds array of dimension keys
	var measure = entry.Value; // implements IAggregator interface
	Console.WriteLine("D1={0} D2={1} V={2}", entry.Key[0], entry.Key[1], measure.Value); 
}

It is possible to get all dimension keys collected by PivotData with GetDimensionKeys method:

var allKeys = cube.GetDimensionKeys(); // returns array of keys for each dimension
var productKeys = allKeys[0]; // dimension at index=0 is "Product"
var countryKeys = allKeys[1]; // dimension at index=1 is "Country"

Aggregate functions

PivotData library implements the following factory components for basic aggregate functions:

Factory component Type of Value Description Example
CompositeAggregatorFactory object[] Used for collecting several measures; accepts list of aggregator factories.
new CompositeAggregatorFactory(
  new CountAggregatorFactory(),
  new SumAggregatorFactory("amount")
)
CountAggregatorFactory uint Counts number of rows from facts table
new CountAggregatorFactory()
CountUniqueAggregatorFactory int Counts number of unique values from specified facts table column
new CountUniqueAggregatorFactory(
  "company_title")
SumAggregatorFactory decimal Computes the sum of the specified column values from facts table
new SumAggregatorFactory("amount")
MinAggregatorFactory (type of specified field) Computes the minimum of the specified column values from facts table
new MinAggregatorFactory("age")
MaxAggregatorFactory (type of specified field) Computes the maximum of the specified column values from facts table
new MaxAggregatorFactory("age")
AverageAggregatorFactory decimal Computes the average of the specified column values from facts table
new AverageAggregatorFactory("age")
ListUniqueAggregatorFactory ICollection Collects only unique values from specified facts table column
new ListUniqueAggregatorFactory("age")
ListAggregatorFactory ICollection Collects all references (or values if field is specified) to the source fact objects (like LINQ GroupBy).
new ListAggregatorFactory()
new ListAggregatorFactory("name")
VarianceAggregatorFactory double Computes mean/variance/sample variance/standard deviation/sample std-dev
new VarianceAggregatorFactory(
  "exchange_rate", 
  VarianceAggregatorValueType.SampleStdDevValue) )
FormulaAggregatorFactory (depends on formula result) Compute custom formula value by arguments of specified aggregators
new FormulaAggregatorFactory(
 "Actual/Estimated", 
 (argValues) => {
  var actual = Convert.ToDecimal(argValues[0]);
  var estimated = Convert.ToDecimal(argValues[1]);
  return actual/estimated;
 },
 new [] {
  new SumAggregatorFactory("actual"),
  new SumAggregatorFactory("estimated")
 } )

It is possible to use custom aggregator (single-pass summary function) by implementing IAggregatorFactory/IAggregator interfaces.


Next section: OLAP operations (query and filter)