PivotData Tool utility for pivot tables generation by CSV/TSV files or SQL database


FREE Download

Need to create pivot table reports from C# code? Check out PivotData Toolkit for .NET.

PivotData is a command line utility that aggregates data from CSV/TSV file or SQL database and generates pretty pivot table reports (HTML/Excel/CSV/PDF) without OLAP server or Excel. It it good for automated and scheduled crosstab reports generation.
Pivotdata tool usage doesn't need any programming skills.

Installation

  • Windows: special installation procedure is not needed; pivotdata.exe is a .NET console utility that requires Microsoft .NET Framework 4.0 (or higher). In most cases it is already installed if you have Win7SP1/Win8.
  • Linux/MacOS: please contact us if you interested in PivotData Tool version for these platforms
  • Web:

Features

PivotData Tool can:
  • aggregate large datasets that Excel cannot handle (csv files >1gb are processed in several minutes)
  • load data directly from MS SQL (or any other DB that has ODBC driver)
  • load pre-aggregated data (for example result of SQL GROUP BY query)
  • parse input values (dates, numbers, custom regex rules)
  • calculate derived values and formulas
  • perform data cube slicing and filtering
  • save data cube for fast multiple reports generation
  • generate 2-dimensional pivot table reports (HTML, CSV, Excel; PDF may be produced by HTML output with wkhtmltopdf)
  • upload data cube to URL (useful for refreshing data in SemanticTool)

Synopsis command line options

pivotdata -i inputHandler inputOptions [dataOptions] -o outputHandler outputOptions

Any pivotdata task performs two basic steps:
  • build in-memory data cube structure (result of data aggregation or loading previously aggregated results)
  • do something with data cube and produce some valuable output (like pivot table report generation)
Process can be described by the following diagram:
 ______       ________       _________       ________       ______
|      |     |applying|     |PivotData|     |applying|     |      |
|input | --> |mapping | --> |in-memory| --> |slicing | --> |output|
|source|     |handlers|     |  cube   |     |filters |     |result|
|______|     |________|     |_________|     |________|     |______|

Input Handlers input sources for building in-memory data cube

List of supported input handlers (-i option):
-i cubefile <cube_file>
load data cube from serialized data file (pair of .cfg and .dat files). Input file may have extension (.cfg) or be without extension at all (in this case .cfg is appended automatically). For example:
pivotdata -i cubefile myCube.cfg
This handler doesn't need cube configuration option (-p) because it is already present in .cfg file; mapping handlers (-m) are also not applicable with "cubefile" input handler.
-i csv {csv_file} [{csv_options_json}] -p {pivot_data_cube_config_json}
aggregate data from csv file
{csv_file} is a path to the local file or "-" (read csv data from stdin)
{csv_options_json} (optional) is a JSON object with CSV reader settings:
Delimiter
string used as separator in CSV file (optional, ',' by default)
HasHeaderRecord
true|false (optional, true by default)
Headers
array of CSV column headers (optional, used when CSV doesn't have header row)
IgnoreBlankLines
true|false (optional, true by default)
TrimHeaders
true|false (optional, false by default)
RowsLimit
integer: max number of rows to read (optional). Useful for testing purposes with large input files.
Encoding
encoding name (optional, UTF-8 by default). Possible values are returned by Encoding.GetEncodings method.
Pivot data cube definition -p {pivot_data_cube_config_json} is required for 'csv' input handler.
Example:
pivotdata -i csv myData.csv "{Delimiter:';'}" -p "{Dimensions:['Contact Name', 'Amount', 'Pay Date'],Aggregators:[{Name:'Sum',Params:['Amount']}]}"
-i <mssql|odbc> {db_source_options_json} -p {pivot_data_cube_config_json}
aggregate data returned by SELECT using MSSQL/AzureSQL (mssql) or ODBC (odbc) driver.
Cube configuration -p {pivot_data_cube_config_json} is required for this input handler. {db_source_options_json} is a JSON object with DB reader settings:
Connection
DB connection string
SelectSql
SELECT statement that returns data to aggregate
Example:
pivotdata -i mssql "{Connection:'Data Source=localhost\\SQLEXPRESS;Database=AdventureWorks2014;User=usr;password=pwd;',SelectSql:'select ProductID,OrderQty,UnitPrice from Sales.SalesOrderDetail'}" -p "{Dimensions:['ProductID','OrderQty'],Aggregators:[{Name:'Sum',Params:['UnitPrice']}]}" -o cubefile AdvWorksSalesCube

Data Options data mapping rules, slicing and filtering

-p {pivot_data_cube_config_json}
Define multidimensional dataset configuration for input data source. Described by JSON object:
Dimensions
array of column names from csv file like: ['Contact Name', 'Amount', 'Pay Date']
Aggregators
array of data cube value aggregators. Supported aggregators:
{Name:'Count'} Number of matched rows
{Name:'Sum',Params:['Salary']} Sum of 'Salary' column values
{Name:'Average',Params:['Salary']} Average of 'Salary' column values
{Name:'Min',Params:['Salary']} Minimum of 'Salary' column values
{Name:'Max',Params:['Salary']} Maximum of 'Salary' column values
{Name:'CountUnique',Params:['Contact Name']} Number of unique 'Contact Name' column values
-m {mapping_config_json}
Apply value mapping rules for each row from input data source. This handler can be used for parsing string values (dates, numbers) and defining derived values (year/month/quarter/day date parts, formatting values). Mapping is defined by JSON object "column_name"=>{mapping handler descriptor}, for example:
{ 'create_date' : {Type:'datetime'} }
where 'create_date' is a column that should be parsed as DateTime value.
List of supported mapping handlers:
{Type:'datetime'} Parses input string as DateTime value
{Type:'integer'} Parses input string as integer value
{Type:'double'} Parses input string as floating-point value
{Type:'decimal'} Parses input string as decimal value
{Type:'format',Params:['Year: {0}', 'year_column_name' ]} Formats string value; first parameter is a format specifier (acceptable by .NET String.Format) and rest of parameters are column names used for getting format arguments. It is possible to use more than 1 argument.
{Type:'getyear',Params:['date_column']} Returns year component of the specified column's date
{Type:'getmonthnumber',Params:['date_column']} Returns month component of the specified column's date
{Type:'getquarter',Params:['date_column']} Returns quarter (1-4) component of the specified column's date
{Type:'getdaynumber',Params:['date_column']} Returns day component of the specified column's date
{Type:'getdateonly',Params:['datetime_column']} Returns date component (ignore time part) of the specified column's datetime
{Type:'regexismatch',Params:['regular_expression','input_value_column']} Returns test result of .NET regular expression match (true or false)
{Type:'regexmatch',Params:['regular_expression','input_value_column']} Returns matched value of .NET regular expression . To return group value use name "value", for example: (?<value>.*)
{Type:'regexreplace',Params:['regular_expression','replacement','input_value_column']} Returns result of .NET regular expression replace (string)
{Type:'formula',Params:['expression']} Returns evaluation result of lambda expression. Row values could be accessed as Data["column"] or Row["column"]
{Type:'lowercase',Params:['input_value_column']} Returns lowercase value.
Example of mapping configuration for parsing input values:
pivotdata -i {input_source} -m "{'OrderDate':{Type:'datetime'},'Total':{Type:'double'}}"
Note that it is possible to apply several mapping rules at once:
pivotdata -i {input_source} -m "{'OrderDate':{Type:'datetime'},'Total':{Type:'double'}}" -m "{'OrderYear':{Type:'getyear',Params:['OrderDate']}}"
(in this case mapping handlers are applied in the same order as they specified)
-s {slice_query_json}
Transform data cube according to the specified query.

{slice_query_json} is a JSON object:
Dimensions
array of dimensions to select
AggregatorIndexes
array of aggregator indexes to select
Filter
formula expression (string) for filtering datapoints
DerivedDimensions
dimension name → formula expression map (supports merging dimension keys by regex rule or Damerau-Levenshtein distance criteria -- useful for handling data with typos)
Examples:
-s "{Dimensions:['Date Year'],AggregatorIndexes:[1], Filter:'Dimensions[""Date Year""]==2015'}"
-s "{DerivedDimensions: {'Model Family': 'Merge.ByRegex(Dimensions[""model""], ""(Model_Z1|Model_Z2)"", ""Model Z"")' }"
-s "{DerivedDimensions: {'Status': 'Merge.ByDamerauLevenshteinDistance(Dimensions[""status""], new[] {""open"", ""closed"", ""in progress""})' }"
-g <count field name>
Load input data as pre-grouped cube datapoints. This option is not applicable with '-i cubefile'.
<count field name> identifies a column with count aggregate value (required).
Each input row should unique define value key and aggregated value. This option is useful for aggregating cube data on DB side with GROUP BY:
pivotdata -i mssql "{Connection:'Data Source=localhost\\SQLEXPRESS;Database=AdventureWorks2014;User=usr;password=pwd;',SelectSql:'select ProductID,OrderQty,SUM(UnitPrice) as UnitPrice,COUNT(UnitPrice) as cnt from Sales.SalesOrderDetail group by ProductID,OrderQty'}" -p "{Dimensions:['ProductID','OrderQty'],Aggregators:[{Name:'Sum',Params:['UnitPrice']}]}" -g cnt -o cubefile AdvWorksAggrOnDbSide
-t {pivot_table_report_config_json}
Define pivot table report configuration.
{pivot_table_report_config_json} is a JSON object:
Rows
array of dimensions that should be used for building rows of the pivot table
Columns
array of dimensions that should be used for building columns of the pivot table
AggregatorIndex
(optional) use aggregator at specified index for pivot table values. Applicable only if data cube has more than one aggregator.

Output Handlers results produced from in-memory data cube

List of supported output handlers (-o option):
-o cubefile <output_file>
write aggregated data to the {output_file}. Actually 2 files are created: .cfg with data cube configuration (json) and .dat with binary data. For example:
pivotdata -i {input_source_options} -o cubefile Payments_Aggregated
-o uploadpackage {config_json}
compose compressed cube package (.zip file) and upload it to the specified HTTP endpoint. {config_json} is a JSON object with the following settings:
Name
Unique cube name (used as file name for zip package)
UploadUrl
HTTP or HTTPS endpoint URL
Username
HTTP Basic Authentication username (optional)
Password
HTTP Basic Authentication password (optional)
This output handler can be used for adding/updating SemanticTool data with REST API:
-o uploadpackage "{Name:'Road Accidents',UploadUrl:'https://app.semantictool.net/api/v1/cubes/import/package?overwrite=true', Username:'<username>',Password:'<access_code>'}"
-o getvalue {value_key_json}
write aggregated value for specified dimension keys to stdout. {value_key_json} is a JSON array of dimension keys, for example:
pivotdata -i {input_source_options} -o getvalue "[2007,5]"
Totals value can be obtained by specifying 'null' key for some or all dimensions. Example of getting grand total value:
pivotdata -i {input_source_options} -o getvalue "[null,null]"
-o pivottablehtml <output_file>
generate pivot table report and render it as HTML table. Pivot table configuration (-t option) is required for this output handler.
<output_file> is a path to local file or "-" (write to stdout)
Example:
pivotdata -i {input_source_options} -t "{Rows:['Year', 'Quarter'],Columns:['Ethnicity']}" -o pivottablehtml -
pivotdata -i {input_source_options} -t "{Rows:['Year'],Columns:[]}"  -o pivottablehtml MyPivotTableReport.html
-o pivottablecsv <output_file> [{csv_opts_json}]
generate pivot table report in CSV format. Pivot table configuration (-t option) is required for this output handler.
<output_file> is a path to local file or "-" (write to stdout)
Optional CSV settings {csv_opts_json} (JSON object):
CsvDelimiter
string used as CSV values separator
TotalsRow
(true|false) if true totals column is renderered (default)
TotalsColumn
(true|false) if true totals row is renderered (default)
GrandTotal
(true|false) if true grand total is rendered (default)
RenderRowHeaderColumn
(true|false) if true row keys are rendered as csv values (default)
Example:
pivotdata -i {input_source_options} -t "{Rows:['Year'],Columns:[]}" -o pivottablecsv report.csv "{CsvDelimiter:';'}"
-o pivottableexcel <output_file> [{excel_opts_json}]
generate pivot table report in Excel format. Pivot table configuration (-t option) is required for this output handler.
<output_file> is a path to local file or "-" (write to stdout)
Optional Excel settings {excel_opts_json} (JSON object):
WorkSheetName
title of Excel Worksheet with pivot report data
Example:
pivotdata -i {input_source_options} -t "{Rows:['Year'],Columns:[]}" -o pivottableexcel report.xlsx "{WorkSheetName:'Sales 2015'}"

Global Options

-h
Display help
--silent
Be less verbose: do not show info messages
--debug
When error occurs show full exception stacktrace (useful for bugreports)
--setculture <cultureName>
Explicitly set culture used for parsing input values (dates,numbers). For example:
en-GB can parse dates in format: dd/mm/YYYY
en-US can parse dates in format: mm/dd/YYYY
de-DE can parse dates in format: dd.mm.YYYY
More details about .NET cultures and formatting rules: MSDN CultureInfo

Licensing Conditions

PivotData Tool can be used for FREE by end-users and inside companies for internal business purposes (like server-side automations). If you want to externally redistribute it as part of your own solution/product (ISV) you need to purchase a commercial license.