This tool has been designed to do balancing tests between two datasets efficiently and accurately, and with little hassle. In most cases a source data extract will be used to determine if the data structure and content of your QlikView data model is balancing back to the source system and that your analysis of the data is correct.
In this example we will be using 2 small files which will be easy to cross check manually by eye but will be very difficult to do on actual large datasets as indicated in below screenshots.
Data Set 1(This represents a data extract from the source):
Data Set 2(This represents the bi end result data needed to be balance checked). Note some differences has been highlighted which has been spotted manually by eye:
All the logic of how the tool works can be found in the load script of the ‘Data Balancing Tool.qvw’ model.
The only configuration you will require to do in the load script is to load the source(Dataset1) and compared(Dataset2) datasets into tables Dataset1 and Dataset2 as indicated in below examples. Please take note that you need to add the single column or concatenate the range of columns you need to compare (please note that if you are comparing a range of columns they need to be in the same sequence in both loads.) into column LINK. Below is an example of cross checking only the ID:
Dataset1:
NoConcatenate
//Replace below load statement with your source file
LOAD ID,
Country,
Branch,
EmployeeID,
ProductID,
Revenue,
[No Of Units],
Discount,
//Concatenate columns to cross balance as LINK field(The concatenation order of columns need to be the same for source and compared files.)
ID AS LINK
FROM
[C:\Work\Data Balancing Template\Data Set 1.xlsx]
(ooxml, embedded labels, table is Sheet1);
Dataset2:
NoConcatenate
//Replace below load statement with your file you want to have compared to your source extract.
LOAD ID,
Country,
Branch,
EmployeeID,
ProductID,
Revenue,
[No Of Units],
Discount,
//Concatenate columns to cross balance as LINK field(The concatenation order of columns need to be the same for source and compared files.)
ID AS LINK
FROM
[C:\Work\Data Balancing Template\Data Set 2.xlsx]
(ooxml, embedded labels, table is Sheet1);
Once Reload is completed you will be presented with a single table in the table viewer called Comparison:
In the frontend you will be presented with the following output (please note the comparison table needs to be updated with the column names according to the datasets you are comparing in the comparison Table.):
Selecting the Mismatch value in the Comparison Mismatches filter will return only the records that do not have matching LINK values
To return values in dataset 2 missing in dataset1 select the ‘Dataset1 Value Missing’ in the ‘Dataset1 has value’ filter
To return values in dataset 1 missing in dataset2 select the ‘Dataset2 Value Missing’ in the ‘Dataset1 has value’ filter
The Summary chart displays the following information
No Of Possible LINKS: Indicates total number of possible values in both files combined = 35
Links in Dataset1: Indicates number of links contributed from Dataset 1 = 30
Links in Dataset2: Indicates number of links contributed from Dataset 2 = 34
No Of Comparison Mismatches: Total number of values not matching = 6
Dataset1 values not in Dataset 2 = 1
Dataset2 values not in Dataset 1 = 5
You are also able to compare multiple columns or the whole record of the 2 datasets by concatenating the columns you wish to compare with each other into the LINK column
For this example you can just comment out all of the statements in the Single ID Column Compare tab of the load script and navigate to the Multiple Column Compare tab in the load script and uncomment the script and reload the example:
Dataset1:
NoConcatenate
//Replace below load statement with your source file
LOAD ID,
Country,
Branch,
EmployeeID,
ProductID,
Revenue,
[No Of Units],
Discount,
//Concatenate columns to cross balance as LINK field(The concatenation order of columns need to be the same for source and compared files.)
ID &
Country &
Branch &
EmployeeID &
ProductID &
Revenue &
[No Of Units] &
Discount AS LINK
FROM
[C:\Work\Data Balancing Template\Data Set 1.xlsx]
(ooxml, embedded labels, table is Sheet1);
Dataset2:
NoConcatenate
//Replace below load statement with your file you want to have compared to your source extract.
LOAD ID,
Country,
Branch,
EmployeeID,
ProductID,
Revenue,
[No Of Units],
Discount,
//Concatenate columns to cross balance as LINK field(The concatenation order of columns need to be the same for source and compared files.)
ID &
Country &
Branch &
EmployeeID &
ProductID &
Revenue &
[No Of Units] &
Discount AS LINK
FROM
[C:\Work\Data Balancing Template\Data Set 2.xlsx]
(ooxml, embedded labels, table is Sheet1);
For the Output of the above example please navigate to the Multiple Column Compare Tab
You will see that the comparison results now span over the whole dataset for a column to column comparison.
Take Note of how the LINK column changed in this example
Simply by selecting the ‘Mismatch’ value from the ‘Comparison Mismatches’ list box you will instantly have a list of all the records that have mismatch in either direction of the comparison between the 2 files in any of the columns. Please note that the Comparison chart needs to be manually updated with all the columns of the datasets you are comparing which I have done for the explanation of the tool. This can be done very simply without colour coding unless you will be comparing the 2 datasets on a regular basis in which case the tool can be reused once setup for the specific datasets. The summaries and filters will automatically populate without any intervention.