Performing data transformations is a bit complex, as it cannot be achieved by writing a single SQL query and then comparing the output with the target. For ETL Testing Data Transformation, you may have to write multiple SQL queries for each row to verify the transformation rules.
To start with, make sure the source data is sufficient to test all the transformation rules. The key to perform a successful ETL testing for data transformations is to pick the correct and sufficient sample data from the source system to apply the transformation rules.
The key steps for ETL Testing Data Transformation are listed below −
- The first step is to create a list of scenarios of input data and the expected results and validate these with the business customer. This is a good approach for requirements gathering during design and could also be used as a part of testing.
- The next step is to create the test data that contains all the scenarios. Utilize an ETL developer to automate the entire process of populating the datasets with the scenario spreadsheet to permit versatility and mobility for the reason that the scenarios are likely to change.
- Next, utilize data profiling results to compare the range and submission of values in each field between the target and source data.
- Validate the accurate processing of ETL generated fields, e.g., surrogate keys.
- Validating the data types within the warehouse are the same as was specified in the data model or design.
- Create data scenarios between tables that test referential integrity.
- Validate the parent-to-child relationships in the data.
- The final step is to perform lookup transformation. Your lookup query should be straight without any aggregation and expected to return only one value per the source table. You can directly join the lookup table in the source qualifier as in the previous test. If this is not the case, write a query joining the lookup table with the main table in the source and compare the data in the corresponding columns in the target.
ETL Testing – Data Quality
Checking data quality during ETL testing involves performing quality checks on data that is loaded in the target system. It includes the following tests −
Number check
The Number format should be same across the target system. For example, in the source system, the format of numbering the columns is x.30, but if the target is only 30, then it has to load not prefixing x. in target column number.
Date Check
The Date format should be consistent in both the source and the target systems. For example, it should be same across all the records. The Standard format is: yyyy-mm-dd.
Precision Check
Precision value should display as expected in the target table. For example, in the source table, the value is 15.2323422, but in the target table, it should display as 15.23 or round of 15.
Data Check
It involves checking the data as per the business requirement. The records that don’t meet certain criteria should be filtered out.
Example − Only those records whose date_id >=2015 and Account_Id != ‘001’ should load in the target table.
Null Check
Some columns should have Null as per the requirement and possible values for that field.
Example − Termination Date column should display Null unless and until its Active status Column is “T” or “Deceased”.
Other Checks
Common checks like From_Date should not greater than To_Date can be done.