ETL testing covers all the steps involved in an ETL lifecycle. It starts with understanding the business requirements till the generation of a summary report.
The common steps under ETL Testing lifecycle are listed below −
- Understanding the business requirement.
- Validation of the business requirement.
- Test Estimation is used to provide the estimated time to run test-cases and to complete the summary report.
- Test Planning involves finding the Testing technique based on the inputs as per business requirement.
- Creating test scenarios and test cases.
- Once the test-cases are ready and approved, the next step is to perform pre-execution check.
- Execute all the test-cases.
- The last step is to generate a complete summary report and file a closure process.
ETL Testing – Scenarios
ETL Test Scenarios are used to validate an ETL Testing Process. The following table explains some of the most common scenarios and test-cases that are used by ETL testers.
Test Scenarios | Test-Cases |
---|---|
Structure Validation | It involves validating the source and the target table structure as per the mapping document.Data type should be validated in the source and the target systems.The length of data types in the source and the target system should be same.Data field types and their format should be same in the source and the target system.Validating the column names in the target system. |
Validating Mapping document | It involves validating the mapping document to ensure all the information has been provided. The mapping document should have change log, maintain data types, length, transformation rules, etc. |
Validate Constraints | It involves validating the constraints and ensuring that they are applied on the expected tables. |
Data Consistency check | It involves checking the misuse of integrity constraints like Foreign Key.The length and data type of an attribute may vary in different tables, though their definition remains same at the semantic layer. |
Data Completeness Validation | It involves checking if all the data is loaded to the target system from the source system.Counting the number of records in the source and the target systems.Boundary value analysis.Validating the unique values of primary keys. |
Data Correctness Validation | It involves validating the values of data in the target system.Misspelled or inaccurate data is found in table.Null, Not Unique data is stored when you disable integrity constraint at the time of import. |
Data Transform validation | It involves creating a spreadsheet of scenarios for input values and expected results and then validating with end-users.Validating parent-child relationship in the data by creating scenarios.Using data profiling to compare the range of values in each field.Validating if the data types in the warehouse are same as mentioned in the data model. |
Data Quality Validation | It involves performing number check, date check, precision check, data check, Null check, etc.Example − Date format should be same for all the values. |
Null Validation | It involves checking the Null values where Not Null is mentioned for that field. |
Duplicate Validation | It involves validating duplicate values in the target system when data is coming from multiple columns from the source system.Validating primary keys and other columns if there is any duplicate values as per the business requirement. |
Date Validation check | Validating date field for various actions performed in ETL process.Common test-cases to perform Date validation −From_Date should not greater than To_DateFormat of date values should be proper.Date values should not have any junk values or null values |
Full Data Validation Minus Query | It involves validating full data set in the source and the target tables by using minus query.You need to perform both source minus target and target minus source.If the minus query returns a value, that should be considered as mismatching rows.You need to match the rows in source and target using the Intersect statement.The count returned by Intersect should match with the individual counts of source and target tables.If the minus query returns no rows and the count intersect is less than the source count or the target table count, then the table holds duplicate rows. |
Other Test Scenarios | Other Test scenarios can be to verify that the extraction process did not extract duplicate data from the source system.The testing team will maintain a list of SQL statements that are run to validate that no duplicate data have been extracted from the source systems. |
Data Cleaning | Unwanted data should be removed before loading the data to the staging area. |