I’m excited to announce that this last week I have released a new project (source code), which is hosted here. Its a free utility that generates SQL Server scripts to compare the results of any two queries. You then run that comparison script from your client (SQL Server Management Studio, Azure Data Studio…) and are presented with a detailed report of all identical or discrepant records. Notice, that’s queries we are comparing and not just tables or views. This allows for some interesting capabilities, which differ from how the major commercial data comparison programs work. A few use cases for this tool are outlined below, along with a basic usage example.
- Table or View Comparison - This is the bread and butter for any data comparison tool. Just plug in a select statement for any two tables or views, on the same or different databases or servers, and you can see matched, extra, missing and discrepant rows. You have probably made comparisons scripts like this by hand many times, and don’t have to anymore. An example of this type of comparison is shown below.
- Validating a Refactored Query - Safely refactoring complex queries for performance or clarity can be daunting. Once you introduce temp tables, CTEs and nested subqueries the code can become difficult to reason about. In fact, working on a complex refactoring was what originally motivated this project. So long as the output columns are the same after refactoring, then the queries will be comparable, regardless of the complexity of the syntax. The queries being compared can even be parameterized. You may find that doing this kind of refactoring aided by this tool, can save quite a bit of time, as you can iterate over changes to the test query and more easily find and resolve errors than by doing any visual comparison of results.
- Validating Schema Changes - I recently working on a change to a set of tables which were not fully normalized. As part of the change they needed to become normalized. As you may know de-normalized data can have discrepancies, which is why they are often avoided. For example, if a table of books stores author names instead of a foreign key to an authors table, then you can easily end up with inconsistencies if the author’s name was updated on one but not all of the book rows. So, when I was working on normalizing those tables I needed to confirm if the script to normalize the tables was not making unexpected changes. This turns out to be simple, by comparing a query that selected all the columns from all the affected tables in a production database, and another query selecting the now rearranged columns in a dev database.
A Simple Data Comparison
As a simple example, lets create two tables to compare. As you can see these have the same structure, but have some deliberately discrepant data.
To generate a comparison script:
- Create these tables and data in a test database.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 -- Sample Data Source Create Table Books ( Id int not null ,Author varchar(50) not null ,Title varchar(50) not null) Create Table SciFiBooks ( Id int not null ,Author varchar(50) not null ,Title varchar(50) not null) Insert into Books (Id, Author, Title) Values (1, 'Martha Wells', 'All Systems Red') ,(2, 'Terry Mancour', 'Spellmonger') ,(3, 'Yahtzee Croshaw', 'Will Save the Galaxy for Food') ,(4, 'Andrew Rowe', 'Sufficiently Advanced Magic') ,(5, 'Brandon Sanderson', 'The Emperor''s Soul') ,(7, 'Lois McMaster Bujold', 'The Vor Game') Insert into SciFiBooks (Id, Author, Title) Values (1, 'Martha Wells', 'All Systems Red') ,(3, 'Yahtzee Croshaw', 'Will Save the Galaxy for Food') ,(4, 'Andrew Rowe', 'Sufficiently Advanced Magic') ,(5, 'Lois McMaster Bujold', 'The Warrior''s Apprentice') ,(6, 'Larry Niven', 'Ringworld') ,(7, 'Lois McMaster Bujold', 'Ceteganda')
- Open the web app.
- Then paste the
TestSQL into the app.
1 2 3 4 5 6 -- Assert Select Id, Author, Title From Books -- Test Select Id, Author, Title From SciFiBooks
- Select the
Idcolumn as a key.
- Copy the comparison script.
- Run the query in the test database. The output below will be produced.
As we can see, the query has correctly found the following:
- 3 rows are identical.
- Spellmonger is in
- Ringworld is in
- Two rows have discrepancies. The column specific discrepancy tables
Discrepant__Titleshow one and two discrepancies respectively.
I will go into examples of more complex comparisons in later posts. For now, you can see examples in the usage guide.
If you have any feedback, issues, or requests, please visit the project on GitHub.