Open Testware Reviews

Data Comparator Survey

Copyright 2003 by Tejas Software Consulting - All rights reserved.

Reviewed: 2003-August-31
Testingfaqs.org category: Test Implementation Tools

Data comparators perform a simple but crucial task for testers. Working with an automated test execution tool, they can compare the actual program output against an expected results file, serving as the "oracle" that determines whether the test passes or fails. They can also be useful when doing manual testing, checking the changes to the system that a particular operation causes. Comparators are also used frequently in the context of configuration management tools, though that usage is outside the scope of what I'm reporting on here.

About data comparator tools

The simplest type of data comparator is represented by the cmp tool, which simply reports the file position where the two files first start to differ. It only works on ASCII files.

You can step up to the diff tool, which sets the standard for comparators. It also deals with ASCII text (though it can report whether binary files differ on a yes/no basis), and it produces ASCII output showing the differing lines from each file. Diff works hard to find where lines have been added and deleted, so it can find later blocks of text that do match. This minimizes the number of lines that are reported as different. You can find some implementation of diff preinstalled on every Unix platform and most Unix-like platforms, plus I've heard rumors that you can get Windiff from somewhere on the Windows installation disk.

Diff3 is a variation of diff that compares two files that were copied from a common base and may have been independently modified. There are also a handful of wrappers on top of diff that add additional features, such as colordiff and diffstat.

There are a number of comparators that focus on data other than ASCII text. Several are designed to compare database schemas or contents. There are comparators for MySQL, Oracle, PostgreSQL, MS Access, and one generic SQL tool. Add two choices that look at XML based on the structure of the data rather than a line-by-line text file approach.

Also on the list is CSDiff, which can kick off Microsoft Word's built-in comparison feature among other things, and JojoDiff, which compares any file format (typically binary files) without any knowledge of the semantics of the data. There are libraries to do simple diff-like comparisons within Java, Perl, Python, and Ruby programs. If you have a range of files that need different tools to do comparisons, try metadiff.

The venerable spiff tool implements a feature that I don't see often enough - heuristic matching. It can accept floating point numbers within a range that you specify, rather than looking only for an exact match. I once expanded this concept with a tool that used regular expressions to allow a great deal of flexibility in what it would accept as a matching line. I called it rediff, and it was a wrapper on top of diff. Unfortunately, the company that owns it hasn't made it open source. Let me know if you find a comparator that uses regular expressions.

If you're looking for tools beyond these, you'll find a huge list of commercial and freeware comparators at http://www.foldermatch.com/fmcompetitors.htm.

About the matrix

It wasn't hard to track down a long list of data comparators when I went looking for them. This survey represents the most promising ones of the many that I found. At least for ASCII text comparators, I only report tools that have a command line interface, because these are the ones that are most likely to help with automated testing.

A few of those listed here also have a GUI interface. There are a number of GUI comparator tools and wrappers on top of diff available as well that aren't listed here.

Tool
Platforms
Notes
cmp
Unix, Cygwin
Simple tool to indicate the first byte where two files start to differ. GNU tool.
ColorDiff
Unix, Linux. Cygwin
Diff wrapper that colorizes the output.
CSDiff
Windows (except XP)
GUI and command line interfaces. Can launch MS-Word's file compare feature.
DataDiff
platforms supported by Perl
Finds differing rows between two MySQL 3.23.x or 4.0.x databases.
diff
Unix, Linux, Cygwin
The GNU implementation of the classic workhorse Unix diff tool that shows a minimized set of differences between two files.
diff3
Unix, Linux, Cygwin
Shows differences among three files, useful if two people or programs changes the same original file in different ways. GNU tool.
diffstat
Unix, Linux, Cygwin
Reads the output of diff and displays a histogram of the insertions, deletions, and modifications per-file.
diffxml
platforms supported by Java
Tools for comparing and patching XML files.
dirdiff
platforms supported by Perl A Perl command-line utility for recursively comparing the date/time stamps of files contained in two directory trees.
ExamDiff
Windows
Primarily a GUI file comparison tool, also has a command line interface. The commercial ExamDiff Pro has additional features.
JLibDiff
platforms supported by Java
Diff library for use in Java code.
JojoDiff Unix, Windows
Diff utility for binary files.
MDBDiff
Windows
Locates structural differences between two Microsoft Access 97 or 2000 databases (*.mdb files).
meta-diff
Linux, Windows
Can launch other diff programs as appropriate for each file it encounters.
mysqldiff
platforms supported by Perl
Compares the table definitions of two MySQL databases.
Oracle SchemaDiff platforms supported by Perl
Compares schemas between two Oracle databases (Oracle 7.3.4 and above).
pardiff
Linux, Unix
Diff alternative, implements an alternative to diff's --side-by-side option.
Perl Algorithm::Diff
platforms supported by Perl
Diff implemented as a Perl module.
Perl String::DiffLine
platforms supported by Perl
Perl module for simple comparisons similar to the cmp utility.
pgdiff
Unix, MacOS, Windows
Compares the table definitions of two PostgreSQL databases. Generates commands to convert the structure of one database to look like the other.
Python difflib
platforms supported by Python
Python library for computing deltas between objects.
ruby-diff
platforms supported by Ruby
Ruby port of Perl's Algorithm::Diff.
spiff
Unix
An old and quirky diff alternative with extra features such as setting allowable tolerances for comparing floating point numbers.
SQLDiff
platforms supported by PHP
Show the differences between two SQL tables, works on a variety of databases. Web-based interface.
XMLComparator platforms supported by Java
Compares XML documents.