Open Testware Reviews

Sclc Metrics Tool

Copyright 2003 by Tejas Software Consulting - All rights reserved.

Contents

Overview -- Maturity -- Project activity -- Platforms -- Support -- Documentation -- Installation -- Implementation -- Performance -- Similar tools -- Limitations -- Observations

Overview

Reviewed: 2003-April-18
Version reviewed: 1.23, 2003-April-15
Maintainer: Brad Appleton
URL: http://www.bradapp.net/clearperl/sclc-cdiff.html
Testingfaqs.org category:
Static Analysis Tools
License: Perl license - Artistic or GNU GPL
User interface: command line

Sclc is a code size measurement tool, able to report total lines, blank lines, comment lines, assembly equivalent source lines, and the most useful, non-comment source lines (NCSL). Sclc is an abbreviation for "source code line counter." It can parse 13 different languages: Ada, Assembly, Awk, C, C++, Eiffel, Java, makefiles, Lisp, Pascal, Perl, shell scripts, and Tcl.

Sclc, and its cohort, cdiff, are housed as an afterthought on Brad Appleton's "ClearPerl: Perl 5 modules for ClearCase" web page. I haven't used ClearCase in years, but I've found that sclc is useful in many ways that have nothing to do with ClearCase.

Maturity

4 - Beta (on a scale of 1-5)

With more feedback to the author from users, more testing (preferably with test data added to the distribution package), and a bit of further refinement, the tool could step up to production quality.

Project activity

3 - Stable (on a scale of 1-5)

It's hard to know how often the tool has been updated because until recently there was no change log or version tracking. The user community is not very active. Recent updates saved the tool from "Inactive" status.

Platforms

Sclc is portable to a wide range of platforms. The full range is not documented. I used it successfully on Windows 2000 with ActiveState Perl 5.8.0, Cygwin/Windows 2000 with Cygwin Perl 5.8.0, RedHat Linux 7.3 with Perl 5.6.1, MacOS X 10.1 with Perl 5.6.0, and HP-UX 9.05 with Perl 5.5.2. It will very likely work on any Unix platform supported by Perl. I tested sclc on the DejaGnu 1.4.3 source distribution, and verified that I got the same results on each of these platforms. Verifying this wasn't easy, because the order of the language summary at the bottom of the output was different on each platform, and the order of the files printed out in the detailed output was different between Windows and the other platforms. There was a lot of output to stderr because sclc looked at all files in the distribution, many of which were not programs. I had to take care to separate the error output from the report because when I captured stdout and stderr in the same file, the errors occasionally showed up in the middle of one of the report lines.

My compatibility test illustrates the fact that static analysis tools like sclc don't necessarily have to run on your target platform or your development platform. You can run the tool on any available system that can access your code.

I did encounter a fatal error on Windows using an earlier version of sclc, but that's fixed in the latest sclc version.

The modules sclc uses are all provided with the standard Perl distribution. You'll need Perl version 5.4 or later.

Support

There is little in the way of support for sclc. There is no public bug tracking database, and no public version control system. Only the most recent version of the tool is available on the web page. The ClearPerl web page (the parent page containing the sclc page) mentions a ClearPerl mailing list, but the list is defunct. Sclc is mentioned infrequently on the ClearCase International Users Group (CCIUG) mailing list, including a recent announcement from the tool's author about a new version of sclc that was posted. The web page for the mailing list says that only Rational customers may join, though the list archives are publically available.

The author, Brad Appleton was easily reachable via email during the course of my review, and he was eager to address problems that I reported. So it seems best to contact Brad directly with any questions you have.

Documentation

Sclc is sufficiently documented. There is some background information on the web page. The script includes documentation in the Perl-standard POD format, which can be accessed using the script's "-help" option, by feeding the script directly to the perldoc program, or by accessing the html version of the documentation. I had trouble with the formatting of the documentation using the "-help" option on Cygwin and on one of two Linux platforms. I suspect this is due to flakiness in perldoc rather than a problem with sclc.

There is little documentation of the "AESL" metric, which stands for "assembly-equivalent source lines." The comments in the script refer to the "Programming Language Table" from Capers Jones' company Software Productivity Research, Inc. The URL given is defunct - the correct URL is http://www.spr.com/products/programming.shtm. This page gives dire warnings about using the data from this table, though it refers to function point estimates rather than using AESL to compare the size of programs written in different languages. The report now costs US$75.00, which is a barrier for anyone wanting to enhance sclc to calculate AESL metrics for additional languages.

Installation

Sclc is available in either a zip file or a gzipped tar file. The package includes a short README file, a copy of the main web page and the html-formatted man page, plus the cdiff script. Cdiff is an add-on for the ClearCase configuration management system, and sclc has hooks for integrating with cdiff. There are slight differences between the zip and tar packages, to appease Windows and Unix users, respectively--the sclc script has a .pl extension in the Windows package, the README file is slightly different on both, and each package is adjusted to have platform-appropriate line endings. Note that the html pages have links that only work if you're viewing the pages live on the web site.

To execute the sclc script directly, you have to edit the first line of the script to replace "#!/usr/misc/bin/perl5" with the path where Perl is installed on your system, which is often "#!/usr/bin/perl". The default path is reminiscent of the days when Perl users were transitioning from Perl 4 to Perl 5, and is very unlikely to be right for systems installed within the last several years.

There is no automatic installer - you just extract sclc to a location in your filesystem where your normally keep executable files.

Implementation

Sclc is implemented in a single Perl script. Run against itself, it counts 1008 non-comment source lines. There are 680 comment lines. Function headers vary from a single line to a long standardized template. There is a sprinkling of comments throughout the code that should help experienced Perl programmers understand the script.

I like the modular design of the definitions for each language that sclc parses. I tried once to add code to support the Limbo programming language, but after several minutes of studying the code and the comments I wasn't quite able to get the gist of it. I ended up using the -language option to tell it that Limbo programs were shell code, because they both use the same comment syntax, and I think the line count results ended up being accurate except for the overall summary for shell code. If I had wanted to spend more time teaching sclc to understand Limbo, I'm pretty sure I could have done it with some experimentation and/or help from the author.

There are no test cases that ship with the tool. The script does use the "use strict" mechanism to enforce good programming practices, but it does not use "use warnings" to catch errors at run-time.

The author states that sclc is "ancient Perl 4 code," with only minimal porting to use common Perl 5 coding standards. This should only be a concern to Perl purists who both want to work with a newer style implementation and aren't willing to help update the sclc code.

Performance

Parsing source code tends to be cpu-intensive. I tested sclc by running it recursively on all files in the DejaGnu 1.4.3 source distribution with 66,000 total lines of code. It took an average of 2 minutes on a 266 MHz Windows machine, 48 seconds on a 600 MHz Mac, and 27 seconds on an i686 Linux machine of unknown origin.

Similar tools

Clc is a predecessor to sclc, also written by Brad Appleton. It works for C, C++, and Perl, and unlike sclc, it also counts source statements as well as lines of code. When I use clc to measure sclc, it agreed with sclc's assessment of total lines of code, but disagrees significantly on the number of non-comment source statements, 1008 according to sclc, and 1656 according to clc. I don't trust the numbers from clc - a simple grep shows that there are only 1480 lines that don't start with a comment character. I think the two tools tend to agree more on C code. The moral is - it doesn't hurt to use two different tools as a sanity check.

Clc is dated February 14, 1995, a few months before sclc originally appeared. It's on Chris Lott's "Metrics collection tools for C and C++ Source Code" page, which includes several other tools that are either abandoned or are outdated copies. With some detective work you can find more recent copies of some of them.

One such tool listed on Lott's page is sloccount. You'll find the latest version at http://www.dwheeler.com/sloccount/. Sloccount can parse 27 languages, more than twice as many as sclc, it probably has better heuristics for automatically determining the language used within a file, and it seems to have a larger user community than sclc. I compared the NCSL numbers from both sloccount and sclc after analyzing a directory of Linux kernel source files, and the numbers were identical.

So why do I stick with sclc? I like sclc's user interface better. It's more difficult to specify the files you want to process with sloccount, especially if you just want to check one file. Sloccount's output is more cluttered. It probably wouldn't take much effort to make sloccount easier to use, but at first glance, I like sclc better.

Limitations

  1. Sclc counts lines of code, not source statements. Counting source statements is probably more accurate when you're dealing with code developed with a variety of coding styles, because raw line counts are sensitive to coding style. Using the -delim-ignore option may make sclc's NCSL numbers roughly similar to a source statement count.
  2. The tool often makes wrong guesses about language - for example, it thought a Cascading Style Sheet (.css) file was C code, it counted a makefile.in file as shell code rather than a makefile, it parsed an RTF file as Pascal, and it counted Expect and incr Tcl files as shell code. The workaround is to use the -language option to make it smarter about recognizing file extensions, and exclude documentation and other files you don't want it to count.
  3. The numbers are likely to be different if you're using the "-diff" option and you use a context diff rather than the default diff format. The sum of the inserted and deleted lines tends to stay the same, but the individual counts vary.
  4. With the "-diff" option, sclc doesn't try to count changed lines. All modifications are reported only in terms of additions and deletions.
  5. The tool gives no output if you give it the "-diff" option and its input stream is empty. I'd prefer to get positive confirmation that sclc ran.
  6. I had to go to the source code to find out that the regular expressions given as arguments to "-name" and "-except" are sandwiched between ^ and $ anchors automatically. The "filename must completely match" comment in the man page didn't get this point across to me.
  7. Sclc doesn't run preprocessors on files. After macro expansion, conditional compilation, etc., the number of lines of code that actually get compiled can grow or shrink. That's probably okay, and probably the way most metrics tools work, but it's something to be aware of.
  8. The AESL value changes when using the -delim-ignore option, because this option changes the NCSL count and the AESL is based on NCSL. It seems strange that the assembly equivalent for a file would change if you change the way you measure the source code.
  9. The "Totals" and "LangTotals" arguments to the "-sections" option are ignored if you're only analyzing one source file. This might make it more difficult to parse the output with a script that doesn't know ahead of time how many files will be analyzed. There is also no mention in the documentation that these two sections are omitted by default if there's only one file.
  10. It would help if the column headings were repeated before the summary, to help the user remember what each column represents. I once read the AESL column thinking it was NCSL, which resulted in an order-of-magnitude error in the numbers I reported, and I didn't catch the error for several days.
  11. If you put the same filename on the command line twice, there is no warning and the file is counted twice in the totals.
  12. If you ask sclc to process a file that doesn't exist, it gives a misleading error such as: "sclc: Can't determine programming language for nosuchfile."
  13. The man page is garbled with escape characters on my Cygwin/Windows 2000 configuration using TERM=cygwin or vt100. Probably not an sclc bug. I also sometimes saw an error on Linux that resulted in the man page coming out as raw POD markup.
  14. If there are 100,000 or more lines in a source file, the columns in the output don't line up.
These issues were found and addressed during the course of the review. They don't affect version 1.23.

Observations

Sclc is a tool that can do simple static analysis on 13 different programming languages (if you count makefiles as a language). It uses a command-line interface that looks like this when you ask it to analyze itself:

$ sclc sclc
Lines Blank Cmnts NCSL AESL
===== ===== ===== ===== ========== =======================================
1824 167 680 1008 15120.0 sclc (Perl)

It shows a total line count, the number of blank lines, the number of lines containing either full-line or inline comments, and the number of non-comment source lines (NCSL). Note that if your code has lines with inline comments, those lines will be counted in both the comment and NCSL total. The AESL metric refers to "assembly-equivalent source lines," discussed further in the Documentation section earlier. I don't use the AESL metric, and I could suppress it from the output using the "-counts" option.

These kinds of metrics, especially the raw line count (often called the LOC or KLOC metric, for "lines of code" or "thousands of lines of code") that you could generate with a simple tool like the Unix "wc" utility, have been the cause of much academic hand-wringing because there are so many ways to misuse the metrics. I'm going to presume that if you're considering using a tool like sclc, you've done some background reading so that you understand the limitations of these simplistic metrics. Here are a few references to get you started--Software Metrics: Successes, Failures and New Directions by Norman E. Fenton, and for an interesting but hard to implement alternative, Managing (the Size of) Your Projects: A Project Management Look at Function Points by Carol Dekkers.

Despite the shortcomings, LOC and NCSL metrics are very common software metrics. In fact, for both of my previous reviews, I have reported NCSL metrics for the tools I reviewed, using data generated by sclc. These metrics give us a very rough idea of how complex the programs you're dealing with are. I've also used sclc during a consulting engagement to demonstrate the magnitude of the hundreds of thousands of lines of code they were trying to wrangle.

If you're analyzing more than one file, you get a column total at the bottom of the output, plus a file count. And if you're dealing with more than one programming language, you'll get a breakdown by language. Here's an example of a further refinement of the DejaGnu analysis I mentioned earlier, to help sclc figure out the programming language and to tell it to ignore certain files.

$ sclc.pl -language .exp=tcl -language .itcl=tcl -except '.*\.(rtf|in|am)' \
-recurse -ignore -counts Lines+Blank+Cmnts+NCSL .
Lines Blank Cmnts NCSL
===== ===== ===== ===== ===================================================
36 1 1 34 ./.clean (shell)
901 134 314 453 ./aclocal.m4 (shell)
23 4 11 8 ./baseboards/a29k-udi.exp (Tcl)
37 8 15 14 ./baseboards/arc-sim.exp (Tcl)
...
34277 3985 6952 23504 ----- Tcl ----- (256 files)
409 63 38 310 ----- C++ ----- (2 files)
345 44 16 289 ----- C ----- (6 files)
24652 2588 2882 19291 ----- shell ----- (51 files)
2223 214 606 1416 ----- Lisp ----- (1 file)
61906 6894 10494 44810 ***** TOTAL ***** (316 files)

This still needs further tuning, because there are still documentation files like ".clean" above that shouldn't be counted, m4 files that might should be counted but not as shell scripts, and all the ".in" and ".am" files that I haven't figured out. So what counts as code that needs to be counted? Tools aren't going to help much with that conundrum.

Sclc is packaged with the cdiff tool, which is a wrapper on top of the cleardiff command in the ClearCase configuration management system. Sclc has a few command line options for interfacing with cdiff. It probably wouldn't take much effort to enhance sclc to talk to other configuration management systems as well. But sclc works just fine if you don't use ClearCase.

Note that the URL for sclc is a redirect to a different site. Be sure to take note of the bradapp.net URL, because the underlying page has changed recently, and it may change again.

There are many freeware metrics tools lurking around the Internet, especially for C code, though most of them are orphaned. Sclc is slightly rough around the edges, but it's my favorite among the tools I've tried.