SAFE Banner

APRIL 2009


Software Scan

The President's Column

Suppose you find that the source code from two programs are similar, or even identical. Have you proven copying? Not necessarily. In this issue's Scanning IP section I discuss the steps for determining copying once you've found correlation.

Do you have a really large code comparison job but a short deadline? In the Scanning Tools section I discuss CodeGrid, the SAFE supercomputer grid that can run really large jobs in a very short time.

Send me your comments and critiques. I'm always interested in hearing from you.


Bob Zeidman
President, SAFE Corporation

Scanning IP

From Correlation to Copying
You have the source code from two different programs. You run them through CodeMatch and find high correlation numbers. Have you proven copying? Not yet. There are still a few steps to go through first. Finding a correlation between the source code files for two different programs doesn't necessarily mean that illicit behavior occurred. At SAFE we've determined that there are exactly six reasons for correlation between two different programs. These reasons can be summarized as follows.

  • Third-Party Source Code. Both programs use open source code or purchased libraries.
  • Code Generation Tools. Automatic code generation tools, such as Microsoft Visual Basic or Adobe Dreamweaver, generate software source code that looks very similar.
  • Common Identifier Names. Certain identifier names are commonly taught in schools or commonly used by programmers in certain industries.
  • Common Algorithms. There may be an easy or well-understood way of writing a particular algorithm that most programmers use, or one that was taught in school or in textbooks.
  • Common Author. One programmer, or “author,” will create two programs that have correlation simply because that programmer tends to write code in a certain way.
  • Copied Code. Code was copied from one program to another. If the copying was not authorized by the original owner, then it comprises plagiarism.

It's important when using CodeMatch to understand these rules. Especially in litigation. Before there can be proof of copyright infringement, all of the other 5 reasons for correlation need to be eliminated. CodeSuite offers some sophisticated filtering functions that allow you to filter out aspects of the code that are correlated due to the other 5 reasons. What's left, after filtering, is correlation due to copying.

You can read more about this in the article in IP Today entitled, What, Exactly, Is Software Plagiarism?

Advanced Tools to Detect Software Plagiarism and IP Theft

A sophisticated set of tools for analyzing software source code and object code including:

Check binary object code for plagiarism.

Cross check source code for plagiarism.

Compare source code to find differences and measure changes.

The premiere tool for pinpointing copying.

Scour the Internet for plagiarized code.

Turbo charge your analysis on a supercomputer grid.

Get Smart

SAFE offers training at our facility or yours. Contact us to make arrangements:

MCLE credit in software IP

CodeSuite certification

Free Utilities

Did you know that CodeSuite incorporates some free utilities?

FileCount™ counts the number of lines of code, number of bytes, and number of files in a directory tree.

FileIsolate™ allows you to selectively delete or copy files within an entire directory tree. Very useful when you need to manually transfer or examine specific files but need to keep other files confidential, or simply to save disk space. Look for even more cool file manipulation features in the next version of FileIsolate.

Scanning Tools

CodeGrid: The SAFE Corporation Supercomputer
Some of our clients have really huge sets of files to compare. Before CodeMatch, comparing thousands of files against thousands of files was a daunting task requiring lots of time and manpower. Then CodeMatch was developed and the comparison was reduced to a matter of hours. Of course, the result was that customers wanted more and more code compared. And programs are just getting bigger all the time. Some clients need a comparison of 40, 50, 100 or more Mbytes of code. Sometimes our customers take a while to get approval from their clients, often leaving only days or weeks before a court date. That's why we developed CodeGrid.

CodeGrid is a state-of-the-art grid of computers working in parallel that automatically divides up the analysis work of CodeSuite among these multiple computers on a network, greatly increasing the speed of the analysis. Our current grid consists of four dedicated computers and is expandable to any number of computers. SAFE customers can license time on the grid for large CodeSuite jobs. Our customers have already used CodeGrid to complete analysis jobs in two days that would otherwise take more than a week using CodeMatch on a single computer.

As you can tell, we're very proud of CodeGrid and the technology we developed for it. You can read an article about the technology that I wrote with Tim Hoehn, the lead developer of CodeGrid, for Dr. Dobbs Journal, entitled Grid-Enabling Resource-Intensive Applications. CodeGrid provides a solution to otherwise impossible tasks.

This newsletter is not legal advice. Views expressed herein should be checked for accuracy and current applicability.
Copyright 2009 Software Analysis & Forensic Engineering Corporation