The President's Column
Suppose you find that the source code from two programs
are similar, or even identical. Have you proven copying? Not necessarily.
In this issue's Scanning IP section I
discuss the steps for determining copying once you've found correlation.
Do you have a really large code comparison job but
a short deadline? In the Scanning Tools section I
discuss CodeGrid, the SAFE supercomputer grid that can run really
large jobs in a very short time.
Send me your comments and critiques. I'm always interested
in hearing from you.
President, SAFE Corporation
From Correlation to Copying
You have the source code from two different programs. You run them
through CodeMatch and find high correlation numbers. Have you proven
copying? Not yet. There are still a few steps to go through first.
Finding a correlation between the source code files for two different
programs doesn't necessarily mean that illicit behavior occurred.
At SAFE we've determined that there are exactly six reasons for
correlation between two different programs. These reasons can be
summarized as follows.
- Third-Party Source Code. Both programs use open source
code or purchased libraries.
- Code Generation Tools. Automatic code generation tools,
such as Microsoft Visual Basic or Adobe Dreamweaver, generate
software source code that looks very similar.
- Common Identifier Names. Certain identifier names are
commonly taught in schools or commonly used by programmers in
- Common Algorithms. There may be an easy or well-understood
way of writing a particular algorithm that most programmers use,
or one that was taught in school or in textbooks.
- Common Author. One programmer, or author,
will create two programs that have correlation simply because
that programmer tends to write code in a certain way.
- Copied Code. Code was copied from one program to another.
If the copying was not authorized by the original owner, then
it comprises plagiarism.
It's important when using CodeMatch to understand these rules.
Especially in litigation. Before there can be proof of copyright
infringement, all of the other 5 reasons for correlation need to
be eliminated. CodeSuite offers some sophisticated filtering functions
that allow you to filter out aspects of the code that are correlated
due to the other 5 reasons. What's left, after filtering, is correlation
due to copying.
You can read more about this in the article in IP Today
What, Exactly, Is Software Plagiarism?
Advanced Tools to Detect Software Plagiarism and IP Theft
A sophisticated set of tools for analyzing software source code
and object code including:
Check binary object code for plagiarism.
Cross check source code for plagiarism.
Compare source code to find differences and measure changes.
The premiere tool for pinpointing copying.
Scour the Internet for plagiarized code.
Turbo charge your analysis on a supercomputer grid.
SAFE offers training at our facility or yours. Contact
us to make arrangements:
in software IP
Did you know that CodeSuite incorporates some free utilities?
FileCount counts the number of lines of code, number
of bytes, and number of files in a directory tree.
FileIsolate allows you to selectively delete or copy
files within an entire directory tree. Very useful when you need
to manually transfer or examine specific files but need to keep
other files confidential, or simply to save disk space. Look for
even more cool file manipulation features in the next version of
CodeGrid: The SAFE Corporation Supercomputer
Some of our clients have really huge sets of files to compare. Before
CodeMatch, comparing thousands of files against thousands of files
was a daunting task requiring lots of time and manpower. Then CodeMatch
was developed and the comparison was reduced to a matter of hours.
Of course, the result was that customers wanted more and more code
compared. And programs are just getting bigger all the time. Some
clients need a comparison of 40, 50, 100 or more Mbytes of code.
Sometimes our customers take a while to get approval from their
clients, often leaving only days or weeks before a court date. That's
why we developed CodeGrid.
CodeGrid is a state-of-the-art grid of computers working in parallel
that automatically divides up the analysis work of CodeSuite among
these multiple computers on a network, greatly increasing the speed
of the analysis. Our current grid consists of four dedicated computers
and is expandable to any number of computers. SAFE customers can
license time on the grid for large CodeSuite jobs. Our customers
have already used CodeGrid to complete analysis jobs in two days
that would otherwise take more than a week using CodeMatch on a
As you can tell, we're very proud of CodeGrid and the technology
we developed for it. You can read an article about the technology
that I wrote with Tim Hoehn, the lead developer of CodeGrid, for
Dr. Dobbs Journal, entitled Grid-Enabling
Resource-Intensive Applications. CodeGrid provides a solution
to otherwise impossible tasks.