From correlation to copying

You have the source code from two different programs. You run them through CodeMatch and find high correlation numbers. Have you proven copying? Not yet. There are still a few steps to go through first. Finding a correlation between the source code files for two different programs doesn’t necessarily mean that illicit behavior occurred. At SAFE we’ve determined that there are exactly six reasons for correlation between two different programs. These reasons can be summarized as follows.

  • Third-Party Source Code. Both programs use open source
    code or purchased libraries.
  • Code Generation Tools. Automatic code generation tools,
    such as Microsoft Visual Basic or Adobe Dreamweaver, generate
    software source code that looks very similar.
  • Common Identifier Names. Certain identifier names are
    commonly taught in schools or commonly used by programmers in
    certain industries.
  • Common Algorithms. There may be an easy or well-understood
    way of writing a particular algorithm that most programmers use,
    or one that was taught in school or in textbooks.
  • Common Author. One programmer, or “author,”
    will create two programs that have correlation simply because
    that programmer tends to write code in a certain way.
  • Copied Code. Code was copied from one program to another.
    If the copying was not authorized by the original owner, then
    it comprises plagiarism.

It’s important when using CodeMatch to understand these rules. Especially in litigation. Before there can be proof of copyright infringement, all of the other 5 reasons for correlation need to be eliminated. CodeSuite offers some sophisticated filtering functions that allow you to filter out aspects of the code that are correlated due to the other 5 reasons. What’s left, after filtering, is correlation due to copying.

You can read more about this in the article in IP Today entitled, What, Exactly, Is Software Plagiarism?