Tag Archives: forensic engineering

ADFSL 2011 Conference on Digital Forensics, Security and Law

Last year my consulting company presented a paper entitled Measuring Whitespace Patterns As An Indication of Plagiarism that examined and tested the concept that patterns of whitespace in two source code files can be used to determine whether one program was copied from the other. The conference was an enjoyable three days in St. Paul, Minnesota. We even got a tour of the Forensic Science Laboratory of the Bureau of Criminal Apprehension where we learned the real forensic science used to catch criminals (the CSI TV shows are a “little bit” exaggerated, but the reality is just as interesting).

This year the conference will be at Longwood University in Richmond, Virginia from May 25 through 27. I’m serving on the conference committee. We’re looking for paper, presentation, and panel submissions in the following areas:

Curriculum

1. Digital Forensics Curriculum
2. Cyber Law Curriculum
3. Information Assurance Curriculum
4. Accounting Digital Forensics Curriculum

Teaching Methods

5. Digital Forensics Teaching Methods
6. Cyber Law Teaching Methods
7. Information Assurance Teaching Methods
8. Accounting Digital Forensics Teaching Methods

Cases

9. Digital Forensics Case Studies
10. Cyber Law Case Studies
11. Information Assurance Case Studies
12. Accounting Digital Forensics Case Studies

Information Technology

13. Digital Forensics And Information Technology
14. Cyber Law And Information Technology
15. Information Assurance And Information Technology
16. Accounting Digital Forensics Information Technology

Networks And The Internet

17. Digital Forensics And The Internet
18. Cyber Law And The Internet
19. Information Assurance And Internet
20. Digital Forensics Accounting And The Internet

Anti-Forensics And Counter Anti-Forensics

21. Steganography
22. Stylometrics And Author Attribution
23. Anonymity And Proxies
24. Encryption And Decryption

International Issues

25. International Issues In Digital Forensics
26. International Issues In Cyber Law
27. International Issues In Information Assurance
28. International Issues In Accounting Digital Forensics

Theory

29. Theory Development In Digital Forensics
30. Theory Development In Information Assurance
31. Methodologies For Digital Forensic Research
32. Analysis Techniques For Digital Forensic And Information Assurance Research

Digital Rights Management (DRM)

33. DRM Issues In Digital Forensics
34. DRM Issues In Information Technology
35. DRM Issues In Information Assurance
36. DRM Issues In Cyber Law

Privacy Issues

37. Privacy Issues In Digital Forensics
38. Privacy Issues In Information Assurance
39. Privacy Issues In Cyber Law
40. Privacy Issues In Digital Rights Management

Software Forensics

41. Software Piracy Investigation
42. Software Quality Forensics

Other Topics

43. Cyber Culture And Cyber Terrorism

The deadline for submissions is February 19. The website for the conference is at http://www.digitalforensics-conference.org where you’ll find more information about the conference, the venue, and submission guidelines.

SAFE introduces CodeSuite-LT

CodeSuite-LT® is a less expensive, limited version of the full CodeSuite tool. Each tool in the suite produces a readable report that can be used to find copying. CodeSuite-LT includes BitMatch, CodeCross, CodeDiff, CodeMatch, FileCount, and FileIsolate. It also includes the ability to filter results using SourceDetective. CodeSuite-LT does not produce a database and does not allow post-process filtering of results. Instead, it generates an easy-to-read report that can be used to pinpoint copying.

Which is Right For You?

Which product is right for you, CodeSuite or CodeSuite-LT? Click here for a table that compares the features of both programs so you can choose the right solution.

Multiprocessing CodeSuite-MP

Until now there were two ways of running really big jobs of CodeSuite. One was to simply run it and wait for as long as it took. Really large jobs can take as much as a week or two. The other option was to run the job on CodeGrid, our framework that distributes the job over a grid of networked computers. CodeGrid shows an almost linear speedup for each computer on the grid, but it requires someone to maintain the computers and the network and that can be a daunting job. Now there’s a third option;, CodeSuite-MP allows you to run multiple jobs on a single multicore computer. We’re seeing a near-linear speedup for the number of cores, and there’s no special maintenance required. We’re even seeing a near-linear speedup using virtual cores. If you want to get a license for CodeSuite-MP, contact our sales department.

Can whitespace patterns provide clues to plagiarism?

Over the years I’ve run into expert witnesses and attorneys who have told me about software copyright infringement cases where the only clues that copying occurred were patterns of spaces and tabs (“whitespace”). The idea is that if a truly ambitious thief wanted to cover his tracks, he would modify the stolen code so much that there was no longer a visible trace of copying. However, the clever software sleuth could find patterns of whitespace that the thief had missed; although virtually nothing remained, the invisible tabs and spaces could produce a conviction.

This always sounded intriguing, but I wondered whether anyone had ever tested this theory. We could find no articles or papers on the subject, except for one inconclusive paper, and I dreaded to think that some programmer was convicted based on an untested theory. I decided to have my consulting company, Zeidman Consulting, do some carefully controlled research. If the results turned out well, SAFE Corporation would add whitespace pattern algorithms to CodeSuite to further enhance its ability to detect copying.

Our results were published in a paper entitled Measuring Whitespace Patterns as an Indication of Plagiarism that was recently presented at the ADFSL Conference on Digital Forensics, Security and Law. Our results are summarized in the final paragraph:

This whitespace pattern matching method can be used to focus a search for evidence of similarity or copying, but this method cannot stand by itself.

What we discovered is that even very different files have often have similar whitespace patterns. At Zeidman Consulting we’ve used whitespace patterns to confirm copying that was already detected through the use of CodeMatch to find correlated programming elements. In those cases, the whitespace patterns offered further confidence in our findings and in some cases showed which program had been developed first. For a copy of the paper, email us at info@SAFE-corp.biz.

Our next research project is to look at sequences of whitespace within files. Maybe there we’ll find some clues to copying. But for now our results show that whitespace patterns without any other evidence should not be used to determine that copying occurred.

DUPE: Depository of Universal Plagiarism Examples

In 2003 I created the CodeMatch program that very quickly became a de facto standard in software IP litigation. I created a test bench of purposely plagiarized code that could be used to independently and objectively compare the results produced by different plagiarism detection programs. Some in the academic community claimed that my tests were biased toward the algorithms used by CodeMatch, which explained why CodeMatch fared so well compared to the other programs. However, these same critics, despite my requests, never produced their own set of standard tests.

Although I believe that the standard tests I have used are not biased, it occurred to me that there could be a better way to eliminate even unintentional bias. The solution would be to take the source code for certain open source programs and announce a new open source project that would involve purposely plagiarizing the code. Programmers from around the world would be invited, perhaps in a competition, to change the source code while retaining the functionality. The original programs and the plagiarized versions submitted from others would be stored in a database known as the Depository of Universal Plagiarism Examples or DUPE. Plagiarism detection programs would then be run on DUPE and comparisons of the results could be made to determine which programs best detected copying. Also, important statistics about plagiarized code could be determined, as well as patterns identified in order to improve the plagiarism detection programs.

SAFE Corporation has begun looking into creating this database. However, we realize that we would like to work with partners in academia and industry. We believe that there are several key issues that need to be resolved in creating DUPE. These are:

  1. Choosing appropriate open source projects.
  2. Creating a minimum definition of software plagiarism.
  3. Creating the database.
  4. Determining policies including who can access it, how it will be used, and who will maintain it.
  5. Determining how to run the tests, how to generate the results, and how to distribute the results.

Please contact me if you’re interested in working on this important and groundbreaking project.

Interesting software IP cases of 2009

Here is my list of the most interesting software IP cases of 2009,
in chronological order:

SAFE Corporation is looking for great ideas

There are a lot of unanswered questions about source code, and we want to work with you to figure them out. We realize that currently accepted algorithms for analyzing, comparing, and measuring source code leave a lot to be desired in many cases. Also, there are a lot of techniques that have never been studied on large bodies of modern code. For example, measurement techniques developed in the 1970s were probably tested on assembly languages and older programming languages like BASIC, FORTRAN, and COBOL. Do they still hold on modern object oriented languages like Java and C#?

If you have a research idea relating to code analysis, and you can use the SAFE tools, let us know. Email Larry Melling, VP of Sales and Marketing with your ideas. If they pass our review process you’ll get free licenses to our tools, free support, and help getting your results published. This could be the beginning of a beautiful friendship.

From correlation to copying

You have the source code from two different programs. You run them through CodeMatch and find high correlation numbers. Have you proven copying? Not yet. There are still a few steps to go through first. Finding a correlation between the source code files for two different programs doesn’t necessarily mean that illicit behavior occurred. At SAFE we’ve determined that there are exactly six reasons for correlation between two different programs. These reasons can be summarized as follows.

  • Third-Party Source Code. Both programs use open source
    code or purchased libraries.
  • Code Generation Tools. Automatic code generation tools,
    such as Microsoft Visual Basic or Adobe Dreamweaver, generate
    software source code that looks very similar.
  • Common Identifier Names. Certain identifier names are
    commonly taught in schools or commonly used by programmers in
    certain industries.
  • Common Algorithms. There may be an easy or well-understood
    way of writing a particular algorithm that most programmers use,
    or one that was taught in school or in textbooks.
  • Common Author. One programmer, or “author,”
    will create two programs that have correlation simply because
    that programmer tends to write code in a certain way.
  • Copied Code. Code was copied from one program to another.
    If the copying was not authorized by the original owner, then
    it comprises plagiarism.

It’s important when using CodeMatch to understand these rules. Especially in litigation. Before there can be proof of copyright infringement, all of the other 5 reasons for correlation need to be eliminated. CodeSuite offers some sophisticated filtering functions that allow you to filter out aspects of the code that are correlated due to the other 5 reasons. What’s left, after filtering, is correlation due to copying.

You can read more about this in the article in IP Today entitled, What, Exactly, Is Software Plagiarism?

CodeCross tool just released for detecting software IP theft and infringement

SAFE has just released a new tool for comparing computer code to detect copyright infringement and trade secret theft. CodeCross™ finds traces of nonfunctional source code that have been copied from one program to another.

According to Nikolaus Baer at Zeidman Consulting, a SAFE Corporation customer, “I suggested the concept of CodeCross after working on cases where stolen code had nonfunctional remnants in another party’s code. SAFE developed the tool quickly and it works great. In one case I found traces of copied code that had previously gone undetected.”

CodeCross is available with CodeSuite 3.2.0 and can be downloaded for free from the SAFE Corporation website.