Can whitespace patterns provide clues to plagiarism?

Over the years I’ve run into expert witnesses and attorneys who have told me about software copyright infringement cases where the only clues that copying occurred were patterns of spaces and tabs (“whitespace”). The idea is that if a truly ambitious thief wanted to cover his tracks, he would modify the stolen code so much that there was no longer a visible trace of copying. However, the clever software sleuth could find patterns of whitespace that the thief had missed; although virtually nothing remained, the invisible tabs and spaces could produce a conviction.

This always sounded intriguing, but I wondered whether anyone had ever tested this theory. We could find no articles or papers on the subject, except for one inconclusive paper, and I dreaded to think that some programmer was convicted based on an untested theory. I decided to have my consulting company, Zeidman Consulting, do some carefully controlled research. If the results turned out well, SAFE Corporation would add whitespace pattern algorithms to CodeSuite to further enhance its ability to detect copying.

Our results were published in a paper entitled Measuring Whitespace Patterns as an Indication of Plagiarism that was recently presented at the ADFSL Conference on Digital Forensics, Security and Law. Our results are summarized in the final paragraph:

This whitespace pattern matching method can be used to focus a search for evidence of similarity or copying, but this method cannot stand by itself.

What we discovered is that even very different files have often have similar whitespace patterns. At Zeidman Consulting we’ve used whitespace patterns to confirm copying that was already detected through the use of CodeMatch to find correlated programming elements. In those cases, the whitespace patterns offered further confidence in our findings and in some cases showed which program had been developed first. For a copy of the paper, email us at info@SAFE-corp.biz.

Our next research project is to look at sequences of whitespace within files. Maybe there we’ll find some clues to copying. But for now our results show that whitespace patterns without any other evidence should not be used to determine that copying occurred.

Leave a Reply

Your email address will not be published. Required fields are marked *