The history of the computer industry is filled with fascinating tales of sudden riches and lost opportunities. Take that of Ronald Wayne, who cofounded Apple Computer with Steve Wozniak and Steve Jobs but sold his shares for just US $2,300. And John Atanasoff, who proudly showed his digital computer design to John Mauchly who later codesigned the Eniac, typically recognized as the first electronic computer, without credit to Atanasoff. Perhaps the most famous story of missed fame and fortune is that of Gary Kildall. A pioneer in computer operating systems, Kildall started the company Digital Research and wrote Control Program for Microcomputers (CP/M), the operating system used on many of the early hobbyist personal computers, such as the MITS Altair 8800, the IMSAI 8080, and the Osborne 1, before IBM introduced its own PC. Kildall could have been the king of personal computer software, but instead that title went to his small-time rival Bill Gates. For years, rumors have circulated that the code for the original DOS operating system sold by Microsoft is actually copied from the CP/M operating system developed by Digital Research.
A couple years ago we took it upon ourselves to search out the original code and use CodeSuite to determine the truth once and for all. Our research was summarized in a popular (and not-so-popular) article in IEEE Spectrum entitled Did Bill Gates Steal the Heart of DOS? If you haven’t read it, you should. It’s a fun read but it only summarizes our exhaustive results using our tools and procedures for finding copied code. The article generated a lot of controversy and we always intended to publish the full technical details of our analysis, but it’s surprising how many people don’t like our conclusion and wouldn’t publish my paper. But now the full academic paper entitled A Code Correlation Comparison of the DOS and CP/M Operating Systems is available online in the Journal of Software Engineering and Applications. If you want to know the details, and you want to know the truth, it’s in the article and the details are in the paper.
I hope you’re all aware of my book The Software IP Detective’s Handbook: Measurement, Comparison, and Infringement Detection. It’s the first book on Software Forensics, a field that I pioneered at Software Analysis and Forensic Engineering and Zeidman Consulting. Whereas Digital Forensics deals with bits and files, without any detailed knowledge of the meaning of the data, Software Forensics deals with analysis of software using detailed knowledge of its syntax and functionality to perform analysis to find stolen code and stolen trade secrets. The algorithms described in the book have been used in many court cases. The book also describes algorithms for measuring software evolution, particularly as it relates to IP changes.
If you are a teacher, this is a great time to incorporate the materials in the book into your courses on software development, intellectual property law, business management, and computer science. There’s something for everyone in the various chapters of the book. Your students and you will be at the forefront of an important and very new field of study.
If you’re interested, please contact me.
S.A.F.E. recently released version 4.4 of CodeSuite and version 1.1 of CodeSuite-LT. The most important new feature of this version is that these programs now recognizes many different text encoding formats including ASCII, UTF-8, UTF-16, and UTF-32. Characters in alphabets other than the Latin alphabet used for English are now supported. For example, code with comments or strings in Japanese, Korean, Chinese, or Russian can be compared correctly.
The most significant change is to BitMatch. When examining binary object code to find text strings, you can now specify the encoding format of the file. If you’re not sure about the encoding, you can choose multiple formats.
As demand for our products increase outside the United States, we realized a need to support languages in those countries also.
My book on software intellectual property, a labor of love (and hate) for the last two years, has just been published by Prentice-Hall. The book is intended for several different audiences including computer scientists, computer programmers, business managers, lawyers, engineering consultants, expert witnesses, and high-tech entrepreneurs. Some chapters give easy-to-understand explanations of intellectual property concepts including copyrights, patents, and trade secrets. Other chapters are highly mathematical treatments describing quantitative ways of comparing and measuring software and software IP. The first chapter of the book outlines which chapters are most important for the different audiences.
Overall the book covers the following topics:
- Key concepts of software intellectual property
- Comparing and correlating source code for signs of theft or infringement
- Uncovering signs of copying in object code when source code is inaccessible
- Tracking malware and third-party code in applications
- Using software clean rooms to avoid IP infringement
- Understanding IP issues associated with patents, open source, and DMCA
You can purchase your copy from Amazon.com here.
Software Analysis & Forensic Engineering Corporation today released a case study of Online IP Screening between Zynga’s FarmVille game and CrowdStar’s Happy Aquarium game. The study shows some interesting correlation between the source code for the two games. SAFE Corporation is officially announcing its SAFE Online IP Screening service that is targeted at social games and other online applications. The screening service is a subscription service to regularly examine online applications for signs of copying. In this first case study, we already found surprising results. Even after the normal process of eliminating correlation due to third party code, commonly used identifier names, automatically generated code, common algorithms, and common authors, correlation remained. Was this intentional? Illegal? Acceptable? Coincidence? Decide for yourself: see summaries of this and other case studies here and register to download the full case studies here.
One unique feature of online applications is that often the full source code is downloaded to the user’s machine. This makes it easier for your competitors to copy your code. It also makes it easier for us to detect that copying. Learn more about SAFE Online IP Screening here or email us for details about how we can protect you from unauthorized copying and dissemination of your code.
CodeSuite-LT® is a less expensive, limited version of the full CodeSuite tool. Each tool in the suite produces a readable report that can be used to find copying. CodeSuite-LT includes BitMatch, CodeCross, CodeDiff, CodeMatch, FileCount, and FileIsolate. It also includes the ability to filter results using SourceDetective. CodeSuite-LT does not produce a database and does not allow post-process filtering of results. Instead, it generates an easy-to-read report that can be used to pinpoint copying.
Which is Right For You?
Which product is right for you, CodeSuite or CodeSuite-LT? Click here for a table that compares the features of both programs so you can choose the right solution.
So the government is finding ways to fix the patent system. One of those fixes is the Peer-to-Patent program. It seems like a good idea. In order to speed up the granting of good patents and quickly eliminate the bad ones, allow people from everywhere and anywhere to submit prior art. If that’s actually the way it worked, I’d celebrate; it would be a great resource for finding prior art and making the patent office more efficient. Unfortunately my experience is that the program creates more problems than it fixes. The patent office invited me to participate in the program. Two people posted “invalidating prior art” for my patent application entitled “Detecting Plagiarism in Computer Source Code.” This art was related to my invention, but definitely was not invalidating. Here is the first independent claim of my original patent application:
- A computer-implemented method comprising:
- creating, by a computer system, a first array of lines of functional program code from a first program source code file, the first program source code file including the lines of functional program code of a first program and lines of nonfunctional comments of the first program;
- creating, by the computer system, a second array of lines of nonfunctional comments from a second program source code file, the second program source code file including lines of functional program code of a second program and the lines of nonfunctional comments of the second program;
- comparing, by the computer system, the lines of functional program code from the first array with the lines of nonfunctional comments from the second array to find similar lines;
- calculating, by the computer system, a similarity number based on the similar lines; and presenting to a user an indication of copying of the first program source code file wherein said indication of copying is defined by the similarity number.
Here is the only dependent claim of the prior art patent US 7,568,109:
- A system for comparing at least a first corpus to a second corpus, comprising:
- an analyzer identifying concepts in the corpuses, said analyzer determining a frequency rating of each of said concepts in each corpus;
- for each corpus, replacing each instance of each of said concepts with its respective determined frequency rating to create a frequency file;
- and a comparator comparing the frequency file for the first corpus to the frequency file for the second corpus, wherein said comparing the frequency file for the first corpus to the frequency file for the second corpus further comprises comparing portions of one corpus against the other corpus.
The second prior art submission was simply a reference to the UNIX diff command. While the diff command is relevant, it is a simple line-by line comparison of text files without any understanding or parsing of programming source code. It doesn’t separate functional lines of code (statements) from nonfunctional lines (comments).
Judging by their remarks, the posters to the Peer-to-Patent site didn’t understand patents, and didn’t read the patent claims. They should be allowed to post references, but the ultimate decision must be in the hands of those trained in examining patents. However, the patent examiner told me that her supervisor didn’t want to issue a patent that had been publicly noted to be invalid, and so after months of arguments I had to arbitrarily narrow the claims to get allowance, resulting in patent US 7,823,127. So now, anyone from anywhere with any ulterior motive (particularly those who believe no software should be patentable) can bring about the quick rejection of an otherwise useful and valid patent.
CodeScreener: Online Plagiarism Detection for Software
SAFE Corporation has developed an online plagiarism detection service for software. The CodeScreener™ service is built on SAFE Corporation’s court-tested CodeSuite® forensic software and patented source code correlation technology. CodeScreener is designed to streamline the plagiarism detection process, giving you a thorough analysis of each file and a consistent set of correlation metrics. It’s online, it’s interactive, and it’s much less expensive than standalone CodeSuite. Contact our Sales Department to get a free evaluation license.
Until now there were two ways of running really big jobs of CodeSuite. One was to simply run it and wait for as long as it took. Really large jobs can take as much as a week or two. The other option was to run the job on CodeGrid, our framework that distributes the job over a grid of networked computers. CodeGrid shows an almost linear speedup for each computer on the grid, but it requires someone to maintain the computers and the network and that can be a daunting job. Now there’s a third option;, CodeSuite-MP allows you to run multiple jobs on a single multicore computer. We’re seeing a near-linear speedup for the number of cores, and there’s no special maintenance required. We’re even seeing a near-linear speedup using virtual cores. If you want to get a license for CodeSuite-MP, contact our sales department.
Last month we announced CodeMeasure, our new standalone tool for measuring software growth. This month we announced the release of CodeSuite 4.0 that includes CodeCLOC for measuring how software evolves across versions of code. CodeCLOC uses the same algorithms that were implemented in CodeMeasure and that were developed for the landmark software transfer pricing case Symantec v. Commissioner of Internal Revenue.
You’re probably wondering what is the difference between CodeMeasure and CodeCLOC. CodeMeasure is a simple, inexpensive program for generating the CLOC measurement statistics for multiple versions of a program. CodeCLOC, intended for litigation, compares only two versions of code but produces a detailed database of results that can be further filtered and analyzed using CodeSuite or your own custom tools. The results from CodeCLOC can be presented in court and the CodeCLOC database can be presented to the opposing party for verification.
CodeSuite 4.0 also has a few other nice features including a revamped user interface. There’s also a new function to generate statistics from any CodeSuite database and the command line interface has been enhanced for integrating with other programs. CodeSuite 4.0 is available for download here and can be purchased on a term license or project basis. CodeCLOC is priced at $20 per megabyte. A one year term license for CodeSuite is $100,000.