In the past few years I’ve been interviewing students for job openings at my companies. Some students came from large, well-known universities while other came from small colleges. Some students had bachelor’s degrees in computer science while others had master’s degrees. One thing that many of these recent graduates had in common was that they couldn’t program competently.
I found that these graduating students were adept at finding code on the Internet. When I gave assignments to code a particular algorithm, I was seriously impressed with how quickly they were able to find the code online. When I asked them to modify the algorithm, they struggled. Also, testing and debugging code often seemed beyond their abilities. Many of them were unaware of debugging techniques that allow them to focus in on the problem, such as using breakpoints to isolate chunks of code or forcing conditions that cause certain code paths to be executed.
The art of commenting also seems to have been ignored in most computer science education programs as well as in many companies. In my companies, our coding standard requires that every routine, no matter how small, must have a header comment that describes the functionality of the routine, all input parameters, the output of the routine, and any other information that someone using the routine would need. Yet most programmers out of school, and many working in the industry, produce uncommented code that is difficult to understand, difficult to debug, and very difficult to maintain.
Can you imagine a medical program that didn’t teach how to stitch up a patient after surgery or use the latest CT scanner? University computer science departments need to take a serious look at the skills they’re teaching. At my companies, I now require prospective employees to sit down at a computer and write a program that works correctly according to a written specification, is fully commented, and is completely their own code. I hope that the percentage of graduates passing this test increases in future years.
You can now run CodeMeasure to graph the growth of your software project development effort over multiple versions of the software. CodeMeasure uses the Changing Lines of Code (CLOC) method to calculate the growth. The graph that CodeMeasure produces illustrates various CLOC measurements. An example is shown below.
Now there is a caveat (we do need to make a profit after all). You can examine the graph and take a screen shot of it, but you can’t save the results to a spreadsheet without a paid license. The good news is that a license is only $500 for a 1-year unlimited license. You can download CodeMeasure here and purchase a license here. This way you get to try out CodeMeasure and see how the results can help you measure your software development effort.
Software Analysis & Forensic Engineering Corporation today released a case study of Online IP Screening between Zynga’s FarmVille game and CrowdStar’s Happy Aquarium game. The study shows some interesting correlation between the source code for the two games. SAFE Corporation is officially announcing its SAFE Online IP Screening service that is targeted at social games and other online applications. The screening service is a subscription service to regularly examine online applications for signs of copying. In this first case study, we already found surprising results. Even after the normal process of eliminating correlation due to third party code, commonly used identifier names, automatically generated code, common algorithms, and common authors, correlation remained. Was this intentional? Illegal? Acceptable? Coincidence? Decide for yourself: see summaries of this and other case studies here and register to download the full case studies here.
One unique feature of online applications is that often the full source code is downloaded to the user’s machine. This makes it easier for your competitors to copy your code. It also makes it easier for us to detect that copying. Learn more about SAFE Online IP Screening here or email us for details about how we can protect you from unauthorized copying and dissemination of your code.
CodeSuite-LT® is a less expensive, limited version of the full CodeSuite tool. Each tool in the suite produces a readable report that can be used to find copying. CodeSuite-LT includes BitMatch, CodeCross, CodeDiff, CodeMatch, FileCount, and FileIsolate. It also includes the ability to filter results using SourceDetective. CodeSuite-LT does not produce a database and does not allow post-process filtering of results. Instead, it generates an easy-to-read report that can be used to pinpoint copying.
Which is Right For You?
Which product is right for you, CodeSuite or CodeSuite-LT? Click here for a table that compares the features of both programs so you can choose the right solution.
So the government is finding ways to fix the patent system. One of those fixes is the Peer-to-Patent program. It seems like a good idea. In order to speed up the granting of good patents and quickly eliminate the bad ones, allow people from everywhere and anywhere to submit prior art. If that’s actually the way it worked, I’d celebrate; it would be a great resource for finding prior art and making the patent office more efficient. Unfortunately my experience is that the program creates more problems than it fixes. The patent office invited me to participate in the program. Two people posted “invalidating prior art” for my patent application entitled “Detecting Plagiarism in Computer Source Code.” This art was related to my invention, but definitely was not invalidating. Here is the first independent claim of my original patent application:
- A computer-implemented method comprising:
- creating, by a computer system, a first array of lines of functional program code from a first program source code file, the first program source code file including the lines of functional program code of a first program and lines of nonfunctional comments of the first program;
- creating, by the computer system, a second array of lines of nonfunctional comments from a second program source code file, the second program source code file including lines of functional program code of a second program and the lines of nonfunctional comments of the second program;
- comparing, by the computer system, the lines of functional program code from the first array with the lines of nonfunctional comments from the second array to find similar lines;
- calculating, by the computer system, a similarity number based on the similar lines; and presenting to a user an indication of copying of the first program source code file wherein said indication of copying is defined by the similarity number.
Here is the only dependent claim of the prior art patent US 7,568,109:
- A system for comparing at least a first corpus to a second corpus, comprising:
- an analyzer identifying concepts in the corpuses, said analyzer determining a frequency rating of each of said concepts in each corpus;
- for each corpus, replacing each instance of each of said concepts with its respective determined frequency rating to create a frequency file;
- and a comparator comparing the frequency file for the first corpus to the frequency file for the second corpus, wherein said comparing the frequency file for the first corpus to the frequency file for the second corpus further comprises comparing portions of one corpus against the other corpus.
The second prior art submission was simply a reference to the UNIX diff command. While the diff command is relevant, it is a simple line-by line comparison of text files without any understanding or parsing of programming source code. It doesn’t separate functional lines of code (statements) from nonfunctional lines (comments).
Judging by their remarks, the posters to the Peer-to-Patent site didn’t understand patents, and didn’t read the patent claims. They should be allowed to post references, but the ultimate decision must be in the hands of those trained in examining patents. However, the patent examiner told me that her supervisor didn’t want to issue a patent that had been publicly noted to be invalid, and so after months of arguments I had to arbitrarily narrow the claims to get allowance, resulting in patent US 7,823,127. So now, anyone from anywhere with any ulterior motive (particularly those who believe no software should be patentable) can bring about the quick rejection of an otherwise useful and valid patent.
CodeScreener: Online Plagiarism Detection for Software
SAFE Corporation has developed an online plagiarism detection service for software. The CodeScreener™ service is built on SAFE Corporation’s court-tested CodeSuite® forensic software and patented source code correlation technology. CodeScreener is designed to streamline the plagiarism detection process, giving you a thorough analysis of each file and a consistent set of correlation metrics. It’s online, it’s interactive, and it’s much less expensive than standalone CodeSuite. Contact our Sales Department to get a free evaluation license.
Until now there were two ways of running really big jobs of CodeSuite. One was to simply run it and wait for as long as it took. Really large jobs can take as much as a week or two. The other option was to run the job on CodeGrid, our framework that distributes the job over a grid of networked computers. CodeGrid shows an almost linear speedup for each computer on the grid, but it requires someone to maintain the computers and the network and that can be a daunting job. Now there’s a third option;, CodeSuite-MP allows you to run multiple jobs on a single multicore computer. We’re seeing a near-linear speedup for the number of cores, and there’s no special maintenance required. We’re even seeing a near-linear speedup using virtual cores. If you want to get a license for CodeSuite-MP, contact our sales department.
Many in the intellectual property business have been holding their breath waiting for this case to be decided. Many countries don’t allow software patents at all and most countries don’t allow business method patents. The United States allows both, but the lines, limits, and legality have been changing over the past years. The Court of Appeals for the Federal Circuit (CAFC) decided that Bilski’s patent on a method for handling energy hedge funds was not patentable because patents must be tied to a particular machine or transform an article from one thing or state to another. This “machine-or-transformation test” is probably as confusing to you as it is to the thousand of inventors and attorneys who had to understand it. Bilski appealed to the Supreme Court and on Monday the Supreme Court decided. Bilski loses his patent, but not because of the machine-or-transformation test. Abstract ideas have never been patentable and that’s what Bilski’s patent is, according to the Supreme Court. They also ruled that the machine-or-transformation test is only one test for patentability, not the only test as the CAFC had stated. They also ruled that business methods are patentable, as long as they are not abstract ideas.
Still confused? So are many others. Except for Bilski who now knows for sure that he doesn’t have a patent. Looking at it as an inventor, I see that the court has broadened the scope of patentable materials, which is good, but has made the test for patentability muddier which means I will spend even more time and more money arguing with patent examiners. Looking at it as an expert witness for patent litigation, this ruling is sure to cause a lot more disagreements, which means a lot more litigation, which means a lot more business for me.
An excellent discussion of the Bilski ruling can be found at Patently-O, written by Dennis Crouch, Associate Professor at the University of Missouri School of Law. His regular columns on patents are the best ones available anywhere.
SAFE has just introduced its latest product called CodeMeasure™ that can measure the growth of software. Unlike our other products, this one is intended for software developers (look for a litigation version coming soon to CodeSuite). The tool is based on the technique that Zeidman
Consulting developed for the case Symantec v. IRS that we call the Changing Lines of Code (CLOC) method of measuring software changes. It worked pretty well in the Symantec case to help calculate software transfer pricing, and saved Symantec over $500 million in taxes.
We have a whole new website about the product, designed for software developers, at CodeMeasure.com. Check it out and let me know what you think of the product and the website.
In 2003 I created the CodeMatch program that very quickly became a de facto standard in software IP litigation. I created a test bench of purposely plagiarized code that could be used to independently and objectively compare the results produced by different plagiarism detection programs. Some in the academic community claimed that my tests were biased toward the algorithms used by CodeMatch, which explained why CodeMatch fared so well compared to the other programs. However, these same critics, despite my requests, never produced their own set of standard tests.
Although I believe that the standard tests I have used are not biased, it occurred to me that there could be a better way to eliminate even unintentional bias. The solution would be to take the source code for certain open source programs and announce a new open source project that would involve purposely plagiarizing the code. Programmers from around the world would be invited, perhaps in a competition, to change the source code while retaining the functionality. The original programs and the plagiarized versions submitted from others would be stored in a database known as the Depository of Universal Plagiarism Examples or DUPE. Plagiarism detection programs would then be run on DUPE and comparisons of the results could be made to determine which programs best detected copying. Also, important statistics about plagiarized code could be determined, as well as patterns identified in order to improve the plagiarism detection programs.
SAFE Corporation has begun looking into creating this database. However, we realize that we would like to work with partners in academia and industry. We believe that there are several key issues that need to be resolved in creating DUPE. These are:
- Choosing appropriate open source projects.
- Creating a minimum definition of software plagiarism.
- Creating the database.
- Determining policies including who can access it, how it will be used, and who will maintain it.
- Determining how to run the tests, how to generate the results, and how to distribute the results.
Please contact me if you’re interested in working on this important and groundbreaking project.