Tag Archives: plagiarism detection

SAFE Corporation Awarded Patent Number Six for its CodeSuite Software Forensics Tool

The CodeCross function of CodeSuite compares functional source code to commented-out source code

CUPERTINO, CA (February 9, 2015) – Software Analysis & Forensic Engineering Corporation, the leading provider of forensic tools for software copyright and trade secret analysis, had its sixth patent allowed covering its CodeSuite® tool for comparing software code to help detect copyright infringement.

This latest patent is entitled “Detecting Plagiarism in Computer Source Code” and covers the CodeCross functionality that compares functional code to non-functional code. CodeSuite is the only commercially successful tool for comparing computer source code and object code to find infringement that has been accepted by the courts. It has been used successfully in over 70 intellectual property litigations worldwide. CodeSuite has been recognized by the USPTO as a unique invention. Our customers agree.

“Other programs that compare software don’t provide any understanding about the comparison or the results,” according to Gary Stringham of Gary Stringham & Associates, who has used CodeSuite in his expert witness cases. “Things match or they don’t. Only CodeSuite allows me to delve into the reasons for the matches, search the Internet for comparable third-party code, and then systematically filter out false positives. This means I can focus on possible infringement very quickly. Or, if nothing is left after filtering, I have a very strong argument against infringement.”

“CodeSuite has survived every challenge in court that it’s ever faced,” says Bob Zeidman, president of SAFE Corporation and inventor of CodeSuite. “Judges and juries like the quantitative, objective measurements produced by CodeSuite when they’re produced by a qualified expert trained in the tool. We provide online certification courses that give lawyers confidence that the expert knows how to use the tool and produce rock solid results that will stand up to scrutiny in court.”

CodeSuite 4.7 is available now and can be purchased on a term license or project basis. Project pricing is based on the size of code analyzed and the specific function used for the analysis. Pricing varies from $10 per megabyte for CodeCross® to $400 per megabyte for CodeMatch®. A six-month unlimited use license for CodeSuite is $50,000. A limited feature version of the program, CodeSuite-LT, is available for a six-month unlimited license for $3,000. Free trial licenses can be requested by contacting sales@SAFE-corp.biz.

Was the Microsoft Empire Built on Stolen Goods?

The history of the computer industry is filled with fascinating tales of sudden riches and lost opportunities. Take that of Ronald Wayne, who cofounded Apple Computer with Steve Wozniak and Steve Jobs but sold his shares for just US $2,300. And John Atanasoff, who proudly showed his digital computer design to John Mauchly who later codesigned the Eniac, typically recognized as the first electronic computer, without credit to Atanasoff. Perhaps the most famous story of missed fame and fortune is that of Gary Kildall. A pioneer in computer operating systems, Kildall started the company Digital Research and wrote Control Program for Microcomputers (CP/M), the operating system used on many of the early hobbyist personal computers, such as the MITS Altair 8800, the IMSAI 8080, and the Osborne 1, before IBM introduced its own PC. Kildall could have been the king of personal computer software, but instead that title went to his small-time rival Bill Gates. For years, rumors have circulated that the code for the original DOS operating system sold by Microsoft is actually copied from the CP/M operating system developed by Digital Research.

A couple years ago we took it upon ourselves to search out the original code and use CodeSuite to determine the truth once and for all. Our research was summarized in a popular (and not-so-popular) article in IEEE Spectrum entitled Did Bill Gates Steal the Heart of DOS? If you haven’t read it, you should. It’s a fun read but it only summarizes our exhaustive results using our tools and procedures for finding copied code. The article generated a lot of controversy and we always intended to publish the full technical details of our analysis, but it’s surprising how many people don’t like our conclusion and wouldn’t publish my paper. But now the full academic paper entitled A Code Correlation Comparison of the DOS and CP/M Operating Systems is available online in the Journal of Software Engineering and Applications. If you want to know the details, and you want to know the truth, it’s in the article and the details are in the paper.

S.A.F.E. Releases CodeSuite 4.7

Software Analysis and Forensic Engineering has just released a new version  of CodeSuite that has some really great new features.

 

PID  spreadsheets

What’s  a PID? It’s a partial identifier. Or more specifically, a partially matching identifier. That’s where two identifiers in code almost match. So for example, the identifiers identifier1 and confident_boy share the partial identifier (or “PID”) ident. CodeMatch has always been able to correlate PIDs and use that in calculating the identifier correlation score as a component of the entire correlation score between two source code files. But there can be so many PIDs that users got blurry-eyed trying to view them all and find suspicious ones in a CodeMatch HTML report. So we came up with a solution. You can now export the PIDs from a CodeSuite database into a spreadsheet. You can see not only the PIDs, but the original identifiers that share the PIDs. Now you can sort and select, cut and paste, and generally look for clues to copying in a simple spreadsheet.

 

FileIdentify™

Part of our process for finding copying has been to first find all the source code files in a directory of files so that you know what to examine. However, there are lots of source code files, and some can be missed. Some programming languages are a bit uncommon and you may not recognize the source code files. Well, we found a solution to that too. The new FileIdentify function of CodeSuite allows you to point at a folder and generate a spreadsheet containing all of the file extensions in that folder and all subfolders. If CodeSuite recognizes the (potential) programming language, it will put that information in the spreadsheet too.

 

XML

From the beginning of CodeSuite, when there was only CodeMatch, the database has always been a fully documented text file that anyone can view. This allows our customers to make their own tools to extract data and statistics from a CodeSuite comparison, and some customers have created some very interesting utilities. Our database format was simple, but grew more complex over the years. Now we have a function in CodeSuite that converts any CodeSuite database into XML so that you can use off-the-shelf tools to examine it, translate it, or write utilities to extract data and statistics.

Be a Pioneer in the Field of Software Forensics

I hope you’re all aware of my book The Software IP Detective’s Handbook: Measurement, Comparison, and Infringement Detection. It’s the first book on Software Forensics, a field that I pioneered at Software Analysis and Forensic Engineering and Zeidman Consulting. Whereas Digital Forensics deals with bits and files, without any detailed knowledge of the meaning of the data, Software Forensics deals with analysis of software using detailed knowledge of its syntax and functionality to perform analysis to find stolen code and stolen trade secrets. The algorithms described in the book have been used in many court cases. The book also describes algorithms for measuring software evolution, particularly as it relates to IP changes.

If you are a teacher, this is a great time to incorporate the materials in the book into your courses on software development, intellectual property law, business management, and computer science. There’s something for everyone in the various chapters of the book. Your students and you will be at the forefront of an important and very new field of study.

If you’re interested, please contact me.

HTML Preprocessor Released

S.A.F.E. recently released the HTML Preprocessor. The HTML Preprocessor is designed to transform web pages into files that are amenable to analysis by CodeSuite, DocMate, and other source code analysis tools. The HTML Preprocessor examines HTML files and other markup language files and extracts all embedded code into separate files. These files each contain only one kind of code that can be easily analyzed and compared using CodeSuite and DocMate. The code contained in these generated files are:

  • Scripts such as JavaScript and VBScript
  • Cascading style sheets (CSS)
  • Comment text containing HTML comments
  • Message text containing HTML user messages
  • HTML tags
  • Pure HTML
  • Pseudocode representation of the HTML

CodeSuite 4.4 and CodeSuite-LT 1.2 Released

S.A.F.E. recently released version 4.4 of CodeSuite and version 1.1 of CodeSuite-LT. The most important new feature of this version is that these programs now recognizes many different text encoding formats including ASCII, UTF-8, UTF-16, and UTF-32. Characters in alphabets other than the Latin alphabet used for English are now supported. For example, code with comments or strings in Japanese, Korean, Chinese, or Russian can be compared correctly.

The most significant change is to BitMatch. When examining binary object code to find text strings, you can now specify the encoding format of the file. If you’re not sure about the encoding, you can choose multiple formats.

As demand for our products increase outside the United States, we realized a need to support languages in those countries also.

The Software IP Detective’s Handbook

My book on software intellectual property, a labor of love (and hate) for the last two years, has just been published by Prentice-Hall. The book is intended for several different audiences including computer scientists, computer programmers, business managers, lawyers, engineering consultants, expert witnesses, and high-tech entrepreneurs. Some chapters give easy-to-understand explanations of intellectual property concepts including copyrights, patents, and trade secrets. Other chapters are highly mathematical treatments describing quantitative ways of comparing and measuring software and software IP. The first chapter of the book outlines which chapters are most important for the different audiences.

Overall the book covers the following topics:

  • Key concepts of software intellectual property
  • Comparing and correlating source code for signs of theft or infringement
  • Uncovering signs of copying in object code when source code is inaccessible
  • Tracking malware and third-party code in applications
  • Using software clean rooms to avoid IP infringement
  • Understanding IP issues associated with patents, open source, and DMCA

You can purchase your copy from Amazon.com here.

DocMatch detects plagiarism

S.A.F.E. has recently announced the release of DocMatch, a new tool for comparing all kinds of documents to find plagiarism. Our unique, patented technology has proved very useful for finding copied computer code in court. We decided to apply our technology to general documents like articles, papers, and novels. There have been a few cases where we built custom applications to compare written engineering specifications. The results were very useful. In one case, finding copied but modified software specifications gave clues that showed how one company copied another’s software.

DocMatch can be licensed as the full version or the LT version. The full version is the professional tool. It creates a database containing matching elements between two sets of documents. The full version can automatically search the Internet for all references to commonly used words and filter them from the database. Also, sophisticated statistics can be extracted from the database. The full version costs $150 for a one-year license. The LT version produces an easy-to-read HTML report showing words, sentences, and paragraphs that are identical or similar in every pair of documents. The LT version costs $30 for a one-year license. Register to download your copy here.