
TWO TEXT COMPARE ONLINE CODE
The assessment of source code similarity has a co-evolutionary relationship with the modifications made to the code at the point of its creation. Examples include finding similar bug fixes (Hartmann et al., 2010), identifying cross-cutting concerns (Bruntink et al., 2005), program comprehension (Maletic and Marcus 2001), code recommendation (Holmes and Murphy 2005), and example extraction (Moreno et al., 2015). Whilst that list covers the more common applications, similarity assessment is used in many other areas, too.
TWO TEXT COMPARE ONLINE SOFTWARE
These include clone detection, the problem of locating duplicated code fragments plagiarism detection software copyright infringement and code search, in which developers search for similar implementations. This broad, thorough study is the largest in existence and potentially an invaluable guide for future users of similarity detection in source code.Īssessing source code similarity is a fundamental activity in software engineering and it has many applications. The code similarity analysers are thoroughly evaluated not only based on several well-known pair-based and query-based error measures but also on each specific type of pervasive code modification.

After directly applying optimal configurations derived from one data set to another, the tools perform poorly on the new data set. Moreover, we demonstrate that optimal configurations are very sensitive to a specific data set. Its use reduced false classifications to zero for three of the tools. Our study strongly validates the use of compilation/decompilation as a normalisation technique. Our experimental results show that in the presence of pervasive modifications, some of the general textual similarity measures can offer similar performance to specialised code similarity tools, whilst in the presence of boiler-plate code, highly specialised source code similarity detection techniques and tools outperform textual similarity measures. These are (1) pervasively modified code, created with tools for source code and bytecode obfuscation, and boiler-plate code, (2) source code normalisation through compilation and decompilation using different decompilers, (3) reuse of optimal configurations over different data sets, (4) tool evaluation using ranked-based measures, and (5) local + global code modifications. We evaluate 30 code similarity detection techniques and tools using five experimental scenarios for Java source code.

code changes that are contained in a single method or code block.

transformations that may have a global effect, and local modifications, i.e. We are interested in two types of code modification in this study: pervasive modifications, i.e. These code modifications could affect the performance of code similarity analysers including code clone and plagiarism detectors to some certain degree. refactoring, bug fixing, or even software plagiarism. Often, the code is not copied as it is and it may be modified for various purposes e.g. Copying and pasting of source code is a common activity in software engineering.
