Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Java API for JPlag #89

Merged
merged 93 commits into from
May 27, 2021
Merged
Show file tree
Hide file tree
Changes from 92 commits
Commits
Show all changes
93 commits
Select commit Hold shift + click to select a range
f377da4
Remove webservice-related files and folders
Oct 13, 2020
ac03879
Add *.iml files to .gitignore
Oct 13, 2020
0b98211
Merge branch 'develop' into remove-files-not-required
Oct 13, 2020
226c67a
Merge pull request #1 from ls1intum/remove-files-not-required
Oct 13, 2020
255d636
Delete unnecessary resources
Oct 13, 2020
2218aa8
Pass JPlagOptions to Program constructor
Oct 13, 2020
9765a5d
Remove unused methods from Options
Oct 13, 2020
d2f8367
Add Verbosity and Language enum
Oct 13, 2020
c263e30
Remove unused OptionContainer
Oct 13, 2020
bf911c1
Format code
Oct 13, 2020
9a704a6
WIP Replace CommandLindOptions with new JPlagOptions; remove unnecess…
Oct 18, 2020
5544507
Remove BufferedCounter and Colors
Oct 18, 2020
2da099b
Rename Program and JPlag classes
Oct 18, 2020
104616c
Create JPlagResult class; rename some files
Oct 18, 2020
4eb8b4d
Format jplag/pom.xml
Oct 18, 2020
d79fc50
Update version in jplag/pom.xml
Oct 18, 2020
6b4e27e
Update version of maven-assemly-plugin in jplag/pom.xml
Oct 18, 2020
44774f5
Update README
Oct 19, 2020
74ce5bf
Update README
Oct 19, 2020
df625c8
Format jplag.parent/pom.xml
Oct 20, 2020
b9cde04
Move language initialization to JPlag class
Oct 20, 2020
87e90c4
Move similarity matrix attribute to JPlag class
Oct 20, 2020
6fd209e
Add unified method for submission comparison
Oct 20, 2020
28ba312
Add public setters to JPlagOptions
Oct 20, 2020
a6e3b77
Make compare() return an actual JPlagResult
Oct 20, 2020
4f89fdf
Refactor checkBaseCodeOption()
Oct 20, 2020
39c616d
Use strategy pattern for comparison modes
Oct 21, 2020
054845b
Format AllMatches
Oct 21, 2020
96b9784
Use CLI for testing purposes
Oct 21, 2020
f813b3c
Simplify how submissions get parsed
Oct 21, 2020
a9e1aaa
Refactor findSubmissions()
Oct 21, 2020
6abdcfb
Update JPlagOptions
Oct 21, 2020
5e1ee1d
Refactor submission scanning
Oct 22, 2020
abd2db0
Simplify submission scanning
Oct 22, 2020
3648f9a
Reformat some code
Oct 22, 2020
fb7a7b8
Reformat some code of the Submission class
Oct 22, 2020
766d94f
Rename jplag.options.Language to LanguageOption to avoid naming conflict
Oct 22, 2020
a7e11d8
Remove experimental comparison strategy
Oct 22, 2020
2902ea1
Remove unused files
Oct 22, 2020
231c10a
Remove 'exp' attribute from JPlagOptions
Oct 22, 2020
78ecb99
Tidy up JPlagOptions
Oct 22, 2020
85ad266
Reformat normal and revision strategies
Oct 25, 2020
af397d6
Remove print statement from Submission class
Oct 26, 2020
b4085a7
Refactor JPlagOptions
Oct 26, 2020
c3efba0
Remove experimental Filter class
Oct 26, 2020
f2de667
Refactor JPlagOptions
Oct 26, 2020
faf9fc7
Reformat Matches class
Oct 26, 2020
8d8e988
Reformat Match class
Oct 26, 2020
f13574c
Reformat GSTiling class
Oct 26, 2020
d2a6d11
Rename AllMatches to JPlagComparison
Oct 26, 2020
b87e175
Reformat JPlagComparison class
Oct 26, 2020
c5330b6
Remove JPlagComparison inheritance from Matches
Oct 26, 2020
3a7c778
Rename AllBaseCodeMatches to JPlagBaseCodeComparison
Oct 26, 2020
a82fce9
Remove test attribute from JPlagResult
Oct 26, 2020
238e909
Follow up to a82fce97
Oct 26, 2020
a38cc12
Calculate the similarity distribution in JPlagResult
Oct 26, 2020
a211620
Use new JPlagResult API in NormalComparisonStrategy
Oct 26, 2020
8cebeea
Add new accessors to JPlagResult
Oct 26, 2020
3df924d
Simplify JPlagComparison comparators
Oct 26, 2020
fc2ae06
Add similarityThreshold option
Oct 26, 2020
d2c6d6a
Add option to specify a metric the similarity threshold refers to
Oct 26, 2020
a3ffd72
Remove unused code in NormalComparisonStrategy
Oct 26, 2020
17f4b23
Refactor RevisionComparisonStrategy
Oct 26, 2020
86581c9
Move registerMatch to ExternalComparisonStrategy
Oct 26, 2020
cc2d515
Deprecate storeMatches and storePercent options
Oct 26, 2020
ea7f8cc
Remove unused code from JPlagComparison
Oct 26, 2020
4dbf8d6
Update README
Oct 26, 2020
87b8af1
Update README
Oct 27, 2020
c2d958f
WIP Update README
Oct 27, 2020
390369a
Update README.md
Nov 1, 2020
a6eeb3d
Update README.md
Nov 1, 2020
68ce582
Update README.md
Nov 1, 2020
48d1327
Fix NullPointerException during base code scan
Nov 8, 2020
ed7e486
Add possibility to generate a report of the jplag result
Dec 1, 2020
3361299
Update readme
Dec 10, 2020
89649e7
Add class diagram png
Dec 10, 2020
b0b1711
Update README.md
Dec 10, 2020
f6c0cab
Fix ArithmeticDistribution when generating the similarity distributio…
Dec 19, 2020
92c3d1e
Decrease size of generated report
Dec 19, 2020
85164a6
Add argparse4j library
philipphundertmark Apr 13, 2021
3f3b266
Add basic arguments to CLI
philipphundertmark Apr 18, 2021
cc7c7e1
Map language option to LanguageOption enum
philipphundertmark Apr 18, 2021
0e638e5
Finish CLI implementation
philipphundertmark May 9, 2021
69b13d7
Update README.md
May 9, 2021
394b1e4
Update README.md
May 16, 2021
43198b3
Merge branch 'master' into develop
May 17, 2021
ee1ce17
Fixed formatting and made minor changes to comments in the strategy p…
tsaglam May 18, 2021
822a172
Renamed the GSTiling class to make it clear what it actually does.
tsaglam May 18, 2021
86277ef
Added some comments for to important interfaces, minor alterations to…
tsaglam May 19, 2021
a6a1b14
Fixed some minor warnings along the way.
tsaglam May 19, 2021
ca16286
Organized imports and made overrides explicit.
tsaglam May 19, 2021
60e49c0
Applied auto-formatting.
tsaglam May 19, 2021
b862730
Restored the CI configuration file.
tsaglam May 20, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@ target/

*.class

result

# Mobile Tools for Java (J2ME)
.mtj.tmp/

Expand All @@ -18,3 +20,4 @@ hs_err_pid*

# intelliJ
.idea/
*.iml
15 changes: 0 additions & 15 deletions .travis.yml

This file was deleted.

246 changes: 180 additions & 66 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,92 +1,206 @@
# JPlag - Detecting Software Plagiarism

[![Build Status](https://travis-ci.org/jplag/jplag.svg?branch=master)](https://travis-ci.org/jplag/)
[![Latest Release](https://img.shields.io/github/release/jplag/jplag.svg)](/~https://github.com/jplag/jplag/releases/latest)
[![License](https://img.shields.io/github/license/jplag/jplag.svg)](/~https://github.com/jplag/jplag/blob/master/LICENSE)

This fork focuses on the development of a new Java API for JPlag.

## Download and Installation

### Building from sources

1. Download or clone the code from this repository.
2. Run `mvn clean install` from the root of the repository to install all submodules. You will find the JARs in the respective `target` directories.
3. Inside the `jplag` directory run `mvn clean generate-sources package assembly:single`.

You'll find the generated JAR with all dependencies in `jplag/target`.

## (Breaking) Changes

> Note: The following list of changes doesn't claim to be complete and is intended to give a rough overview of the changes introduced with this fork.

This fork made the following (breaking) changes to the JPlag repository:

* Removed folders related to (deprecated) web services: `adminTool`, `atujplag`, `homepage`, `maven-libs`, `webService`, and `wsClient`
* Deleted unnecessary resources from `jplag/src/main/resources`
* All Cluster-related code fragments are commented and marked as `TODO`
* The new `JPlagOptions` class replaces all previous Options-related classes and manages all available program options
* Added the `argparse4j` package to properly parse CLI arguments
* Renamed the `Program` class to `JPlag`. It contains the main logic the parse submissions and delegate the comparison of files to one of the new `ComparisonStrategy` implementing classes.
* Deleted the **experimental** comparison mode. The **external** and **special** comparsion strategies are commented and marked as `TODO`. The **Normal** and **Revision** strategy should work as intended.
* The new `JPlagResult` class is supposed to bundle all results of a plagiarism detection run. An instance of this class can optionally be passed to the new `Report` class to generate web pages of the results
* While re-implementing the CLI, we renamed/removed some arguments to adapt the CLI to the new code structure. A detailed description of all available options of the new CLI can be found below.
* The new `JPlagComparison` class replaces the old `AllMatches` class.
* We removed `AvgComparator`, `AvgReversedComparator`, `MaxComparator`, `MaxReversedComparator`, `MinComparator`, and `MinReversedComparator` from the `JPlagComparison` (previously `AllMatches`) class. All comparisons are now sorted by average similarity. The `JPlagResult` and `JPlagComparison` classes should make adding a custom sorting logic very straightforward if that's required.

In addition, without referring to any specifics, this fork also includes:
* Tons of code formatting & restructuring
* Renaming of classes, files, and functions
* Deletion of unused code

## Usage

### CLI

```
usage: jplag [-h]
[-l {java_1_1,java_1_2,java_1_5,java_1_5_dm,java_1_7,java_1_9,python_3,c_cpp,c_sharp,char,text,scheme}]
[-bc BC] [-v {parser,quiet,long,details}] [-d] [-S S] [-p P]
[-x X] [-t T] [-s S] [-r R] rootDir

JPlag - Detecting Software Plagiarism

positional arguments:
rootDir The root-directory that contains all submissions

named arguments:
-h, --help show this help message and exit
-l {java_1_1,java_1_2,java_1_5,java_1_5_dm,java_1_7,java_1_9,python_3,c_cpp,c_sharp,char,text,scheme}
Select the language to parse the submissions
(default: java_1_9)
-bc BC Name of the directory which contains the base
code (common framework)
-v {parser,quiet,long,details}
Verbosity (default: quiet)
-d (Debug) parser. Non-parsable files will be stored
(default: false)
-S S Look in directories <root-dir>/*/<dir> for
programs
-p P comma-separated list of all filename suffixes
that are included
-x X All files named in <file> will be ignored
-t T Tune the sensitivity of the comparison. A smaller
<n> increases the sensitivity
-s S Similarity Threshold: all matches above this
threshold will be saved
-r R Name of directory in which the web pages will be
stored (default: result)
```

### Java API

The new API makes it easy to integrate JPlag's plagiarism detection into external Java projects.

#### Example

```java
JPlagOptions options = new JPlagOptions("/path/to/rootDir", LanguageOption.JAVA_1_9);
options.setBaseCodeSubmissionName("template");

JPlag jplag = new JPlag(options);
JPlagResult result = jplag.run();

## Download and Run JPlag
Download a released version - all releases are single-JAR releases.
List<JPlagComparison> comparisons = result.getComparisons();

Type `java -jar jplag-yourVersion.jar` in a console to see the command line options.
The options as of 2019/03/20 are:
// Optional
File outputDir = new File("/path/to/output");
Report report = new Report(outputDir);

report.writeResult(result);
```
JPlag (Version 2.12.0-SNAPSHOT), Copyright (c) 2004-2019 KIT - IPD Tichy, Guido Malpohl, and others.
Usage: JPlag [ options ] [<root-dir>] [-c file1 file2 ...]
<root-dir> The root-directory that contains all submissions

options are:
-v[qlpd] (Verbose)
q: (Quiet) no output
l: (Long) detailed output
p: print all (p)arser messages
d: print (d)etails about each submission
-d (Debug) parser. Non-parsable files will be stored.
-S <dir> Look in directories <root-dir>/*/<dir> for programs.
(default: <root-dir>/*)
-s (Subdirs) Look at files in subdirs too (default: deactivated)

-p <suffixes> <suffixes> is a comma-separated list of all filename suffixes
that are included. ("-p ?" for defaults)

-o <file> (Output) The Parserlog will be saved to <file>
-x <file> (eXclude) All files named in <file> will be ignored
-t <n> (Token) Tune the sensitivity of the comparison. A smaller
<n> increases the sensitivity.
-m <n> (Matches) Number of matches that will be saved (default:20)
-m <p>% All matches with more than <p>% similarity will be saved.
-r <dir> (Result) Name of directory in which the web pages will be
stored (default: result)
-bc <dir> Name of the directory which contains the basecode (common framework)
-c [files] Compare a list of files.
-l <language> (Language) Supported Languages:
java19 (default), java 17, java15, java15dm, java12, java11, python3, c/c++, c#-1.2, char, text, scheme

#### Class Diagram

![UMLClassDiagram.png](UMLClassDiagram.png)

## Concepts

This section explains some fundamental concepts about JPlag that make it easier to understand and use.

### Root directory

This is the directory in which JPlag will scan for submissions.

### Submissions

Submissions contain the source code that JPlag will parse and compare. They have to be direct children of the root directory and can either be single files or directories.

#### Example: Single-file submissions

```
/path/to/root-directory
├── Submission-1.java
├── ...
└── Submission-n.java
```

#### Example: Directory submissions

JPlag will read submission directories recursively, so they can contain multiple (nested) source code files.

**Note:** java19 refers to all java version from 9 on (currently 9 - 12).
```
/path/to/root-directory
├── Submission-1
│ ├── Main.java
│ └── util
│ └── Utils.java
├── ...
└── Submission-n
├── Main.java
└── util
└── Utils.java
```

### Example
Assume that we want to check students' solutions that are written in Java 11.
If you want JPlag to scan only one specific subdirectory of a submission for source code files (e.g. `src`), you can pass the `--subDir` option:

Each student solution is in its own directory, say `student1`, `student2`, and so on.
All solutions are in a common directory, say `exercise1`.
```
With option --subDir=src

/path/to/root-directory
├── Submission-1
│ ├── src
│ │ ├── Main.java # Included
│ │ └── util
│ │ └── Utils.java # Included
│ ├── lib
│ │ └── Library.java # Ignored
│ └── Other.java # Ignored
└── ...
```

To run JPlag, simply type `java -jar jplag-yourVersion.jar -l java19 -r /tmp/jplag_results_exercise1/ -s /path/to/exercise1`
### Base Code

- `-l java19` tells JPlag to use the frontend for Java 9+
- `-s` tells JPlag to recurse into subdirectories; as we assume Java projects, we'll very likely encounter subdirectories such as `student1/src/`
- `-r /tmp/jplag_results_exercise1` tells JPlag to store the results in the directory `/tmp/jplag_results_exercise1`
The base code is a special kind of submission. It is the template that all other submissions are based on. JPlag will ignore any match between two submissions that is also part of the base code.

**Note:** You have to specify the language exactly as they are printed by JPlag (running JPlag without command line arguments prints all available languages - and other options).
E.g., if you want to process C++ files, you have specify `-l c/c++` as language option.
Like any other submission, the base code has to be a single file or directory in the root directory.

### Options
#### `-x <file>` (eXclude) All files named in `<file>` will be ignored
The option `-x` requires an exclusion list saved as `<file>`.
The exclusion list contains a number of suffixes.
JPlag will ignore all files that end with one of the suffixes.
```
/path/to/root-directory
├── BaseCode
│ └── Solution.java
├── Submission-1
│ └── Solution.java
├── ...
└── Submission-n
└── Solution.java
```

#### `-c [files]` (Compare) Compare a list of files
Example: `java -jar jplag-yourVersion.jar -l java19 -c student1_file student2_file student3_file`
This option must be the last one.
JPlag will compare just a list of files pairwise.
#### Example

#### `-bc <dir>` (common framework) Name of the directory which contains the basecode
Example: `java -jar jplag-yourVersion.jar -s -l java19 ./submissions -bc template`
This option includes files that were given out to students as a framework or to fill in blanks - the content is compared with each submission and matching parts are excluded from mutual student matching.
`<dir>` is considered to be the name of a subdirectory, i.e. relative path from `<root-dir>`, residing somewhere in the submission directory, on the same level as student submissions.
**Note:** Due to a bug in all versions you have to provide the base directory without a slash at the end (e.g template, **not** template/).
In this example, students have to solve a given problem by implementing the `run` method in the template below. Because they are not supposed to modify the `main` function, it will be identical for each student.

## Building JPlag
To build and run a local installation of JPlag, you can use the pom.xml in this directory (aggregator). It builds JPlag and the available frontends.
```java
// BaseCode/Solution.java
public class Solution {

To generate single modules run `mvn clean generate-sources package` in the base directory; if you want a single file then run `mvn clean generate-sources assembly:assembly` inside the `jplag` directory after installing all submodules with `mvn clean install` from the base directory. You will find the JARs in the respective `target` directories. If you build a single JAR, it will be generated in `jplag/target`.
// DO NOT MODIFY
public static void main(String[] args) {
Solution solution = new Solution();
solution.run();
}

public void run() {
// TODO: Implement your solution here.
}
}
```

### Web Service
Installing, running and maintaining a local web service is not recommended as the web service uses outdated libraries and (really) needs polishing.
To prevent JPlag from detecting similarities in the `main` function (and other parts of the template), we can instruct JPlag to ignore matches with the given base code by providing the `--baseCode=<base-code-name>` option.

If you want to do it anyway: `atujplag` is the client, `webservice` is the - yepp - web service.
The `<base-code-name>` in the example above is `BaseCode`.

## Improving JPlag
## Contributing
We're happy to incorporate all improvements to JPlag into this code base. Feel free to fork the project and send pull requests.

### Adding new languages
Expand Down
Binary file added UMLClassDiagram.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading