Class NGramPostProcessor

    • Field Detail

      • threshold

        @AdjustableParameter(name="Common Threshold",
                             defaultValue=0.3f,
                             minimumBound=0.0f,
                             maxumumBound=1.0f,
                             step=0.001f,
                             description="If a section of code appears in more than this % of the files it will be ignored. Used to remove skeleton and common code.")
        public float threshold
        Threshold determining when to ignore large sets of matches.

        If a block of code is common among a large set of files it is less likely to be plagiarism and more likely a common code pattern or something given to the students (e.g. skeleton files). This threshold determines the percentage of files over which matches will be ignored, to avoid false detections. Comparison is less than or equal to, so to deactivate set to 1

    • Constructor Detail

      • NGramPostProcessor

        public NGramPostProcessor()
    • Method Detail

      • processResults

        public ModelTaskProcessedResults processResults​(java.util.List<ISourceFile> files,
                                                        java.util.List<NGramRawResult> rawResults)
        The main entrance method for the postprocessor.

        This method takes in the outputs produced by detection and processes them into groups by the duplicate content, instead of file pairs.

        Specified by:
        processResults in interface IPostProcessor<NGramRawResult>
        Parameters:
        files - The list of files covered by the rawResults passed.
        rawResults - The set of rawResults produced by the IDetector.
        Returns: