Skip to content

Latest commit

 

History

History
546 lines (473 loc) · 17.9 KB

README.adoc

File metadata and controls

546 lines (473 loc) · 17.9 KB

SWAMP SARIF IO Libraries

Module Version: 1.0.0

Last updated on 12-05-2019

Sarif version currently tracking: 2.1.0

Name

swamp-sarif-io - Library to write SARIF files from Perl

Description

The Software Assurance Marketplace (SWAMP) runs software assurance tools, and converts the results of each tool into a common format called SCARF (SWAMP Common Assessment Result Format). There is now a new format being developed by OASIS called SARIF. This repository contains a set of libraries that allows a program to write SARIF data from programs written in Perl.

Documentation for SCARF is available here. Documentation for SARIF is available here.

Synopsis

An example of a normal usage where the assessment ran successfully:

use SarifJsonWriter;

my $output = "/path/to/file.sarif"
my $writer = new SarifJsonWriter($output, "utf-8");

my %optionsData = (
  pretty => 1,
  error_level => 2,
  preferOnlyArtifactIndex => 1,
  addProvenance => 1,
  external => {
    results => {
      name => "results.sarif-external-properties",
      maxItems => 1000,
    },
  },
)
$writer->SetOptions(\%optionsData);

my @errors = ();
push @errors, CheckInitialData($initialData);
push @errors, CheckInvocations($invocations);
push @errors, CheckResultData($resultData);
push @errors, CheckRuleData($ruleData);

$writer->BeginFile();
$writer->BeginRun($initialData);
$writer->AddToolData($toolData);
$writer->AddOriginalUriBaseIds($baseIds);
$writer->AddInvocations($invocations);
$writer->BeginResults();
$writer->AddResult($resultData);
$writer->EndResults();
$writer->AddResources($ruleData);
$writer->EndRun($endData);
$writer->EndFile();

Output

Significant SARIF properties included in the generated SARIF files:

{
  version: "...",
  $schema: "...",
  runs: [
    {
      automationDetails: {
        guid: "...",
      },
      tool: {
        driver: {
          name: "...",
          version: "...",
        }
      },
      properties: { ... },
      originalUriBaseIds: { ... },
      invocations: [
        commandLine: "...",
        arguments: [ ... ],
        startTimeUtc: "...",
        endTimeUtc: "...",
        workingDirectory: { ... },
        environmentVariables: { ... },
        exitCode: "..."
      ],
      results: [
        {
          ruleId: "...",
          level: "...",
          rank: "...",
          message: { ... },
          locations: [ ... ], # including line, column, snippets
          codeFlows: [ ... ],
          provenance: { ... },
          properties: { tags: [], ... },
        }, ...
      ]
      artifacts: [ ... ],
      conversion: { ... },
    }, ...
  ]
}

Requirements

The following Perl libraries should be installed for the program to work correctly:

  • JSON::Streaming::Writer

Subroutines

new($output_file, $encoding)

This is the subroutine used to instantiate the writer. This subroutine expects two parameters, which are the output file name and the encoding. Only the "utf-8" encoding is supported right now.

SetOptions($optionsData)

Sets options for the writer. Refer to the Data Structures section to learn more about what options are available.

GetPretty()

Returns a boolean that shows whether the writer currently pretty prints.

GetErrorLevel()

Returns the error level currently set.

GetNumBugs()

Returns the total number of result objects added.

GetNumMetrics()

Returns the total number of metrics.

GetWriterAttrs($hash)

Adds filenames created by SarifJsonWriter to $hash.

BeginFile()

This subroutine writes the version and schema properties and starts the runs array.

BeginRun($initialData)

This subroutine starts writing initial data to the run object and saves some data for later use.

AddToolData($toolData)

Adds information about the tool and/or extensions to the SARIF file.

AddOriginalUriBaseIds($baseIds)

Adds the originalUriBaseIds property to the SARIF file.

Note: This method adds only the originalUriBaseIds object. Paths in the SARIF file don’t technically adjust to paths passed here. Instead, they adjust to paths passed to the BeginRun($initialData) method.

AddSpecialLocations($displayBase)

Adds the specialLocations property.

AddInvocations($invocations)

This subroutine adds the invocation property.

BeginResults()

Starts the results property and array. Called once before AddResult($resultData) calls.

AddResult($resultData)

Every BugInstance in a SCARF file maps to a result object in SARIF. This subroutine writes the data for a result object, and stores some data that will only be written out after all result objects are written.

EndResults()

Ends the results property and array. Called once after all AddResult($resultData) calls.

AddResources($ruleData)

Adds the resources object.

EndRun($endData)

Data saved previously will be written out here. Also ends the run object and closes all external files.

EndFile()

Ends arrays and properties and closes the main sarif file.

CheckInitialData($initialData)

Checks whether the required fields in the data structure are set. Program either does nothing, just print errors or dies depending on the error level set.

CheckInvocations($invocations)

Checks whether the required fields in the data structure are set. Program either does nothing, just print errors or dies depending on the error level set.

CheckResultData($resultData)

Checks whether the required fields in the data structure are set. Program either does nothing, just print errors or dies depending on the error level set.

CheckRuleData($ruleData)

Checks whether the required fields in the data structure are set. Program either does nothing, just print errors or dies depending on the error level set.

Data Structures

The following are the data structures used in the callbacks listed above:

$optionsData

optionsData contains information that is supposed to be passed to the writer at the beginning for the purpose of configuring the writer.

Option explanations:

pretty - Whether the sarif file will be indented. (Default is FALSE).

error_level - What the writer will do if an error is detected. 0 means do nothing; 1 means to print the error out; 2 means to print the error and die immediately. Note that if a critical error occurs, the program will still die even if the error_level is set to 0 or 1. (Default is 2).

addArtifacts - Whether the run.artifacts object is added to the sarif file

preferOnlyArtifactIndex - If run.artifacts is present then result.locations.physicalLocation.artifactLocation will contain only artifactIndex (no uri & uriBaseId).

For the options addArtifacts and preferOnlyArtifactIndex, the default behavior is to have the artifacts object and have the complete artifactLocation object in all results. (i.e. addArtifacts is TRUE and preferOnlyArtifactIndex is FALSE).

addProvenance - Whether the result.provenance object is added to the sarif file (Default is TRUE).

artifactHashes - Whether the writer will attempt to compute or read the file hash for the artifacts.artifact object. (Default is TRUE).

sortKeys - Whether the writer will sort hash objects before printing them out. Useful for debugging purposes. (Default is FALSE).

addSnippets - Whether the writer will attempt to open the artifact, navigate to the target lines and read the lines in order to add the snippet property. (Default is TRUE).

extraSnippets - Whether the writer will attempt to get more lines that the specified Start/End line. Value must be a positive integer. (Default is 0).

external - An object specifying whether a property will be externalized in an external file. Each object has 2 properties - 'name' and 'maxItems'. 'name' specifies the name of the external file and must be named like so: "PROPERTYNAME.sarif-external-properties". 'maxItems' specifies the maximum number of objects of that property type that will be present in each file. Only provide 'maxItems' if the property is an array type. If 'maxItems' is specified, the external file(s) will be named like so: "PROPERTYNAME-1.sarif-external-properties", "PROPERTYNAME-2.sarif-external-properties" and so on. Multiple properties can be externalized in the same file. However, if an external property is an array type, the external property file that contains this external property cannot contain another external property.

{
  pretty                  => PRETTY_VALUE,
  error_level             => ERROR_LEVEL_VALUE (Default is 2),
  addArtifacts            => TRUE/FALSE,
  preferOnlyArtifactIndex => TRUE/FALSE,
  addProvenance           => TRUE/FALSE,
  artifactHashes          => TRUE/FALSE,
  sortKeys                => TRUE/FALSE,
  addSnippets             => TRUE/FALSE,
  extraSnippets           => EXTRA_SNIPPETS,
  external => {
    $PROPERTY_NAME => {
      name                => NAME_VALUE,
      maxItems            => MAX_ITEMS_VALUE,    # Only for properties that contain arrays
    },
  },
}

$initialData

initialData contains information regarding the assessment.

Property explanations:

buildDir - Specifies where the build directory is located at. Use this option if the assessment was performed on a different machine. Currently used only to add the 'snippet' property.

{
  build_root_dir     => PACKAGE_DIRECTORY,                        # REQUIRED
  package_root_dir   => DIRECTORY_CONTAINING_PACKAGE,             # REQUIRED
  results_root_dir   => DIRECTORY_CONTAINING_RESULTS,             # REQUIRED
  uuid               => UUIDVALUE,                                # REQUIRED
  tool_name          => TOOL_NAME,                                # REQUIRED
  tool_version       => TOOL_VERSION,                             # REQUIRED
  package_name       => PACKAGE_NAME,                             # REQUIRED
  package_version    => PACKAGE_VERSION,                          # REQUIRED
  buildDir           => BUILD_DIR_PATH
}

$toolData

toolData contains information regarding the tool, such as the driver and extensions

{
  driver => $toolComponent,
  extensions => [
    $toolComponent, $toolComponent...
  ]
}

$toolComponent

{
  name                           => DRIVER_NAME,              # REQUIRED
  fullName                       => DRIVER_FULLNAME,
  guid                           => GUID_VALUE,
  version                        => DRIVER_VERSION,           # REQUIRED
  semanticVersion                => SEMANTIC_VERSION,
  dottedQuadFileVersion          => DOTTED_QUAD_FILE_VERSION,
  releaseDateUtc                 => RELEASE_DATE_UTC,
  downloadUri                    => DOWNLOAD_URI,
  informationUri                 => INFORMATION_URI,
  organization                   => ORGANIZATION,
  product                        => PRODUCT
  productSuite                   => PRODUCT_SUITE
  shortDescription => {
    text                         => TEXT_VALUE,
    markdown                     => MARKDOWN_VALUE,
  },
  fullDescription => {
    text                         => TEXT_VALUE,
    markdown                     => MARKDOWN_VALUE,
  }
  language                       => LANGUAGE,
  globalMessageStrings => {
    $PROPERTY_NAME => {
      text                       => TEXT_VALUE,
      markdown                   => MARKDOWN_VALUE,
    },
  },
  rules => [
    $reportingDescriptor, $reportingDescriptor...
  ],
  notifications => [
    $reportingDescriptor, $reportingDescriptor...
  ]
  taxa => [
    $reportingDescriptor, $reportingDescriptor...
  ]
  supportedTaxanomies => {
    name                         => NAME_VALUE,
    index                        => INDEX_VALUE,
    guid                         => GUID_VALUE
  }
  translationMetadata => {
    name                         => NAME_VALUE,
    fullName                     => FULL_NAME
    shortDescription => {
      text                       => TEXT_VALUE,
      markdown                   => MARKDOWN_VALUE
    },
    fullDescription => {
      text                       => TEXT_VALUE,
      markdown                   => MARKDOWN_VALUE
    },
    dowloadUri                   => DOWNLOAD_URI,
    informationUri               => INFORMATION_URI
  }
  contents => [
    "STRING", "STRING"...
  ]
  isComprehensive                => TRUE/FALSE
  localizedDataSemanticVersion   => VERSION_NUM
  minimumRequiredLocalizedDataSemanticVersion => VERSION_NUM
  associatedComponent => {
    name                         => NAME_VALUE,
    index                        => INDEX_VALUE,
    guid                         => GUID_VALUE
  }
}

$reportingDescriptor

{
  id                         => ID_VALUE,
  deprecatedIds => [
    $ID_1, $ID_2...          => DEPRECATED_IDS,
  ],
  guid                       => GUID_VALUE,
  deprecatedGuids => [
    $GUID_1, $GUID_2...      => DEPRECATED_GUIDS,
  ]
  name                       => NAME,
  deprecatedNames => [
    $NAME_1, $NAME_2...      => DEPRECATED_NAMES,
  ],
  shortDescription => {
    text                     => TEXT_VALUE,
    markdown                 => MARKDOWN_VALUE,
  },
  fullDescription => {
    text                     => TEXT_VALUE,
    markdown                 => MARKDOWN_VALUE,
  },
  messageStrings => {
    $PROPERTY_NAME => {
      text                   => TEXT_VALUE,
      markdown               => MARKDOWN_VALUE,
    }, ...
  },
  helpUri                    => HELP_URI,
  help => {
    text                     => TEXT_VALUE,
    markdown                 => MARKDOWN_VALUE
  }
}

$baseIds

{
  BUILDROOT => {
    uri => URI_VALUE,                   # REQUIRED
    uriBaseId => URIBASEID_VALUE,
    description => $message_object
  }
  PACKAGEROOT => (same object as above),
  RESULTSROOT => (same object as above)
}

$displayBase

{
  uri                     => URI_VALUE
  uriBaseId               => URIBASEID_VALUE
}

$invocations

This hash contains the information related to the invocation(s) of the tool

{
  assessments => [
    {
      commandLine           => COMMAND_LINE_VALUE,
      startTime             => INVOCATION_START_TIME,
      endTime               => INVOCATION_END_TIME,
      workingDirectory      => WORKING_DIRECTORY,
      exitCode              => EXIT_CODE_VALUE,
      args => [
        'ARG1', 'ARG2', ...  # ARGUMENTS
      ],
      env => {               # ENVIRONMENT_VARIABLES
        'key1' => 'value1',
        'key2' => 'value2',
        ...
      },
      executionSuccessful   => TRUE/FALSE
    },
    ...
  ]
}

$resultData

Each resultData hash contains information for one result object. Fields marked as required must be present. If both BugGroup and BugCode are not present, the ruleId for the corresponding result object in sarif will be set to "__UNKNOWN__".

{
  BugGroup              => GROUP_VALUE,
  BugCode               => CODE_VALUE,
  BugRank               => RANK_VALUE,
  BugMessage            => BUG_MESSAGE_VALUE,           # REQUIRED
  BugLocations  => [
    {
      SourceFile        => SOURCE_FILE_NAME,            # REQUIRED
      StartLine         => START_LINE,
      EndLine           => END_LINE,
      StartColumn       => START_COLUMN,
      EndColumn         => END_COLUMN,
      primary           => PRIMARY_VALUE,
      Explanation       => EXPLANATION_VALUE
    },
    ...
  ],
  AssessmentReportFile  => ASSESSMENT_REPORT_FILE_NAME,
  BuildId               => BUILD_ID_VALUE,
  InstanceLocation => {
    Xpath               => XPATH_VALUE,
    LineNum => {
      Start             => START_VALUE,
      End               => END_VALUE
    }
  }
  BugSeverity           => SEVERITY_VALUE,
  CweIds => [
    CWEIDVALUE, CWEIDVALUE, ...
  ],
}

$ruleData

This hash contains information required to write the rules object in the run.resources property.

[
  {
    id                  => ID_VALUE,            # REQUIRED
    defaultLevel        => DEFAULT_LEVEL,
    defaultRank         => DEFAULT_RANK,
    shortDescription    => SHORT_DESCRIPTION,
    fullDescription     => FULL_DESCRIPTION,    # REQUIRED
  },
  ...
]

$endData

This hash contains information required to write out the final objects in the sarif file

{
  sha256hashes          => (SEE_BELOW),
  conversion            => (SEE_BELOW),
}

$sha256hashes

This hash contains the sha256 hashes for all files used in the assessment.

{
  /path/to/file1 => SHA256 VALUE FOR FILE1,
  /path/to/file2 => SHA256 VALUE FOR FILE2,
  ...
}

$conversion

This hash contains information required to write the conversion object in SARIF.

Note: The toolExecutionNotifications property is useful for adding information in cases where an assessment failed.

{
  tool => {
    driver => {
      name              => NAME_VALUE,
      version           => VERSION_VALUE
    }
  },
  commandLine           => COMMAND_LINE,
  args => [
    'ARG1', 'ARG2'...      # ARGUMENTS
  ],
  workingDirectory      => WORKING_DIRECTORY
  env => {               # ENVIRONMENT_VARIABLES
    'key1' => 'value1',
    'key2' => 'value2',
    ...
  }
  executionSuccessful   => TRUE/FALSE
  toolExecutionNotifications => [
    {
      level             => LEVEL_VALUE,
      message => {
        text            => MESSAGE_TEXT
      }
    },
    ...
  ]
  startTime             => PROGRAM_START_TIME
}