Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace analyze with run-queries and interpret-results #548

Merged
merged 1 commit into from
Jun 8, 2021

Conversation

edoardopirovano
Copy link
Contributor

Currently, we call database analyze on the database several times, resulting in a number of SARIF files being created that we have to merge. This PR changes this to instead call database run-queries and then perform a call to database interpret-results to create a single SARIF file (per language).

Merge / deployment checklist

  • Confirm this change is backwards compatible with existing workflows.
  • Confirm the readme has been updated if necessary.
  • Confirm the changelog has been updated if necessary.

);
analysisSummaryBuiltIn = stdout;
await injectLinesOfCode(sarifFile, language, locPromise);

statusReport[`analyze_builtin_queries_${language}_duration_ms`] =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note this means the status report no longer includes result interpretation time. I think that's ok.
@robertbrignull should/could we have a separate telemetry category for interpretation, now that is a separate step?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added it in since it seems potentially useful to have.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this will break the telemetry consumer. Check with Robert and team.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I missed this mention of me. What's done here should essentially be fine since the status report endpoint now ignores unknown fields, so it's ok to update the codeql-action first.

One thing though, could you please update the type definition at

export interface QueriesStatusReport {

so that it includes the new fields. As you can see in this case that type is really only there for documentation purposes, but if it's there it would be nice if it's correct.

src/codeql.ts Show resolved Hide resolved
...getExtraOptionsFromEnv(["database", "interpret-results"]),
];
if (automationDetailsId !== undefined) {
args.push("--sarif-category", automationDetailsId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aeisenberg does the automationDetailsId passed in here ever end with a run-id component that we want to remove before treating it as the category?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A / is always appended to the --sarif-category if it doesn't already end in one. It's handled in exactly the same way for database analyze as it is for interpret results. So, I think it's fine using the same logic for both.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried about the opposite direction: is the Action passing an automationDetailsId of the form category/id where we need to strip off the id before passing it to the CLI? If not, there's nothing to see here.

Copy link
Contributor Author

@edoardopirovano edoardopirovano Jun 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely clear on what automationDetailsId is actually doing, but it is coming from user input:

actionsUtil.getOptionalInput("category"),
so I'm not sure we have any guarantees about what's contained in it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried about the opposite direction: is the Action passing an automationDetailsId of the form category/id where we need to strip off the id before passing it to the CLI? If not, there's nothing to see here.

There's nothing we need to do. The category is passed to the call to analyze if it is defined. And that category will have a / appended to it if one isn't on it already.

If no category is passed to analyze, then in the upload-results phase, the action will inject a calculated category. This category will always have a / on it. See:

const automationID = getAutomationID(category, analysis_key, environment);


// Print the LoC baseline and the summary results from database analyze for the standard
// query suite and (if appropriate) each custom query suite.
logger.startGroup(`Analysis summary for ${language}`);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you have lost this log group and summary output. You still need to capture the stdout from interpret-results and log it here. The difference from the earlier code is that we only need to log one summary, rather than having separate summaries for builtin and custom queries.

printLinesOfCodeSummary only includes the baseline produced by the Action, not the summary produced by the CLI.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it still needed? It looks like the stdout from the interpret-results call is showing up just fine in the Actions logs: /~https://github.com/github/codeql-action/runs/2748181217?check_suite_focus=true#step:5:2435

My understanding of the previous code was that it was there because otherwise the summary might have been further back in the logs (in particular if any custom queries were run) and we wanted to duplicate it at the end for convenience - since now interpret-results is always the last thing we do I think it isn't needed any more.

@aeisenberg Am I right in my understanding or is the logger output also used for something other than the Actions logs that needs to see that summary?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm...I wasn't aware of the duplication, but that makes sense. Though, now there are a lot of lines that are ungrouped (not exactly sure how it was before). Could you enclose the call to interpret-results in a logger group?

In fact, it would be really nice if we could put run queries in a group as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I think about it, though, if you put interpret-results in a group, then the summary table would be hidden, and I think that should be prominently displayed.

What do you think of both grouping interpret-results and duplicating the table outside of the group? I think that would make the logs more readable.

Copy link
Contributor Author

@edoardopirovano edoardopirovano Jun 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a good idea! Just pushed a commit with that, let's see what the logs in the PR checks look like but I agree it should be neater 🙂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that @AlonaHlobina has blog posts going out that include screenshots of the existing log groupings, so I was trying not to change it. This does make the summary more visible though, which is arguably even better than pointing users to the right group.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is ok. I can include a new screenshot of it. The changelog is not published yet. Where can I see the new logs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got another PR open that will further change the logs slightly, so if you need a screenshot now I would look at the run logs from that PR's checks, e.g. /~https://github.com/github/codeql-action/runs/2782645280?check_suite_focus=true#step:7:309

When that's merged to main (probably later today) the CodeQL Analysis workflows on semmle-code will also start producing the latest logs here: /~https://github.com/github/semmle-code/actions/workflows/codeql-analysis.yml

src/codeql.ts Outdated
Comment on lines 797 to 805
let output = "";
await new toolrunner.ToolRunner(cmd, args, {
listeners: {
stdout: (data: Buffer) => {
output += data.toString("utf8");
},
},
}).exec();
return output;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not using stdout from run-queries, so no need to capture or return it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, done.

searchPath: string | undefined
) => {
searchPathsUsed.push(searchPath!);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you change searchPathsUsed to have type (string | undefined}[] I think you could avoid using the ! type assertion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done.

src/analyze.ts Outdated
@@ -253,13 +225,29 @@ export async function runQueries(

return statusReport;

async function runInterpretResults(
language: Language,
querySuites: string[],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: these are paths, right? Could you change the name to reflect that?

Suggested change
querySuites: string[],
querySuitePaths: string[],

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, changed here and elsewhere.

...getExtraOptionsFromEnv(["database", "interpret-results"]),
];
if (automationDetailsId !== undefined) {
args.push("--sarif-category", automationDetailsId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A / is always appended to the --sarif-category if it doesn't already end in one. It's handled in exactly the same way for database analyze as it is for interpret results. So, I think it's fine using the same logic for both.

@@ -36,10 +36,17 @@ test("status report fields and search path setting", async (t) => {

for (const language of Object.values(Language)) {
setCodeQL({
databaseAnalyze: async (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we completely remove databaseAnalyze now? The only place it was used before has been removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, removed.

@edoardopirovano edoardopirovano force-pushed the use-interpret-results branch 2 times, most recently from 45ef1cb to aac2736 Compare June 7, 2021 07:13
Copy link
Contributor

@aeisenberg aeisenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! I think it would be slightly better to flip the output (as in the suggestion). This is how it was before.

Also, I think the the baseline comment is a bit awkward. We should probably change the text, but that could happen in another PR.

src/analyze.ts Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants