-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specify optional property file.sourceLanguage to guide syntax-driven colorization of snippets #286
Comments
Per offline discussion, there's a question around whether GFM allows for colorization of snippets. Note that the markdown embedded in this note uses HTML to provide colorization. This displays in VS code but not here.
Here's a CSS snippet using a markdown language hint. Note that it is colorized. #button { border: none; } |
Two thoughts:
But if you allow a "formatted snippet" containing Markdown, it will be difficult for the consumer to locate the actual character position within the marked up source code where the problem occurred:
But!
I suggest this feature might not be worth the trouble. |
I independently reached some of your points but not your final conclusion. :) First, great point, plaintext snippets are inherently valuable. I'd independently reached the same conclusion as you: what we need here is simply a language designation. I do think this feature is worth the trouble, mostly due to observing our web development team struggle with the problem of how to handle code snippets in our browser-based results explorer. For this reason, I'm willing to invest time in discussion. :) We should favor the approach that @fishoak suggested in TC, an open-ended enum populated with some preliminary values. When defining new language values, we can suggest that producers stick with certain conventions:
With the guidance above, if a tool producer happens to decide to author a static analysis capability for the 'Racket' or 'R++' languages, the conventions we provide allow for predictable enum values of 'racket' and 'rplusplus' respectively. Here's a list of languages/formats that are extremely current in 'most utilized' data. Note that not all of these have significant static analysis tooling eco-systems. I've explicitly avoided adding languages that are not currently highly utilized for which a language value can easily be predicted ('fortran', 'basic', 'lisp', 'vbscript', etc.) javascript |
I suggest "spelling out" the language names. That will allow SDKs that wish to do so to use them as identifiers (for example, enum values). I'm surprised not to see "html" on the language list. There's lots of static analysis around it. We should add it. ( With those nits out of the way, here what I think we agree on:
That's enough for me to write the change draft. |
|
btw - can't resist, isn't it true that the plain text snippet example you provided could be constructed entirely from other SARIF constructs, suggesting that you shouldn't be doing this? for the same reasons that you shouldn't be injecting line numbers? 'divide by zero' here would presumably derive from the rule id. the spces and ^ location would derive from region data. also, note that a viewer might usefully provide syntax coloring for this snippet, while still finding value in using the ^ notation to call out the specific location. so, with our new proposal what you'd do is specify the language, provide only the snippet of source, and then reconstruct the ^ + other formatting at runtime. basically, all snippets are just that, snippets of a file. don't inject anything else into them. what this means is that if a tool itself has some formatted notion of a result, SARIF doesn't provide a place to flow this formatted content along.
|
Sorry, I didn't explain my example clearly enough. In my example, this is the snippet:
Because it's plain text, and because the consumer has the region location, snippet location, and result message properties available, the consumer can easily render the snippet like this:
|
Yes, completely agree, sorry for my confusion. |
|
we also support the following languages: ABAP Also, would you consider something like JSP a separate "language" for the purposes of snippet colorization? |
@katrinaoneil Yes. Similarly, the ASP.NET MVC "Razor" syntax. |
Attached is a spreadsheet of some research I did on programming languages and identifiers in use for them. I took 12 sources of programming languages lists, including 3 that rank the top 50 or 100, editors, javascript syntax highlighters, lines of code counters, and giant lists of languages. The spreadsheet is sorted by the 3 ranking sites and then the number of references to the language in other sites. It also includes the id's that the references use for the language (if present) and the display name. I think that we would use commonly used id in SARIF to avoid an impedence mismatch with existing usage. the id, langName, and v_globs columns might be worth including in the appendix. The list is quite long, but can be reasonably cut off when there are only two non-ranking site listings. The column descriptions are below:
|
Here is new revision of the .xlsx and the .csv file (named .csv.txt to make GitHub happy). The .xlsx file is now correctly displays the couple of non-ascii characters (Excel's default is not uft-8). It also contains minor corrections. The column descriptions are below:
|
Our snippets are file contents. Some tools produce formatted snippets. For example, they create a gutter in the left margin that shows line numbers. They provide syntax coloring for the snippet.
SARIF has no way for a tools producer to create these formatted snippets. Instead, the consumer is required to examine the textual snippet and do something sensible with it. We have done this internally in our pipeline with some success.
Currently, we are looking at converting a tool's output that produces a formatted snippet. It occurred to me that we could define another property on fileContent that would be designed to store a formatted version of its contents. I suppose this would be called fileContent.richText in order to follow other places in the format where we allow for markdown-rendered text.
The text was updated successfully, but these errors were encountered: