Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C++ header files are not highlighted correctly #932

Closed
eliaskosunen opened this issue Jun 12, 2018 · 6 comments · Fixed by #1269
Closed

C++ header files are not highlighted correctly #932

eliaskosunen opened this issue Jun 12, 2018 · 6 comments · Fixed by #1269

Comments

@eliaskosunen
Copy link

This issue originally occurred on GitLab, which uses rouge for its syntax highlighting.

The problem is, that C++ header files with the file extension '.h' are highlighted as C files, even though that extension is very commonly used for C++ headers as well.

These following two snippets are from GitLab and highlighted by rouge.

Snippet 1, example.h: https://gitlab.com/snippets/1723020
Highlighted as C code. Evident by the lack of highlighting of keywords such as namespace, template, using and class.

Snippet 2, example.hpp: https://gitlab.com/snippets/1723021
Highlighted correctly as C++ code.

In comparison, an identical GitHub gist with name example.h: https://gist.github.com/eliaskosunen/4f508cdfb59aad0599c964ffa7ce9a87 is highlighted correctly as C++ code.

Possible solutions:

  • Because for the most part, C++ is a superset of C, so header files with extension .h should probably default to C++ rather than C. The only keyword I can think of that exists in C but not in C++ is restrict, but there are way more examples the other way around
  • Use some kind of heuristic to determine the language of header files
@aldanor
Copy link
Contributor

aldanor commented Jun 16, 2018

Defaulting to C++ for .h sounds like a good idea.

@robin850
Copy link
Contributor

robin850 commented Jul 8, 2018

Hi there,

It looks pretty safe to switch from C to C++ as the lexer for .h files. The following script has been run against several projects :

require 'rouge'
require 'minitest/autorun'

class HighlightTest < Minitest::Test
  Dir.glob('postgres/**/*.h').each_with_index do |file, index|
    define_method(:"test_#{file}") do
      file = File.read(file).freeze
      skip if file.include?("__cplusplus")

      with_c   = highlight(file, Rouge::Lexers::C)
      with_cpp = highlight(file, Rouge::Lexers::Cpp)

      assert_equal with_c, with_cpp
    end
  end

  private
    def highlight(file, lexer)
      Rouge::Formatters::Null.new.format(lexer.lex(file))
    end
end

Here are the results :

  • With Redcarpet : 100% OK.

  • With Git : 6/198 failures. This is due to :

    • Struct fields named with C++ reserved keywords (e.g. final, explicit)
    • 0.77 highlighted as Literal.Number.Float in C++ but with a combination
      of Literal.Number.Integer and Punctuation in C.
  • With PostgreSQL : as far as I can tell, all the failures are due to the fact that Postgres uses .h for C++ headers as well so keywords are highlighted as Name rather than Keyword. Everything is green with the following patch applied to the aforementioned script :

    class Rouge::Lexers::C
      class << self
        alias :original_keywords :keywords
    
        def keywords
          cpp_keywords = [
            "bool", "using", "new", "typeid", "public",
            "final", "override", "namespace"
          ]
    
          original_keywords.merge(cpp_keywords)
        end
      end
    end
  • With Ruby : 100% OK apart from the aforementionned problem with float literals.

@stale
Copy link

stale bot commented Jul 8, 2019

This issue has been automatically marked as stale because it has not had any activity for more than a year. It will be closed if no additional activity occurs within the next 14 days.

If you would like this issue to remain open, please reply and let us know if the issue is still reproducible.

@stale stale bot added the stale-issue There has been no activity for a year. label Jul 8, 2019
@eliaskosunen
Copy link
Author

Issue still reproducible on GitLab

@stale stale bot removed the stale-issue There has been no activity for a year. label Jul 21, 2019
@pyrmont
Copy link
Contributor

pyrmont commented Jul 21, 2019

Thanks @robin850—I'm sorry to be attending to this so late but that's been very helpful.

Based on those tests, it does look like switching to C++ lexer for *.h globs would be the simplest solution. It also looks like a few of those mismatches (eg. floating point literals) are really errors in the lexer that should be corrected.

I'll try to get a PR together sometime this week to address this.

@pyrmont
Copy link
Contributor

pyrmont commented Jul 22, 2019

@eliaskosunen I had a bit more of a think about the problem and it didn't feel right to treat all files as C++ when the header files may not be C++.

My alternative solution is to provide a disambiguation that detects files as C++ based on the presence of certain keywords. This is the approach that Rouge already takes with Objective-C (which has a similar problem since its header files also use the .h extension).

I've submitted the fix as PR #1269. If you (or @robin850) has any comments on it, please let me know in that thread. I'm not very familiar with C++ and so the words I'm using to test for may not be complete (the test is only if the keyword is the first word on a line so as to avoid situations like @robin850 encountered when highlighting the Git codebase).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants