Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to configure parsers in packwerk #243

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 23 additions & 1 deletion USAGE.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,28 @@ Packwerk reads from the `packwerk.yml` configuration file in the root directory.
| cache | false | when true, caches the results of parsing files |
| cache_directory | tmp/cache/packwerk | the directory that will hold the packwerk cache |

### Using custom parsers

You can specify a custom parser to parse different file formats (e.g. slim or haml)

```ruby
class SlimParser
include ParserInterface
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dnamsons In some of the other merged PRs, we ended up just going with the plain Checker and Validator as names for the interfaces to be included to extend packwerk. Can we just call this Parser, i.e. include Packwerk::Parser

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that makes sense 👍
the Ruby parser also defines the Parser namespace, to avoid confusion, I added Packerk:: to every usage of this interface.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should rename to something like FileParser?


REGEX = /\.slim\Z/

def call
# Your parsing logic here
end

def match?(path)
REGEX.match?(path)
end
end

Packwerk::Parsers::Factory.instance.parsers.append(SlimParser)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of asking the user to explicitly specify these parsers, can we have them included automatically?

Here's what I had in mind:

# typed: strict
# frozen_string_literal: true

module Packwerk
  module Parsers
    module ParserInterface
      extend T::Helpers
      extend T::Sig

      requires_ancestor { Kernel }

      interface!

      sig { params(base: Class).void }
      def self.included(base)
        @parsers ||= T.let(@parsers, T.nilable(T::Array[Class]))
        @parsers ||= []
        @parsers << base
      end

      sig { returns(T::Array[ParserInterface]) }
      def self.all
        T.unsafe(@parsers).map(&:new)
      end

      sig { abstract.params(io: File, file_path: String).returns(T.untyped) }
      def call(io:, file_path:)
      end
    end
  end
end

Then, the user just needs to define this in a file that is "required" within packwerk.yml via this new API in this upcoming PR: #245

Elsewhere, we can just check Packwerk::Parsers::ParserInterface.all to get the list of all of the parsers.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is certainly a better approach than what I had created, as it makes adding new parsers easier 👍

The only concern i have is that the method Packwerk::Parsers::ParserInterface.all seems like it's not cohesive to the idea of a parser interface, to me it would seem more appropriate if we were to call Packwerk::Parsers.all and be able to get all of the registered parsers. What do you think about something like this?

# lib/packwerk/parsers/parser_interface.rb
module Packwerk
  module Parsers
    module ParserInterface
      extend T::Helpers
      extend T::Sig

      requires_ancestor { Kernel }

      interface!

      sig { params(base: Class).void }
      def self.included(base)
        Packwerk::Parsers.register(base)
      end

      sig { abstract.params(io: File, file_path: String).returns(T.untyped) }
      def call(io:, file_path:)
      end

      sig { abstract.params(path: String).returns(T.nilable(ParserInterface)) }
      def match?(path)
      end
    end
  end
end
# lib/packwerk/parsers.rb
module Packwerk
  module Parsers
    autoload :Erb, "packwerk/parsers/erb"
    autoload :Factory, "packwerk/parsers/factory"
    autoload :ParserInterface, "packwerk/parsers/parser_interface"
    autoload :Ruby, "packwerk/parsers/ruby"

    sig { params(parser: ParserInterface).void }
    def self.register(parser)
      @parsers ||= T.let([], T.nilable(T::Array[ParserInterface]))
      
      @parsers << parser
    end

    sig { returns(T::Array[ParserInterface]) }
    def self.all
      T.unsafe(@parsers).map(&:new)
    end
  end
end

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't imagine the ability to get all parsers to be a public API. I do think that's a cleaner internal API for packwerk though! However I'm not sure we'll want to do this for some of the other things we are able to or will be able to extend (offenses formatters, checkers) since there isn't an existing module for them (i.e. no standalone Formatters or Checkers modules), and I'm not sure if we'll want to add one just to hold the registered interfaces.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intention wasn't to make the listing of parsers as a public API, just to make it a bit cleaner for internal use, but I can see that adding modules for Formatters and Checkers just to hold extensions might be a bit excessive.

I'll change my Pull request to use the solution that You outlined, so we don't have differing APIs for how Parsers get stored vs. Formatters & Checkers.

```

### Using a custom ERB parser

You can specify a custom ERB parser if needed. For example, if you're using `<%graphql>` tags from /~https://github.com/github/graphql-client in your ERBs, you can use a custom parser subclass to comment them out so that Packwerk can parse the rest of the file:
Expand All @@ -99,7 +121,7 @@ class CustomParser < Packwerk::Parsers::Erb
end
end

Packwerk::Parsers::Factory.instance.erb_parser_class = CustomParser
Packwerk::Parsers::Factory.instance.parsers = [Packwerk::Parsers::Ruby, CustomParser]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexevanczuk After these changes, I don't see an easy way to override the erb parser.

IMO the best approach would be to move the erb parser to a seperate gem, but if we want to keep it in packwerk and have a way of configuring it, then I will add a configuration for this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you share more about why you want to override the ERB parser? I know we want to add new parsers, but why override the other one?

Separately, I think if we want to extract ERB, /~https://github.com/rubyatscale/packwerk-extensions would be a great home for it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexevanczuk I don't have a particular use case, but the API for overriding the erb parser was there before my changes(/~https://github.com/Shopify/packwerk/blob/main/USAGE.md#using-a-custom-erb-parser) so I was thinking about how to keep feature parity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh interesting... I see a couple of options:

  1. We could keep the existing API exactly as is and implement it by having an override when getting the parser to use (the override would check if the parser to fetch is ERB and use the overridden value if provided). The advantages here are we don't have to think about changing the API at all.
  2. We could eliminate this API entirely and when someone wants to custom parse an ERB, they could just pass in their own custom ERB parser that parsers out things the default parser cannot. Advantage here is that it's one less "extension" API to support.

What do you think @dnamsons and @gmcgibbon?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dnamsons I took another look at this. What if we changed this line:
/~https://github.com/Shopify/packwerk/pull/243/files#diff-79f5d7b9ceb194cd2ea7989b1a3839f0c8f8d14cdcd2ff7dfe02299498a82648R14

To select instead of find the parsers, which will allow multiple parsers to parse the same file? That way, someone can "extend" how any type of file is being parsed (ruby, erb, etc.)?

Then we can remove this API and its behavior can be ported to an ERB parser extension.

```

## Using the cache
Expand Down
1 change: 1 addition & 0 deletions lib/packwerk.rb
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ module Packwerk
autoload :Package
autoload :PackageSet
autoload :ParsedConstantDefinitions
autoload :Parser
autoload :Parsers
autoload :ParseRun
autoload :UnresolvedReference
Expand Down
6 changes: 3 additions & 3 deletions lib/packwerk/file_processor.rb
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ def call(absolute_file)
private

sig do
params(node: Parser::AST::Node, absolute_file: String).returns(T::Array[UnresolvedReference])
params(node: ::Parser::AST::Node, absolute_file: String).returns(T::Array[UnresolvedReference])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing the prefixed :: definitely makes me feel inclined to prefer Packwerk::FileParser over Parser so that we don't fight with the parser library. I suppose it's a "temporary" problem if we eventually swap out parser for syntax_tree or something else, but can't go wrong with FileParser I think.

end
def references_from_ast(node, absolute_file)
references = []
Expand All @@ -68,14 +68,14 @@ def references_from_ast(node, absolute_file)
references
end

sig { params(absolute_file: String, parser: Parsers::ParserInterface).returns(T.untyped) }
sig { params(absolute_file: String, parser: Packwerk::Parser).returns(T.untyped) }
def parse_into_ast(absolute_file, parser)
File.open(absolute_file, "r", nil, external_encoding: Encoding::UTF_8) do |file|
parser.call(io: file, file_path: absolute_file)
end
end

sig { params(file_path: String).returns(T.nilable(Parsers::ParserInterface)) }
sig { params(file_path: String).returns(T.nilable(Packwerk::Parser)) }
def parser_for(file_path)
@parser_factory.for_path(file_path)
end
Expand Down
2 changes: 1 addition & 1 deletion lib/packwerk/node.rb
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ def constant_name(constant_node)
def each_child(node)
if block_given?
node.children.each do |child|
yield child if child.is_a?(Parser::AST::Node)
yield child if child.is_a?(::Parser::AST::Node)
end
else
enum_for(:each_child, node)
Expand Down
4 changes: 2 additions & 2 deletions lib/packwerk/node_processor.rb
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,8 @@ def initialize(reference_extractor:, absolute_file:)

sig do
params(
node: Parser::AST::Node,
ancestors: T::Array[Parser::AST::Node]
node: ::Parser::AST::Node,
ancestors: T::Array[::Parser::AST::Node]
).returns(T.nilable(UnresolvedReference))
end
def call(node, ancestors)
Expand Down
33 changes: 33 additions & 0 deletions lib/packwerk/parser.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# typed: strict
# frozen_string_literal: true

module Packwerk
module Parser
extend T::Helpers
extend T::Sig

requires_ancestor { Kernel }

interface!

@parsers = T.let([], T::Array[Class])

sig { params(base: Class).void }
def self.included(base)
@parsers << base
end

sig { returns(T::Array[Parser]) }
def self.all
T.unsafe(@parsers).map(&:new)
end

sig { abstract.params(io: File, file_path: String).returns(T.untyped) }
def call(io:, file_path:)
end

sig { abstract.params(path: String).returns(T::Boolean) }
def match?(path:)
end
end
end
13 changes: 10 additions & 3 deletions lib/packwerk/parsers/erb.rb
Original file line number Diff line number Diff line change
Expand Up @@ -9,15 +9,18 @@
module Packwerk
module Parsers
class Erb
include ParserInterface
include Packwerk::Parser

ERB_REGEX = /\.erb\Z/
private_constant :ERB_REGEX

def initialize(parser_class: BetterHtml::Parser, ruby_parser: Ruby.new)
@parser_class = parser_class
@ruby_parser = ruby_parser
end

def call(io:, file_path: "<unknown>")
buffer = Parser::Source::Buffer.new(file_path)
buffer = ::Parser::Source::Buffer.new(file_path)
buffer.source = io.read
parse_buffer(buffer, file_path: file_path)
end
Expand All @@ -28,11 +31,15 @@ def parse_buffer(buffer, file_path:)
rescue EncodingError => e
result = ParseResult.new(file: file_path, message: e.message)
raise Parsers::ParseError, result
rescue Parser::SyntaxError => e
rescue ::Parser::SyntaxError => e
result = ParseResult.new(file: file_path, message: "Syntax error: #{e}")
raise Parsers::ParseError, result
end

def match?(path:)
ERB_REGEX.match?(path)
end

private

def to_ruby_ast(erb_ast, file_path)
Expand Down
29 changes: 2 additions & 27 deletions lib/packwerk/parsers/factory.rb
Original file line number Diff line number Diff line change
Expand Up @@ -9,34 +9,9 @@ class Factory
extend T::Sig
include Singleton

RUBY_REGEX = %r{
# Although not important for regex, these are ordered from most likely to match to least likely.
\.(rb|rake|builder|gemspec|ru)\Z
|
(Gemfile|Rakefile)\Z
}x
private_constant :RUBY_REGEX

ERB_REGEX = /\.erb\Z/
private_constant :ERB_REGEX

sig { params(path: String).returns(T.nilable(ParserInterface)) }
sig { params(path: String).returns(T.nilable(Packwerk::Parser)) }
def for_path(path)
case path
when RUBY_REGEX
@ruby_parser ||= Ruby.new
when ERB_REGEX
@erb_parser ||= erb_parser_class.new
end
end

def erb_parser_class
@erb_parser_class ||= Erb
end

def erb_parser_class=(klass)
@erb_parser_class = klass
@erb_parser = nil
Packwerk::Parser.all.find { |parser| parser.match?(path: path) }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if someday we'll want to allow multiple parsers to parse the same file if they do different things (e.g. a ruby file parser that takes a special DSL and turns it into a constant reference, although, like relations, I think that is currently done in the references_from_ast step).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we probably don't need to change this now since the current behavior only allows one parser for a file!

end
end
end
Expand Down
17 changes: 0 additions & 17 deletions lib/packwerk/parsers/parser_interface.rb

This file was deleted.

22 changes: 17 additions & 5 deletions lib/packwerk/parsers/ruby.rb
Original file line number Diff line number Diff line change
Expand Up @@ -7,16 +7,24 @@
module Packwerk
module Parsers
class Ruby
include ParserInterface
include Packwerk::Parser

class RaiseExceptionsParser < Parser::CurrentRuby
RUBY_REGEX = %r{
# Although not important for regex, these are ordered from most likely to match to least likely.
\.(rb|rake|builder|gemspec|ru)\Z
|
(Gemfile|Rakefile)\Z
}x
private_constant :RUBY_REGEX

class RaiseExceptionsParser < ::Parser::CurrentRuby
def initialize(builder)
super(builder)
super.diagnostics.all_errors_are_fatal = true
end
end

class TolerateInvalidUtf8Builder < Parser::Builders::Default
class TolerateInvalidUtf8Builder < ::Parser::Builders::Default
def string_value(token)
value(token)
end
Expand All @@ -28,17 +36,21 @@ def initialize(parser_class: RaiseExceptionsParser)
end

def call(io:, file_path: "<unknown>")
buffer = Parser::Source::Buffer.new(file_path)
buffer = ::Parser::Source::Buffer.new(file_path)
buffer.source = io.read
parser = @parser_class.new(@builder)
parser.parse(buffer)
rescue EncodingError => e
result = ParseResult.new(file: file_path, message: e.message)
raise Parsers::ParseError, result
rescue Parser::SyntaxError => e
rescue ::Parser::SyntaxError => e
result = ParseResult.new(file: file_path, message: "Syntax error: #{e}")
raise Parsers::ParseError, result
end

def match?(path:)
RUBY_REGEX.match?(path)
end
end
end
end
8 changes: 4 additions & 4 deletions lib/packwerk/reference_extractor.rb
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,8 @@ def initialize(

sig do
params(
node: Parser::AST::Node,
ancestors: T::Array[Parser::AST::Node],
node: ::Parser::AST::Node,
ancestors: T::Array[::Parser::AST::Node],
absolute_file: String
).returns(T.nilable(UnresolvedReference))
end
Expand Down Expand Up @@ -99,8 +99,8 @@ def self.get_fully_qualified_references_and_offenses_from(unresolved_references_
sig do
params(
constant_name: String,
node: Parser::AST::Node,
ancestors: T::Array[Parser::AST::Node],
node: ::Parser::AST::Node,
ancestors: T::Array[::Parser::AST::Node],
absolute_file: String
).returns(T.nilable(UnresolvedReference))
end
Expand Down
2 changes: 1 addition & 1 deletion test/unit/parsers/erb_test.rb
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ class ErbTest < Minitest::Test

test "#call writes parse error to stdout" do
error_message = "stub error"
err = Parser::SyntaxError.new(stub(message: error_message))
err = ::Parser::SyntaxError.new(stub(message: error_message))
parser = stub
parser.stubs(:ast).raises(err)

Expand Down
21 changes: 10 additions & 11 deletions test/unit/parsers/factory_test.rb
Original file line number Diff line number Diff line change
Expand Up @@ -23,14 +23,20 @@ class FactoryTest < Minitest::Test
assert_instance_of(Parsers::Erb, factory.for_path("foo.html.erb"))
assert_instance_of(Parsers::Erb, factory.for_path("foo.md.erb"))
assert_instance_of(Parsers::Erb, factory.for_path("/sub/directory/foo.erb"))
end

test "#for_path gives custom parser for matching paths" do
fake_class = Class.new do
T.unsafe(self).include(ParserInterface)
end
T.unsafe(self).include(Packwerk::Parser)

with_erb_parser_class(fake_class) do
assert_instance_of(fake_class, factory.for_path("foo.html.erb"))
def match?(path:)
/\.slim\Z/.match?(path)
end
end

assert_instance_of(fake_class, factory.for_path("foo.html.slim"))
assert_instance_of(fake_class, factory.for_path("foo.md.slim"))
assert_instance_of(fake_class, factory.for_path("/sub/directory/foo.slim"))
end

test "#for_path gives nil for unknown path" do
Expand All @@ -41,13 +47,6 @@ class FactoryTest < Minitest::Test

private

def with_erb_parser_class(klass)
factory.erb_parser_class = klass
yield
ensure
factory.erb_parser_class = nil
end

def factory
Parsers::Factory.instance
end
Expand Down
2 changes: 1 addition & 1 deletion test/unit/parsers/ruby_test.rb
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ class RubyTest < Minitest::Test

test "#call writes parse error to stdout" do
error_message = "stub error"
err = Parser::SyntaxError.new(stub(message: error_message))
err = ::Parser::SyntaxError.new(stub(message: error_message))
parser = stub
parser.stubs(:parse).raises(err)

Expand Down