Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ensure exclusions win over inclusions for scanning #172

Conversation

rmannibucau
Copy link
Contributor

If any inclusion is "**anything" then DirectoryScanner will browse excluded folders. It completely defeats most of exclusions and typically let the plugin go through target and node_modules which can be very time consuming.
With this patch, my license check (only) went from 25s to 2.5s.

@McFoggy
Copy link
Contributor

McFoggy commented Jun 9, 2020

@rmannibucau shouldn't this be suggested as default implementation of DirectoryScanner in maven-shared-utils

@rmannibucau
Copy link
Contributor Author

@McFoggy I don't know, can clearly discuss it but I fear some plugins will use that as a feature (binary scanners out of my head which scan target more than src) so it will likely end up with a flag which is more or less the exact same impact for this plugin (+ it is way faster to include and release like that).
Will start the discussion on maven side in between but if it is possible to get this fix soon I would be very welcomed.

Copy link
Owner

@mathieucarbou mathieucarbou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rmannibucau : I completely agree with the change, and it would be nice to see if we can have it for 4.0.rc1, but this PR cannot be merged as-is because:

  • there is no unit test
  • there is a behavioral change compared to before, so to activate this feature, we would need another boolean flag in the plug-in such as exclusionPriority set to false by default
  • the README should be updated to warn users about this new feature
  • we need to keep consistency with how scanning works across maven plugins. Have you check what other maven modules are doing ? So they prioritize exclusion too ? How to they wire the directory scanner ?
    Thanks 😊

@McFoggy
Copy link
Contributor

McFoggy commented Jun 9, 2020

@mathieucarbou the PR from Romain should not change the behavior, ie the final result, unless DirectoryScanner itself is wrong or broken. It will "just" allow to exclude some files faster than before.

From DirectoryScanner javadoc

Only files/directories which match at least one pattern of the include pattern list or other file selector, and don't match any pattern of the exclude pattern list or fail to match against a required selector will be placed in the list of files/directories found.

@rmannibucau I was not saying that we should not integrate it here as a workaround of the bad performance behavior the standard DirectoryScanner.

Let's be pragmatic by opening an issue/PR at maven-shared-utils and integrate this PR as a performance workaround until the issue is resolved in the right place.

@rmannibucau
Copy link
Contributor Author

@mathieucarbou @McFoggy updated the PR, migrated to maven-shared-utils (plugin was using plexus) and added a flag to be able to implement this behavior, MSU scanner has an API for that but it is not the default (guess for compatibility too).

Side note: I'm not sure the meaning of the defaults of the plugin without this flag on though since it will visit the ignored folders but ignore the found files but while we can deactivate this very slow behavior I'm happy.

@mathieucarbou
Copy link
Owner

@McFoggy : how sure you are ? We are safe in your opinion to have that as a default behavior and safely remove the boolean flag ?

@rmannibucau : thank you for the PR update. Could you plese rebase with the latest changes ? There has been a big PR merged.

@mathieucarbou mathieucarbou added this to the 4.0.rc1 milestone Jun 9, 2020
@rmannibucau rmannibucau force-pushed the rmannibucau/enforce-exclusions branch 2 times, most recently from 65877c4 to 33a3046 Compare June 9, 2020 11:18
@rmannibucau
Copy link
Contributor Author

@mathieucarbou rebased (but it broke config no?).

BTW the change is safe cause today excluded files end up in getExcludedFiles and this is not used by the plugin so this collection is useless in this context. In other words, getExcludedFiles will no more be populated with my change but since it is not queried anyway it does not change anything functionally for the plugin.

@mathieucarbou
Copy link
Owner

mathieucarbou commented Jun 9, 2020

@mathieucarbou rebased (but it broke config no?).

BTW the change is safe cause today excluded files end up in getExcludedFiles and this is not used by the plugin so this collection is useless in this context. In other words, getExcludedFiles will no more be populated with my change but since it is not queried anyway it does not change anything functionally for the plugin.

Ok i see. Understand more. I thought your changes could potentially change the outcome of the scanner. But it only makes it faster. I was confused.

So yes, let's make that the default behavior and remove this Boolean flag that I proposed to add since the current behavior (current outcome of the scanner) is preserved. This is just an optimization that you are adding and I thought that the scanner outcome could potentially change.
For me, the behavior is preserved as long as the scanner output is identical :-) Now, how the scan is done, of course I agree we can optimize that - no worry :-)

@rmannibucau
Copy link
Contributor Author

done

mathieucarbou
mathieucarbou previously approved these changes Jun 9, 2020
@mathieucarbou
Copy link
Owner

@rmannibucau : build is failing:

Failed tests: 

  SelectionTest.test_exclusions_respect_with_fastScan:73->assertIncludedFilesInFakeProject:80 expected:<[included.txt, module/src/main/java/not-ignored.txt, module/sub/subsub/src/main/java/not-ignored.txt]> but was:<[included.txt]>

@rmannibucau
Copy link
Contributor Author

@mathieucarbou windows I assumed? pushed a commit about it but didn't test it

@mathieucarbou
Copy link
Owner

@mathieucarbou windows I assumed? pushed a commit about it but didn't test it

no, it's building on Ubuntu...

@rmannibucau
Copy link
Contributor Author

@mathieucarbou hmm, i can't reproduce either in intellij nor in CLI. Any possibility the temp folder matches the exclusion and is not /tmp as the default?

@mathieucarbou
Copy link
Owner

@mathieucarbou hmm, i can't reproduce either in intellij nor in CLI. Any possibility the temp folder matches the exclusion and is not /tmp as the default?

I ran into some issues one day about the tmp folder in docker... You could try by creating the tmp directories in the target folder, by passing the "target" folder as a root of the junit rule perhaps.

@rmannibucau
Copy link
Contributor Author

@mathieucarbou can you give a try setting this exclude array in the test please:

new String[] {"target" + File.separator + "**", "module/**/target" + File.separator + "**"}

?

@mathieucarbou
Copy link
Owner

@mathieucarbou can you give a try setting this exclude array in the test please:

new String[] {"target" + File.separator + "**", "module/**/target" + File.separator + "**"}

?

Would be far easier for you to continue handling your PR.
You can even directly edit your branch from GitHub website.

@rmannibucau
Copy link
Contributor Author

@mathieucarbou agree but since I can't reproduce it is mainly random guess tries. Any way to reproduce it, you mentionned docker, is the setup documented anywhere?

@McFoggy
Copy link
Contributor

McFoggy commented Jun 9, 2020

the travis build is using openjdk7

docker run --rm -it openjdk:7 /bin/bash

cd ~
mkdir dev && cd dev
wget -o http://apache.mirrors.ovh.net/ftp.apache.org/dist/maven/maven-3/3.6.3/binaries/apache-maven-3.6.3-bin.zip
unzip apache-maven-3.6.3-bin.zip
export PATH=$PWD/apache-maven-3.6.3/bin:$PATH
git clone /~https://github.com/rmannibucau/license-maven-plugin.git
cd license-maven-plugin
git checkout rmannibucau/enforce-exclusions
mvn test

@mathieucarbou mathieucarbou requested a review from a team June 9, 2020 14:29
@mathieucarbou
Copy link
Owner

mathieucarbou commented Jun 9, 2020

@rmannibucau : reason I am asking you to handle the PR until it is complete is because on my side I do not have time to help now (2y old kid at home + newborn). I can only review / merge / release with the time I have.

@McFoggy and @adamretter are on the MLP team and can also help you and review / merge once the problem is found, but they also have limited time.

This is the only last PR before I trigger a 4.0.rc1 release. It would be nice to have this improvement. But if you think that solving this issue will take time, I can still trigger the 4.0.rc1 release without this PR and trigger an rc2 after 😄

Thanks everybody for your help 👍

@rmannibucau
Copy link
Contributor Author

Think I spotted the issue, will fix it now

using maven shared utils and adding a flag to have a fast scan
@rmannibucau rmannibucau force-pushed the rmannibucau/enforce-exclusions branch from 0f1b162 to 20faa4d Compare June 9, 2020 16:20
@rmannibucau
Copy link
Contributor Author

Should be good now, perf improvement is not that impressive but it /2 and impl is correct now ;)
Thanks for the patience.

@mathieucarbou
Copy link
Owner

Should be good now, perf improvement is not that impressive but it /2 and impl is correct now ;)
Thanks for the patience.

You're welcome! Let's get a release now :-)

@mathieucarbou mathieucarbou merged commit 683b10b into mathieucarbou:master Jun 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants