-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace linkRegex with xurls library #6261
Conversation
Rather than maintaining a complicated regex to match URLs for autolinking, gitea can use this existing go library that takes care of the matching with very little code change to gitea itself. After spending a while trying to find the perfect regex for all cases this library still works better as it is more flexible than a single regex ever will be. This will also fix the following issues: go-gitea#5844 go-gitea#3095 go-gitea#3381 This passes all our current tests and I've added new ones mentioned in those issues as well.
Codecov Report
@@ Coverage Diff @@
## master #6261 +/- ##
=========================================
Coverage ? 38.81%
=========================================
Files ? 355
Lines ? 50253
Branches ? 0
=========================================
Hits ? 19504
Misses ? 27920
Partials ? 2829
Continue to review full report at Codecov.
|
modules/markup/html.go
Outdated
@@ -645,7 +642,7 @@ func emailAddressProcessor(ctx *postProcessCtx, node *html.Node) { | |||
// linkProcessor creates links for any HTTP or HTTPS URL not captured by | |||
// markdown. | |||
func linkProcessor(ctx *postProcessCtx, node *html.Node) { | |||
m := linkRegex.FindStringIndex(node.Data) | |||
m := xurls.Strict().FindStringIndex(node.Data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The result of xurls.Strict()
should be cached - it compiles several regexps.
Tagging this as a bugfix as it solves a bug, so that we can get it in the 1.8.0 release. |
This is much faster and we only care about https? links to preserve existing behavior.
Thanks much for the feedback! That is exactly right thanks for catching. Also here is a tiny test program to compare how long this takes vs the current implementation: https://gist.github.com/mrsdizzie/edfcbf36a5355d1db5f5d7218543a7a4 The results I got were:
modifiedLinkRegex being a new regex that would try and match all of the URLs mentioned in the bugs above. So this is pretty on par with anything we could have added and doesn't introduce noticeable slowness in real world situations of large content |
Rather than maintaining a complicated regex to match URLs for autolinking, gitea can use this existing go library that takes care of the matching with very little code change to gitea itself:
/~https://github.com/mvdan/xurls
After spending a while trying to find the perfect regex for all cases this library still works better as it is more flexible than a single regex ever will be.
This will also fix the following issues: #5844 #3095 #3381
This passes all current tests and I've added new ones based on URLs mentioned in those issues above.