Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate shortened doc links #2511

Open
debadair opened this issue Aug 31, 2022 · 3 comments
Open

Validate shortened doc links #2511

debadair opened this issue Aug 31, 2022 · 3 comments
Labels
docs-build Relates to the build tooling and scripts link-checking Link Checking & Redirects

Comments

@debadair
Copy link
Contributor

The Elastic Health API returns information that includes links to Troubleshooting topics in the documentation. They're using the link shortening service for these links so that they are easier to manage. There are other places where we are embedding links to the docs (deprecation warnings) and using shortened links. We need to routinely test these to ensure the links we're embedding in output don't 404.

@debadair debadair added link-checking Link Checking & Redirects docs-build Relates to the build tooling and scripts labels Aug 31, 2022
@gtback
Copy link
Member

gtback commented Sep 12, 2022

Conceptually, I think this comes in three steps:

  1. Get a list of short-links we need to test (technically, we could do skip this step and go straight to step 3 if they're already links to the documentation)
  2. Look up the original link for each shortened link (we can do this by just making the web request, or we could directly query the Elastic cluster that stores this data).
  3. Ensure all "docs-looking" links are valid.

Step 3 can use the existing patterns for Kibana links.
Step 2 is probably up to me (I'll need to think about whether making the requests would slow down the builds too much, or skew metrics, or anything like that; @JoshMock, do you have any thoughts here?)
I'll need help with Step 1: more information about where the links are saved in code, and how we can extract them.

@JoshMock
Copy link
Member

As far as hitting the shortener service, HEAD requests work: curl --head https://ela.st/infra-aon

If you get a 200-300 level response and a valid Location header (which you could then follow if you so chose to ensure the final destination actually exists) you know the link is good.

The only other thing I'll say is that recursive link validation tools sound easy to build, but there be dragons there. I wrote one at my previous job and saw a surprising amount of false negatives because some servers deny HEAD requests, or behave in other non-standard ways. Sometimes this is intentional bot-blocking, and sometimes it's just accidental bugs, but it's always annoying. 😄

@gtback
Copy link
Member

gtback commented Sep 12, 2022

Thanks, @JoshMock !

In this case, I was imagining just finding all ela.st links, and making a request and getting the Location header for the redirect. Then, if the URL starts with https://www.elastic.co/guide/, take the part after that and ensure it exists under: /~https://github.com/elastic/built-docs/tree/master/html (this is how all of the existing link checking works). My plan was to not make any requests for anything under www.elastic.co (or indeed, any domains other than ela.st).

I should have thought to use a HEAD request, which means you can filter out a "click" like this based on http.request.method, as well as possibly using a custom User Agent.

@gtback gtback removed their assignment Mar 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs-build Relates to the build tooling and scripts link-checking Link Checking & Redirects
Projects
None yet
Development

No branches or pull requests

3 participants