Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad email validator #349

Open
iUnstable0 opened this issue Aug 17, 2024 · 13 comments
Open

Bad email validator #349

iUnstable0 opened this issue Aug 17, 2024 · 13 comments

Comments

@iUnstable0
Copy link

iUnstable0 commented Aug 17, 2024

This email validator incorrectly accepts test@ok as a valid input. While it follows the HTML specification for email address validation, the regex provided on that website also matches emails without a top-level domain (TLD), such as test@gmailcom. This is problematic for several reasons:

  1. Public-Facing Applications: For public-facing applications, accepting emails without a TLD can lead to invalid user inputs, causing potential issues with email delivery and user verification processes.

  2. Rare Usage in Development: Even in development environments, it is rare to see local email addresses without a TLD. Most development setups use valid TLDs to simulate real-world scenarios. (if im being honest they js use their personal or burner mail lol)

To address these concerns, I suggest enforcing a stricter validation that requires a TLD. This can be achieved by modifying the regex to ensure the presence of a TLD. Here is an example of a stricter regex:

/^[-a-zA-Z0-9_.]+@[-a-zA-Z0-9]+\.[a-zA-Z]{2,4}$/
@Keats
Copy link
Owner

Keats commented Aug 18, 2024

Honestly since this is allowed by both the mail spec (test@gmailcom is a valid email in theory) and the HTML5 spec I would prefer to leave it. However maybe we could have a strict mode that disable some esoteric addresses?

@iUnstable0
Copy link
Author

itd be nice to have it built in! since right now i'm using custom function and the crate mailchecker

@szabgab
Copy link

szabgab commented Aug 26, 2024

I just encountered this problem as one of my users mistakenly used blabla@gmail forgetting the .com. I'd also like to see a way to make this stricter.

@GrantGryczan
Copy link

GrantGryczan commented Aug 30, 2024

Something like #[validate(email_strict)] would be nice. But with the way things typically are in Rust, I'd expect strict to be the default in order to avoid footguns, and I don't think most people expect the default to be allowing TLDs.

But regardless of what the default is, differing from the client-side validation for HTML5 email inputs arguably introduces a small problem: if the server is stricter than the client, then the user might see raw errors from the server rather than a more UX-friendly error (such as client-side validation). But I don't know that there's any way to mitigate that footgun here (other than by making a note of it in the documentation--perhaps via naming?).

@Keats
Copy link
Owner

Keats commented Aug 30, 2024

strict might not be a good name. More something like allow_esoteric_mails?

@szabgab
Copy link

szabgab commented Aug 30, 2024

What other cases are there besides "domain name without any dot in it"?

As for name "require_one_dot_after_at" or a more ambitious one: "valid_hostname" or "valid_tld" that would check against the list of all the TLDs.

@GrantGryczan
Copy link

GrantGryczan commented Sep 1, 2024

More something like allow_esoteric_mails?

The reason one would want to use it should be communicated by its name. I believe the purpose of allowing esoteric emails here is to act equivalently to how HTML5 validates emails, so I suspect something with html5 in the name would be best, if you are taking the strict-by-default route (which would be a breaking change, though not necessarily bad). Then in the docs, you'd explain what the exact differences in behavior are.

For example #[validate(email(like_html5))], but I know nothing about attribute syntax conventions, so that might not be exactly right.

@Keats
Copy link
Owner

Keats commented Sep 2, 2024

The default doesn't need a name, it's the other one that is "simpler" that requires one

@GrantGryczan
Copy link

GrantGryczan commented Sep 3, 2024

Oh, I thought you were suggesting a name for opting into the current default functionality, because this issue's suggested functionality is to disallow certain emails from the default, not to allow certain additional emails (which is what allow_esoteric_emails led me to believe).

In that case, perhaps something like #[validate(email(with_tld))]? Again I think it's important for the name to communicate the purpose (so, for example, I don't have to check the docs periodically due to forgetting why it's needed).

Actually, that might not be a great name because it should still allow IP addresses, right? So it's ambiguous exactly how that would behave with IP addresses. But I'd still go in the direction of making it clear from the syntax that domains in emails require a TLD.

@Keats
Copy link
Owner

Keats commented Sep 3, 2024

My bad, I meant disallow_esoteric_mails, not allow_esoteric_mails 🤦

Actually, that might not be a great name because it should still allow IP addresses, right? So it's ambiguous exactly how that would behave with IP addresses. But I'd still go in the direction of making it clear from the syntax that domains in emails require a TLD.

exactly, ip addresses still work. The idea with "esoteric" is stuff that no regular user is going to encounter in reality

@gianalarcon
Copy link

Hi team, any updates on this issue? We are also expecting to use a validate(email_strict) or something similar. Thanks!

@Keats
Copy link
Owner

Keats commented Sep 20, 2024

I would take a PR for it

@GrantGryczan
Copy link

GrantGryczan commented Sep 22, 2024

Haha, I just found out that some people actually do use real, working emails with just a TLD, no dot. See this Reddit thread for examples. In light of this, I'm not sure making this a built-in validator check is the right solution. If this is implemented, it should be documented that this invalidates some real-world email addresses from top-level domain mail servers. But I think checking if the TLD is real would be a more ideal solution (albeit with the drawback that new TLDs are often created), and that would cover the user@gmailcom case.

(Though I personally ended up going with the more "Parse, don't validate" solution of validating using newtypes, so I don't have any personal investment in this anymore anyway.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants