Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNS and Let's encrypt certificates for OCaml #27

Open
hannesm opened this issue Jan 16, 2023 · 38 comments
Open

DNS and Let's encrypt certificates for OCaml #27

hannesm opened this issue Jan 16, 2023 · 38 comments

Comments

@hannesm
Copy link
Member

hannesm commented Jan 16, 2023

Dear Madam or Sir,

with huge interest I read through some of the issues in this repository. Thanks for being open and transparent what you like to achieve.

Every other issue when it comes to migrations, I see that there are issues related to let's encrypt certificates and migration of services. The underlying reason, as far as I can tell, stems from the methodology of retrieving let's encrypt certificates: run a "certbot" locally, which requires (a) a web server on port 80, and (b) some ad-hoc configuration to serve static files, and (c) DNS changes being propagated for the desired hostname(s). This means, only once the actual service is deployed to live it can retrieve its certificate. This also makes moving services hard (without downtime).

Over the years, I worked on (fully open source, fully developed in OCaml as MirageOS unikernels) automation to push the whole let's encrypt interaction into DNS (a secondary server on steroids), and thus decoupling the actual service deployment from the certificate provisioning.

The idea is pretty simple: both certificates and signing requests are public data anyways (they're stored in the certificate transparency log, ...). DNS is a fault-tolerant key-value store. Each CSR and certificate is embedded as TLSA (https://www.rfc-editor.org/rfc/rfc6698.html) record (in DER encoding, i.e. no base64/pem, just the bare minimal stuff). Thanks to DNS TSIG we also have authentication (so not everyone may upload CSR) ;).

The mechanism is as follows: the primary DNS sends out DNS NOTIFY whenever the zone changes. The dns-letsencrypt-secondary observes zone(s), and whenever a fresh CSR is detected (or a soon expiring certificate, or a CSR without matching certificate (i.e. key rollover)), the let's encrypt DNS challenge is used to provision a new certificate.

The services behind just download (dig tlsa _letsencrypt._tcp.robur.coop) the certificate, and only need to have their private key distributed.

The operator can use nsupdate to upload a new certificate signing request.

If you're interested in using such a system (and run your own DNS servers - of course you can keep gandi's as advertised ones / public ones), don't hesitate to reach out. I'm happy to help figuring out how to work in that area. :)

@avsm
Copy link
Member

avsm commented Jan 18, 2023

Thank you for suggesting this @hannesm, and of course of all the hard work you've done that has led to this being possible at all. I'm entirely supportive of the suggestion as the HTTP process is a real pain to manage, but would like to see the end-to-end process deployed somewhere other than ocaml.org first to ensure it's suitably mature.

@mtelvers, would you like to have a go at this on some other domain such as realworldocaml.org, and document it to your and @hannesm' satisfaction on infra.ocaml.org? Once it's documented and demonstrable elsewhere (particularly with respect to how to edit DNS zone files and so on, presumably via git), we will then need to present this to Xavier Leroy to get his permission to make the (big) change for the ocaml.org domain. I'm also tagging @RyanGibb who is interested in matters of OCaml and DNS and may want to assist.

@hannesm
Copy link
Member Author

hannesm commented Jan 18, 2023

What I failed to mention in the initial issue is that such a setup with OCaml-DNS has been used for various domains since more than 5 years; also mirage.io works this way.

Also, the let's encrypt challenge: so getting a signed certificate does not require the private key of the certificate signing request :D This is why the setup works pretty nicely.

I can for sure help how to setup a primary name server and using DNS zones in a git repository. I can also provide secondary name servers if desired.

@hannesm
Copy link
Member Author

hannesm commented Jan 18, 2023

@avsm and thanks for your support interest.

@RyanGibb
Copy link

RyanGibb commented Jan 25, 2023

Hi @hannesm, I would be very interested in assisting with this if you could use my help.

I have a question if that's okay. I'm wondering if there's possible vulnerabilities with the spoofing of TLSA records. Reading RFC 6698:

This document defines a secure method to associate the certificate
that is obtained from the TLS server with a domain name using DNS;
the DNS information needs to be protected by DNSSEC.

As far as I understand, the OCaml-DNS resolver supports DNSSEC, but the authoritative server doesn't. Do you think this is an issue?

@hannesm
Copy link
Member Author

hannesm commented Jan 25, 2023

@RyanGibb thanks for your offer.

From my observation, the current DNS deployment for ocaml.org / realworldocaml.org does not use DNSSec. Also, DNSSec integration into the authoritative servers (for OCaml-DNS) is on the agenda, and will be done this year.

As another point, so you can spoof TLSA records - but what is the attack vector? The service that uploads the CSR authenticates itself to the authoritative servers. The service that downloads the certificate checks that the public key in the certificate matches the private key it has. The certificate chain is checked to be valid (against the system trust anchors).

I don't quite understand where DNSSec would be necessary, but maybe I'm failing to see the attack vector (@RyanGibb would you mind to elaborate a bit more what you mean with "Do you thin this is an issue?"). The service can btw also ask the authoritative server directly for TLSA records (certificates).

To me, I wonder whether @mtelvers has an opinion and/or time for diving into such a thing ("running authoritative DNS services") or not. There's some IETF document that it is suggested to use anycast IP addresses for this, I'm myself not doing this since I don't have sufficiently many machines and BGP speakers to have such a setup -- but tbh it works fine with "just normal" IPv4 addresses ;)

@RyanGibb
Copy link

Hi @hannesm, thanks for your reply.

After reading https://hannes.nqsb.io/Posts/DnsServer I think I understand that the TLSA records are only used for distributing the CSRs and certificates between DNS servers and services in your solution, not for replacing a CA with the DNSSEC trust anchor as rfc6698 describes. Hence the TLSA RRs' location at _letsencrypt._tcp.robur.coop differs from https://www.rfc-editor.org/rfc/rfc6698#section-3. Is my understanding correct?

If so, I understand the reason why DNSSEC isn't required. The CA (letsencrypt) is the root of trust. The TSLA records are just a convent way of distributing the provided certificate. Apologies for my misunderstanding!

Also, DNSSec integration into the authoritative servers (for OCaml-DNS) is on the agenda, and will be done this year.

Aside from the letsencrypt DNS-01 challenge, that's great to hear :-)

@hannesm
Copy link
Member Author

hannesm commented Jan 26, 2023

TLSA records are only used for distributing the CSRs and certificates between DNS servers and services in your solution, not for replacing a CA with the DNSSEC trust anchor as rfc6698 describes. Hence the TLSA RRs' location at _letsencrypt._tcp.robur.coop differs from https://www.rfc-editor.org/rfc/rfc6698#section-3. Is my understanding correct?

@RyanGibb yes, your understanding is correct.

Aside from the letsencrypt DNS-01 challenge, that's great to hear :-)

I'm not sure I understand what your comment means, would you mind to explain?

@RyanGibb
Copy link

@RyanGibb yes, your understanding is correct.

Great, thank you for confirming.

I'm not sure I understand what your comment means, would you mind to explain?

I just mean to say, despite DNSSEC not being required for the letsencrypt DNS challenge, it's good to know that it's on the agenda.

@RyanGibb
Copy link

RyanGibb commented Feb 27, 2023

Hi all, just to give a small update on this I've created a nameserver primarily targeting Unix using the new effects-based IO library and mirage OCaml-DNS library that is able to perform dynamic UPDATEs authenticated using TSIG: /~https://github.com/RyanGibb/aeon/. I'm hopping to add support for the letsencrypt challenge to this namserver directly (as opposed to running in a separate process), simplifying the communication required.

@hannesm
Copy link
Member Author

hannesm commented Feb 27, 2023

Dear @RyanGibb, thanks for your effort. But I'd really like to hear from @mtelvers what would be worth for OCaml infrastructure. And I opened this issue to explicitly understand whether using MirageOS unikernels would be possible/interesting for that infrastructure.

I feel very torpedized and getting the issue stolen by your "hey, look, I developed something new that supports some parts (certainly no notify, hasn't been tested for years on real domains, etc.) in this shiny new IO framework" -- especially since I've been doing the underlying DNS development since 2017. Your "add support for the letsencrypt challenge to this namserver directly" is as well something that can be done in a MirageOS unikernel.

@RyanGibb
Copy link

My sincere apologies @hannesm. I in no way meant to torpedo this issue. You've been working on this for far longer than myself, and my contribution is a small layer using a different IO library. I just wanted to express my continued interest in this topic and share some work that I've been doing for a different project that relates to this issue. I should have been more clear on that.

@avsm
Copy link
Member

avsm commented Feb 27, 2023

@hannesm @RyanGibb please do take a positive interpretation of each other's efforts. The world of managing OCaml infrastructure is small enough already without us driving each other away.

In my view, Ryan has been learning and reproducing Hannes' efforts, and that's appreciated. If you could perhaps split up your experiences with "reproducing the Mirage DNS stack" vs your own reimplementations on eio, that would be most useful for the knowledge sharing in this issue.

But let's wait for @mtelvers to comment on his plans first, and if he's not available, then make a wider call to the community for more assistance with reproducing the Mirage DNS stack on other domains .

@mtelvers
Copy link
Collaborator

@hannesm I think your post may have been inspired by my convoluted implementation of Let's Encrypt certificates which I used for #19. This not my preferred approach.

I prefer to use automatic provisioning, which is included in Caddy. In this case, this option was not available to me as the requirement was for round-robin DNS. With round-robin, I could not guarantee the response would arrive at the requesting server. The natural resolution is to use DNS challenge, but that was not available as DNS updates are administered manually by @avsm. Therefore, I switched to NGINX, this gave me the granular configuration required to redirect HTTP challenges to the originating server.

We would also need to agree on a hosting strategy for the unikernel to ensure a redundant deployment.

Reading through https://hannes.nqsb.io/Posts/DnsServer, under the Let's encrypt! section, how would we configure a reverse proxy such as (Caddy/NGINX) to request a certificate with an hmac-secret?

@avsm What are the success criteria? Or perhaps more importantly, what administrative controls need to be kept in place by the new solution?

If we can use DNS-01 Challenge rather than HTTP-01, then this would be worth implementing for the round-robin DNS for opam.ocaml.org. The alternative solution would be a Gandi API key which would delegate more access than just creating TXT records.

@hannesm
Copy link
Member Author

hannesm commented Feb 28, 2023

how would we configure a reverse proxy such as (Caddy/NGINX) to request a certificate with an hmac-secret?

The initial CSR can be uploaded with nsupdate -y hmac-sha256:client._update:<b64-encoded-shared-secret> (where nsupdate is part of bind) -- my assumption is that "spawn new host names / services" requires human intervention anyways, and thus can be done by a human with the shared secret in their hands (also with the private/public key pair, which is used to produce the CSR).

The certificates can be downloaded by the service with the following shell script - e.g. via a cron job (since it is ensured that the certificate is updated 2 weeks before expiry):

#!/bin/sh

set -e

hostname=$1

dig_opts=" +noquestion +nocomments +noauthority +noadditional +nostats"
if [ $# = 2 ]; then
    dig_opts="$dig_opts @$2"
fi
data=$(dig tlsa _letsencrypt._tcp.$hostname $dig_opts | awk '{if (NR>3){print}}' | cut -f 5- -d ' ' | sort | grep '^[03] 0 0' | cut -d ' ' -f 4- | sed -e 's/ //g')
file=

hex_to_bin () {
    data=$(echo $@ | sed -e 's/\([0-9A-F][0-9A-F]\)/0x\1 /g')
    for hex in $data; do
        oct=$(printf "%o" $hex)
        if [ $oct = "0" ]; then
            printf "\0" >> $file
        else
            printf "%1b" $(echo '\0'$oct) >> $file
        fi
    done
}

cert_file=$(mktemp)
i=0
for cert in $data; do
    i=$(echo $i + 1 | bc)
    file=$(mktemp)
    hex_to_bin $cert
    openssl x509 -inform der -outform pem -in $file -out $cert_file.$i
    rm $file
done

# now mix and match, the final $i should be the leaf certificate
inter=$(mktemp)
last_inter=$(echo $i - 1 | bc)
for j in $(seq 1 $last_inter); do
    cat $cert_file.$j >> $inter
done

openssl verify -show_chain -verify_hostname $hostname -untrusted $inter $cert_file.$i

out=$hostname.pem

if [ -f $out ]; then
    out=$(mktemp)
fi

cat $cert_file.$i >> $out
cat $inter >> $out

rm -f $cert_file* $inter

echo "PEM bundle in $out"

@avsm
Copy link
Member

avsm commented Feb 28, 2023

@mtelvers wrote:

@avsm What are the success criteria? Or perhaps more importantly, what administrative controls need to be kept in place by the new solution?

Good question. There's one important missing piece in our current infrastructure: secrets management. We already have a bunch of keys lying around, and with the DNS infrastructure will have even more with the various nsupdate pieces. So I think we need to come up with some way to store and securely share the various private material (and ensure that there's robust administrative controls there). Once we have that, I'm satisfied that we can manage the DNSKEYs and other pieces in this ticket well. Perhaps let's split this issue into a sub ticket on secrets management, if you think that's useful? (and @hannesm, do you have anything in your box of Mirage deployment tricks that might help with that?)

@hannesm
Copy link
Member Author

hannesm commented Feb 28, 2023

secrets management

Would you mind to expand your requirements here?

As far as I can see, there is (when we consider self-hosted authoritative DNS)

  1. gandi.net passwords for access to domains (this is likely a private account, or some required by actual humans for changing things (authoritative NS) -- in a setup where DNS is self-hosted -- maybe use a password manager that allows sharing via an encrypted file?
  2. secrets for e.g. uploading a CSR (when a new service is deployed, i.e. a new DNS record needs to be enrolled as well)
  3. secrets between machines (i.e. secondary NS requires a shared secret for requesting a zone transfer from the primary)

Are there more types of secrets needed?

Certainly, the password manager (1) looks out of scope for this discussion. The secrets required by humans (2) to modify the zone file can be (a) access to a (private) git repository hosted on GitHub (as done by the mirage organization) (b) a shared secret in the password manager.

For the secrets between machines (3), the current setup (for e.g. mirage.io) is: the git repository with the zone files contains the shared secrets (to communicate between primary and secondary servers). The primary DNS server has access to it (via a ssh key that is provided as boot parameter (command line argument), the public part is registered with GitHub); the secondary DNS servers receive the shared secrets as boot parameter.

Now, lifting the boot parameters (none of the below is implemented yet)

  • To avoid passing shared secrets around, there could be a LDAP service (+clients), but then the primary server (and secondary servers) would need to authenticate themselves to the LDAP - which again means they'd need some command line parameters.
  • Another option would be a DHCP server that provides (from a configuration file) the matching secrets to the unikernel(s). Now, only the DHCP configuration file would contain the secrets, no more boot parameters (also, it allows to pass logging etc. via DHCP).
  • A third option is to have a web service that acts as configuration manager (i.e. has the secrets and communicates with albatross). This would as well allow (in contrast to DHCP) managing unikernels with secrets on different hosts (certainly this configuration manager now should be secured, but it could very well be a unikernel that requires two-factor authentication (webauthn)).

Let me know what you think, and/or let's have a discussion (maybe a video meeting?) about other approaches (and about the concrete goals).

@avsm
Copy link
Member

avsm commented Feb 28, 2023

I'd only add to the secrets list for ocaml.org:
4) SSH keys for the hosts themselves
5) Capnproto capability files for various services (only necessary if we hook this into the RPC infrastructure)
6) service database passwords for (e.g.) watch.ocaml.org's postgres

Ahead of any discussion, it would be good to have the current status of the secrets in the ocaml.org cluster written down @mtelvers, and we can converge on what missing gaps there are in terms of rolling out any change in DNS infrastructure.

@mtelvers
Copy link
Collaborator

A typical infrastructure deployment uses OCaml services running internally (usually under Docker) with HTTPS offloaded to a reverse proxy. Since we need a reverse proxy, the most straightforward approach is to use Caddy, which is a reverse proxy and manages the certificates automatically. Here is the entire Caddy configuration file needed for a typical service:

www.ocaml.org {
	reverse_proxy www:8080
}

In this setup, Caddy resolves the challenges automatically via HTTP challenge using the DNS entries that @avsm creates.

For a more complex setup, such as where say, if www.ocaml.org resolved to multiple addresses, the ideal setup would be to use the DNS-01 challenge. This is achieved like this (complete configuration file given):

{
	acme_dns gandi {env.GANDI_API_TOKEN}
}

www.ocaml.org {
	reverse_proxy www:8080
}

A typical invocation would be like this:

docker run -it --rm -e GANDI_API_TOKEN=__key__ -v ./Caddyfile:/etc/caddy/Caddyfile -v config:/config -v data:/data -p 80:80 -p 443:443 tuneitme/caddy

There is an outstanding issue to move deploy.ocamllabs.io to deploy.mirage.io. Currently this is deployed using Caddy exactly as described above. Perhaps we can use this as a test case to integrate your hmac script into Caddy? I also see that there is a Caddy module for hmac which may do what we need?

@reynir
Copy link

reynir commented Feb 28, 2023

I have not used caddy before, but I searched around a bit and I found this: /~https://github.com/caddy-dns/rfc2136

I'm not sure exactly how it works, and it may not be able to take advantage of the tricks in the letsencrypt secondary dns server, but I think it should work.

@mtelvers
Copy link
Collaborator

mtelvers commented Mar 1, 2023

@hannesm, I am working through your blog post, and some links may have been renamed/moved since it was reviewed in 2019. Can you help me locate these? Perhaps this is now released?

# git via ssh is not yet released, but this opam repository contains the branch information
$ opam repo add git-ssh git+/~https://github.com/roburio/git-ssh-dns-mirage3-repo.git

There is no branch future, but there is future-git? Can I substitute?

git clone -b future /~https://github.com/roburio/unikernels.git

@hannesm
Copy link
Member Author

hannesm commented Mar 1, 2023

Dear @mtelvers, thanks a lot for your comment(s). Indeed, that changed a bit since the packages are now released. I'll work on revising that blog post.

The sources are now:

All these unikernels are as well available as reproducible binaries (hvt -- kvm) from our infrastructure:

@hannesm
Copy link
Member Author

hannesm commented Mar 2, 2023

I have not used caddy before, but I searched around a bit and I found this: /~https://github.com/caddy-dns/rfc2136

I'm not sure exactly how it works, and it may not be able to take advantage of the tricks in the letsencrypt secondary dns server, but I think it should work.

Indeed, RFC2136 is the "dynamic updates for DNS" RFC, which is implemented by OCaml-DNS. And the configuration snippet from the link:

{
    "module": "acme",
    "challenges": {
        "dns": {
            "provider": {
                "name": "rfc2136",
                "key": "cWnu6Ju9zOki4f7Q+da2KKGo0KOXbCf6Pej6hW3geC4=",
                "key_name": "test",
                "key_alg": "hmac-sha256",
                "server": "1.2.3.4:53"
            }
        }
    }
}

Are supposed to directly work. This would mean: (a) no need for dns-letsencrypt-secondary (b) enroll your hmac secret with key and key_name available to caddy and dns-primary-git. I've not tested the interaction with caddy (but since it is RFC-specified, and works with bind, it should be fine).

@avsm
Copy link
Member

avsm commented Mar 2, 2023

Relying on RFC2136 and using another interoperable bit of software like Caddy seems ideal here; well spotted @reynir. I'm hopeful that we'll eventually have a Caddy replacement in OCaml (I'm working on one on the side), but it'll obviously take some time to mature before being suitable for OCaml.org deployment.

@mtelvers
Copy link
Collaborator

mtelvers commented Mar 2, 2023

@hannesm I have made some progress, but it doesn't seem to work, and I am unsure where I am going wrong. I have a DNS server up and running with this command:

sudo ./_build/default/primary-git --remote=/~https://github.com/mtelvers/tunbury-uk-dns --ipv4=a.b.c.d/24 -l debug

However, it doesn't work when I try to test it with this (per your example).

$ host ns1.tunbury a.b.c.d
Using domain server:
Name: a.b.c.d
Address: a.b.c.d#53
Aliases: 

Host ns1.tunbury not found: 9(NOTAUTH)

The console output is

2023-03-02 15:30:01 +00:00: DBG [dns_server] from w.x.y.z received:header 8712 (query) operation Query rcode 
                                            no error flags: recursion desired
                                            question ns1.tunbury A?
                                            data query additional 
                                            EDNS no TSIG no
2023-03-02 15:30:01 +00:00: DBG [dns_mirage] udp: sending 29 bytes from 53 to w.x.y.z:16906

I get rate limited by Git pretty quickly; therefore, I tried to use a local git server. I generated a key with awa_gen_key and put the public key in the git user's ~/.ssh/authorized_keys. I scanned the host with ssh-keygen, etc. On the command line, I added --authenticator=SHA256:xxx and --remote=ssh://git@127.0.0.1/tunbury-uk-dns.git. The --seed parameter now seems to be --ssh-key=rsa:seed. However, I get the error below. I tried various options, such as using the public IP rather than 127.0.0.1

2023-03-02 15:41:12 +00:00: ERR [git-fetch] The Git peer is not reachable.
2023-03-02 15:41:12 +00:00: ERR [application] couldn't initialize git repository ssh://git@127.0.0.1/tunbury-uk-dns.git: error fetching: No connection found

My apologies; I am probably making some basic error.

@hannesm
Copy link
Member Author

hannesm commented Mar 2, 2023

Thanks for your report, @mtelvers. I just pushed an update to the blog post.

To answer your trouble:

--remote=/~https://github.com/mtelvers/tunbury-uk-dns

should be --remote=/~https://github.com/mtelvers/tunbury-uk-dns.git I think

--authenticator=SHA256:xxx and --remote=ssh://git@127.0.0.1/tunbury-uk-dns.git. The --seed parameter now seems to be --ssh-key=rsa:seed. However, I get the error below. I tried various options, such as using the public IP rather than 127.0.0.1

Indeed that argument changed: awa_gen_key has a --keytype argument (with ed25519 or rsa) now;
ssh-key= takes rsa:seed or ed25519:key;
and --remote should be git@IPorHOST:path.git (no more ssh://, and a : instead of / after the IPorHOST)

Hope that helps.

@mtelvers
Copy link
Collaborator

mtelvers commented Mar 2, 2023

@hannesm Thank you. The local git repository is now working. However, it is still not resolving names. I'll have another look in the morning. Thanks again.

@hannesm
Copy link
Member Author

hannesm commented Mar 2, 2023

@mtelvers if you pass -l \*:debug, you'll see quite some log messages. But testing the zone file may be worth it first (opam install dns-cli and ozone /path/to/zonefile).

@mtelvers
Copy link
Collaborator

mtelvers commented Mar 3, 2023

Success! The remote must include the branch even when there is only one branch called master. Thus this works:

sudo ./_build/default/primary-git --remote=git@a.b.c.d:tunbury-uk-dns.git'#master'

@reynir
Copy link

reynir commented Mar 3, 2023

Happy to hear it works! 🥳

It defaults to trying branch main if not specified :-)

@mtelvers
Copy link
Collaborator

mtelvers commented Mar 3, 2023

Domain tunbury.uk is now using dns-primary-git as the name server, and the certificate for https://www.tunbury.uk was deployed directly from Caddy using RFC2136 and a DNS-01 challenge. I'll document the steps.

@mtelvers
Copy link
Collaborator

mtelvers commented Mar 6, 2023

Deploying an authoritative DNS servers as a MirageOS unikernels

Git Server

These steps create a Git Server on the local machine to host the zone repository. A remote machine could be used instead with suitable changes to the commands. The machine acting as the Git server should have git installed!

sudo apt update -y
sudo apt install git -y

Create a user called git

sudo adduser --disabled-password --gecos '@git' --home /home/git git

Create an SSH key to secure access to the repository.

ssh-keygen -t rsa -N "" -f ~/.ssh/id_rsa

Copy the new key to the git user.

sudo mkdir -p -m 0700 /home/git/.ssh
sudo cp ~/.ssh/id_rsa.pub /home/git/.ssh/authorized_keys
sudo chown git:git /home/git/.ssh /home/git/.ssh/authorized_keys
sudo chmod 0600 /home/git/.ssh/authorized_keys

Verify that ssh git@localhost now works without prompt.

OCaml and Opam

Install the necessary prerequisites using apt.

sudo apt update -y
sudo apt install build-essential curl unzip bubblewrap git -y

Install Opam

sudo bash -c "sh <(curl -fsSL https://raw.githubusercontent.com/ocaml/opam/master/shell/install.sh)"

Setup Opam with a 4.14 switch.

opam init -y --shell-setup -c 4.14.1
eval $(opam env)

DNS Zone

Create the remote repository.

ssh git@localhost "mkdir tunbury-uk-dns.git && cd tunbury-uk-dns.git && git init --bare"

Locally, create the folder and zone file.

mkdir -p ~/tunbury-uk-dns
cd ~/tunbury-uk-dns
cat << 'EOF' > tunbury.uk
$ORIGIN tunbury.uk.
$TTL 3600
@       SOA     ns1     hostmaster      1       86400   7200    1048576 3600
@       NS      ns1
ns1     A       128.232.124.215
www     A       128.232.124.215
EOF

Now create a HMAC secret using random data.

hmac=$(openssl rand -base64 32)
echo "personal._update.tunbury.uk. DNSKEY 0 3 163 $hmac" > tunbury.uk._keys

Setup a git repository and push the zone file.

cd ~/tunbury-uk-dns
git init
git config --global user.email "Email address"
git config --global user.name "Name"
git add .
git commit -m "initial commit"
git remote add origin git@localhost:tunbury-uk-dns.git
git push origin master

Git SSH access authentication

Get the fingerprint of the Git server's SSH key in the format SHA256:xxxx.

ssh-keyscan -t rsa localhost > /tmp/public-key
fingerprint=$(ssh-keygen -l -E sha256 -f /tmp/public-key | cut -f 2 -d " ")

In order for the DNS server to authenticate again the Git server we need to generate a key pair using the awa tool. Install the tool using opam.

opam install -y awa

Then generate the key by running awa_gen_key. Example output is below.

$ awa_gen_key
private key seed: rsa:xV+Z1XIeHvQ1FhRNuXcF424YOd6QvGe9idH1Y58o
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDCJEIp1ofFkE2Kt3vd7DgvReuTFg7PpWjePOO0Kn/Mv6t1TPdVzKSiJ0r0jcIWt2df9cqAd+GVr+tSOGflHZeXJerYMzIrZOFjzxVV7P4hJnSEgIosVcmEgnr2WFcHdlINvr4mkkCwOvu5ncE+ix+vRMHLORlA1ND4Fe98dANn4fJba9EQsXgFuxu3QLTCpcjDoHtNM4vOuY0B4ptLS2RWLYkCDsyswGFDqFGKLQI/RbkbsmarPh4qlXD/cy9I2DnoBetGvmBFUpjQkRlpcHS95zN6/0YC0oq6X2lXXiDXdrkPLOq5t1O4rjejiOnv4cXIma9V4GxQ73NASih8D3JU/KsF9FnggneUP8zu//i3UxiZrOemokfjgQ3ancPh6GnqnhEMeypnNx5w8x52tF8RnR7zolLqEvQeRMddetA/bZcmoJBJQ0N3cWZrC1MSEAFUeeH+mqSlW7nzONdYZ3JZ8OMjL/vV/lfm2e50+jCh9+GwxdpK+2+yQzNqWX0E8JdERAw6H3ThT1MSCszUhApZWGsPmSBXLbZI6ror/qGlPWE4POsRdZSGnWVhZpR5xT089Xp1lR4ZdhqDd69TcS/WQ8I+eGNtqqjZdm9M0sz4u/NRhO9VwfipE6mYMqbDOlL0vVulsXTqgNRWA0NEFXfzc74dv8wG4u0a/bX+y+5TcQ== awa@awa.local

The public key portion needs to be added to the authorized_keys on the Git server and the private key seed is needed for the invocation of dns-primary-git.

out=$(awa_gen_key)
public=$(echo $out | cut -d " " -f 5-6)
sudo bash -c "echo $public >> ~git/.ssh/authorized_keys"
seed=$(echo $out | cut -d " " -f 4)

Seed is in the format rsa:xxxxx

Primary DNS Server

Clone the git repository for dns-primary-git and build it.

opam install -y mirage dns-cli
git clone /~https://github.com/roburio/dns-primary-git.git
cd dns-primary-git
mirage configure
make depend
mirage build

To run the DNS server, we need to specify the host SSH key fingerprint (determined above), the seed of the private key to authenticate against the Git server, the IP address to listen on, and finally, the location of the remote repository. The default branch is main, so any other branch needs to be specified.

sudo ./_build/default/primary-git --authenticator=$fingerprint --ssh-key=$seed --remote=git@localhost:tunbury-uk-dns.git'#master' --ipv4=128.232.124.215/24

Verify that your name server is operating with dig or host or nslookup:

nslookup ns1.tunbury.uk 128.232.124.215

Docker

We need to install Docker to perform the Caddy build.

curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh ./get-docker.sh

Add your user to the docker group. My user name is vagrant.

sudo usermod -aG docker vagrant

Caddy

The default Docker image for Caddy does not include support for RFC2136 therefore, we need a custom Docker build.

Create a Dockerfile for the build:

mkdir ~/caddy
cd ~/caddy
cat << EOF > Dockerfile
FROM caddy:builder AS builder

RUN xcaddy build \
    --with github.com/caddy-dns/rfc2136@master

FROM caddy:alpine

COPY --from=builder /usr/bin/caddy /usr/bin/caddy
EOF

Run the build with docker buildx build . -t caddy-rfc2136

Test Usage

To test the automatic creation of certificates, we will implement a typical service with Caddy as the reverse proxy. Create a docker-compose.yml

cat << EOF > docker-compose.yml
version: "3.7"
services:
  caddy:
    image: caddy-rfc2136
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - /etc/caddy:/etc/caddy:ro
      - caddy_data:/data
      - caddy_config:/config
  www:
    image: ocurrent/v3.ocaml.org-server:live
    sysctls:
      - 'net.ipv4.tcp_keepalive_time=60'
volumes:
  caddy_data:
  caddy_config:
EOF

And a create /etc/caddy/Caddyfile. We insert our HMAC secret from the DNS Zone step as the key with the matcing key_name.

cat << EOF > /etc/caddy/Caddyfile
{
        acme_dns rfc2136 {
                key_name "personal._update.tunbury.uk"
                key_alg "hmac-sha256"
                key "$hmac"
                server "128.232.124.215:53"
        }
}

www.tunbury.uk {
        reverse_proxy www:8080
}
EOF

Finally, run docker compose up.

Thanks

Thanks to @hannes and @reynir

Reference: https://hannes.nqsb.io/Posts/DnsServer

@hannesm
Copy link
Member Author

hannesm commented Mar 24, 2023

Cool great to hear your success @mtelvers. I've been re-reading http://infra.ocaml.org/opam-ocaml-org and am wondering whether a next step would be to use (as done by mirage.io and other domains) DNS for storing certificates and certificate signing requests.

The advantages would be:

  • no longer hostname dancing and proxying requests to another DNS round-robin'd host
  • no need for "intermediate" opam-4.ocaml.org / opam-5.ocaml.org certificates
  • independence of opam-5 and opam-4 (i.e. adding new hosts to the mix (no need to reconfigure nginx on existing hosts), also certificate renewal when one host is offline (which from the article I read does not work at the moment))

Instead, the process would be:

  • generate a private key, distribute to ocaml-4 and ocaml-5
  • generate a certificate signing request using that key, for the domain name opam.ocaml.org
  • add that signing request as TLSA record to DNS
  • run a dns-letsencrypt-secondary (which takes care to receive provisioned certificates from signing requests by solving the dns-01 challenge, and stores these certificates in dns as well (as TLSA records) -- it also checks daily for soon-expired (where soon is 14 days) certificates and re-requests fresh ones)
  • on opam-4 and opam-5 have a cronjob that fetches on a daily basis the certificates from DNS (and thus keep getting non-expired ones)

If there's demand, I can provide shell scripts for (a) uploading CSR (based on nsupdate) and (b) downloading CERT (based on dig) (I may already have posted links above to these scripts).

Please let me know what you think.

@mtelvers
Copy link
Collaborator

In summary, we would like to provision SSL certificates for a reverse HTTPS proxy, specifically when we have round-robin DNS. Any proxy server is suitable; Caddy would be my preferred choice as it provides automatic certificate provisioning and renewal.

The current setup is that DNS is managed manually by @avsm via Gandi's web GUI thus restricting us to using HTTP-01 challenges.

We have a working solution to this problem using NGINX. This solution redirects missed HTTP-01 challenges to the alternate server.

Disadvantages of the current solution

Looking at the disadvantages identified with this solution:

  1. Dancing between hosts: This is somewhat manual during the initial deployment. However, we have a working solution with minimal scripting for ongoing renewals.

  2. Intermediate certificates: We would deploy these anyway because we would want to be able to target specific hosts for testing.

  3. Adding new hosts: We are unlikely to add a third or fourth host, so this is less of an issue, but we would need to reconfigure it to make this work. We would create a chain of hosts, eventually hitting the HTTP redirect limit.

  4. Renewal when a host is offline: This is problematic but unlikely to fail completely. The host would need to be offline for > 30 days, and during that time, the daily attempts to renew the certificate would need to always go to the offline host. In practice, there is a 50/50 chance of getting the right host and, therefore, not needing to redirect.

DNS-01 Challenges

A typical solution to this issue is to use DNS-01 challenges instead of HTTP-01. This solution requires the DNS records needed by Let’s Encrypt to be provisioned by the client using a shared secret token.

We would require approval from Xavier via @avsm to allow automatic updates to the ocaml.org domain. With that approval, we then have multiple approaches available to us on how to implement the solution.

Gandi API

We could use the Gandi API for DNS-01 challenges. This solution would not require scripts or cron jobs and could be handled entirely with Caddy. See /~https://github.com/caddy-dns/gandi

MirageOS DNS with nsupdate

We could use MirageOS to provide a DNS server which could be updated using HMAC keys to publish the necessary records using nsupdate. Once the certificates are provisioned and downloaded, NGINX can be signalled to read the new certificates.

MirageOS DNS with Caddy

We could complete the deployment with a MirageOS DNS server and use RFC2136 integrated into Caddy. /~https://github.com/caddy-dns/rfc2136. This was implemented last week on ci.mirage.io and deploy.mirage.io. The implementation was very straightforward.

Summary

As we have a solution, is a change needed?

Are Xavier/Anil happy to allow automated updates to ocaml.org?

With automated updates, should we use Gandi API or RFC2136?

Is further testing required, in which case, what is needed?

It is good public relations, PR, to use an OCaml/MirageOS DNS server for ocaml.org. However, the counterargument is that we are not a DNS provider, and therefore running a DNS service is a distraction from our core purpose.

@hannesm
Copy link
Member Author

hannesm commented Mar 28, 2023

Thanks for extensive explanation @mtelvers. So we could have taken a shortcut back on January 16th if you would have replied with "no, there's no interest in running our own DNS services".

@hannesm hannesm closed this as completed Mar 28, 2023
@avsm avsm reopened this Mar 29, 2023
@avsm
Copy link
Member

avsm commented Mar 29, 2023

I haven't even had a chance to digest all this excellent analysis yet, and the issue got closed?! I certainly didn't interpret Mark's analysis above as "no there's no interest", Hannes. However, it'll take some time to digest all the implications rather than running quickly into any such switch for a domain as large and as important to the ecosystem as ocaml.org.

@tmcgilchrist
Copy link
Collaborator

Excellent summary thanks @mtelvers.

Additionally if we run our own DNS services, we need a plan for providing support for the MirageOS based stack. We (Tarides people working on this) have limited time to cover the work already on our roadmap (#26, #25, docs-ci, and supporting ocaml.org website development).

@rikusilvola
Copy link
Contributor

Taking a step back, we can see that there's been fruitful discussion on how we could start dogfooding OCaml-DNS. It is clear that everyone's intent here is to ensure we do it right.

I must second @avsm on that this process must be deliberate. We're all very grateful for your patience, @hannesm, as this will take time, and your continued collaboration and contributions are very much appreciated by all.

@avsm
Copy link
Member

avsm commented May 15, 2023

A point in favour of switching authoritative DNS server to one we control (while debugging #42) is that Gandi offers secondary DNS hosting, so we would be protected against our primary DNS temporarily going down.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants