Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Poetry: use locked environment in Docker images #12385

Merged
merged 14 commits into from
Apr 7, 2022
4 changes: 2 additions & 2 deletions .dockerignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@
# things to include
!docker
!synapse
!MANIFEST.in
!README.rst
!setup.py
!pyproject.toml
!poetry.lock

**/__pycache__
63 changes: 40 additions & 23 deletions docker/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -16,18 +16,31 @@

ARG PYTHON_VERSION=3.9

FROM docker.io/python:${PYTHON_VERSION}-slim as base
DMRobertson marked this conversation as resolved.
Show resolved Hide resolved

###
### Stage 0: builder
###
FROM docker.io/python:${PYTHON_VERSION}-slim as builder

# install the OS build deps
#
# Irritatingly, there is no blessed guide on how to distribute an application with its
# poetry-managed environment in a docker image. For a while,
# `poetry export | pip install -r /dev/stdin` seemed plausible but is limited by bugs
# in `poetry export` whose fixes (scheduled for poetry 1.2) have yet to be released.
# This is inspired from:
# /~https://github.com/python-poetry/poetry/discussions/1879#discussioncomment-216865
# https://stackoverflow.com/questions/53835198/integrating-python-poetry-with-docker?answertab=scoredesc
FROM base as builder

# RUN --mount is specific to buildkit and is documented at
# /~https://github.com/moby/buildkit/blob/master/frontend/dockerfile/docs/syntax.md#build-mounts-run---mount.
# Here we use it to set up a cache for apt, to improve rebuild speeds on
# slow connections.
#
# Here we use it to set up a cache for pip (below, for apt and poetry), to improve
# rebuild speeds on slow connections.
# We install poetry as --user so that it doesn't end up in the system-wide python
# installation. That gets copied later into the runtime image.
RUN --mount=type=cache,target=/root/.cache/pip \
pip install --user poetry==1.1.12

# install the OS build deps
RUN \
--mount=type=cache,target=/var/cache/apt,sharing=locked \
--mount=type=cache,target=/var/lib/apt,sharing=locked \
Expand All @@ -45,33 +58,37 @@ RUN \
zlib1g-dev \
&& rm -rf /var/lib/apt/lists/*

# Copy just what we need to pip install
COPY MANIFEST.in README.rst setup.py /synapse/
COPY synapse/__init__.py /synapse/synapse/__init__.py
COPY synapse/python_dependencies.py /synapse/synapse/python_dependencies.py
WORKDIR /synapse

# Copy just what we need to run `poetry install`
COPY pyproject.toml poetry.lock README.rst /synapse/

# Install to the Python installation which hosts `pip`. In this case, it's the system
# Python.
ENV POETRY_VIRTUALENVS_IN_PROJECT=true \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@richvdh asked:

I think using ENV here will mean we end up with these env vars exported in the final image? Are you sure that's what we want?

My understanding was that these just apply to the builder stage because we haven't yet begun to define the final stage.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right. I've not investigated in practice how this works for initial stages of Dockerfiles, but what I do know is that if you set an ENV in the final stage, you'll also get that env var in the final container. See https://docs.docker.com/engine/reference/builder/#env:

The environment variables set using ENV will persist when a container is run from the resulting image.

and:

If an environment variable is only needed during build, and not in the final image, consider setting a value for a single command instead

My instinct here, irrespective of whether it does the right thing in practice, is to avoid the ambiguity by avoiding ENV.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I was allowing myself to be a little lazy since I was invoking poetry twice.

Now that we're able to export requirements.txt (92c9a60) this isn't needed.

For context: originally poetry export produced a requirements.txt file which omitted hashes for certain dependencies. That's a poetry bug. We were able to workaround this by adding a missing metadata dependency to Synapse (#12384) and tightening a signedjson dependency (matrix-org/python-signedjson#9). We only found the workaround last week, and I hadn't gone back to make use of requirements.txt here (if it ain't broke...).

POETRY_VIRTUALENVS_CREATE=true \
POETRY_HOME=/opt/poetry

# To speed up rebuilds, install all of the dependencies before we copy over
# the whole synapse project so that we this layer in the Docker cache can be
# the whole synapse project, so that this layer in the Docker cache can be
# used while you develop on the source
#
# This is aiming at installing the `install_requires` and `extras_require` from `setup.py`
RUN --mount=type=cache,target=/root/.cache/pip \
pip install --prefix="/install" --no-warn-script-location \
/synapse[all]
RUN --mount=type=cache,target=/opt/poetry/artifacts \
--mount=type=cache,target=/opt/poetry/.cache/pypoetry/cache \
/root/.local/bin/poetry install --no-dev --no-root --no-interaction --no-ansi --extras all

# Copy over the rest of the project
# Copy over the synapse source code.
DMRobertson marked this conversation as resolved.
Show resolved Hide resolved
COPY synapse /synapse/synapse/

# Install the synapse package itself and all of its children packages.
#
# This is aiming at installing only the `packages=find_packages(...)` from `setup.py
RUN pip install --prefix="/install" --no-deps --no-warn-script-location /synapse
# Install the synapse package itself, by omitting the --no-root argument
RUN --mount=type=cache,target=/opt/poetry/artifacts \
--mount=type=cache,target=/opt/poetry/cache \
/root/.local/bin/poetry install --no-dev --no-interaction --no-ansi --extras all

###
### Stage 1: runtime
###

FROM docker.io/python:${PYTHON_VERSION}-slim
FROM base

LABEL org.opencontainers.image.url='https://matrix.org/docs/projects/server/synapse'
LABEL org.opencontainers.image.documentation='/~https://github.com/matrix-org/synapse/blob/master/docker/README.md'
Expand All @@ -93,7 +110,7 @@ RUN \
openssl \
&& rm -rf /var/lib/apt/lists/*

COPY --from=builder /install /usr/local
COPY --from=builder /synapse/ /synapse
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@richvdh wondered:

/me wonders how many people rely on things being installed in /usr/local.

Certainly this is going to break anyone who builds docker images that install extension modules.

It might be fine to copy the venv to /usr/local, but I wasn't certain; this change is me erring towards caution. See this comment thread.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't considered that people were building on top of our docker images. A quick search fins a few.

I couldn't find an equivalent to pip install --prefix in poetry last time I looked. But this comment python-poetry/poetry#234 (comment) might be useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we could dump everything back in /usr/local/ then we wouldn't need matrix-org/complement#337.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't considered that people were building on top of our docker images. A quick search fins a few.

and I'll bet there are a whole bunch more that aren't on (public) github.

Copying a virtualenv wholesale into /usr/local doesn't sound like a thing that will work to me. If nothing else there will be a bunch of hardcoded paths which would need updating.

Ultimately, I think our best approach here is just going to be to warn people loudly that this is a change we are making. Which sounds to me like we should also get our story straight about how people are meant to install extensions (#9944) to avoid breaking things twice for people.

To decouple this... could we do the same thing as the debian build, and hack around with poetry export and pip install -r ?

COPY ./docker/start.py /start.py
COPY ./docker/conf /conf

Expand Down
12 changes: 6 additions & 6 deletions docker/start.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#!/usr/local/bin/python
#!/synapse/.venv/bin/python

import codecs
import glob
Expand Down Expand Up @@ -108,7 +108,7 @@ def generate_config_from_template(config_dir, config_path, environ, ownership):

# Hopefully we already have a signing key, but generate one if not.
args = [
"python",
sys.executable,
"-m",
"synapse.app.homeserver",
"--config-path",
Expand Down Expand Up @@ -158,7 +158,7 @@ def run_generate_config(environ, ownership):

# generate the main config file, and a signing key.
args = [
"python",
sys.executable,
"-m",
"synapse.app.homeserver",
"--server-name",
Expand All @@ -175,7 +175,7 @@ def run_generate_config(environ, ownership):
"--open-private-ports",
]
# log("running %s" % (args, ))
os.execv("/usr/local/bin/python", args)
os.execv(sys.executable, args)


def main(args, environ):
Expand Down Expand Up @@ -254,12 +254,12 @@ def main(args, environ):

log("Starting synapse with args " + " ".join(args))

args = ["python"] + args
args = [sys.executable] + args
if ownership is not None:
args = ["gosu", ownership] + args
os.execve("/usr/sbin/gosu", args, environ)
else:
os.execve("/usr/local/bin/python", args, environ)
os.execve(sys.executable, args, environ)


if __name__ == "__main__":
Expand Down