Skip to content
This repository has been archived by the owner on Sep 12, 2018. It is now read-only.

S3 Region #400

Closed
shreyaskarnik opened this issue May 30, 2014 · 83 comments
Closed

S3 Region #400

shreyaskarnik opened this issue May 30, 2014 · 83 comments
Labels
Milestone

Comments

@shreyaskarnik
Copy link
Contributor

Following is the problem I noticed with 0.7.0
When I specify s3_region as us-west-2 in my config, registry is up but I cannot pull or push.

If I comment out s3_region then everything is fine. I have confirmed on S3 that the region of my bucket is us-west-2
image

Has anyone else experienced this? Logs show that registry is connecting to BUCKET_NAME.s3-us-west-2.amazonaws.com but I cannot pull or push.

I would appreciate any help/pointers on this issue.

@dmp42
Copy link
Contributor

dmp42 commented May 30, 2014

Hi @shreyu86

Any error message in the logs when you try to push / pull?
Any error message at startup?

Thanks,

  • Olivier

@dmp42 dmp42 modified the milestone: Next May 30, 2014
@shreyaskarnik
Copy link
Contributor Author

I get 502 trying to pull or push also if I ping registry I get a 404. Logs
don't say anything special from which I can deduce something is broken even
tough logging is at debug level.
On May 30, 2014 12:08 PM, "Mangled Deutz" notifications@github.com wrote:

Hi @shreyu86 /~https://github.com/shreyu86

Any error message in the logs when you try to push / pull?
Any error message at startup?

Thanks,

  • Olivier


Reply to this email directly or view it on GitHub
#400 (comment)
.

@dmp42
Copy link
Contributor

dmp42 commented May 30, 2014

also if I ping registry I get a 404

What do you mean by that?

Logs don't say anything special from which I can deduce something is broken even tough logging is at debug level

Still, can you gist your registry logs / output?

Thanks a lot.

@shreyaskarnik
Copy link
Contributor Author

I'll send the gists soon (out for lunch). Thanks for the fast reply.
On May 30, 2014 12:18 PM, "Mangled Deutz" notifications@github.com wrote:

also if I ping registry I get a 404

What do you mean by that?

Logs don't say anything special from which I can deduce something is
broken even tough logging is at debug level

Still, can you gist your registry logs / output?

Thanks a lot.


Reply to this email directly or view it on GitHub
#400 (comment)
.

@shreyaskarnik
Copy link
Contributor Author

This is the gist of the logs: https://gist.github.com/shreyu86/cc4ae0f6b7f4e329438a have redacted some contents.

also if I ping registry I get a err not 404 but I cant see the page which shows registry version and setting flavor.

when I do a ping using curl here is the response I get:

curl -X GET HOST:5000/v1/_ping
curl: (56) Recv failure: Connection reset by peer

@dmp42
Copy link
Contributor

dmp42 commented May 30, 2014

So, /ping is not 404, right? Instead, the server appears not to be started / bound.

Let me try tomorrow with a s3 region and see if that works for me.

Keep me posted if anything new meanwhile.

Best.

@shreyaskarnik
Copy link
Contributor Author

Thanks! No Ping is not 404, that was a confusion, as of now I have commented out the S3 region. Registry seems to be running fine without s3_region.

Also in similar situations I used following method to connect to S3:

import boto.s3.connection
def connect_to_s3(settings):

    return boto.s3.connection.S3Connection(
        aws_access_key_id=settings.aws.auth.access_key_id,
        aws_secret_access_key=settings.aws.auth.secret_access_key,
        host=settings.s3.host
    )

In this method assume settings to be a dot dictionary object.
host here being s3-us-west-2.amazonaws.com for west-2 and I use this connection object returned above to do all the S3 operations, I have not faced any issues with the above method.

Also I am out of sync with the latest changes (couple of them major) so opened an issue instead of PR.

Will surely dig into this more.

@dmp42 dmp42 added this to the 0.8 milestone May 30, 2014
@dmp42 dmp42 added the bug label May 30, 2014
@shreyaskarnik
Copy link
Contributor Author

Just curious @dmp42 were you able to repro the issue?

@dmp42
Copy link
Contributor

dmp42 commented Jun 2, 2014

I haven't had time yet to get to it, sorry for that - will sure do later today and keep you posted though!

@shreyaskarnik
Copy link
Contributor Author

Thanks!
On Jun 2, 2014 7:39 AM, "Mangled Deutz" notifications@github.com wrote:

I haven't had time yet to get to it, sorry for that - will sure do later
today and keep you posted though!


Reply to this email directly or view it on GitHub
#400 (comment)
.

skarnik-rmn added a commit to skarnik-rmn/docker-registry that referenced this issue Jun 2, 2014
@dmp42 dmp42 mentioned this issue Jun 2, 2014
@shreyaskarnik
Copy link
Contributor Author

I've opened #405 as a solution to this issue. I have patched our private
registry and can confirm that everything (Push, Pull, Search) operations
are working normally after this change.

-Shreyas

Shreyas

On Mon, Jun 2, 2014 at 7:40 AM, Shreyas Karnik shreyu86@gmail.com wrote:

Thanks!
On Jun 2, 2014 7:39 AM, "Mangled Deutz" notifications@github.com wrote:

I haven't had time yet to get to it, sorry for that - will sure do later
today and keep you posted though!


Reply to this email directly or view it on GitHub
#400 (comment)
.

@dmp42
Copy link
Contributor

dmp42 commented Jun 2, 2014

@shreyu86

I just created a new bucket, in the us-west-2 region - I'm using us-west-2 as a region in the configuration and it works for me...

So, we need to dig deeper.

Can you provide with additional infos?

  • the logs you copied (the "with region" scenario) seems to be cut too early - what happens exactly? it hangs there and does nothing more? it crashes? or is there more after that?
  • what version of the registry are you running? (I assume 0.7.0) - did you try and/or have the problem with 0.6.9?
  • are you running it from the docker container? or from a pip install?

Thanks a lot!

@shreyaskarnik
Copy link
Contributor Author

@dmp42 here is the information you requested.

  • the logs you copied (the "with region" scenario) seems to be cut too early - what happens exactly? it hangs there and does nothing more? it crashes? or is there more after that?
    • It just hangs at those logs, nothing further happens.
  • what version of the registry are you running? (I assume 0.7.0) - did you try and/or have the problem with 0.6.9?
    • I am running 0.7.0 I did not try 0.6.9.
  • are you running it from the docker container? or from a pip install?
    • I am running it inside docker container pulled from index.docker.io

Thanks for looking into this.

@dmp42
Copy link
Contributor

dmp42 commented Jun 2, 2014

Can you get this gist: https://gist.github.com/dmp42/8436a9be2c569bd75965

And try run it from where you are? (obviously replace bucketname = 'XXXX'
awsid = 'XXX'
awssec = 'XXX' with appropriate values).

Also, can you put debug = 2 inside your boto.cfg file (in your home)?

Thanks a lot!

@shreyaskarnik
Copy link
Contributor Author

It worked. Here is the output:

Gonna connect without region
0.257856845856
Gonna connect with region
0.0846657752991

@dmp42
Copy link
Contributor

dmp42 commented Jun 2, 2014

This is getting really weird.

@shreyu86:

  • do you use the configuration variable storage_redirect?
  • can you copy the exact command lines you are using to launch the registry, both with region and without region
  • same thing for your configuration files, with and without region

Sorry for the long walk, but I see no other way to get to the bottom of this...

@shreyaskarnik
Copy link
Contributor Author

  • do you use the configuration variable storage_redirect?
    • no it is set to false
  • can you copy the exact command lines you are using to launch the registry, both with region and without region
  • command lines are:
# Commands to invoke registry
 docker run -d -p 5000:5000 -v /etc/docker-registry/config.yml:/opt/docker-registry/config.yml -e SETTINGS_FLAVOR=prod -e DOCKER_REGISTRY_CONFIG=/opt/docker-registry/config/config.yml -e GUNICORN_WORKERS=10 registry:latest
  • same thing for your configuration files, with and without region:
    • for this I edit the configuration file variable s3_region to set a region I set it to us-west-2 and to connect without region I comment it out.

Here are my redacted settings just in case you need to take a look at those:

https://gist.github.com/shreyu86/52652677440596e669b4

@dmp42
Copy link
Contributor

dmp42 commented Jun 2, 2014

# Commands to invoke registry
 docker run -d -p 5000:5000 -v /etc/docker-registry/config.yml:/opt/docker-registry/config.yml -e SETTINGS_FLAVOR=prod -e DOCKER_REGISTRY_CONFIG=/opt/docker-registry/config/config.yml

Unless I'm mistaken, this can't work: you are telling the registry to use /opt/docker-registry/config/config.yml while you mount /opt/docker-registry/config.yml

Is this a typo or did you move some stuff and were launching with this?

Thanks.

@shreyaskarnik
Copy link
Contributor Author

Err typo in cleaning and redacting.

Correct set of commands are

docker run -d -p 5000:5000 -v
/etc/docker-registry/config.yml:/opt/docker-registry/config/config.yml -e
SETTINGS_FLAVOR=prod -e
DOCKER_REGISTRY_CONFIG=/opt/docker-registry/config/config.yml -e
GUNICORN_WORKERS=10 registry:latest

@dmp42
Copy link
Contributor

dmp42 commented Jun 2, 2014

Ok... this doesn't make any sense...

If you successfully connected using the test script I provided, there is absolutely no reason why the registry wouldn't connect. This is almost exactly the same code...

One last shot: did you run the test script on the same machine that you use to run the registry?

Otherwise, I see no resolution to this :-(

As I can't reproduce on my own setup - there is only two possibilities:

  • we find how to reproduce
  • you manage to debug deep inside boto to see what happens
  • you trust me enough to grant me a temporary access to your bucket

Sorry for not being able to be more helpful...

@shreyaskarnik
Copy link
Contributor Author

One last shot: did you run the test script on the same machine that you use to run the registry?
Yes both on my dev machine and on the registry machine inside of ec2 here is the output for the registry machine:

Gonna connect without region
0.0115849971771
Gonna connect with region
0.0101661682129

I guess inside of EC2 it is a minimal difference with and without regions but not sure about its impact.

I will try my hand at debugging boto.

Regarding the bucket, will have to jump through a lot of permission issues so right now will try to debug this issue at my end and post findings here.

Thanks for the help, also I will try it with a fresh S3 bucket as whenever I update the registry I use the same bucket, I've been using private registry since a long time so will do this test with a fresh S3 bucket and see if I face the same issue or not.

I will surely post any findings here. Thanks @dmp42 I really appreciate your help.

@shreyaskarnik
Copy link
Contributor Author

Actually I think I am OK with not using explicit region declaration in the registry config because while debugging this issue I found that boto redirects the bucket to the underlying region, some logs (not in that order from registry logs)

2014-06-02 17:00:27,119 DEBUG: Host: BUCKET.s3.amazonaws.com
2014-06-02 17:00:27,210 DEBUG: Redirecting: http://BUCKET-us-west-2.amazonaws.com/

I think right now this is one short term optimization I can make, but will visit this during off hours to get to the bottom.

@dmp42
Copy link
Contributor

dmp42 commented Jun 2, 2014

@shreyu86 Ok - please keep me update of any progress on this.

@ddeaguiar
Copy link

I'm also running into this issue. I'm using registry:latest on docker.io. Here's my startup command:

docker run \
       --rm \
       -it \
       -e SETTINGS_FLAVOR=s3 \
       -e AWS_REGION=us-east-1 \
       -e AWS_BUCKET=my-bucket \
       -e AWS_ENCRYPT=true \
       -e AWS_SECURE=true \
       -e STORAGE_PATH=/registry \
       -e AWS_SECRET=REDACTED \
       -e AWS_KEY=SECRET \
       -e SEARCH_BACKEND=sqlalchemy \
       -P \
       registry

It seems that setting the region doesn't solve the broken_pipe error I'm getting on push.

@mattheworiordan
Copy link

I tried it outside the container and had the same issue

@dmp42
Copy link
Contributor

dmp42 commented Aug 27, 2014

@mattheworiordan read my comment please

@mattheworiordan
Copy link

@dmp42 apologies, misread @stongo's comment. I too am on Ubuntu 14.04

@dmp42
Copy link
Contributor

dmp42 commented Aug 27, 2014

Same python libraries, different system deps. Pointing at libevent-dev ?

Can anyone do tests with different ubuntu versions?

@stongo
Copy link

stongo commented Aug 27, 2014

using Ubuntu 12.04

@stongo
Copy link

stongo commented Aug 27, 2014

Actually just compared my error more closely and it seems to be a slightly different error I'm experiencing.

@shreyaskarnik
Copy link
Contributor Author

@dmp42 any official update/workaround on this?

@dmp42
Copy link
Contributor

dmp42 commented Sep 5, 2014

@shreyu86 for now, I would support specifying the region inside the boto.cfg file.

@shreyaskarnik
Copy link
Contributor Author

Thanks!

Shreyas

On Fri, Sep 5, 2014 at 4:32 PM, Olivier Gambier notifications@github.com
wrote:

@shreyu86 /~https://github.com/shreyu86 for now, I would support
specifying the region inside the boto.cfg file.


Reply to this email directly or view it on GitHub
#400 (comment)
.

@jvimr
Copy link

jvimr commented Sep 17, 2014

I've created a set of small dockerfiles that do the workaround (boto.conf && remove aws region from sampe_config.yml ) - /~https://github.com/jvimr/docker.registry

This was referenced Oct 7, 2014
@adamlc
Copy link

adamlc commented Oct 17, 2014

@jvimr you are a legend! works perfectly :)

@chuegle
Copy link

chuegle commented Nov 1, 2014

This looks like a gevent bug dealing with unicode.

It's triggered by specifying the region in boto, which pulls the URL from the boto region file which has unicode strings. If you don't specify the region, it uses the default which is not unicode, so therefore it works.

It can be reproduced (in both docker/ubuntu and Mac OS X) by doing the following:

in a.py:
 import b
in b.py:
 import gevent.socket
 print gevent.socket.getaddrinfo(u"s3.amazonaws.com", 443)
python a.py # hangs

If you change the host to a non-unicode, it works.

Oddly, if you put the gevent.socket call in a.py instead of b.py, it works.

Also, if you have it in both a.py and b.py, it works.

If you put a unicode string in b.py and a normal string in a.py, it will hang.

This would be an ugly workaround:

        if self._config.s3_region is not None:
            return boto.s3.connection.S3Connection(
                host=str(boto.regioninfo.load_regions()['s3'][self._config.s3_region]),
                aws_access_key_id=self._config.s3_access_key,
                aws_secret_access_key=self._config.s3_secret_key,
                **kwargs)

or some way if specifying a host=str(host) in the kwargs w/o region.

Side note discovered while investigating:
The gevent.monkey.patch_all() ends up being called after most everything in requirements files are imported. It seems that python entry_point code is nice enough to import the requirements for you before importing the code to patch_all(). It should be less important in gevent 1.0+ that can handle threading, but something to be aware of.

ADDENDUM
A bit more research found a couple other fixes:

  • u'fix gevent'.encode('idna') - Prevent the deadlock by initializing outside of the getaddrinfo call.
  • install simplejson - Boto will use this if it exists, and json if not. simplejson returns its results a str() while json returns it as unicode()

I went into a bit more detail on how it works in boto in:
boto/boto#2179 (comment)

@dmp42
Copy link
Contributor

dmp42 commented Nov 3, 2014

@chuegle !!!!

Awesome research.

Is there an upstream (gevent) ticket for this? Otherwise, do you have an ETA for this to hit a boto release?

@chuegle
Copy link

chuegle commented Nov 3, 2014

The gevent ticket (although there may be more than one):
gevent/gevent#349

As far as a boto eta, I haven't submitted a patch and haven't heard back on my comments there, so I can't really say. I'll see if I have time this week to implement one of the fixes for boto and do a pull request, but would still have no idea how long it is until the next rev is pushed or if this change would make it in.

@dmp42
Copy link
Contributor

dmp42 commented Nov 3, 2014

Fixed thanks to @chuegle incredible work!

@dmp42 dmp42 closed this as completed Nov 3, 2014
@hartym
Copy link

hartym commented Nov 18, 2014

Any idea when / version in which this will be made available ?

@dmp42
Copy link
Contributor

dmp42 commented Nov 18, 2014

@hartym was released with 0.9 IIRC.
Are you still experiencing this issue?

@hartym
Copy link

hartym commented Nov 20, 2014

@dmp42 I could not start a registry container using 0.9/latest tag, backed on S3. The container is working fine with local filesystem backend, but S3 connection never worked. I could connect to the same bucket with the same aws keys using raw boto though. At best, the container was hanging after AWS ...somecrypticstring... line.

For now, I did put aside the setup of my own registry, I will dive more into it early next year. Not sure my problem is related to this ticket, but all pointers I found around were heading me here.

@dmp42
Copy link
Contributor

dmp42 commented Nov 20, 2014

@hartym if you delay your deployment to next year, fair enough - if you still have time to test, can you copy your configuration, launch command, and log?

Thanks a lot.

@hartym
Copy link

hartym commented Nov 21, 2014

@dmp42 seems like the "latest" image was not the same as 0.9.0, and after running brand new tests, I achieved to run the registry (with S3 storage on europe region and sqlalchemy search backend). Thanks for your help and sorry for bothering.

@dmp42
Copy link
Contributor

dmp42 commented Nov 21, 2014

@hartym happy you got it working!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests