Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More PY3 support for FormatPY, the xfel legacy image pickle format #205

Merged
merged 3 commits into from
Aug 24, 2020

Conversation

phyy-nx
Copy link
Contributor

@phyy-nx phyy-nx commented Aug 3, 2020

Current dxtbx testing for FormatPY uses pickles that were created using py2. While reprocessing old LCLS data I regenerated some image pickles on a py3 build and found I couldn't read them because of encoding problems. This PR fixes that problem using a new function in iotbx/npy.py, image_dict_to_unicode.

Additionally, this makes the understand method of FormatPY more robust. Adds pickle protocol 4 magic bytes check but also tries to de-pickle the file, as the pickle magic bytes aren't guaranteed to fully specify whether a file is a pickle.

Note, the only place image pickles are generated in xfel land is averaging XTC streams with Rayonix data. Everyting else, including xtc_process using Rayonix data, uses cbfs.

Current dxtbx testing for FormatPY uses pickles that were created using py2.  While reprocessing old LCLS data I regenerated some image pickles on a py3 build and found I couldn't read them because of encoding problems.  This PR fixes that problem using a new function in iotbx/npy.py, image_dict_to_unicode.

Additionally, this makes the understand method of FormatPY more robust.  Adds pickle protocol 4 magic bytes check but also tries to de-pickle the file, as the pickle magic bytes aren't guaranteed to fully specify whether a file is a pickle.

Note, the only place image pickles are generated in xfel land is averaging XTC streams with Rayonix data.  Everyting else, including xtc_process using Rayonix data, uses cbfs.
@phyy-nx
Copy link
Contributor Author

phyy-nx commented Aug 3, 2020

This PR is not urgent :) Anyone want to self-assign as a reviewer? Thanks!

format/FormatPY.py Outdated Show resolved Hide resolved
Looks like the 4th byte in protocol 4 isn't guaranteed to be \xea. On other systems it varies. Looks like the 3rd byte is the same on the systems I tested though.
@phyy-nx
Copy link
Contributor Author

phyy-nx commented Aug 3, 2020

Thanks @Anthchirp. I wrote a snippet to look at pickling results on the 6 protocols and then tried it on a couple systems and got slightly different answers. The new version should cover all the bases.

import pickle

with open('dials_regression/image_examples/LCLS_CXI/shot-s00-2011-12-02T21_07Z29.723_00569.pickle', 'rb') as f:
  d = pickle.load(f, encoding='bytes')

def showit(protocol):
  with open('demo_pickle_protocol.pickle', 'wb') as f:
    pickle.dump(d, f, protocol=protocol)

  with open('demo_pickle_protocol.pickle', 'rb') as f:
    print ("protocol", protocol, f.read()[0:4])

for i in range(6):
  showit(i)

Output

02T21_07Z29.723_00569.pickle'>
protocol 0 b'(dp0'
protocol 1 b'}q\x00('
protocol 2 b'\x80\x02}q'
protocol 3 b'\x80\x03}q'
protocol 4 b'\x80\x04\x95\xce'
protocol 5 b'\x80\x05\x95\xce'

@ndevenish
Copy link
Collaborator

This fixes python3 reading all pickle images, or pickle images written from python3? Just trying to work out an accurate phrasing for a newsfragments/205.bugfix.

Otherwise don't see why this can't be squashed in.

@phyy-nx
Copy link
Contributor Author

phyy-nx commented Aug 20, 2020

It fixes pickle images written in py3.

@ndevenish ndevenish merged commit bc7c35f into master Aug 24, 2020
@ndevenish ndevenish deleted the py3_imagepickle branch August 24, 2020 17:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants