Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Technically invalid test images #6878

Closed
Yay295 opened this issue Jan 10, 2023 · 8 comments
Closed

Technically invalid test images #6878

Yay295 opened this issue Jan 10, 2023 · 8 comments
Labels

Comments

@Yay295
Copy link
Contributor

Yay295 commented Jan 10, 2023

"illu10_no_preview.eps", "illu10_preview.eps", "illuCS6_no_preview.eps", and "illuCS6_preview.eps" all contain one line that is longer than 255 characters, which is the maximum limit according to the specification.

Page 13:

EPS files must not have lines of ASCII text that exceed 255 characters, excluding line-termination characters.

Page 25:

The hexadecimal lines must never exceed 255 bytes in length. In cases where the preview is very wide, the lines must be broken. The line breaks can be made at any even number of hex digits, because the dimensions of the finished preview are established by the width, height, and depth values.

The four image files appear to be nearly the same, and have the same long line:

<xmpGImg:image>/9j/4AAQSkZJRgABAgEASABIAAD/7QAsUGhvdG9zaG9wIDMuMAA4QklNA+0AAAAAABAASAAAAAEA&#xA;AQBIAAAAAQAB/+4ADkFkb2JlAGTAAAAAAf/bAIQABgQEBAUEBgUFBgkGBQYJCwgGBggLDAoKCwoK&#xA;DBAMDAwMDAwQDA4PEA8ODBMTFBQTExwbGxscHx8fHx8fHx8fHwEHBwcNDA0YEBAYGhURFRofHx8f&#xA;Hx8fHx8fHx8fHx8fHx8fHx8fHx8fHx8fHx8fHx8fHx8fHx8fHx8fHx8fHx8f/8AAEQgBAAAsAwER&#xA;AAIRAQMRAf/EAaIAAAAHAQEBAQEAAAAAAAAAAAQFAwIGAQAHCAkKCwEAAgIDAQEBAQEAAAAAAAAA&#xA;AQACAwQFBgcICQoLEAACAQMDAgQCBgcDBAIGAnMBAgMRBAAFIRIxQVEGE2EicYEUMpGhBxWxQiPB&#xA;UtHhMxZi8CRygvElQzRTkqKyY3PCNUQnk6OzNhdUZHTD0uIIJoMJChgZhJRFRqS0VtNVKBry4/PE&#xA;1OT0ZXWFlaW1xdXl9WZ2hpamtsbW5vY3R1dnd4eXp7fH1+f3OEhYaHiImKi4yNjo+Ck5SVlpeYmZ&#xA;qbnJ2en5KjpKWmp6ipqqusra6voRAAICAQIDBQUEBQYECAMDbQEAAhEDBCESMUEFURNhIgZxgZEy&#xA;obHwFMHR4SNCFVJicvEzJDRDghaSUyWiY7LCB3PSNeJEgxdUkwgJChgZJjZFGidkdFU38qOzwygp&#xA;0+PzhJSktMTU5PRldYWVpbXF1eX1RlZmdoaWprbG1ub2R1dnd4eXp7fH1+f3OEhYaHiImKi4yNjo&#xA;+DlJWWl5iZmpucnZ6fkqOkpaanqKmqq6ytrq+v/aAAwDAQACEQMRAD8A9FeU/KUXl6K5VbmS6e6k&#xA;Ls0jSMAC7yUHqPK32pWr8W/U/FyZgAkm0/wodirsVQmracmo6fLZtI0Qk4ssiMyMGRg6kMjI4+JR&#xA;9lgfAg74qEk/wSv+CP8ACv12T0vS9D61WXnwr4erX/Y8uHbjx+HBSb3tk2FDsVdirsVdirsVdirs&#xA;VdirsVdirsVdirsVdirsVdirsVdirsVdirsVdirsVdirsVdirsVdirsVdirsVdirsVdirsVdirsV&#xA;dirsVdirsVdirsVdirsVdirsVdirsVdirsVdirsVdirsVdirsVdirsVdirsVdirsVdirsVdirsVd&#xA;irsVdirsVdirsVdirsVdirsVdirsVdirsVdirsVdirsVdirsVdirsVdirsVdirsVdirsVdirsVdi&#xA;rsVdirsVdirsVdirsVdirsVdirsVdirsVdirsVdirsVdirsVdirsVdirsVdirsVdirsVdirsVdir&#xA;sVdirsVdirsVdirsVdirsVdirsVdirsVdirsVdirsVdirsVdirsVdirsVdirsVdirsVfI/8Azkz5&#xA;q80aZ+Z8lrpusXtlbfUrdvQt7mWJOR5VPFGUVOBXlH+PvPf/AFMeqf8ASbcf814q7/H3nv8A6mPV&#xA;P+k24/5rxV3+PvPf/Ux6p/0m3H/NeKpn5X89edpfM2kRS+YdSkjkvbdXRrycqymVQQQX3BxV9+YV&#xA;fGX/ADlV/wCTXk/5gbb/AI2wK8fxV2KuxVNfKf8AylWjf8x1t/yeXFX6MYVfGX/OVX/k15P+YG2/&#xA;42wK8fxV2KuxVNfKf/KVaN/zHW3/ACeXFX6MYVfGX/OVX/k15P8AmBtv+NsCvH8VdirsVTXyn/yl&#xA;Wjf8x1t/yeXFX6MYVfLf5l/lr5l/MvzXdeZbeew0G0hjjszHq8tzbMzxPKhK87YVr6RPt06ggBJF&#xA;Mcsf+cWPOuoGQWGv+X7sxU9X0LueXjWtOXC3NK8TihF/9ChfmV/1ctG/5H3X/ZNirv8AoUL8yv8A&#xA;q5aN/wAj7r/smxVdB/zi/wDmHoF3aazPeaXcQ2NzbzPBbyXbyuBMmyKLapPsMUh9J/40m/wP/ij9&#xA;GS19L1/0fSX1OFetfSrTj8XLjxpvWm+NrW9P/9k=</xmpGImg:image>

It looks like &#xA;, an HTML encoded line feed, was added to this file instead of an actual line feed. Replacing &#xA; with \n allowed the files to load in my branch (I'm testing some changes to the EPS plugin).

The two "*_preview.eps" files also have a line that is longer than 255 characters at the end, but it looks like binary data so I think that's valid.

Currently the code only checks the line length for lines in the header, and lines after the header that start with a %. Since these test files came from the wild, we probably want to allow files like this, but I thought I'd at least document it.

@Yay295
Copy link
Contributor Author

Yay295 commented Jan 10, 2023

"timeout-d675703545fee17acab56e5fec644c19979175de.eps" is also invalid. It has a valid binary header, but it doesn't contain either of the required header comments: %!PS-Adobe and %%BoundingBox. This is fine, since the test it's used in isn't supposed to pass, but I think it should actually be raising an OSError instead of a Image.UnidentifiedImageError.

@radarhere
Copy link
Member

It is raising a SyntaxError because the first line is longer than 255 characters. That error is caught, leading to the UnidentifiedImageError.

https://www.loc.gov/preservation/digital/formats/fdd/fdd000246.shtml

EPS files use lines of 255 or fewer ASCII characters.

@Yay295
Copy link
Contributor Author

Yay295 commented Jan 10, 2023

It is raising a SyntaxError because the first line is longer than 255 characters.

I'm not sure it should be doing that, though. The file has a valid binary header with an offset of 32,820. That offset puts the start at column 594 of line 41 (using ANSI encoding), which appears to be binary data. So it could either raise an error because there isn't any header comments at that position, or it could assume there is no header and raise an error because the required comments are missing.

Perhaps it's different with a different encoding?

@radarhere
Copy link
Member

Are you saying that an UnidentifiedImageError shouldn't be raised because we actually do know what the image format is? You may want to have a look at #1687

@Yay295
Copy link
Contributor Author

Yay295 commented Jan 10, 2023

I'm saying the error should be due to the missing header comments, not because the line is too long, because the line appears to be binary data, not ASCII, so it doesn't have a length limit.

@radarhere
Copy link
Member

radarhere commented Jan 11, 2023

Looking at the actual specification, I see

https://web.archive.org/web/20170818010030/http://wwwimages.adobe.com/content/dam/Adobe/en/devnet/postscript/pdfs/5002.EPSF_Spec.pdf

EPS files must not have lines of ASCII text that exceed 255 characters,
excluding line-termination characters.

binary data is allowed

So perhaps that leads to your conclusion that binary data can be longer than 255 characters.

In trying to find out what should come after the offset though, I'm having a hard time finding more information about this.

elif i32(s, 0) == 0xC6D3D0C5:
# FIX for: Some EPS file not handled correctly / issue #302
# EPS can contain binary data
# or start directly with latin coding
# more info see:
# https://web.archive.org/web/20160528181353/http://partners.adobe.com/public/developer/en/ps/5002.EPSF_Spec.pdf
offset = i32(s, 4)
length = i32(s, 8)

Do you happen to know where any documentation about that is?

@Yay295
Copy link
Contributor Author

Yay295 commented Jan 11, 2023

No, I didn't find anything other than that PDF.

@radarhere
Copy link
Member

The comments regarding line length and required header comments in timeout-d675703545fee17acab56e5fec644c19979175de.eps and have been resolved by #6879. The file now raises "SyntaxError: EPS header missing "%!PS-Adobe" comment".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants