From 308bc64d91faadf01442001d0cfbea4787e089e3 Mon Sep 17 00:00:00 2001 From: oleibman Date: Sun, 31 Jan 2021 08:13:14 -0800 Subject: [PATCH] Inherited Scrutinizer Recommendations - 3 of 3 (#1808) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * Make DefinedNames Samples Consistent With Other Samples (#1707) All other Samples write to temporary directory. DefinedNames samples write to main directory, which (a) means they aren't stored with others, and (b) they aren't ignored by git so look like changed files. The tests are also simplified by requiring Header rather than Bootstrap, making use of Helper. * Resolve XSS Vulnerability in the HTML Writer (#1719) Resolve XSS Vulnerability in the HTML Writer * Drop Travis * Automatic GitHub releases from git tags * Improve Coverage in src/PhpSpreadsheet There are no changes to code. Additional tests are added, so that the following 6 items now have 100% test coverage: - Comment - DefinedName - DocumentGenerator - IOFactory - NamedFormula - NamedRange * Changes for Scrutinizer Two changes to fix minor problems reported by Scrutinizer. * Spelling: Tou -> You * Fix for 1735 (Incorrect activeSheetIndex after RemoveSheetByIndex) (#1743) This is a fix for issue #1735. It adds tests for this situation, and similar situations involving adding new sheets and accessing existing ones. Coverage for Spreadsheet.php increases from 69% to 75% as a result. * Update change log * Fix for 3 Issues Involving ReadXlsx and NamedRange (#1742) * Fix for 3 Issues Involving ReadXlsx and NamedRange Issues #1686 and #1723, which provide sample spreadsheets, are probably solved by this ticket. Issue #1730 is also probably solved, but I have no way to verify. There are two problems with how PhpSpreadsheet is handling things now. Although the first problem is much less severe, and isn't really a factor in the issues named above, it is helpful to get it out of the way first. If you define a named range in Excel, and then delete the sheet where the range exists, Excel saves the range as #REF!. If there is a cell which references the range, it will similarly have the value #REF! when you open the Excel file. Currently, PhpSpreadsheet discards the #REF! definition, so a cell which references the range will appear as #NAME? rather than #REF!. This PR changes the behavior so that PhpSpreadsheet retains the #REF! definition, and cells which reference it will appear as #REF!. The second problem is the more severe, and is, I believe, responsible for the 3 issues identified above. If you define a named range and the sheet on which the range is defined does not exist at the time, Excel will save the range as something like: '[1]Unknown Sheet'!$A$1 If a cell references such a range, Excel will again display #REF!. PhpSpreadsheet currently throws an Exception when it encounters such a definition while reading the file. This PR changes the behavior so that PhpSpreadsheet saves the definition as #REF!, and cells which reference it will behave similarly. For the record, I will note that Excel does not magically recalculate when a missing sheet is subsequently added, despite the fact that the reference might now become resolvable. PhpSpreadsheet behaves likewise. * Remove Dead Code in Test Identified it after push but before merge. * Update change log * Apply Column and Row Styles to Existing Cells (#1721) * Apply Column and Row Styles to Existing Cells This is a fix for issue #1712. When a style is applied to an entire row or column, it is currently only effective for cells which don't already contain a value. The code needs to iterate through existing cells in the row/column in order to apply the style to them. This could be considered a breaking change, however, I believe that the change makes things operate as users would expect, and that the existing implementation is incomplete. The change also removes protected element conditionalStyles from the Style class. That element is an unused remnant, and can no longer be set or retrieved - methods getConditionalStyles and setConditionalStyles actually act on an element in the Worksheet class. Finally, additional tests are added so that Style, and in fact the entire Style directory, now has 100% test coverage. * Scrutinizer Changes Scrutinizer flagged 6 statements. 5 can be easily corrected. One is absolutely wrong (it thinks iterating through cells in column can return null). Let's see if we can satisfy it. * Remove Exception For CellIterator on Empty Row/Column For my first attempt at this change, which corrects a bug by updating styles for non-empty cells when a style is set on a row or column, I wished to make things more efficient by using setIterateOnlyExistingCells, something which the existing documentation recommends. This caused an exception to be generated when the row or column is empty. So I removed that part of the change while I researched what was going on. I have completed that research. The existing code does throw an exception when the row/column is empty and iterateOnlyExistingCells is true. However, that does not seem like a reasonable action. This situation is analagous to iterating over an empty array, and that action is legal and does not throw. The same should apply here. There were no tests for this situation, and now there are. I have added additional tests, and coverage for all of RowCellIterator, ColumnCellIterator, and CellIterator are all now 100%. Some of my new tests were added in new members, because the existing tests all relied on mocking, which was not the best choice for the new tests. One of the existing tests for RowCellIteratorTest (testSeekOutOfRange) was wrong; it issued the expected exception, but for the wrong reason. I have added an additional test to ensure that it fails "correctly". The existing documentation says that the default value for IterateOnlyExistingCells is true. In fact, the default value is false. I have corrected the documentation. * More Scrutinizer I believe its analysis is incorrect, but this should silence it. * DocBlock Correction ColumnCellIterator DocBlock for current indicated it could return null or Cell, but it can really return only Cell. This had caused Scrutinizer to complain earlier. * PHP8 Environment Appears to be Fixed Cosmetic change to Doc member. I suspect there is a way to rerun all the tests without another push, but I have been unable to figure out how. * Update change log * TextData Coverage and Minor Bug Fixes (#1744) This had been intended to get 100% coverage for TextData functions, and it does that. However, some minor bugs requiring source changes arose during testing. - the Excel CHAR function restricts its argument to 1-255. PhpSpreadsheet CHARACTER had been allowing 0+. Also, there is no need to test if iconv exists, since it is part of Composer requirements. - The DOLLAR function had been returning NUM for invalid arguments. Excel returns VALUE. Also, negative amounts were not being handled correctly. - The FIXEDFORMAT function had been returning NUM for invalid arguments. Excel FIXED returns VALUE. * Replace anti-xss with html purifier (#1751) * Replace voku/anti-xss with ezyang/htmlpurifier. Despite anti-xss being a smaller footprint dependency, an a better license fit with our MIT license, there are issues with it's automatic it sanitisation of global variables causing side effects * Additional unit tests for xss in html writer cell comments * Fix bug #1626 where values of 0 were "rounded" up/down as if they were not 0 (#1627) * Fix bug where values of 0 were "rounded" up/down as if they were not 0 * Update change log * Fix for #1612 - SLK Long File Name (#1706) Issue has been marked stale, but ... Sylk read sets worksheet title to filename (minus .slk). If that is >31 characters, PhpSpreadsheet throws Exception. This change truncates sheet title, as Excel does, to 31 characters. * Update change log * worksheet: fix if cellValue does not exist (#1727) The condition is FALSE if the cell does not exist in the flipped table, but anyway, it is sent in to a method requiring 'string' type, causing it to fail. * fixes #1655 issue (#1656) Resolve problem with incorrectly defined hyperlinks * Add 'ps' suffix to printer settings resources IDs (#1690) * Add 'ps' suffix to printer settings resources IDs * Update change log * Fix pixelsToPoints conversion (for HTML col width) (#1733) * DocBlock Change in Styles/Conditional (#1697) Scrutinizer reported a minor error in a test involving a module which I was not changing. Styles/Conditional function setConditions can take a scalar or an array as a parameter, but DocBlock says it only expects array. I did not wish to add the extra module to my PR, but made a note to self to fix that after PR was installed. That has now happened, and it makes for a good case for me to see all the PHP8/Composer2/etc. changes that have happened recently. * Merge pull request #1698 * Merge pull request #4 from PHPOffice/master * Restore Omitted Read XML Test * Fix for bug #1592 (UPDATED) (#1623) * Fix for Xls when BIFF8 SST (FCh) has bad Shared string length * Update change log * Add nightly PHP 8.1 dev to github actions (#1763) * Fix compatibility with ext-gd on php 8 (#1762) * CSV - Guess Encoding, Handle Null-string Escape (#1717) * CSV - Guess Encoding, Handle Null-string Escape This is in response to issue #1647 (detect CSV character encoding). First, my tests with mb_detect_encoding indicate that it doesn't work well enough; regardless, users can always do that on their own if they deem it useful. Rolling my own is also troublesome, but I can at least: a. Check for BOM (UTF-8, UTF-16BE, UTF-16LE, UTF-32BE, UTF-32LE). b. Do some heuristic tests for each of the above encodings. c. Fallback to a user-specified encoding (default CP1252) if a and b don't yield result. I think this is probably useful enough to include, and relatively easy to expand if other potential encodings should be considered. Starting with PHP7.4, fgetcsv allows specification of null string as escape character in fgetcsv. This is a much better choice than the PHP (and PhpSpreadsheet) default of backslash in that it handles the file in the same manner as Excel does. There is one statement in Reader/CSV which would be adversely affected if the caller so specified (building a regular expression under the assumption that escape character is a single character). Fix that statement appropriately and add tests. * Update changelog * Update Units of Measure supported by the CONVERT() function (#1768) Now supports all current UoM in all categories, with both 1- and 2-character multiplier prefixes, and binary multiplier prefixes, including the new Temperature scales * Changelog for 1.16.0 release * Fix date tests withut specified year for current year 2021 (#1774) * Mrand of zero to any multiple should return 0 (#1773) * Inherited Scrutinizer Recommendations - 3 of 3 I tried to sync my fork with the main project, as I have done several times before. However, the GitHub interface to do this had changed, and it appears that I did not make the optimal selection when I had a choice. Consequently, all the merges that happened to base between the last time I synchronized and this time appear to be part of any PR that I push. "Files changed" remains correct for my new PRs, but there appear to be many more commits involved than is actually the case. I will, at some point, delete and re-create my fork, and pay much closer attention in future when I want to sync my fork with the main project. Because of this set-up, Scrutinizer reports flaws in code that I haven't actually changed in my PRs #1799 and #1800. It still passes them, but, as long as I'm aware of the problems, I may as well attempt to correct them. The following are not part of those PRs: - 5 problems spread over 4 different members - 12 problems in Calculation/Engineering - 15 problems in Reader/XML I shall attempt to resolve these via 3 separate PRs, of which this is the third. * The Usual Fixed some Scrutinizer problems, some new ones popped up. * More Scrutinizer I think it's wrong in a lot of these cases. Although I am working around all of them, I intend to file a bug report with them. Co-authored-by: Mark Baker Co-authored-by: Adrien Crivelli Co-authored-by: Ryan McAllen Co-authored-by: Flinsch <220455+Flinsch@users.noreply.github.com> Co-authored-by: Jan Sverre Riksfjord Co-authored-by: Max Kalyabin Co-authored-by: Sébastien Despont Co-authored-by: Guilliam Xavier Co-authored-by: Gianluca Giovinazzo Co-authored-by: Alexander M. Turek Co-authored-by: Martins Sipenko --- src/PhpSpreadsheet/Reader/Xml.php | 51 ++++++++++++++++++------------- 1 file changed, 30 insertions(+), 21 deletions(-) diff --git a/src/PhpSpreadsheet/Reader/Xml.php b/src/PhpSpreadsheet/Reader/Xml.php index 11aa1df3bd..9de882337f 100644 --- a/src/PhpSpreadsheet/Reader/Xml.php +++ b/src/PhpSpreadsheet/Reader/Xml.php @@ -191,7 +191,7 @@ public function listWorksheetNames($pFilename) $xml_ss = $xml->children($namespaces['ss']); foreach ($xml_ss->Worksheet as $worksheet) { - $worksheet_ss = $worksheet->attributes($namespaces['ss']); + $worksheet_ss = self::getAttributes($worksheet, $namespaces['ss']); $worksheetNames[] = (string) $worksheet_ss['Name']; } @@ -221,7 +221,7 @@ public function listWorksheetInfo($pFilename) $worksheetID = 1; $xml_ss = $xml->children($namespaces['ss']); foreach ($xml_ss->Worksheet as $worksheet) { - $worksheet_ss = $worksheet->attributes($namespaces['ss']); + $worksheet_ss = self::getAttributes($worksheet, $namespaces['ss']); $tmpInfo = []; $tmpInfo['worksheetName'] = ''; @@ -381,13 +381,13 @@ public function loadIntoExisting($pFilename, Spreadsheet $spreadsheet) } if (isset($xml->CustomDocumentProperties)) { foreach ($xml->CustomDocumentProperties[0] as $propertyName => $propertyValue) { - $propertyAttributes = $propertyValue->attributes($namespaces['dt']); + $propertyAttributes = self::getAttributes($propertyValue, $namespaces['dt']); $propertyName = preg_replace_callback('/_x([0-9a-f]{4})_/i', ['self', 'hex2str'], $propertyName); $propertyType = Properties::PROPERTY_TYPE_UNKNOWN; switch ((string) $propertyAttributes) { case 'string': $propertyType = Properties::PROPERTY_TYPE_STRING; - $propertyValue = trim($propertyValue); + $propertyValue = trim((string) $propertyValue); break; case 'boolean': @@ -407,7 +407,7 @@ public function loadIntoExisting($pFilename, Spreadsheet $spreadsheet) break; case 'dateTime.tz': $propertyType = Properties::PROPERTY_TYPE_DATE; - $propertyValue = strtotime(trim($propertyValue)); + $propertyValue = strtotime(trim((string) $propertyValue)); break; } @@ -420,8 +420,9 @@ public function loadIntoExisting($pFilename, Spreadsheet $spreadsheet) $worksheetID = 0; $xml_ss = $xml->children($namespaces['ss']); - foreach ($xml_ss->Worksheet as $worksheet) { - $worksheet_ss = $worksheet->attributes($namespaces['ss']); + foreach ($xml_ss->Worksheet as $worksheetx) { + $worksheet = ($worksheetx === null) ? new SimpleXMLElement('') : $worksheetx; + $worksheet_ss = self::getAttributes($worksheet, $namespaces['ss']); if ( (isset($this->loadSheetsOnly)) && (isset($worksheet_ss['Name'])) && @@ -444,7 +445,7 @@ public function loadIntoExisting($pFilename, Spreadsheet $spreadsheet) // locally scoped defined names if (isset($worksheet->Names[0])) { foreach ($worksheet->Names[0] as $definedName) { - $definedName_ss = $definedName->attributes($namespaces['ss']); + $definedName_ss = self::getAttributes($definedName, $namespaces['ss']); $name = (string) $definedName_ss['Name']; $definedValue = (string) $definedName_ss['RefersTo']; $convertedValue = AddressHelper::convertFormulaToA1($definedValue); @@ -458,7 +459,7 @@ public function loadIntoExisting($pFilename, Spreadsheet $spreadsheet) $columnID = 'A'; if (isset($worksheet->Table->Column)) { foreach ($worksheet->Table->Column as $columnData) { - $columnData_ss = $columnData->attributes($namespaces['ss']); + $columnData_ss = self::getAttributes($columnData, $namespaces['ss']); if (isset($columnData_ss['Index'])) { $columnID = Coordinate::stringFromColumnIndex((int) $columnData_ss['Index']); } @@ -475,14 +476,14 @@ public function loadIntoExisting($pFilename, Spreadsheet $spreadsheet) $additionalMergedCells = 0; foreach ($worksheet->Table->Row as $rowData) { $rowHasData = false; - $row_ss = $rowData->attributes($namespaces['ss']); + $row_ss = self::getAttributes($rowData, $namespaces['ss']); if (isset($row_ss['Index'])) { $rowID = (int) $row_ss['Index']; } $columnID = 'A'; foreach ($rowData->Cell as $cell) { - $cell_ss = $cell->attributes($namespaces['ss']); + $cell_ss = self::getAttributes($cell, $namespaces['ss']); if (isset($cell_ss['Index'])) { $columnID = Coordinate::stringFromColumnIndex((int) $cell_ss['Index']); } @@ -524,7 +525,7 @@ public function loadIntoExisting($pFilename, Spreadsheet $spreadsheet) $cellData = $cell->Data; $cellValue = (string) $cellData; $type = DataType::TYPE_NULL; - $cellData_ss = $cellData->attributes($namespaces['ss']); + $cellData_ss = self::getAttributes($cellData, $namespaces['ss']); if (isset($cellData_ss['Type'])) { $cellDataType = $cellData_ss['Type']; switch ($cellDataType) { @@ -587,7 +588,7 @@ public function loadIntoExisting($pFilename, Spreadsheet $spreadsheet) $author = (string) $commentAttributes->Author; } $node = $cell->Comment->Data->asXML(); - $annotation = strip_tags($node); + $annotation = strip_tags((string) $node); $spreadsheet->getActiveSheet()->getComment($columnID . $rowID)->setAuthor($author)->setText($this->parseRichText($annotation)); } @@ -610,7 +611,7 @@ public function loadIntoExisting($pFilename, Spreadsheet $spreadsheet) if ($rowHasData) { if (isset($row_ss['Height'])) { $rowHeight = $row_ss['Height']; - $spreadsheet->getActiveSheet()->getRowDimension($rowID)->setRowHeight($rowHeight); + $spreadsheet->getActiveSheet()->getRowDimension($rowID)->setRowHeight((float) $rowHeight); } } @@ -631,7 +632,7 @@ public function loadIntoExisting($pFilename, Spreadsheet $spreadsheet) $activeWorksheet = $spreadsheet->setActiveSheetIndex(0); if (isset($xml->Names[0])) { foreach ($xml->Names[0] as $definedName) { - $definedName_ss = $definedName->attributes($namespaces['ss']); + $definedName_ss = self::getAttributes($definedName, $namespaces['ss']); $name = (string) $definedName_ss['Name']; $definedValue = (string) $definedName_ss['RefersTo']; $convertedValue = AddressHelper::convertFormulaToA1($definedValue); @@ -662,10 +663,11 @@ private function parseStyles(SimpleXMLElement $xml, array $namespaces): void } foreach ($xml->Styles[0] as $style) { - $style_ss = $style->attributes($namespaces['ss']); + $style_ss = self::getAttributes($style, $namespaces['ss']); $styleID = (string) $style_ss['ID']; $this->styles[$styleID] = (isset($this->styles['Default'])) ? $this->styles['Default'] : []; - foreach ($style as $styleType => $styleData) { + foreach ($style as $styleType => $styleDatax) { + $styleData = $styleDatax ?? new SimpleXMLElement(''); $styleAttributes = $styleData->attributes($namespaces['ss']); switch ($styleType) { case 'Alignment': @@ -750,15 +752,16 @@ private function parseStyleBorders($styleID, SimpleXMLElement $styleData, array $diagonalDirection = ''; $borderPosition = ''; foreach ($styleData->Border as $borderStyle) { - $borderAttributes = $borderStyle->attributes($namespaces['ss']); + $borderAttributes = self::getAttributes($borderStyle, $namespaces['ss']); $thisBorder = []; $style = (string) $borderAttributes->Weight; $style .= strtolower((string) $borderAttributes->LineStyle); $thisBorder['borderStyle'] = self::$mappings['borderStyle'][$style] ?? Border::BORDER_NONE; - foreach ($borderAttributes as $borderStyleKey => $borderStyleValue) { + foreach ($borderAttributes as $borderStyleKey => $borderStyleValuex) { + $borderStyleValue = (string) $borderStyleValuex; switch ($borderStyleKey) { case 'Position': - $borderStyleValue = strtolower((string) $borderStyleValue); + $borderStyleValue = strtolower($borderStyleValue); if (in_array($borderStyleValue, self::$borderPositions)) { $borderPosition = $borderStyleValue; } elseif ($borderStyleValue == 'diagonalleft') { @@ -784,6 +787,11 @@ private function parseStyleBorders($styleID, SimpleXMLElement $styleData, array } } + private static function getAttributes(?SimpleXMLElement $simple, string $node): SimpleXMLElement + { + return ($simple === null) ? new SimpleXMLElement('') : ($simple->attributes($node) ?? new SimpleXMLElement('')); + } + private static $underlineStyles = [ Font::UNDERLINE_NONE, Font::UNDERLINE_DOUBLE, @@ -854,7 +862,8 @@ private function parseStyleFont(string $styleID, SimpleXMLElement $styleAttribut */ private function parseStyleInterior($styleID, SimpleXMLElement $styleAttributes): void { - foreach ($styleAttributes as $styleAttributeKey => $styleAttributeValue) { + foreach ($styleAttributes as $styleAttributeKey => $styleAttributeValuex) { + $styleAttributeValue = (string) $styleAttributeValuex; switch ($styleAttributeKey) { case 'Color': $this->styles[$styleID]['fill']['endColor']['rgb'] = substr($styleAttributeValue, 1);