-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
CSV - Guess Encoding, Handle Null-string Escape (#1717)
* CSV - Guess Encoding, Handle Null-string Escape This is in response to issue #1647 (detect CSV character encoding). First, my tests with mb_detect_encoding indicate that it doesn't work well enough; regardless, users can always do that on their own if they deem it useful. Rolling my own is also troublesome, but I can at least: a. Check for BOM (UTF-8, UTF-16BE, UTF-16LE, UTF-32BE, UTF-32LE). b. Do some heuristic tests for each of the above encodings. c. Fallback to a user-specified encoding (default CP1252) if a and b don't yield result. I think this is probably useful enough to include, and relatively easy to expand if other potential encodings should be considered. Starting with PHP7.4, fgetcsv allows specification of null string as escape character in fgetcsv. This is a much better choice than the PHP (and PhpSpreadsheet) default of backslash in that it handles the file in the same manner as Excel does. There is one statement in Reader/CSV which would be adversely affected if the caller so specified (building a regular expression under the assumption that escape character is a single character). Fix that statement appropriately and add tests.
- Loading branch information
Showing
15 changed files
with
170 additions
and
8 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
a\"hello;hello;hello;\",b\"hello;hello;hello;\",c\"\hello;hello;hello;\" | ||
a\"hello;hello;hello;\",b\"hello;hello;hello;\",c\"\hello;hello;hello;\",d | ||
a\"hello;hello;hello;\",b\"hello;hello;hello;\",c\"\hello;hello;hello;\" | ||
a\"hello;hello;hello;\",b\"hello;hello;hello;\",c\"\hello;hello;hello;\" |
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
première,second,troisième | ||
Quatrième,cinquième,sixième |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
première,second,troisième | ||
Quatrième,cinquième,sixième |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
premi�re,second,troisi�me | ||
Quatri�me,cinqui�me,sixi�me |