fix: strip the UTF8 BOM by pgerlach · Pull Request #85 · gavinr/github-csv-tools · GitHub
Skip to content

fix: strip the UTF8 BOM#85

Merged
gavinr merged 1 commit into
gavinr:masterfrom
suricats:fix-UTF8-BOM
Mar 30, 2023
Merged

fix: strip the UTF8 BOM#85
gavinr merged 1 commit into
gavinr:masterfrom
suricats:fix-UTF8-BOM

Conversation

@pgerlach

Copy link
Copy Markdown
Contributor

The input file is read as UTF8, and in csv-parse documentation is written "It is recommended to always activate this option when working with UTF-8 files." (https://csv.js.org/parse/options/bom/).

This fixes the case where there is a BOM, in which case the first column was not detected, because it includes the BOM character as the first char of the first column name.

If the file has no BOM, then the option does nothing.

We read the input file as UTF8, and in csv-parse documentation is
written "It is recommended to always activate this option when working
with UTF-8 files."
@gavinr

gavinr commented Mar 24, 2023

Copy link
Copy Markdown
Owner

@pgerlach

Copy link
Copy Markdown
Contributor Author

Sure ! This is an export from Excel choosing the format "CSV UTF-8".

csv-file-with-utf8-bom.csv

hexdump shows that it begins with the UTF-8 BOM 0xefbbbf.

$ hexdump -C csv-file-with-utf8-bom.csv
00000000  ef bb bf 74 69 74 6c 65  2c 62 6f 64 79 0d 0a 55  |...title,body..U|
00000010  54 46 2d 38 20 42 4f 4d  2c 68 61 6e 64 6c 65 20  |TF-8 BOM,handle |
00000020  55 54 46 2d 38 20 66 69  6c 65 73 20 77 69 74 68  |UTF-8 files with|
00000030  20 42 4f 4d                                       | BOM|
00000034

githubCsvTools can't parse it. But it can parse the same file with the bom removed.

csv-file-without-utf8-bom.csv

@gavinr gavinr merged commit 1ee65e9 into gavinr:master Mar 30, 2023
@gavinr

gavinr commented Mar 30, 2023

Copy link
Copy Markdown
Owner

thanks!

github-actions Bot pushed a commit that referenced this pull request Mar 30, 2023
## [3.1.7](v3.1.6...v3.1.7) (2023-03-30)

### Bug Fixes

* strip the UTF8 BOM ([#85](#85)) ([1ee65e9](1ee65e9))
@github-actions

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants