-
Notifications
You must be signed in to change notification settings - Fork 348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Column names cannot contain spaces when saving back to Redshift #84
Comments
This patch modifies `JDBCWrapper.schemaString` to wrap column names in quotes, which is necessary in order to allow us to create tables with columns whose names are reserved words or which contain spaces. This fixes #80. Note that, by itself, this patch does not enable full support for creating Redshift tables with column names that contain spaces; we are currently constrained by Avro's schema validation rules (see #84). Author: Josh Rosen <joshrosen@databricks.com> Closes #85 from JoshRosen/column-name-escaping.
This patch adds new options to allow CSV to be used as the intermediate data format when writing data to Redshift. This can offer large performance benefits because Redshift's Avro reader can be very slow. This patch is based on #165 by emlyn and incorporates changes from me in order to add documentation, make the new option case-insensitive, improve some error messages, and add tests. Using CSV for writes also allows us to write to tables whose column names are unsupported by Avro, so #84 is partially addressed by this patch. As a hedge, I've marked this feature as "Experimental" and I'll remove that label after it's been tested in the wild a bit more. Fixes #73. Author: Josh Rosen <joshrosen@databricks.com> Author: Josh Rosen <rosenville@gmail.com> Author: Emlyn Corrin <Emlyn.Corrin@microsoft.com> Author: Emlyn Corrin <emlyn@swiftkey.com> Closes #288 from JoshRosen/use-csv-for-writes.
Hey, colons and dashes too, I have fields like this
because neither Spec https://avro.apache.org/docs/current/spec.html#names doesn't say anything about this. Wdyt ? |
The Avro spec actually does say something about this (emphasis mine to make things clearer):
My PR description also notes this:
Starting in version spark-redshift/src/main/scala/com/databricks/spark/redshift/RedshiftWriter.scala Line 370 in 6cc49da
I'll mark this issue as "documentation" to remind me to add a section to the README documenting this workaround. |
Is this fixed yet? (white space in column names) |
Because
spark-redshift
uses Avro in its write path, it inherits the limitations of Avro's schema validation: we cannot create tables with columns that contain non-letter-or-_
characters, such as spaces or quotes.In order to work around these limitations, I think that we'd have to use a different set of column names when writing to Avro, then use a JSONPaths file to map those columns back to the original column names.
The text was updated successfully, but these errors were encountered: