Skip to content

conversion:charset

timrdf edited this page Jul 6, 2012 · 8 revisions
csv2rdf4lod-automation is licensed under the [Apache License, Version 2.0](/~https://github.com/timrdf/csv2rdf4lod-automation/wiki/License)

Structural conversion:Enhancements:


csv2rdf4lod assumes the input character encoding is UTF-8. If this is not the case, then conversion:charset can be used to specify the appropriate character encoding. The values that you are likely to use are listed here:

  • US-ASCII Seven-bit ASCII, a.k.a. ISO646-US, a.k.a. the Basic Latin block of the Unicode character set
  • ISO-8859-1 ISO Latin Alphabet No. 1, a.k.a. ISO-LATIN-1
  • UTF-8 Eight-bit UCS Transformation Format
  • UTF-16BE Sixteen-bit UCS Transformation Format, big-endian byte order
  • UTF-16LE Sixteen-bit UCS Transformation Format, little-endian byte order
  • UTF-16 Sixteen-bit UCS Transformation Format, byte order identified by an optional byte-order mark
<http://logd.tw.rpi.edu/source/lebot/dataset/replacement-characters-from-csv-api/version/2012-Jul-05/conversion/enhancement/1>
   a conversion:LayerDataset, void:Dataset;

   conversion:base_uri           "http://logd.tw.rpi.edu"^^xsd:anyURI;
   conversion:source_identifier  "lebot";
   conversion:dataset_identifier "replacement-characters-from-csv-api";
   conversion:version_identifier "2012-Jul-05";
   conversion:enhancement_identifier "1";

   conversion:conversion_process [
      a conversion:EnhancementConversionProcess;
      conversion:enhancement_identifier "1";

      dcterms:creator <http://purl.org/twc/id/machine/lebot/MacBookPro6_2#lebot>;
      dcterms:created "2012-07-06T09:31:27-04:00"^^xsd:dateTime;

      conversion:charset "ISO-8859-1";  # <- Add this to specify character encoding of the input CSV (default is UTF-8)

      conversion:delimits_cell ",";

      conversion:enhance [
         ov:csvCol          1;
         ov:csvHeader       "Title";
Clone this wiki locally