Fix character encoding issues when loading JSON configuration files #574

vicenciomf2 · 2025-02-22T04:33:34Z

Problem Description

When loading configuration files containing non-ASCII characters (like Spanish accents or special characters), the text is incorrectly encoded, resulting in mojibake (e.g., "ó" appearing as "Ã³"). This affects all text loaded from JSON configuration files, including prompts for LLM services.

Example of the issue:

{
    "LLMImageDescriptionProcessor_image_description_prompt": "Eres un experto en análisis de documentos especializado en crear descripciones de texto para imágenes. (...)"
}

Was being loaded as:

Eres un experto en anÃ¡lisis de documentos especializado en crear descripciones de texto para imÃ¡genes. (...)

And the LLM's response was, for example:

Esta imagen muestra el escudo de la Pontificia Universidad CatÃ³lica de Chile.

Root Cause

The JSON configuration files were being opened without explicitly specifying UTF-8 encoding, causing the text to be read with the system's default encoding.

Solution

Modified the configuration file loading in config/parser.py to explicitly use UTF-8 encoding when opening JSON files. This is particularly important for non-English LLM prompts and responses.

Dev

Fix llm layout missing text

Fix character encoding issues when loading configuration files with non-ASCII characters.

github-actions · 2025-02-22T04:33:47Z

CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅

vicenciomf2 · 2025-02-22T04:34:27Z

I have read the CLA Document and I hereby sign the CLA

github-actions bot and others added 5 commits February 16, 2025 23:02

@dantetemplar has signed the CLA in VikParuchuri#555

d0a0455

Merge pull request VikParuchuri#560 from VikParuchuri/dev

27d2b9e

Dev

Merge pull request VikParuchuri#564 from VikParuchuri/dev

434c0ce

Dev

Merge pull request VikParuchuri#565 from VikParuchuri/dev

141da8c

Fix llm layout missing text

Fix utf-8 encoding for JSON config files

72863e2

Fix character encoding issues when loading configuration files with non-ASCII characters.

github-actions bot added a commit that referenced this pull request Feb 22, 2025

@vicenciomf2 has signed the CLA in #574

cdd681a

VikParuchuri changed the base branch from master to dev February 22, 2025 12:55

VikParuchuri merged commit bfc960e into VikParuchuri:dev Feb 22, 2025
1 check passed

github-actions bot locked and limited conversation to collaborators Feb 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix character encoding issues when loading JSON configuration files #574

Fix character encoding issues when loading JSON configuration files #574

vicenciomf2 commented Feb 22, 2025

github-actions bot commented Feb 22, 2025 •

edited

Loading

vicenciomf2 commented Feb 22, 2025

Fix character encoding issues when loading JSON configuration files #574

Fix character encoding issues when loading JSON configuration files #574

Conversation

vicenciomf2 commented Feb 22, 2025

Problem Description

Example of the issue:

Root Cause

Solution

github-actions bot commented Feb 22, 2025 • edited Loading

vicenciomf2 commented Feb 22, 2025

github-actions bot commented Feb 22, 2025 •

edited

Loading