Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix character encoding issues when loading JSON configuration files #574

Merged
merged 5 commits into from
Feb 22, 2025

Conversation

vicenciomf2
Copy link
Contributor

Problem Description

When loading configuration files containing non-ASCII characters (like Spanish accents or special characters), the text is incorrectly encoded, resulting in mojibake (e.g., "ó" appearing as "ó"). This affects all text loaded from JSON configuration files, including prompts for LLM services.

Example of the issue:

{
    "LLMImageDescriptionProcessor_image_description_prompt": "Eres un experto en análisis de documentos especializado en crear descripciones de texto para imágenes. (...)"
}

Was being loaded as:

Eres un experto en análisis de documentos especializado en crear descripciones de texto para imágenes. (...)

And the LLM's response was, for example:

Esta imagen muestra el escudo de la Pontificia Universidad Católica de Chile.

Root Cause

The JSON configuration files were being opened without explicitly specifying UTF-8 encoding, causing the text to be read with the system's default encoding.

Solution

Modified the configuration file loading in config/parser.py to explicitly use UTF-8 encoding when opening JSON files. This is particularly important for non-English LLM prompts and responses.

Copy link
Contributor

github-actions bot commented Feb 22, 2025

CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅

@vicenciomf2
Copy link
Contributor Author

I have read the CLA Document and I hereby sign the CLA

github-actions bot added a commit that referenced this pull request Feb 22, 2025
@VikParuchuri VikParuchuri changed the base branch from master to dev February 22, 2025 12:55
@VikParuchuri VikParuchuri merged commit bfc960e into VikParuchuri:dev Feb 22, 2025
1 check passed
@github-actions github-actions bot locked and limited conversation to collaborators Feb 22, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants