Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix NoneType Error in flatten Function for Text-Only Tasks in LLAVA Models #501

Merged
merged 2 commits into from
Jan 17, 2025

Conversation

bibisbar
Copy link
Contributor

PR Description

This PR resolves a NoneType iteration error in the flatten function when handling text-only tasks (e.g., MMLU or HellaSwag) in LLAVA family models.

Problem

The flatten function processes batched visual tokens but receives None or empty inputs in text-only tasks. This leads to a NoneType iteration error in the original implementation.

Solution

  1. Added input validation to flatten:
    • Checks if the input is None or empty.
    • Returns an empty list if validation fails.
  2. Ensures sub-elements are validated before iteration.

This update ensures compatibility with text-only tasks while maintaining functionality for visual tasks.

@bibisbar
Copy link
Contributor Author

New PR: Fix and Refactor Text-Only and Multimodal Input Handling

Description

This PR refactors the code to properly handle both text-only and multimodal inputs during evaluation. The following changes were made:

  1. Text-Only Task Handling:

    • For tasks without image input, the model now properly processes and handles input_ids and labels without passing any image data.
  2. Multimodal Task Handling:

    • When image inputs are provided, the image and image_sizes are correctly passed to the model along with input_ids and labels.
    • Added conditional checks to differentiate between text-only and multimodal evaluation.
  3. Shape Adjustments for Input and Labels:

    • Ensured that the input_ids and labels tensors have the correct shape ((batch_size, seq_len)).
    • The context part of the labels is set to -100 to exclude it from loss calculation.

Changes

  • Refactored the input preprocessing and tensor handling logic to support both text-only and multimodal inputs.
  • Added checks for image and appropriately handled different cases.
  • Ensured tensor shapes are consistent across both text and multimodal tasks.

How to Test

  1. Run the evaluation script with text-only tasks and check that the model processes correctly.
  2. Run the evaluation script with multimodal tasks (i.e., tasks that include both text and images) and verify that the image input is correctly handled.
  3. Check the shape of input_ids and labels tensors to ensure they are in the correct format for both text-only and multimodal tasks.

Motivation and Context

The goal of this PR is to ensure that both text-only and multimodal inputs are handled correctly during model evaluation. This change improves the flexibility of the evaluation pipeline and prepares it for different task types.

Checklist

  • Code compiles and runs as expected.
  • All tests pass (if applicable).
  • Code style and formatting are consistent.
  • The PR description is clear and includes sufficient details about the changes made.

@pufanyi pufanyi self-requested a review January 17, 2025 02:27
@pufanyi
Copy link
Collaborator

pufanyi commented Jan 17, 2025

Hi!!! Thank you so much for your contribution!!! Can you try to run pre-commit run --all-files so that it can format the code to pass the lint check?

@bibisbar
Copy link
Contributor Author

It should work now. Looks like it was accidentally deleted during one of the commits, but I’ve restored it.

@pufanyi
Copy link
Collaborator

pufanyi commented Jan 17, 2025

Looks good! Thank you so much!

lmms_eval/models/tinyllava.py Outdated Show resolved Hide resolved
@pufanyi pufanyi merged commit b1fbf55 into EvolvingLMMs-Lab:main Jan 17, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants