Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ChangeFeed Spark Bug Processing All Partitions #42553

Merged
merged 36 commits into from
Nov 26, 2024

Conversation

tvaron3
Copy link
Member

@tvaron3 tvaron3 commented Oct 24, 2024

Description

In our spark connector, all cosmos partitions should be processed up to a precalculated lsn in most cases using streaming mode. When a spark partition is being processed and a split happens, there is a chance that a child partition is not fully drained because spark will keep grabbing changes into an endLSN but has no knowledge of the split.

Implementation

Spark will set a request option for continuing the changefeed until an endLSN and the java sdk will keep returning changes up to the endLSN. If there is a 304 it will continue reading because the endLSN is always calculated with a known lsn. The 304 would happen if there is a delay in change feed.

@tvaron3
Copy link
Member Author

tvaron3 commented Oct 24, 2024

/azp run java - cosmos - tests

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@azure-sdk
Copy link
Collaborator

API change check

APIView has identified API level changes in this PR and created following API reviews.

com.azure:azure-cosmos

@tvaron3
Copy link
Member Author

tvaron3 commented Oct 24, 2024

/azp run java - cosmos - tests

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@tvaron3 tvaron3 marked this pull request as ready for review October 24, 2024 20:13
@tvaron3 tvaron3 requested review from kirankumarkolli and a team as code owners October 24, 2024 20:13
@tvaron3
Copy link
Member Author

tvaron3 commented Oct 30, 2024

/azp run java - cosmos - tests

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@tvaron3
Copy link
Member Author

tvaron3 commented Oct 31, 2024

/azp run java - cosmos - spark

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@tvaron3
Copy link
Member Author

tvaron3 commented Nov 7, 2024

/azp run java - cosmos - spark

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@tvaron3
Copy link
Member Author

tvaron3 commented Nov 7, 2024

/azp run java - cosmos - spark

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@tvaron3
Copy link
Member Author

tvaron3 commented Nov 8, 2024

/azp run java - cosmos - spark

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@tvaron3
Copy link
Member Author

tvaron3 commented Nov 18, 2024

/azp run java - cosmos - spark

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@tvaron3
Copy link
Member Author

tvaron3 commented Nov 19, 2024

/azp run java - cosmos - tests

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@tvaron3
Copy link
Member Author

tvaron3 commented Nov 19, 2024

/azp run java - cosmos - tests

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@FabianMeiswinkel FabianMeiswinkel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - Thanks

@tvaron3 tvaron3 merged commit 05ed0e4 into Azure:main Nov 26, 2024
38 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants