-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ChangeFeed Spark Bug Processing All Partitions #42553
Conversation
/azp run java - cosmos - tests |
Azure Pipelines successfully started running 1 pipeline(s). |
API change check APIView has identified API level changes in this PR and created following API reviews. |
/azp run java - cosmos - tests |
Azure Pipelines successfully started running 1 pipeline(s). |
...re-cosmos-spark_3_2-12/src/main/scala/com/azure/cosmos/spark/ChangeFeedPartitionReader.scala
Outdated
Show resolved
Hide resolved
...re-cosmos-spark_3_2-12/src/main/scala/com/azure/cosmos/spark/ChangeFeedPartitionReader.scala
Outdated
Show resolved
Hide resolved
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/BridgeInternal.java
Outdated
Show resolved
Hide resolved
...re-cosmos-spark_3_2-12/src/main/scala/com/azure/cosmos/spark/ChangeFeedPartitionReader.scala
Outdated
Show resolved
Hide resolved
/azp run java - cosmos - tests |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run java - cosmos - spark |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run java - cosmos - spark |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run java - cosmos - spark |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run java - cosmos - spark |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run java - cosmos - spark |
Azure Pipelines successfully started running 1 pipeline(s). |
…to tvaron3/changeFeedSparkBug
/azp run java - cosmos - tests |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run java - cosmos - tests |
Azure Pipelines successfully started running 1 pipeline(s). |
...smos-spark_3_2-12/src/test/scala/com/azure/cosmos/spark/ChangeFeedPartitionReaderITest.scala
Show resolved
Hide resolved
…to tvaron3/changeFeedSparkBug
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - Thanks
Description
In our spark connector, all cosmos partitions should be processed up to a precalculated lsn in most cases using streaming mode. When a spark partition is being processed and a split happens, there is a chance that a child partition is not fully drained because spark will keep grabbing changes into an endLSN but has no knowledge of the split.
Implementation
Spark will set a request option for continuing the changefeed until an endLSN and the java sdk will keep returning changes up to the endLSN. If there is a 304 it will continue reading because the endLSN is always calculated with a known lsn. The 304 would happen if there is a delay in change feed.