[Bug]: All Operation search not functioning under Azure CosmosDB-for-Cassandra storage #5185

Wraith2 · 2024-02-07T15:37:51Z

What happened?

I have installed as a proof of concept and to minimize costs am using Azure CosmosDB for Cassandra as the database backend which I believe has been made to work based on previous closed issues #1667 and #2467 . The jaeger parts of the tooling work successfully but CosmosDB's cassandra api does not appear to be complete which means it is not possible to use the "all" operation selection to see the recent traces ordered by time.

Steps to reproduce

Setup a tracing database using AzureDB for Cassandra. Customize the install cqsh commands for a single node and run it, it will succeed. Proceed to setup something to send data through the collector to the backend and everything should work normally.
Run the query ui pointing to the azure db instance. select the "all" operation and click "Find Traces" button. An error will be returned:

ORDER BY requires creating a custom index: CosmosClusteringIndex. Please create a custom index and re-issue this query

The query that cases this to occur is:

SELECT trace_id FROM service_name_index WHERE bucket IN (0,1,2,3,4,5,6,7,8,9) AND service_name = ? AND start_time > ?  AND start_time < ? ORDER BY start_time DESC LIMIT ?;

and the definition of service_name_index already states that start_time is part of the key and ordered, WITH CLUSTERING ORDER BY (start_time DESC) so i think this is CosmosDB not conforming the cassandra api correctly. This is backed up by an issue on the microsoft learn site https://learn.microsoft.com/en-us/answers/questions/1181520/cassandra-api-unable-to-run-query?page=1&orderby=Helpful&comment=answer-1177286#newest-answer-comment where the user is directed to enable a preview cassandra feature that i cannot find.

Expected behavior

It would be good if jaeger could be made to work around this issue or some azure specific schema change could be identified that let it work in spite of the missing feature in cosmosdb.

I understand that is not likely to be a problem in jaeger. However when researching whether this backend would function all the information I could find suggested that it would work. If azure cosmosdb for cassandra is not a viable backend because it lacks a required feature of the real cassandra system then it may be useful for others to be able to find this issue in searches.

Relevant log output

No response

Screenshot

No response

Additional context

No response

Jaeger backend version

1.53

SDK

OpenTelemetry Dotnet package 1.7.0

Pipeline

azure appservice -> jaeger-collector -> azurecosmosdb-for-cassandra

Stogage backend

azurecosmosdb-for-cassandra

Operating system

Windows

Deployment model

No response

Deployment configs

create-schema-clean.txt

The text was updated successfully, but these errors were encountered:

Wraith2 · 2024-02-12T09:23:57Z

Apologies for the random ping @TheovanKraay but you may be well placed to help with this.

TheovanKraay · 2024-02-12T12:16:03Z

The Cosmos DB API for Apache Cassandra does have some compatibility gaps. I would recommend running Jaeger with Azure Managed Instance for Apache Cassandra.. This is an offering under Azure Cosmos DB, but is a fully managed service for pure open-source Apache Cassandra with 100% compatibility. You should not have any problems with any of the Jaeger commands if using this service instead.

Wraith2 · 2024-02-23T11:16:14Z

Ok, thanks.

No action needed here from Jaeger then. To anyone who finds this in a search you will need to move to full cassandra or elasticsearch storage backend.

jravnik · 2024-06-18T17:04:24Z

According to Microsoft Questions, the former "Preview" feature will no longer be activated and will not become "GA": https://learn.microsoft.com/en-us/answers/questions/1338536/is-cosmosclusterindex-still-a-preview-feature (expand comments on accepted answer)

This means that it is not and will not be possible to create a custom index with Azure Cosmos DB for Apache Cassandra. But running a managed instance with Apache Cassandra instead is much more costly in comparison, simply to compensate for this one missing feature.

Is it still possible for Jaeger to work around this issue so that a query with operation = "All" is possible? Maybe a config option to order after the data is fetched from Cassandra (of course with the disadvantage of being a little slower and/or more resource-intensive for memory/CPU)?

yurishkuro · 2024-06-18T18:14:53Z

Jaeger uses this table for service-only searches:

CREATE TABLE IF NOT EXISTS ${keyspace}.service_name_index (
    service_name      text,
    bucket            int,
    start_time        bigint, -- microseconds since epoch
    trace_id          blob,
    PRIMARY KEY ((service_name, bucket), start_time)
) WITH CLUSTERING ORDER BY (start_time DESC)

The clustering primary index was a feature of Cassandra since v2.x (10yrs ago). If Cosmos does not support it, I don't know how it claims to be Cassandra-compatible. Perhaps it has other workarounds to define some secondary indices.

jravnik · 2024-06-19T15:19:03Z

@yurishkuro Strangely enough, the tables themselves can be created with a clustering index without any problems. But as soon as a query uses an ORDER BY, the problem is reported that the said custom index is missing - which simply cannot be created since it is not a supported feature:

SELECT * FROM jaegertracing.service_name_index WHERE bucket in (0,1,2,3,4,6,7,8,9) AND service_name = 'my-service' AND start_time > 1718728568730810 AND start_time < 1718728668730810 ORDER BY start_time DESC;

InvalidRequest: Error from server: code=2200 [Invalid query] message="ORDER BY requires creating a custom index: CosmosClusteringIndex. Please create a custom index and re-issue this query"

If one submits the query without ORDER BY, the data can be read. A potential workaround would therefore probably be to run the SELECT without ORDER BY and sort the results in the application logic:

SELECT * FROM jaegertracing.service_name_index WHERE bucket in (0,1,2,3,4,6,7,8,9) AND service_name = 'my-service' AND start_time > 1718728568730810 AND start_time < 1718728668730810;

 service_name | bucket | start_time       | trace_id
------------------+--------+------------------+------------------------------------
 my-service   |      1 | 1718728573990844 | 0x0d1b531a4024d9dc848c0793120673ad
 my-service   |      1 | 1718728572649478 | 0xcd2e6645a71ba85a1b12fce2e82b30a8
 my-service   |      1 | 1718728572637999 | 0xcd2e6645a71ba85a1b12fce2e82b30a8
 my-service   |      1 | 1718728571031771 | 0xdcfc1ac661729d6ca1aafe38bd146add

---MORE---

I don't think this can be solved with secondary indexes, as I understand that they only support faster filtering, but not sorting.

Wraith2 added the bug label Feb 7, 2024

Wraith2 changed the title ~~[Bug]:~~ [Bug]: All Operation search not functioning under Azure CosmosDB-for-Cassandra storage Feb 7, 2024

Wraith2 closed this as completed Feb 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: All Operation search not functioning under Azure CosmosDB-for-Cassandra storage #5185

[Bug]: All Operation search not functioning under Azure CosmosDB-for-Cassandra storage #5185

Wraith2 commented Feb 7, 2024 •

edited

Loading

Wraith2 commented Feb 12, 2024

TheovanKraay commented Feb 12, 2024

Wraith2 commented Feb 23, 2024

jravnik commented Jun 18, 2024

yurishkuro commented Jun 18, 2024

jravnik commented Jun 19, 2024

[Bug]: All Operation search not functioning under Azure CosmosDB-for-Cassandra storage #5185

[Bug]: All Operation search not functioning under Azure CosmosDB-for-Cassandra storage #5185

Comments

Wraith2 commented Feb 7, 2024 • edited Loading

What happened?

Steps to reproduce

Expected behavior

Relevant log output

Screenshot

Additional context

Jaeger backend version

SDK

Pipeline

Stogage backend

Operating system

Deployment model

Deployment configs

Wraith2 commented Feb 12, 2024

TheovanKraay commented Feb 12, 2024

Wraith2 commented Feb 23, 2024

jravnik commented Jun 18, 2024

yurishkuro commented Jun 18, 2024

jravnik commented Jun 19, 2024

Wraith2 commented Feb 7, 2024 •

edited

Loading