Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[storage] Remove distinction between primary and archive storage intefaces #6567

Open
wants to merge 48 commits into
base: main
Choose a base branch
from

Conversation

mahadzaryab1
Copy link
Collaborator

@mahadzaryab1 mahadzaryab1 commented Jan 18, 2025

Which problem is this PR solving?

Description of the changes

  • This PR completely removes the interface storage.ArchiveFactory by refactoring all the storage implementations to remove the distinction between a primary and archive interface. Note that the concept of archive storage remains the same within Jaeger, we just now use the same interface to handle both primary and archive storages.

How was this change tested?

Checklist

Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
Copy link

codecov bot commented Jan 18, 2025

Codecov Report

Attention: Patch coverage is 98.02956% with 4 lines in your changes missing coverage. Please review.

Project coverage is 96.16%. Comparing base (b04e0ba) to head (263315b).
Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
plugin/storage/factory.go 96.10% 2 Missing and 1 partial ⚠️
plugin/storage/es/factory.go 98.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6567      +/-   ##
==========================================
- Coverage   96.24%   96.16%   -0.08%     
==========================================
  Files         373      372       -1     
  Lines       21406    21198     -208     
==========================================
- Hits        20602    20385     -217     
- Misses        612      618       +6     
- Partials      192      195       +3     
Flag Coverage Δ
badger_v1 10.94% <0.00%> (+0.30%) ⬆️
badger_v2 2.85% <0.00%> (+0.07%) ⬆️
cassandra-4.x-v1-manual 16.35% <20.83%> (-0.23%) ⬇️
cassandra-4.x-v2-auto 2.79% <0.00%> (+0.07%) ⬆️
cassandra-4.x-v2-manual 2.79% <0.00%> (+0.07%) ⬆️
cassandra-5.x-v1-manual 16.35% <20.83%> (-0.23%) ⬇️
cassandra-5.x-v2-auto 2.79% <0.00%> (+0.07%) ⬆️
cassandra-5.x-v2-manual 2.79% <0.00%> (+0.07%) ⬆️
elasticsearch-6.x-v1 20.19% <19.27%> (-0.18%) ⬇️
elasticsearch-7.x-v1 20.27% <19.27%> (-0.18%) ⬇️
elasticsearch-8.x-v1 20.44% <19.27%> (-0.16%) ⬇️
elasticsearch-8.x-v2 2.84% <0.00%> (-0.04%) ⬇️
grpc_v1 12.05% <26.56%> (-0.11%) ⬇️
grpc_v2 9.27% <23.83%> (+0.24%) ⬆️
kafka-3.x-v1 10.61% <0.00%> (+0.29%) ⬆️
kafka-3.x-v2 2.85% <0.00%> (+0.07%) ⬆️
memory_v2 2.85% <0.00%> (+0.08%) ⬆️
opensearch-1.x-v1 20.31% <19.27%> (-0.17%) ⬇️
opensearch-2.x-v1 20.32% <19.27%> (-0.17%) ⬇️
opensearch-2.x-v2 2.84% <0.00%> (+0.06%) ⬆️
tailsampling-processor 0.53% <0.00%> (+0.01%) ⬆️
unittests 95.04% <97.53%> (-0.06%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


func (f *Factory) getArchiveClient() es.Client {
if c := f.archiveClient.Load(); c != nil {
if c := f.client.Load(); c != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(not for this PR) I am curious if we still need this indirection. The primary reason it was added was to support TLS cert reload. But we don't do file-watcher-based reloads anymore, so nothing will ever trigger resetting of the client, and the cert rotation should be supported via OTEL's configtls indirection (as confirmed by the unit tests where we simulate rotation of certs). One this PR is merged let's open a good-first-issue ticket.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, maybe we still use file-watcher for password file, then we'd keep this.

Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
@mahadzaryab1 mahadzaryab1 changed the title [WIP] Remove distinction between primary and archive storage in jaeger-v1 [storage] Remove distinction between primary and archive storage intefaces Jan 20, 2025
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
@mahadzaryab1 mahadzaryab1 marked this pull request as ready for review January 20, 2025 17:21
@mahadzaryab1 mahadzaryab1 requested a review from a team as a code owner January 20, 2025 17:21
@@ -138,10 +138,18 @@ func (qOpts *QueryOptions) InitFromViper(v *viper.Viper, logger *zap.Logger) (*Q
return qOpts, nil
}

type archiveInitializer interface {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: since this is used as part of (potentially) public API BuildQueryServiceOptions, and since you only need one function, a better approach would be accepting a parameter of public type

type InitArchiveStorageFn func(logger *zap.Logger) (spanstore.Reader, spanstore.Writer)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's also easier to test with a function

@@ -114,7 +114,9 @@ type Configuration struct {
// UseReadWriteAliases, if set to true, will use read and write aliases for indices.
// Use this option with Elasticsearch rollover API. It requires an external component
// to create aliases before startup and then performing its management.
UseReadWriteAliases bool `mapstructure:"use_aliases"`
UseReadWriteAliases bool `mapstructure:"use_aliases"`
ReadAliasSuffix string `mapstructure:"-"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should add comment why these are not exposed via mapstructure:


// DefaultConfigurable is an interface that can be implement by some storage implementations
// to provide a way to inherit configuration settings from another factory.
type DefaultConfigurable interface {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inheritable?

)

var ( // interface comformance checks
_ storage.Factory = (*Factory)(nil)
_ storage.Purger = (*Factory)(nil)
_ storage.ArchiveFactory = (*Factory)(nil)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add Inheritable?


primaryClient atomic.Pointer[es.Client]
archiveClient atomic.Pointer[es.Client]
client atomic.Pointer[es.Client]

watchers []*fswatcher.FSWatcher
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this still need to be a list, or just exactly one watcher? If one, and it's for a password, I would rename it to pwdFileWatcher

errs = append(errs, err)
}
}
}
for _, storageType := range f.SpanWriterTypes {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why this shouldn't simply be looping through f.factories and f.archiveFactories and closing them all? Seems weird that it depends on external list of types, considering that some types are added implicitly. Unless the same storage can be added more than once.

if role == "archive" {
if primaryFactory, ok := f.factories[kind]; ok {
if dc, ok := factory.(plugin.DefaultConfigurable); ok {
dc.InheritSettingsFrom(primaryFactory)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the order here correct, i.e. inheriting settings after calling Initialize?

if dc, ok := factory.(plugin.DefaultConfigurable); ok {
dc.InheritSettingsFrom(primaryFactory)
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the coverage gap here real or stale?
image

@@ -72,14 +67,6 @@ func (c *GRPCClient) StreamingSpanWriter() spanstore.Writer {
return newStreamingSpanWriter(c.streamWriterClient)
}

func (c *GRPCClient) ArchiveSpanReader() spanstore.Reader {
return &archiveReader{client: c.archiveReaderClient}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we deleting these internal archiveReader/archiveWriter types?

@@ -52,7 +52,7 @@ func NewGRPCHandler(impl *GRPCHandlerStorageImpl) *GRPCHandler {
// NewGRPCHandler creates a handler given implementations grouped by plugin services.
func NewGRPCHandlerWithPlugins(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this? I don't see anything using this, I was expecting the remote-storage to utilize this framework, but if it doesn't then we can delete it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(maybe in a prequel PR)

Comment on lines 73 to 76
"--cassandra-archive.servers=127.0.0.1",
"--cassandra-archive.basic.allowed-authenticators=org.apache.cassandra.auth.PasswordAuthenticator",
"--cassandra-archive.password=" + password,
"--cassandra-archive.username=" + username,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we drop these? The inheritance should take care of them.

af := s.initializeESFactory(t, []string{
"--es-archive.enabled=true",
fmt.Sprintf("--es-archive.tags-as-fields.all=%v", allTagsAsFields),
fmt.Sprintf("--es-archive.index-prefix=%v", indexPrefix),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be dropped, for inheritance?

Comment on lines +36 to +37
err := command.ParseFlags(s.flags)
require.NoError(t, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
err := command.ParseFlags(s.flags)
require.NoError(t, err)
require.NoError(t, command.ParseFlags(s.flags))

Comment on lines +42 to +43
f := initFactory()
af := initFactory()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this creates a gRPC proxy pointing to the same remote storage port? I.e. not really testing two separate storage namespaces? Maybe we can fix in the follow-up.

impl.ArchiveSpanWriter = func() spanstore.Writer { return qOpts.ArchiveSpanWriter }
ar, aw := f.InitArchiveStorage(logger)
impl.ArchiveSpanReader = func() spanstore.Reader { return ar }
impl.ArchiveSpanWriter = func() spanstore.Writer { return aw }

handler := shared.NewGRPCHandler(impl)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing Remote Storage API v1 has dedicated methods for WriteSpan and ArchiveSpan, exposed on the same gRPC service / port. Do we actually want that? Because the grpc storage client no longer knows about primary/archive, so it has only one SpanWriter that always calls WriteSpan gRPC method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants