Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat] ppl #2094

Open
wants to merge 35 commits into
base: main
Choose a base branch
from
Open

[feat] ppl #2094

wants to merge 35 commits into from

Conversation

wu-hui
Copy link
Contributor

@wu-hui wu-hui commented Jul 30, 2024

No description provided.

@wu-hui wu-hui requested review from a team as code owners July 30, 2024 16:26
@product-auto-label product-auto-label bot added size: xl Pull request size is extra large. api: firestore Issues related to the googleapis/nodejs-firestore API. labels Jul 30, 2024
@wu-hui wu-hui requested a review from MarkDuckworth July 31, 2024 17:56
it('arrayContainsAll works', async () => {
const results = await randomCol
.pipeline()
.where(Field.of('tags').arrayContainsAll('adventure', 'magic'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish this read .where(field('tags').arrayContains... or even .where(Field.named('tags').arrayContains.... As I read through the code, I continue to read it as "where field of something" but the meaning is "where field something".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for Java Andy added an function overloads that covers the "common" usecase -- which is the case that the first parameter is a string specifying the name of the field. Is that possible to do for TS?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it adds confusion when reading one line
.where(arrayContains('tags', 'comedy')). The affected field is the argument.

And then reading another line
Field.of('tags').arrayContainsAll('adventure', 'magic'). The affected field is this.

I'm assuming you're only switching around for testing completeness, but also I'm just making sure we need to support both patterns?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Field.of is a Java convention, i am not opposed to make it Field.named() or something else. I think we need more inputs.

Copy link
Contributor

@MarkDuckworth MarkDuckworth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just starting review. Will resume next week, but providing some thoughts now.

const results = await randomCol
.pipeline()
.select(
Field.of('tags').arrayConcat('newTag1', 'newTag2').as('modifiedTags')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if arrayConcat(['newTag1', 'newTag2']) would be better since it is the concatenation of multiple arrays.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

.select(
arrayFilter(Field.of('tags'), arrayElement().eq('comedy')).as(
'filteredTags'
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if we can have the array filter syntax mirror the query filter syntax.

Field.of('tags').arrayFilter().where(eq(arrayElement(), 'comedy')))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be written as:
Field.of('tags')
.arrayFilter(arrayElement().eq('comedy'))
.as('filteredTags')

which is not bad?

arrayTransform(
Field.of('tags'),
arrayElement().strConcat('transformed')
).as('transformedTags')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, except mirror query select syntax.

Field.of('tags').arrayTransform().select(arrayElement().strConcat('transformed'))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, it looks decent after i rewrite it with fluent builder.

it('arrayContainsAll works', async () => {
const results = await randomCol
.pipeline()
.where(Field.of('tags').arrayContainsAll('adventure', 'magic'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it adds confusion when reading one line
.where(arrayContains('tags', 'comedy')). The affected field is the argument.

And then reading another line
Field.of('tags').arrayContainsAll('adventure', 'magic'). The affected field is this.

I'm assuming you're only switching around for testing completeness, but also I'm just making sure we need to support both patterns?

it('testRegexMatches', async () => {
const results = await randomCol
.pipeline()
.where(regexMatch('title', '.*(?i)(the|of).*'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understand the difference between regexContains and regexMatch. Perhaps we should drop one of these. Most authors using regex should be able to write an expressions that match beginning and end of line.
regexMatch('title', '(?i)(the|of)') vs regexMatch('title', '^.*(?i)(the|of).*$').

Or perhaps drop regex as a prefix.
where(contains('title', 'the'))
where(contains('title', Regex.of('the|of'))

const results = await randomCol
.pipeline()
.select(
Field.of('awards').mapGet('hugo').as('hugoAward'),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also consider Field.of('awards.hugo').as('hugoAward'), or is there some special behavior to mapGet?

Does mapGet() support dot notation?
Field.of('awards').mapGet('hugo.year').as('hugoAwardYear')
vs
Field.of('awards').mapGet('hugo').mapGet('year').as('hugoAwardYear')

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. It looks like mapGet does not support nested fields (always returns null).

One benefit is maybe mapGet supports field with special chars better (., /, etc).

const results = await randomCol
.pipeline()
.select(
cosineDistance(Constant.ofVector(sourceVector), targetVector).as(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider just Constant.vector(sourceVector)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

),
euclideanDistance(Constant.ofVector(sourceVector), targetVector).as(
'euclideanDistance'
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern will support KNN. If eventually there is ANN support we may need a different pattern. Do we need to consider that now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand.

it('testNestedFields', async () => {
const results = await randomCol
.pipeline()
.where(eq('awards.hugo', true))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. Does this make mapGet redundant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

} else if (selectable instanceof ExprWithAlias) {
const expr = selectable as ExprWithAlias<Expr>;
result.set(expr.alias, expr.expr);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Selectable is an interface, should we make the interface contain the common methods you need here:

interface Selectable {
   // @internal
   toExprMap(): Map<string, Expr>;
}

return new Pipeline(this.db, copy);
}

distinct(...groups: (string | Selectable)[]): Pipeline {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider making distinct and select (and possibly other functions) consistent in their API surface.

Either make the API

select(...fields: (Selectable | string)[]): Pipeline
distinct(...groups: (Selectable | string)[]): Pipeline

Or:

select(...fields: string[]): Pipeline
select(...fields: Selectable[]): Pipeline
distinct(...groups: string[]): Pipeline
distinct(...groups: Selectable[]): Pipeline

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

aggregate(...accumulators: AccumulatorTarget[]): Pipeline;
aggregate(options: {
accumulators: AccumulatorTarget[];
groups?: (string | Selectable)[];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment. Maybe this is pushing us to change select to select(...fields: (Selectable | string)[]): Pipeline

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah.

field: string | Field,
vector: FirebaseFirestore.VectorValue | number[],
options: FindNearestOptions
): Pipeline;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to update this to the single options param design as used in the Query API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is on my todo.

sort(options: {
orderings: Ordering[];
density?: 'unspecified' | 'required';
truncation?: 'unspecified' | 'disabled';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between 'unspecified' and undefined (or not passing a value)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are the same. It does not matter anymore though, the backend decided to remove these options.

dev/src/stage.ts Outdated
}
}

export class GenerateStage implements Stage {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be GenericStage to match the API method Pipeline.genericStage()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.


arrayConcat(...values: Expr[]): ArrayConcat;
arrayConcat(...values: any[]): ArrayConcat;
arrayConcat(...values: any[]): ArrayConcat {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider making values a standard array parameter, not rest parameters.

dotProductDistance(other: Expr): DotProductDistance;
dotProductDistance(other: VectorValue): DotProductDistance;
dotProductDistance(other: number[]): DotProductDistance;
dotProductDistance(other: Expr | VectorValue | number[]): DotProductDistance {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be more accurate to just call this this and the class dotProduct. Drop "Distance"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

static of(value: Array<any>): Constant;
static of(value: Map<string, any>): Constant;
static of(value: VectorValue): Constant;
static of(value: api.IValue): Constant;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to make this more extensible, you could also add an interface

interface FirestoreProtoSerializable {
  toProto(): api.IValue;
}

and have a static of(value: FirestoreProtoSerializable): Constant

}
}

class DotProductDistance extends Function {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same. Just "DotProduct" is probably more accurate.

Copy link
Contributor

@MarkDuckworth MarkDuckworth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done with reviews for now

return new Fields([Field.of(name), ...others.map(Field.of)]);
}

static ofAll(): Fields {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will remove.

filterable = true as const;
}

class ArrayContainsAny extends Function implements FilterCondition {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is FilterCondition always a function that returns a boolean? If so, would it be useful to make a more generic BooleanFunction interface in case there is a future use case for boolean functions outside of filtering?

export function add(left: string, right: Expr): Add;
export function add(left: string, right: any): Add;
export function add(left: Expr | string, right: Expr | any): Add {
const normalizedLeft = typeof left === 'string' ? Field.of(left) : left;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great candidate for a helper Field.fromArgument(left). Similar to FieldPath.fromArgument(stringOrFieldPath)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

export function add(left: string, right: any): Add;
export function add(left: Expr | string, right: Expr | any): Add {
const normalizedLeft = typeof left === 'string' ? Field.of(left) : left;
const normalizedRight = right instanceof Expr ? right : Constant.of(right);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expr.fromExpressionOrConstant(right)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

export function inAny(element: Expr, others: Expr[]): In;
export function inAny(element: Expr, others: any[]): In;
export function inAny(element: string, others: Expr[]): In; // Added overload
export function inAny(element: string, others: any[]): In; // Added overload
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this behavior is different from the classic in() supporting array, should we also change Expr.in(...) to name Expr.inAny(...). Or if this is only supporting in arrays, would it make more sense to name it inArray(...)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will rename Expr.in to inAny.

export function notInAny(element: Expr, others: Expr[]): Not;
export function notInAny(element: Expr, others: any[]): Not;
export function notInAny(element: string, others: Expr[]): Not; // Added overload
export function notInAny(element: string, others: any[]): Not; // Added overload
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

Copy link
Contributor Author

@wu-hui wu-hui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback! I addressed some of those, and maybe we should discuss for the rest.

return new Pipeline(this.db, copy);
}

distinct(...groups: (string | Selectable)[]): Pipeline {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

aggregate(...accumulators: AccumulatorTarget[]): Pipeline;
aggregate(options: {
accumulators: AccumulatorTarget[];
groups?: (string | Selectable)[];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah.

field: string | Field,
vector: FirebaseFirestore.VectorValue | number[],
options: FindNearestOptions
): Pipeline;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is on my todo.

sort(options: {
orderings: Ordering[];
density?: 'unspecified' | 'required';
truncation?: 'unspecified' | 'disabled';
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are the same. It does not matter anymore though, the backend decided to remove these options.

orderings: Ordering[];
density?: 'unspecified' | 'required';
truncation?: 'unspecified' | 'disabled';
},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With density and truncation removed, it's probably an overkill now.

arrayTransform(
Field.of('tags'),
arrayElement().strConcat('transformed')
).as('transformedTags')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, it looks decent after i rewrite it with fluent builder.

const results = await randomCol
.pipeline()
.select(
cosineDistance(Constant.ofVector(sourceVector), targetVector).as(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

),
euclideanDistance(Constant.ofVector(sourceVector), targetVector).as(
'euclideanDistance'
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand.

const results = await randomCol
.pipeline()
.select(
Field.of('awards').mapGet('hugo').as('hugoAward'),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. It looks like mapGet does not support nested fields (always returns null).

One benefit is maybe mapGet supports field with special chars better (., /, etc).

it('testNestedFields', async () => {
const results = await randomCol
.pipeline()
.where(eq('awards.hugo', true))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

*/
static of(name: string): Field;
static of(path: firestore.FieldPath): Field;
static of(nameOrPath: string | firestore.FieldPath): Field;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This overload is redundant. static of(nameOrPath: string | firestore.FieldPath): Field; Either remove this or keep it and remove the two above.

* @param value The Firestore proto value.
* @return A new `Constant` instance.
*/
static of(value: api.IValue): Constant;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to be redeclared below on line 2083

return this.value;
}

return serializer.encodeValue(this.value)!;
Copy link
Contributor

@MarkDuckworth MarkDuckworth Oct 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

serializer.endcodeValue(undefined) will return null, which is a permitted but unhandled case. Do we want to treat undefined as null?

wu-hui and others added 5 commits October 24, 2024 14:46
* # Conflicts:
#	dev/conformance/runner.ts
#	dev/src/reference/query.ts
#	dev/system-test/firestore.ts
#	types/firestore.d.ts

* fix
Copy link

Warning: This pull request is touching the following templated files:

tom-andersen and others added 2 commits January 16, 2025 20:40
* new stages

* add tests

* fix

* fix

* Fix code examples

* Add unnest options to `d.ts` file.

* Fix union test

* Add test, fix documentation, refactor.

* Documentation

* Undo
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: firestore Issues related to the googleapis/nodejs-firestore API. size: xl Pull request size is extra large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants