-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Models update 2 #924
base: main
Are you sure you want to change the base?
Models update 2 #924
Conversation
Add validation and tests for the following models: - Bounding box - Pose - Segments Add conversion to/from different bounding box formats, including normalized versions.
Deploying datachain-documentation with
|
Latest commit: |
fdc7f1b
|
Status: | ✅ Deploy successful! |
Preview URL: | https://730b1e9f.datachain-documentation.pages.dev |
Branch Preview URL: | https://models-update-2.datachain-documentation.pages.dev |
from pydantic import Field | ||
|
||
from datachain.lib.data_model import DataModel | ||
from datachain.model.utils import BBoxType, convert_bbox | ||
|
||
|
||
class BBox(DataModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This model is trivial now. The only thing I've added is type conversion in convert
method.
The bounding box is defined by two points: | ||
- (x1, y1): The top-left corner of the box. | ||
- (x2, y2): The bottom-right corner of the box. | ||
coords (list[float]): The coordinates of the bounding box. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Floats are more universal here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why? could you give more details?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! Since it is a generic model, user can decide to store bounding boxes of any format here: COCO, YOLO, Albumentation, etc. Normalized or not. It is just a bare model.
To be honest I don't think we even need these bare models for BBox, Pose, etc, we can leave only models for YOLO for now and add other dedicated models later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it can break some downstream stuff in data loader / training though? - if it expects ints
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, should we include type into the class? to know is it yolo or something else?
name: list[str] = Field(default=[]) | ||
confidence: list[float] = Field(default=[]) | ||
box: list[list[float]] = Field(default=[]) | ||
orig_shape: list[int] = Field(default=[]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've decided it is also a good idea to keep orig_shape
here, to do not read image file when shape is needed.
else (0, 0) | ||
) | ||
|
||
def to_albumentations(self) -> Iterator[list[float]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
YoloBox
have bounding box type conversion (also below)
right_ankle = 16 | ||
|
||
|
||
class YoloPose(YoloBox): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yolo pose result have everything yolo bbox have + poses.
) | ||
|
||
|
||
class YoloSeg(YoloBox): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yolo segment result have everything yolo bbox have + segments.
) | ||
|
||
|
||
class YoloCls(DataModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if anyone will use it but Yolo have models so I've added this model here too.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #924 +/- ##
==========================================
+ Coverage 87.66% 88.19% +0.52%
==========================================
Files 130 128 -2
Lines 11700 11680 -20
Branches 1592 1612 +20
==========================================
+ Hits 10257 10301 +44
+ Misses 1043 979 -64
Partials 400 400
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
title=title, | ||
coords=[round(c) for c in coords], | ||
) | ||
def from_list(coords: Sequence[float], title: str = "") -> "BBox": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leave this method for now since it is used in many places, but mark it as deprecated
. No need in separate method if one can build this model with __init__
method directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
internally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean instead of using:
BBox.from_list([10, 20, 30, 40])
customers can use:
BBox(coords=[10, 20, 30, 40])
We don't use this model anymore in the datachain source code for now (except for the examples).
assert_array_almost_equal(yolo_boxes[1], [0.9, 0.9, 0.2, 0.2], decimal=3) | ||
|
||
|
||
def test_yolo_obb(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: what to we test with this kind of tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are testing if models were not changed, since other code may rely on them.
def test_pose(): | ||
pose = Pose(x=[10, 20, 30], y=[40, 50, 60]) | ||
assert pose.model_dump() == { | ||
"x": [10.0, 20.0, 30.0], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should these be ints?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
|
||
class BBox(DataModel): | ||
""" | ||
A data model for representing bounding box. | ||
|
||
Use `datachain.model.Yolo` or for YOLO-specific bounding boxes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like there is a typo
Returns: | ||
BBox: The bounding box instance. | ||
""" | ||
warn( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need to deprecate anything - we can break it for now
Another approach to make better Yolo models along with bounding boxes conversions.
Instead of trying to solve unsolvable problem with having one universal model for Bounding Boxes (same for Poses and Segments), it looks like it is better to have dedicated models for every common format, e.g. Yolo. Plus to have really basic and trivial universal models and let users to override them and decide what to do next.
See also updated docs page.
Checklist: