-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-factoring Categorical ROI #601
Conversation
np.array([True, True, False])) | ||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I remember the problem correctly, the issue is that CategoricalComponents are internally represented as integer arrays, with a mapping from index to category. This mapping was potentially different for two Components, even if their categories overlapped / were the same. Because the ROI logic was using the underlying numerical arrays and not the categories, they in general wouldn't filter properly.
Do I have that right? If so, let's add some tests that setup up multiple CategoricalCoponents with different number->category mappings, and ensure they filter correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. In the CategoricalRoi code proper I have some logic to deal with inputs of list
, np.array
, pd.Series
and CategoricalComponents
and transparently deal with all variants.
With the components I access their underlying .categorical_data
to get the actual data instead of the digitized versions. This way it will be comparable across different Data
objects, something that didn't really work in the previous iterations.
This is great! This behavior has been broken for too long. Regarding integrating with the viewers: after a user draws an ROI on the screen viewers typically call apply_roi (/~https://github.com/glue-viz/glue/blob/master/glue/clients/scatter_client.py#L254). These functions take an roi as input, build a subset from it, and then apply the subset. So ideally you would hook into this function and customize the ROI before building a subset state. Note that multiple different ROIs flow into this function, so you shouldn't assume (eg) a RangeROI. |
Yeah, I saw that area, I figured there had to be something that returns an RoiSubsetState to be passed along to other areas ... but I guess that's done through the magic of callbacks. From a UI standpoint, do you think I should only accept Range and Rectangular ROIs when one (or both) axes are Categorical? I can't quite picture how I would implement polygonal ROI with one categorical axis and one continuous axis. |
I agree that polygonal selection probably doesn't make sense. In addition to rectangular/range selection it might be cool in future for categorical histograms to allow clicking on a single bin or multiple bins (pressing the shift or command key) to select multiple histogram bins. |
Oooo, that would be really slick. Refactoring the PointRoi should work for On Thu, Apr 2, 2015, 7:33 AM Thomas Robitaille notifications@github.com
|
This is actually the internal behavior of the range selector in the histogram (ie it expands to bin boundaries), but it would be good to actually expose that in the UI and let the user shift between smooth/snapped selection |
We rely on the kind of ugly use of the EditSubsetMode singleton to actually decide how to apply subsets to the datasets. It's dumb :). The upshot is that in that function you should just have to modify the roi, build an otherwise-normal RoiSubsetState, and let EditSubsetMode apply it. |
Good question. Unfortunately we don't have good UI mechanisms that alert the user why a particular subset isn't valid. So it might be confusing to implement this (unless you want to flat-out disable the mouse modes that let users draw an unsupported ROI). You could imagine implementing a hybrid categorical/continuous ROI -- there are a couple of ways to define it, but one way to do it is to take the bounding-box of the shape, apply a categorical filter on one dimension, a range filter on the other, and "and" them together. |
This is what I intended to do. But it only works with a rectangular ROI; an arbitrarily constructed polygon (or circle) can't really be built this way. Is there a mechanism for "graying out" particular mouse modes? If so, I could just gray-out the ones that aren't logical when the user makes one of the axes a categorical one. |
@JudoWill - thanks! could you rebase to make sure that Travis passes now? (there was an issue that should be fixed in master). Is this ready for a final review? |
@astrofrog - I want to add a few more tests and add more docs to how the categorical system works. |
@JudoWill - ok, thanks! |
9384003
to
ce4070e
Compare
@JudoWill - thanks! There is one failure on Travis which looks genuine and is only being caught by older versions of Numpy (but is in fact a real problem) - when
should not be called. The only reason this works in recent versions of Numpy is because it returns:
which I don't think is what we want. |
Yup, definitely a bug. Fixed it and added a test that should make sure it doesn't pop up again. |
This looks good! @JudoWill - did you want to include any docs about this? (you said before * I want to add ... more docs to how the categorical system works.*). Just thought I'd check before we go ahead and merge :) |
@astrofrog You can merge it, I'm making another PR with a whole bunch of sphinx docs edits. I put the immediately important docs in the doc-strings already in the code. |
Ok, sounds good! @ChrisBeaumont - does this look good to you too? @JudoWill - when preparing the other pull request, can you also add two entries to the |
+1 As a followup, I think we should refactor logic branches in the viewers that look like:
And instead add |
Re-factoring Categorical ROI
As part of the effort @aak65 is doing implementing some Bio-glue features he's going to need a more robust ability to pass around ROIs and subsets that are based on categorical data in a way that's more robust then my approach before ... which was basically convert everything to numbers and hope it all worked out.
This has all of the backings for the new ROI/Subset but I can't quite figure out where to fold it into the Historgram & Scatter viewers. My ideal situation would be after the user uses a Range-selector on an axis defined by a categorical component it would adjust the returned ROI accordingly. Any ideas on where I would find that logic? @ChrisBeaumont or @astrofrog