Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

De-hack uns for single gene #309

Closed
csweaver opened this issue Oct 11, 2018 · 10 comments · Fixed by #371
Closed

De-hack uns for single gene #309

csweaver opened this issue Oct 11, 2018 · 10 comments · Fixed by #371
Assignees
Labels
backend core backend (rest api, web, engine) cleanup

Comments

@csweaver
Copy link
Contributor

data.uns = cell_data.uns fails if there is a single gene. Need to fix.

@csweaver csweaver self-assigned this Oct 11, 2018
@csweaver csweaver added backend core backend (rest api, web, engine) cleanup labels Oct 11, 2018
@csweaver
Copy link
Contributor Author

This may be a scanpy error, get Alex minimal test case.

@csweaver
Copy link
Contributor Author

Submitted issue to scanpy: scverse/scanpy#323

@falexwolf
Copy link
Contributor

Fixed in scverse/anndata@ae8a1d0. Thank you!

@bkmartinjr
Copy link
Contributor

FYI I'm also seeing a non-trivial performance cost to the .uns deepcopy() on each slice of an anndata object. Still digging into why, but given that the cellxgene back-end has no dependency on this, we may just want to clobber .uns upon load

The code in question: anndata/base.py:_init_as_view(), lines 689::

        uns_new = deepcopy(self._adata_ref._uns)

@bkmartinjr
Copy link
Contributor

@falexwolf - thanks for the fix! Do you expect it to be released soon? It would allow us to remove a bunch of special-case code which handles the forced dimension drop. Thanks!

@bkmartinjr
Copy link
Contributor

Very related issue, just filed on scanpy -- you can't slice both dimensions if one of the results is single-valued. There appears to be an internal inconsistency with how dimension flattening is done in AnnData.

scverse/scanpy#332

@bkmartinjr bkmartinjr self-assigned this Oct 28, 2018
@bkmartinjr
Copy link
Contributor

@csweaver - I will clean up the uns work-around as part of the work I'm doing on slicing performance. It is closely related to the issue I just filed against scanpy.

@falexwolf
Copy link
Contributor

FYI I'm also seeing a non-trivial performance cost to the .uns deepcopy() on each slice of an anndata object. Still digging into why, but given that the cellxgene back-end has no dependency on this, we may just want to clobber .uns upon load

Sorry about the late response: .uns also stores the graph of cells that is used in the embeddings. This graph is sliced when slicing along the observations dimension. It is just copied when slicing along the variables dimension.

I see that this is an efficient solution if you want to extract single genes, etc.

Maybe one should find a more light-weight solution for that? But I'd need to know a bit more...

@bkmartinjr
Copy link
Contributor

wrt .uns copy time -- our current use case boils down to slicing, on both axis using boolean array selectors, to access either annotations or X. We do not (yet) expose/use the info stored in .uns to the visualization.

So the "need" is to slice and access X or annotations, without the overhead of copying .uns.

I'm not sure this is a high priority item, especially as it may require API changes, and isn't our largest time sink. Hence the air quotes.

@falexwolf
Copy link
Contributor

We should have a better way for serving this case in the future, agreed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend core backend (rest api, web, engine) cleanup
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants