Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove df property and change to_pandas_dataframe to to_pandas #146

Merged
merged 9 commits into from
Sep 25, 2020

Conversation

tamargrey
Copy link
Contributor

@tamargrey tamargrey commented Sep 24, 2020

closes #116
closes #147
closes #148

  • removes .df property
  • changes to_pandas_dataframe to to_pandas
  • makes dataframe a protected attribute by changing to .dataframe
  • remove replace_none parameter and don't fillna in the DataTables init

@codecov
Copy link

codecov bot commented Sep 24, 2020

Codecov Report

Merging #146 into main will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #146   +/-   ##
=======================================
  Coverage   99.82%   99.82%           
=======================================
  Files          20       20           
  Lines        1740     1742    +2     
=======================================
+ Hits         1737     1739    +2     
  Misses          3        3           
Impacted Files Coverage Δ
woodwork/data_table.py 100.00% <100.00%> (+0.38%) ⬆️
woodwork/tests/data_table/test_datatable.py 100.00% <100.00%> (ø)
woodwork/tests/testing_utils/data_table_utils.py 100.00% <100.00%> (ø)
woodwork/tests/conftest.py 90.90% <0.00%> (-9.10%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 466a702...e82cbe5. Read the comment docs.

@thehomebrewnerd
Copy link
Contributor

I'm still thinking about whether to_pandas should return a copy of the dataframe or a reference to the dataframe (same with the .dataframe attribute on the table).

Consider this scenario:

  • User creates a data table object that contains an integer column - we set this up with Integer logical type and Int64 dtype
  • User accesses the dataframe from this data table to use in other calculations and during this the integer column is modified in place - say by a division operation that converts the integers into floats
  • Now, if the user runs dt.types on this DataTable, the LogicalType will still be Integer, but the underlying dataframe columns will have a dtype of float64

We are now out of sync - and the logical type information on the DataTable is not accurate. Should we allow this to happen? Same type of thing could happen if a string column got converted into integers/floats.

Copy link
Contributor

@gsheni gsheni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor sugggestions

woodwork/data_table.py Outdated Show resolved Hide resolved
woodwork/data_table.py Outdated Show resolved Hide resolved
def to_pandas(self, copy=False):
"""Retrieves the DataTable's underlying dataframe.

Note: Do not modify the dataframe unless copy=True has been set to avoid unexpected behavior
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

Copy link
Contributor

@gsheni gsheni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@tamargrey tamargrey merged commit afc94eb into main Sep 25, 2020
@gsheni gsheni mentioned this pull request Sep 28, 2020
@gsheni gsheni deleted the to_pandas branch October 16, 2020 20:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants