Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redshift table size estimates gigabytes as kilobytes. #2702

Closed
1 of 5 tasks
vogt4nick opened this issue Aug 14, 2020 · 2 comments · Fixed by #2703
Closed
1 of 5 tasks

Redshift table size estimates gigabytes as kilobytes. #2702

vogt4nick opened this issue Aug 14, 2020 · 2 comments · Fixed by #2703
Labels
bug Something isn't working good_first_issue Straightforward + self-contained changes, good for new contributors! redshift

Comments

@vogt4nick
Copy link
Contributor

Describe the bug

The data catalog offers a feature to estimate table size. For Redshift tables, this estimate reports GB as KB.

image

I can verify this references a table that is 44 GB, not 44 KB.

/~https://github.com/fishtown-analytics/dbt/blob/03010ecde71634d11f80949018c39a63070f1008/plugins/redshift/dbt/include/redshift/macros/catalog.sql#L150

The problem behavior appears to be in this line of code where the bytes are divided by 1e6.

Steps To Reproduce

AFAICT this bug is created by adding a Redshift table to dbt and building the data catalog.

Expected behavior

It should report the correct estimate.

Screenshots and log output

image

I can verify this references a table that is 44 GB, not 44 KB.

System information

Which database are you using dbt with?

  • postgres
  • redshift
  • bigquery
  • snowflake
  • other (specify: ____________)

The output of dbt --version:

installed version: 0.17.2
   latest version: 0.17.2

Up to date!

Plugins:
  - postgres: 0.17.2
  - redshift: 0.17.2
  - snowflake: 0.17.2
  - bigquery: 0.17.2

The operating system you're using: Docs are built with Docker image python:3.7.7-slim-buster. Docs are served with Docker image nginx:1.19.0-alpine.

The output of python --version: 3.7.7

Additional context

/~https://github.com/fishtown-analytics/dbt/blob/fb8065df2711d5f5978865a45cc0725a06e670f1/core/dbt/utils.py#L445

Noting because it's related: I'm reasonably confident the issue is not in the format_bytes usage.

@vogt4nick vogt4nick added bug Something isn't working triage labels Aug 14, 2020
@jtcohen6 jtcohen6 added good_first_issue Straightforward + self-contained changes, good for new contributors! redshift and removed triage labels Aug 14, 2020
@jtcohen6
Copy link
Contributor

Nice catch! Thanks for opening the PR as well. I'll take a look now.

@vogt4nick
Copy link
Contributor Author

vogt4nick commented Aug 14, 2020

A minor update:

/~https://github.com/fishtown-analytics/dbt/blob/03010ecde71634d11f80949018c39a63070f1008/plugins/redshift/dbt/include/redshift/macros/catalog.sql#L150

The problem behavior appears to be in this line of code where the megabytes are divided by 1e6.

Correction is to multiply by 1e6, not to remove the arithmetic entirely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good_first_issue Straightforward + self-contained changes, good for new contributors! redshift
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants