Detailed formatcheck.py documentation

Functions

add_item

Parameter(s):

key - Item to be added or modified
value - Unit of measurement to be associated with key
dct - The Dictionary that this is being applied to

If key is already in the dictionary, add the value to the set associated with the key

Otherwise, add the associate the new key with a new set containing only value

[Ex 1.] key = "Gas", value = "mcf" -> { "Gas": {"mcf"} }

[Ex 2.] key = "Geothermal - Electrical Generation", value = "Kilowatt Hours" 
        -> { "Geothermal - Electrical Generation": {"Kilowatt Hours"} }

        key = "Geothermal - Electrical Generation", value = "Thousands of Pounds" 
        -> { "Geothermal - Electrical Generation": {"Kilowatt Hours", "Thousands of Pounds"} }

get_com_pro

Parameter(s):

cols - Columns from Pandas DataFrame Checks cols for "Commodity" or "Product"

Returns "n/a" if "Commodity" and "Product" are both present or both missing

Otherwise it returns whichever is present

[Ex 1.] cols = ["Commodity"] -> returns "Commodity"
[Ex 2.] cols = ["Product"] -> returns "Product"
[Ex 3.] cols = ["Commodity", "Product"] -> returns "Commodity"

get_data_type

Parameter(s):

name - Name of the Excel file

Field(s):

lower - name in all lowercase letters
prefixes = ["cy","fy","monthly","company","federal","native","production","revenue","disbribution"]

Returns a String based on the Excel file given

If any entries from prefixes are found in name, they will be added to the final String

[Ex] name = "federal_production_CY03-18" -> returns "cyfederalproduction_"

split_unit

Parameter(s):

string - String to be split

Returns a List of Strings separated either by the right-most opening parentheses "(" or the left-most comma ","

[Ex 1] string = "Gas (mcf)" -> ["Gas", "mcf"]
[Ex 2] string = "Geothermal - Electrical Generation, Kilowatt Hours" 
       = ["Geothermal - Electrical Generation", "Kilowatt Hours"]
[Ex 3] string = "Geothermal - sulfur" = ["Geothermal - sulfur", ""]

Class: Setup

get_header

Parameter(s):

file - A Pandas DataFrame

Returns column names as a List

get_unit_dict

Returns a dictionary of item and units. Calls split_unit and add_item

Product
Salt (tons)
Soda Ash (tons)
Sodium Bi-Carbonate (tons)
Gas (mcf)
Borate Products (tons)

Returns {"Salt" : "tons", 
         "Soda Ash" : ", 
         "Sodium Bi-Carbonate : "tons", 
         "Gas" : "mcf", 
         "Borate Products" : "tons"}

Class: FormatChecker

read_config

Parameter(s):

type - Prefix for config file represented by a String

Returns an a dictionary based on the JSON file

get_w_count

Parameter(s):

file - A Pandas DataFrame

Returns a tuple based on the number of "W"s found in Volume or "Withheld"s found in State

Calendar Year  Land Category  Land Class     State  ... Product                       Volume
2003                 Onshore     Federal        CA  ... Salt (tons)                   33,622
2003                 Onshore     Federal        CA  ... Soda Ash (tons)                    W
2003                 Onshore     Federal        CA  ... Sodium Bi-Carbonate (tons)         W
2003                 Onshore     Federal        CA  ... Gas (mcf)                    4,885.6
2003	             Onshore	 Federal  Withheld  ... Borate Products (tons)	      31,124

Returns (2,1)

check_header

Parameter(s):

file - A Pandas DataFrame

Iterates through default header and checks if specific Field Names are present.

Prints out if a Field Name is missing or in the wrong order

Unexpected Field Names are printed separately.

[Ex] default = ["Month", "Calendar Year", "Land Class", "Land Category", "Commodity", "Volume"]
     columns = ["Moth", "Calendar Year", "Land Category", "Land Class", "Commodity", "Volume"]

-> "Month": Not Present, "Land Category": Unexpected Order, "Land Class": Unexpected Order
   New Cols: Moth

check_misc_cols

Parameter(s):

file - A Pandas DataFrame

Iterates through non-numerical fields and checks for unexpected entries.

Also checks Calendar Year

check_nan

Parameter(s):

file - A Pandas DataFrame

Iterates through specific columns and prints out cell with missing information

check_unit_dict

Parameter(s):

file - A Pandas DataFrame

Iterates through column with expected units. Splits each entry by item and unit. Compare to default unit dictionary to determine if valid

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detailed formatcheck.py documentation

Functions

add_item

get_com_pro

get_data_type

split_unit

Class: Setup

get_header

get_unit_dict

Class: FormatChecker

read_config

get_w_count

check_header

check_misc_cols

check_nan

check_unit_dict

Home Page

(Limited Access) Picture Guide

Getting run.bat to work

JSON files

Some Terminal Stuff

Glossary

(DEV) Format Check Doc

(DEV) Number Checker

For Reference

Clone this wiki locally