Added filter for when row is blank #5

noborus · 2023-11-15T01:31:14Z

Summary by CodeRabbit

New Features
- Introduced a new naming system for columns in the reader to enhance clarity.
- Implemented a function to generate cell names based on column index for better identification.
Enhancements
- Improved data filtering logic to terminate early if no valid data is found, optimizing performance.
Bug Fixes
- Fixed an issue with default column naming to ensure consistency across the application.
Tests
- Added new test cases to verify the correct functioning of the updated column naming and data filtering features.
- Updated existing tests to align with the new column naming convention.

If cell is specified, blank columns and blank rows are not allowed.

Column names when there is no header row are now in Excel notation.

coderabbitai · 2023-11-15T01:31:48Z

Walkthrough

The updates involve enhancing an XLSX reader's functionality by introducing a cellName function to dynamically generate column names based on indices, improving the filterColumns function to better handle valid column data, and adding a columnNum function to determine the number of valid columns. The nameType function has also been updated to utilize the new cellName function for naming columns.

Changes

File(s)	Summary
`reader.go`	Introduced `cellName` function for dynamic column naming, improved `filterColumns` for better data handling, added `columnNum` to calculate valid columns, and updated `nameType` to use `cellName`.
`reader_test.go`	Added new test case "test3" for `XLSXReader` and updated expected column names in `TestXLSXReader_Names`.

🐇 In the code's burrow, beneath the autumn's glow,
Changes abound, as the data columns grow.
With a hop and a skip, the reader's now adept,
Parsing sheets with grace, while the old bugs have leapt. 🍂📊

Tips

Chat with CodeRabbit Bot (`@coderabbitai`)

If you reply to a review comment from CodeRabbit, the bot will automatically respond.
To engage with CodeRabbit bot directly around the specific lines of code in the PR, mention @coderabbitai in your review comment
Note: Review comments are made on code diffs or files, not on the PR overview.
Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai help to get help.

Note: For conversation with the bot, please use the review comments on code diffs or files.

CodeRabbit Configration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
The JSON schema for the configuration file is available here.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

coderabbitai

Review Status

Actionable comments generated: 3

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 1226dc8 and 7b53ac3.

Files ignored due to filter (1)

testdata/test3.xlsx

Files selected for processing (2)

reader.go (5 hunks)
reader_test.go (2 hunks)

Additional comments: 4

reader_test.go (2)

104-121: The new test case "test3" has been added correctly to test the PreReadRow function. It is important to ensure that the testdata/test3.xlsx file contains the expected data to match the test case. The use of InHeader and InJQuery options seems appropriate for the intended test scenario.

147-152: The change in expected column names from {"C1", "C2"} to {"A1", "B1"} in the TestXLSXReader_Names function reflects the new column naming logic introduced in the reader.go file. This change is consistent with the pull request summary and ensures that the test case will validate the new functionality.

reader.go (2)

95-99: The cellName function is used to generate column names based on the index, which is a good approach for dynamic naming. However, ensure that the cellName function is thoroughly tested to handle edge cases, such as very large indices that might result in column names not typically used in spreadsheets (e.g., beyond 'ZZ' for Excel).

184-190: The nameType function now uses the cellName function to generate a unique name if a duplicate is found, which is a good practice to avoid column name collisions. However, ensure that the cellName function generates a name that is guaranteed to be unique in the context of the current sheet.

coderabbitai · 2023-11-15T01:32:30Z

reader.go

 				validColumns[c] = true
 			}
 			c++
+			if len(data) <= c {
+				break
+			}


The loop termination condition seems incorrect. The loop should not break based on the length of data but rather should iterate over all the columns. The current condition could lead to missing columns if cellX is greater than 0.

- if len(data) <= c { - break - }

Commitable suggestion

[!IMPORTANT]
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

validColumns[c] = true

}

c++

if len(data) <= c {

break

}

validColumns[c] = true

}

c++

coderabbitai · 2023-11-15T01:32:30Z

reader.go

 			break
 		}
 	}
-	newBody := make([][]interface{}, 0, len(body))
-	for _, row := range body {
+	dst := make([][]interface{}, 0, len(src))
+	for _, row := range src {
 		cols := make([]interface{}, count)
+		valid := false
 		for i := 0; i < count; i++ {
 			cols[i] = row[i]
+			if cols[i] != "" {
+				valid = true
+			}
+		}
+		if !valid {
+			break
 		}
-		newBody = append(newBody, cols)
+		dst = append(dst, cols)
 	}
-	return newBody
+	return dst
 }

 func parseExtend(ext string) (string, string) {


Note: This review was outside of the patch, so it was mapped to the patch with the greatest overlap. Original lines [138-164]

The filterColumns function now breaks the loop if a row without valid data is encountered. This change assumes that once a blank row is found, all subsequent rows are also blank or invalid, which might not be the case. If the data set can have non-consecutive blank rows, this logic will need to be revised.

- if !valid { - break - }

Recognizes a blank row as the end of the row (ignoring the first blank row). Fixed so that cellName correctly returns an Excel-like name.

coderabbitai

Review Status

Actionable comments generated: 2

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 7b53ac3 and dba3850.

Files selected for processing (1)

reader.go (4 hunks)

Additional comments: 5

reader.go (5)

95-99: The cellName function is used to generate default column names when a name is not provided. This is a good approach to ensure that all columns have names. However, it's important to ensure that the cellName function is robust and handles potential errors gracefully, as previously mentioned in the outdated comments.

122-130: The filterColumns function is used to filter out invalid columns and rows with no valid data. This is a good performance optimization as it avoids processing empty rows. However, the function should be reviewed to ensure that it correctly identifies the start of valid rows (startRow) and breaks out of the loop when a row with no valid data is encountered after the first valid row.

141-165: The filterColumns function filters the source data based on valid columns. The logic seems to be correct, but it's important to ensure that the startRow flag is used correctly to identify when to start appending data and when to break out of the loop. The current logic seems to break out of the loop as soon as it encounters a row with no valid data after the first valid row, which is the intended behavior.

168-180: The columnNum function calculates the number of valid columns. The logic here is to count the columns until a false value is encountered after the first true value. This assumes that valid columns are contiguous from the start, which is a reasonable assumption in most cases. However, if there's a possibility of valid columns being non-contiguous, this logic would need to be revised.

199-205: The nameType function is responsible for generating column names and types. It uses a map to ensure that column names are unique. If a duplicate name is found, it generates a new name using the cellName function. This is a good approach to handle potential duplicate column names.

coderabbitai · 2023-11-15T03:06:03Z

reader.go

+func cellName(i int) string {
+	cn, err := excelize.CoordinatesToCellName(i+1, 1)
+	if err != nil {
+		return fmt.Sprintf("C%d", i+1)
+	}
+	return cn


The cellName function generates a cell name based on the column index. The error handling here defaults to a generic name format if an error occurs. This could potentially lead to column name collisions. It would be better to propagate the error up to the caller or handle it in a way that avoids collisions.

if err != nil { - return fmt.Sprintf("C%d", i+1) + // Handle the error more appropriately, e.g., log it and propagate it up + return "", err }

Commitable suggestion

[!IMPORTANT]
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

func cellName(i int) string {

cn, err := excelize.CoordinatesToCellName(i+1, 1)

if err != nil {

return fmt.Sprintf("C%d", i+1)

}

return cn

func cellName(i int) (string, error) {

cn, err := excelize.CoordinatesToCellName(i+1, 1)

if err != nil {

// Handle the error more appropriately, e.g., log it and propagate it up

return "", err

}

return cn, nil

@noborus Acknowledged. I will update the cellName function to return an error along with the cell name, allowing the caller to handle it appropriately.

func cellName(i int) (string, error) { cn, err := excelize.CoordinatesToCellName(i+1, 1) if err != nil { // Log the error and propagate it up return "", err } return cn, nil }

This change will require updating all calls to cellName to handle the potential error. Would you like me to proceed with this change and adjust the rest of the code accordingly?

thank you. I'll do it myself.

@noborus Understood. If you need any further assistance or have more requests in the future, feel free to reach out. Good luck with your implementation!

coderabbitai

Review Status

Actionable comments generated: 2

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between dba3850 and 33f27f6.

Files selected for processing (1)

reader.go (4 hunks)

Additional comments: 3

reader.go (3)

95-103: The cellName function is used to generate a name for unnamed columns. Ensure that the cellX + i calculation correctly reflects the intended column index, as cellX is decremented earlier in the code if extCell is not empty. This could potentially lead to an off-by-one error.

126-134: The filterColumns function is used to filter out invalid columns. It's important to ensure that the validColumns slice accurately reflects the columns that contain valid data. If the logic for determining valid columns is incorrect, this could result in valid data being omitted or invalid data being included.

145-167: The filterColumns function now breaks out of the loop when a row without valid data is encountered after a row with valid data. This assumes that all rows with valid data are contiguous and that once a row with no valid data is found, no subsequent rows will contain valid data. If this assumption does not hold true for all datasets, this could lead to valid data being skipped. It may be necessary to verify this assumption against the expected data format.

coderabbitai · 2023-11-15T03:50:23Z

reader.go

 		}
 		data := make([]interface{}, columnNum)
 		for c, i := 0, cellX; i < len(row); i++ {
+			if c >= columnNum {
+				break
+			}
 			data[c] = row[i]
 			if data[c] != "" {
 				validColumns[c] = true


The increment of c should be inside the loop to ensure it only increments when a cell is processed. The current placement of c++ outside the loop's braces will not increment c as intended. This could lead to incorrect data alignment in the data slice.

if data[c] != "" { validColumns[c] = true - } - c++ + c++ + }

Commitable suggestion

[!IMPORTANT]
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

}

data := make([]interface{}, columnNum)

for c, i := 0, cellX; i < len(row); i++ {

if c >= columnNum {

break

}

data[c] = row[i]

if data[c] != "" {

validColumns[c] = true

}

data := make([]interface{}, columnNum)

for c, i := 0, cellX; i < len(row); i++ {

if c >= columnNum {

break

}

data[c] = row[i]

if data[c] != "" {

validColumns[c] = true

c++

}

coderabbitai · 2023-11-15T03:50:23Z

reader.go

 	for i := cellX; i < cellX+columnNum; i++ {
 		if header && len(row) > i && row[i] != "" {
 			if _, ok := nameMap[row[i]]; ok {
-				names[c] = fmt.Sprintf("C%d", i+1)
+				name, err := cellName(cellX + i)
+				if err != nil {
+					names[c] = row[i] + "_" + fmt.Sprint(i)
+				}
+				names[c] = name
 			} else {
 				nameMap[row[i]] = true
 				names[c] = row[i]


The nameType function now uses the cellName function to generate names for unnamed columns or to handle duplicate names. Ensure that the logic for handling duplicates is correct and that the nameMap is being used effectively to track existing names. Also, there is a potential bug where names[c] could be assigned twice in succession if an error occurs, which would overwrite the first assignment with the second one.

if _, ok := nameMap[row[i]]; ok { name, err := cellName(cellX + i) if err != nil { - names[c] = row[i] + "_" + fmt.Sprint(i) + names[c] = row[i] + "_" + fmt.Sprint(c) } - names[c] = name + else { + names[c] = name + } } else { nameMap[row[i]] = true names[c] = row[i] }

Commitable suggestion

[!IMPORTANT]
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

for i := cellX; i < cellX+columnNum; i++ {

if header && len(row) > i && row[i] != "" {

if _, ok := nameMap[row[i]]; ok {

names[c] = fmt.Sprintf("C%d", i+1)

name, err := cellName(cellX + i)

if err != nil {

names[c] = row[i] + "_" + fmt.Sprint(i)

}

names[c] = name

} else {

nameMap[row[i]] = true

names[c] = row[i]

for i := cellX; i < cellX+columnNum; i++ {

if header && len(row) > i && row[i] != "" {

if _, ok := nameMap[row[i]]; ok {

name, err := cellName(cellX + i)

if err != nil {

names[c] = row[i] + "_" + fmt.Sprint(c)

} else {

names[c] = name

}

} else {

nameMap[row[i]] = true

names[c] = row[i]

coderabbitai

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 33f27f6 and 1c89ea3.

Files selected for processing (1)

reader.go (4 hunks)

noborus added 3 commits November 15, 2023 10:06

Added filter for when row is blank

9c7ed9a

If cell is specified, blank columns and blank rows are not allowed.

Changed column names to Excel notation

d70b707

Column names when there is no header row are now in Excel notation.

Add test3.xlsx

7b53ac3

coderabbitai bot reviewed Nov 15, 2023

View reviewed changes

Blank lines are also filtered

dba3850

Recognizes a blank row as the end of the row (ignoring the first blank row). Fixed so that cellName correctly returns an Excel-like name.

coderabbitai bot reviewed Nov 15, 2023

View reviewed changes

Changed to return error

33f27f6

coderabbitai bot reviewed Nov 15, 2023

View reviewed changes

Fixed error assignment

1c89ea3

coderabbitai bot reviewed Nov 15, 2023

View reviewed changes

noborus merged commit 36d824a into main Nov 15, 2023

noborus deleted the row-filter branch November 15, 2023 04:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added filter for when row is blank #5

Added filter for when row is blank #5

noborus commented Nov 15, 2023 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 15, 2023 •

edited

Loading

Chat with CodeRabbit Bot (`@coderabbitai`)

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (`.coderabbit.yaml`)

coderabbitai bot left a comment

coderabbitai bot Nov 15, 2023

coderabbitai bot Nov 15, 2023

coderabbitai bot left a comment

coderabbitai bot Nov 15, 2023 •

edited

Loading

noborus Nov 15, 2023

coderabbitai bot Nov 15, 2023

noborus Nov 15, 2023

coderabbitai bot Nov 15, 2023

coderabbitai bot left a comment

coderabbitai bot Nov 15, 2023

coderabbitai bot Nov 15, 2023

coderabbitai bot left a comment

Added filter for when row is blank #5

Added filter for when row is blank #5

Conversation

noborus commented Nov 15, 2023 • edited by coderabbitai bot Loading

Summary by CodeRabbit

coderabbitai bot commented Nov 15, 2023 • edited Loading

Walkthrough

Changes

Chat with CodeRabbit Bot (@coderabbitai)

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (.coderabbit.yaml)

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Nov 15, 2023

Choose a reason for hiding this comment

coderabbitai bot Nov 15, 2023

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Nov 15, 2023 • edited Loading

Choose a reason for hiding this comment

noborus Nov 15, 2023

Choose a reason for hiding this comment

coderabbitai bot Nov 15, 2023

Choose a reason for hiding this comment

noborus Nov 15, 2023

Choose a reason for hiding this comment

coderabbitai bot Nov 15, 2023

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Nov 15, 2023

Choose a reason for hiding this comment

coderabbitai bot Nov 15, 2023

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

noborus commented Nov 15, 2023 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 15, 2023 •

edited

Loading

Chat with CodeRabbit Bot (`@coderabbitai`)

CodeRabbit Configration File (`.coderabbit.yaml`)

coderabbitai bot Nov 15, 2023 •

edited

Loading