-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added filter for when row is blank #5
Conversation
If cell is specified, blank columns and blank rows are not allowed.
Column names when there is no header row are now in Excel notation.
WalkthroughThe updates involve enhancing an XLSX reader's functionality by introducing a Changes
TipsChat with CodeRabbit Bot (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 3
Configuration used: CodeRabbit UI
Files ignored due to filter (1)
- testdata/test3.xlsx
Files selected for processing (2)
- reader.go (5 hunks)
- reader_test.go (2 hunks)
Additional comments: 4
reader_test.go (2)
104-121: The new test case "test3" has been added correctly to test the
PreReadRow
function. It is important to ensure that thetestdata/test3.xlsx
file contains the expected data to match the test case. The use ofInHeader
andInJQuery
options seems appropriate for the intended test scenario.147-152: The change in expected column names from {"C1", "C2"} to {"A1", "B1"} in the
TestXLSXReader_Names
function reflects the new column naming logic introduced in thereader.go
file. This change is consistent with the pull request summary and ensures that the test case will validate the new functionality.reader.go (2)
95-99: The
cellName
function is used to generate column names based on the index, which is a good approach for dynamic naming. However, ensure that thecellName
function is thoroughly tested to handle edge cases, such as very large indices that might result in column names not typically used in spreadsheets (e.g., beyond 'ZZ' for Excel).184-190: The
nameType
function now uses thecellName
function to generate a unique name if a duplicate is found, which is a good practice to avoid column name collisions. However, ensure that thecellName
function generates a name that is guaranteed to be unique in the context of the current sheet.
reader.go
Outdated
validColumns[c] = true | ||
} | ||
c++ | ||
if len(data) <= c { | ||
break | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The loop termination condition seems incorrect. The loop should not break based on the length of data
but rather should iterate over all the columns. The current condition could lead to missing columns if cellX
is greater than 0.
- if len(data) <= c {
- break
- }
Commitable suggestion
[!IMPORTANT]
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
validColumns[c] = true | |
} | |
c++ | |
if len(data) <= c { | |
break | |
} | |
validColumns[c] = true | |
} | |
c++ |
reader.go
Outdated
break | ||
} | ||
} | ||
newBody := make([][]interface{}, 0, len(body)) | ||
for _, row := range body { | ||
dst := make([][]interface{}, 0, len(src)) | ||
for _, row := range src { | ||
cols := make([]interface{}, count) | ||
valid := false | ||
for i := 0; i < count; i++ { | ||
cols[i] = row[i] | ||
if cols[i] != "" { | ||
valid = true | ||
} | ||
} | ||
if !valid { | ||
break | ||
} | ||
newBody = append(newBody, cols) | ||
dst = append(dst, cols) | ||
} | ||
return newBody | ||
return dst | ||
} | ||
|
||
func parseExtend(ext string) (string, string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: This review was outside of the patch, so it was mapped to the patch with the greatest overlap. Original lines [138-164]
The filterColumns
function now breaks the loop if a row without valid data is encountered. This change assumes that once a blank row is found, all subsequent rows are also blank or invalid, which might not be the case. If the data set can have non-consecutive blank rows, this logic will need to be revised.
- if !valid {
- break
- }
Recognizes a blank row as the end of the row (ignoring the first blank row). Fixed so that cellName correctly returns an Excel-like name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 2
Configuration used: CodeRabbit UI
Files selected for processing (1)
- reader.go (4 hunks)
Additional comments: 5
reader.go (5)
95-99: The
cellName
function is used to generate default column names when a name is not provided. This is a good approach to ensure that all columns have names. However, it's important to ensure that thecellName
function is robust and handles potential errors gracefully, as previously mentioned in the outdated comments.122-130: The
filterColumns
function is used to filter out invalid columns and rows with no valid data. This is a good performance optimization as it avoids processing empty rows. However, the function should be reviewed to ensure that it correctly identifies the start of valid rows (startRow
) and breaks out of the loop when a row with no valid data is encountered after the first valid row.141-165: The
filterColumns
function filters the source data based on valid columns. The logic seems to be correct, but it's important to ensure that thestartRow
flag is used correctly to identify when to start appending data and when to break out of the loop. The current logic seems to break out of the loop as soon as it encounters a row with no valid data after the first valid row, which is the intended behavior.168-180: The
columnNum
function calculates the number of valid columns. The logic here is to count the columns until a false value is encountered after the first true value. This assumes that valid columns are contiguous from the start, which is a reasonable assumption in most cases. However, if there's a possibility of valid columns being non-contiguous, this logic would need to be revised.199-205: The
nameType
function is responsible for generating column names and types. It uses a map to ensure that column names are unique. If a duplicate name is found, it generates a new name using thecellName
function. This is a good approach to handle potential duplicate column names.
reader.go
Outdated
func cellName(i int) string { | ||
cn, err := excelize.CoordinatesToCellName(i+1, 1) | ||
if err != nil { | ||
return fmt.Sprintf("C%d", i+1) | ||
} | ||
return cn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cellName
function generates a cell name based on the column index. The error handling here defaults to a generic name format if an error occurs. This could potentially lead to column name collisions. It would be better to propagate the error up to the caller or handle it in a way that avoids collisions.
if err != nil {
- return fmt.Sprintf("C%d", i+1)
+ // Handle the error more appropriately, e.g., log it and propagate it up
+ return "", err
}
Commitable suggestion
[!IMPORTANT]
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
func cellName(i int) string { | |
cn, err := excelize.CoordinatesToCellName(i+1, 1) | |
if err != nil { | |
return fmt.Sprintf("C%d", i+1) | |
} | |
return cn | |
func cellName(i int) (string, error) { | |
cn, err := excelize.CoordinatesToCellName(i+1, 1) | |
if err != nil { | |
// Handle the error more appropriately, e.g., log it and propagate it up | |
return "", err | |
} | |
return cn, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@noborus Acknowledged. I will update the cellName
function to return an error along with the cell name, allowing the caller to handle it appropriately.
func cellName(i int) (string, error) {
cn, err := excelize.CoordinatesToCellName(i+1, 1)
if err != nil {
// Log the error and propagate it up
return "", err
}
return cn, nil
}
This change will require updating all calls to cellName
to handle the potential error. Would you like me to proceed with this change and adjust the rest of the code accordingly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you. I'll do it myself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@noborus Understood. If you need any further assistance or have more requests in the future, feel free to reach out. Good luck with your implementation!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 2
Configuration used: CodeRabbit UI
Files selected for processing (1)
- reader.go (4 hunks)
Additional comments: 3
reader.go (3)
95-103: The
cellName
function is used to generate a name for unnamed columns. Ensure that thecellX + i
calculation correctly reflects the intended column index, ascellX
is decremented earlier in the code ifextCell
is not empty. This could potentially lead to an off-by-one error.126-134: The
filterColumns
function is used to filter out invalid columns. It's important to ensure that thevalidColumns
slice accurately reflects the columns that contain valid data. If the logic for determining valid columns is incorrect, this could result in valid data being omitted or invalid data being included.145-167: The
filterColumns
function now breaks out of the loop when a row without valid data is encountered after a row with valid data. This assumes that all rows with valid data are contiguous and that once a row with no valid data is found, no subsequent rows will contain valid data. If this assumption does not hold true for all datasets, this could lead to valid data being skipped. It may be necessary to verify this assumption against the expected data format.
} | ||
data := make([]interface{}, columnNum) | ||
for c, i := 0, cellX; i < len(row); i++ { | ||
if c >= columnNum { | ||
break | ||
} | ||
data[c] = row[i] | ||
if data[c] != "" { | ||
validColumns[c] = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The increment of c
should be inside the loop to ensure it only increments when a cell is processed. The current placement of c++
outside the loop's braces will not increment c
as intended. This could lead to incorrect data alignment in the data
slice.
if data[c] != "" {
validColumns[c] = true
- }
- c++
+ c++
+ }
Commitable suggestion
[!IMPORTANT]
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
} | |
data := make([]interface{}, columnNum) | |
for c, i := 0, cellX; i < len(row); i++ { | |
if c >= columnNum { | |
break | |
} | |
data[c] = row[i] | |
if data[c] != "" { | |
validColumns[c] = true | |
} | |
data := make([]interface{}, columnNum) | |
for c, i := 0, cellX; i < len(row); i++ { | |
if c >= columnNum { | |
break | |
} | |
data[c] = row[i] | |
if data[c] != "" { | |
validColumns[c] = true | |
c++ | |
} |
reader.go
Outdated
for i := cellX; i < cellX+columnNum; i++ { | ||
if header && len(row) > i && row[i] != "" { | ||
if _, ok := nameMap[row[i]]; ok { | ||
names[c] = fmt.Sprintf("C%d", i+1) | ||
name, err := cellName(cellX + i) | ||
if err != nil { | ||
names[c] = row[i] + "_" + fmt.Sprint(i) | ||
} | ||
names[c] = name | ||
} else { | ||
nameMap[row[i]] = true | ||
names[c] = row[i] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nameType
function now uses the cellName
function to generate names for unnamed columns or to handle duplicate names. Ensure that the logic for handling duplicates is correct and that the nameMap
is being used effectively to track existing names. Also, there is a potential bug where names[c]
could be assigned twice in succession if an error occurs, which would overwrite the first assignment with the second one.
if _, ok := nameMap[row[i]]; ok {
name, err := cellName(cellX + i)
if err != nil {
- names[c] = row[i] + "_" + fmt.Sprint(i)
+ names[c] = row[i] + "_" + fmt.Sprint(c)
}
- names[c] = name
+ else {
+ names[c] = name
+ }
} else {
nameMap[row[i]] = true
names[c] = row[i]
}
Commitable suggestion
[!IMPORTANT]
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
for i := cellX; i < cellX+columnNum; i++ { | |
if header && len(row) > i && row[i] != "" { | |
if _, ok := nameMap[row[i]]; ok { | |
names[c] = fmt.Sprintf("C%d", i+1) | |
name, err := cellName(cellX + i) | |
if err != nil { | |
names[c] = row[i] + "_" + fmt.Sprint(i) | |
} | |
names[c] = name | |
} else { | |
nameMap[row[i]] = true | |
names[c] = row[i] | |
for i := cellX; i < cellX+columnNum; i++ { | |
if header && len(row) > i && row[i] != "" { | |
if _, ok := nameMap[row[i]]; ok { | |
name, err := cellName(cellX + i) | |
if err != nil { | |
names[c] = row[i] + "_" + fmt.Sprint(c) | |
} else { | |
names[c] = name | |
} | |
} else { | |
nameMap[row[i]] = true | |
names[c] = row[i] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary by CodeRabbit
New Features
Enhancements
Bug Fixes
Tests