Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added filter for when row is blank #5

Merged
merged 6 commits into from
Nov 15, 2023
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 25 additions & 7 deletions reader.go
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ func NewXLSXReader(reader io.Reader, opts *trdsql.ReadOpts) (trdsql.Reader, erro
if r.names[i] != "" {
validColumns[i] = true
} else {
r.names[i] = fmt.Sprintf("C%d", i+1)
r.names[i] = cellName(i)
}
}
for j, row := range rows {
Expand All @@ -109,6 +109,9 @@ func NewXLSXReader(reader io.Reader, opts *trdsql.ReadOpts) (trdsql.Reader, erro
validColumns[c] = true
}
c++
if len(data) <= c {
break
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loop termination condition seems incorrect. The loop should not break based on the length of data but rather should iterate over all the columns. The current condition could lead to missing columns if cellX is greater than 0.

- if len(data) <= c {
-     break
- }

Commitable suggestion

[!IMPORTANT]
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
validColumns[c] = true
}
c++
if len(data) <= c {
break
}
validColumns[c] = true
}
c++

}
body = append(body, data)
}
Expand All @@ -124,7 +127,15 @@ func NewXLSXReader(reader io.Reader, opts *trdsql.ReadOpts) (trdsql.Reader, erro
return r, nil
}

func filterColumns(body [][]interface{}, dataFlag []bool) [][]interface{} {
func cellName(i int) string {
cn, err := excelize.CoordinatesToCellName(i+1, 1)
if err != nil {
return fmt.Sprintf("C%d", i+1)
}
return cn
Copy link
Contributor

@coderabbitai coderabbitai bot Nov 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cellName function generates a cell name based on the column index. The error handling here defaults to a generic name format if an error occurs. This could potentially lead to column name collisions. It would be better to propagate the error up to the caller or handle it in a way that avoids collisions.

  if err != nil {
-     return fmt.Sprintf("C%d", i+1)
+     // Handle the error more appropriately, e.g., log it and propagate it up
+     return "", err
  }

Commitable suggestion

[!IMPORTANT]
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
func cellName(i int) string {
cn, err := excelize.CoordinatesToCellName(i+1, 1)
if err != nil {
return fmt.Sprintf("C%d", i+1)
}
return cn
func cellName(i int) (string, error) {
cn, err := excelize.CoordinatesToCellName(i+1, 1)
if err != nil {
// Handle the error more appropriately, e.g., log it and propagate it up
return "", err
}
return cn, nil

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@noborus Acknowledged. I will update the cellName function to return an error along with the cell name, allowing the caller to handle it appropriately.

func cellName(i int) (string, error) {
    cn, err := excelize.CoordinatesToCellName(i+1, 1)
    if err != nil {
        // Log the error and propagate it up
        return "", err
    }
    return cn, nil
}

This change will require updating all calls to cellName to handle the potential error. Would you like me to proceed with this change and adjust the rest of the code accordingly?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you. I'll do it myself.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@noborus Understood. If you need any further assistance or have more requests in the future, feel free to reach out. Good luck with your implementation!

}

func filterColumns(src [][]interface{}, dataFlag []bool) [][]interface{} {
count := len(dataFlag)
start := false
for i, f := range dataFlag {
Expand All @@ -136,15 +147,22 @@ func filterColumns(body [][]interface{}, dataFlag []bool) [][]interface{} {
break
}
}
newBody := make([][]interface{}, 0, len(body))
for _, row := range body {
dst := make([][]interface{}, 0, len(src))
for _, row := range src {
cols := make([]interface{}, count)
valid := false
for i := 0; i < count; i++ {
cols[i] = row[i]
if cols[i] != "" {
valid = true
}
}
if !valid {
break
}
newBody = append(newBody, cols)
dst = append(dst, cols)
}
return newBody
return dst
}

func parseExtend(ext string) (string, string) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This review was outside of the patch, so it was mapped to the patch with the greatest overlap. Original lines [138-164]

The filterColumns function now breaks the loop if a row without valid data is encountered. This change assumes that once a blank row is found, all subsequent rows are also blank or invalid, which might not be the case. If the data set can have non-consecutive blank rows, this logic will need to be revised.

- if !valid {
-     break
- }

Expand All @@ -166,7 +184,7 @@ func nameType(row []string, cellX int, columnNum int, header bool) ([]string, []
for i := cellX; i < cellX+columnNum; i++ {
if header && len(row) > i && row[i] != "" {
if _, ok := nameMap[row[i]]; ok {
names[c] = fmt.Sprintf("C%d", i+1)
names[c] = cellName(i)
} else {
nameMap[row[i]] = true
names[c] = row[i]
Expand Down
17 changes: 16 additions & 1 deletion reader_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,21 @@ func TestXLSXReader_PreReadRow(t *testing.T) {
{"3", "melon"},
},
},
{
"test3",
xlsxReader{
fileName: "testdata/test3.xlsx",
opts: &trdsql.ReadOpts{
InHeader: true,
InJQuery: "Sheet1.C1",
},
},
[][]interface{}{
{"1", "apple"},
{"2", "orange"},
{"3", "melon"},
},
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
Expand Down Expand Up @@ -132,7 +147,7 @@ func TestXLSXReader_Names(t *testing.T) {
fileName: "testdata/test1.xlsx",
opts: &trdsql.ReadOpts{InPreRead: 1},
},
[]string{"C1", "C2"},
[]string{"A1", "B1"},
false,
},
{
Expand Down
Binary file added testdata/test3.xlsx
Binary file not shown.