Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error with drive_get() #281

Closed
bshor opened this issue Oct 14, 2019 · 10 comments
Closed

Error with drive_get() #281

bshor opened this issue Oct 14, 2019 · 10 comments

Comments

@bshor
Copy link

bshor commented Oct 14, 2019

I was using googledrive before the upgrade to version 1.0 successfully to access and download google sheets as Excel files. But now with version 1.0 (I'm running R 3.6.1 on Windows), I'm getting a difficult to understand error.

title <- "D State Legislative Pres Endorsements 2020"
test <- drive_get(title)

And I get this error:

Error in add_id_path(nodes, root_id = root_id, leaf = leaf) : 
  !anyDuplicated(nodes$id) is not TRUE

Any ideas of what could be going on?

@jennybc
Copy link
Member

jennybc commented Oct 15, 2019

Sounds similar to #279 #277 #272

@jsstanley
Copy link

jsstanley commented Nov 30, 2019

I'm also getting the same error, albeit very intermittently and randomly despite running the same code every time:

for (i in 1:nrow(statementsList)) {

  currentSheetName <- as.character(statementsList[i, 1, drop = T])
  
  print(paste0('Deleting sheet: ', 
               currentSheetName))
  
  drive_trash(currentSheetName)
  
}

The drive_trash() line seems to be the problem.

@jwbenning
Copy link

jwbenning commented Dec 4, 2019

I'm getting the same error, but there's some more info that's perhaps helpful. So when I run:
herbMaster_gs <- drive_get("Herbivory_Individual"), I get the error:
Error in add_id_path(nodes, root_id = root_id, leaf = leaf) : !anyDuplicated(nodes$id) is not TRUE

When I search for "Herbivory_Individual" in my Drive, only the Google Sheet (that I'm trying to access) is returned. However, in R, when I search:
drive_find(pattern = "Herbivory_Individual"), this finds lots of items:
Items so far: 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400

Any idea what's up, and what all these items could be?

@bshor
Copy link
Author

bshor commented Dec 4, 2019

The solution I found was to delete a file that originally began as a duplicate of the drive document I wanted that had a nearly identical name. Once I did that, drive_get() worked with no problem.

Here is the SO answer that inspired me.

Searches with drive_find() take too long as you discovered ( I quit after a couple of thousand documents) and I don't use them.

@jennybc
Copy link
Member

jennybc commented Dec 5, 2019

Some background on what drive_find() is messaging about and how to make it fast:

  • The pattern = argument is implemented locally. So we recursively fetch data for your whole Drive, then filter on pattern =. I too learned during development that I have access to a shocking number of files on Drive. Fetching all of this is what "Items so far ..." refers too. And yes it can be slow.
  • The fast way to filter is to use the q = clause because that is done on the server-side. The documentation for drive_find() does mention this and includes some examples.

https://googledrive.tidyverse.org/reference/drive_find.html#search-parameters

https://googledrive.tidyverse.org/articles/articles/file-identification.html

https://developers.google.com/drive/api/v3/search-files

@jwbenning
Copy link

@bshor unfortunately that solution isn't working for me...so drive_get is still not working when I supply the name of the spreadsheet. It does work when I supply the sheet URL:

kk <- "https://docs.google.com/spreadsheets/d/1mau6LUz8tWcgXTs6zwHo7Na5TyBkhPzzkfup7zSWioM/edit#gid=938724274"
herbMaster_gs <- drive_get(kk)

So it works, but it definitely would be nice to be able to refer to the spreadsheets by name instead of URL.

@AllysonS
Copy link

AllysonS commented Jan 9, 2020

Deleting the file with a similar name is not an option for me either - one file is a spreadsheet with original data and the other is a companion document containing metadata. They have similar names so that we can easily tell which files go together. And we have a lot of these files.

The drive_get() error started for me when I updated the googledrive package to v 1.0.0.

As @jennybc suggested, using the q = function in drive_find() greatly speeds along the search process so I used that to work around the drive_get() issue.

### simple version
file <- drive_find(q = "name = 'R02_dieback_2019-10-28_raw'") # equals seems to return the file with that exact name

# alternate approach
file <- drive_find(q = "name contains 'R02_dieback_2019-10-28_raw'",                                        
                   q = "not name contains 'metadata'") 

### version with generic object to make process repeatable with multiple files
fn <- "R02_dieback_2019-10-28_raw" 
file <- drive_find(q = paste("name = '", fn, "'", sep = "")

From here I can go ahead with drive_download() and just skip using drive_get().

There are more ways to search for other file characteristics using q listed here: https://developers.google.com/drive/api/v3/search-files

@jennybc
Copy link
Member

jennybc commented Jan 14, 2020

I still have yet to experience this phenomenon or get enough data to truly study it.

But I have formed an untestable hypothesis about the root cause and installed a fix 🤞

Needless to say, please open a new issue if you update to this dev version and still see the phenomenon.

@bshor
Copy link
Author

bshor commented Feb 3, 2020

I thought I'd fixed it as I described above, but I got the anyDuplicated error again (this is on googledrive 1.0.0). I tried drive_find with a q and a pattern option, and it was much faster and worked without error.

@jennybc
Copy link
Member

jennybc commented Feb 3, 2020

In the development version of googledrive, there is a fix for the anyDuplicated error (e56b3f5). But I now believe there is a general problem, from the Google side, re: exhaustively listing files (#288). One conclusion from all of these investigations is that when accuracy and performance become very important, you should maximize your use of the q clause for narrowing search on the server side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants