Add facilities to analyse lists #7770

rdstern · 2022-08-14T11:21:44Z

In File > New Data Frame > Lists > Words/Literature > Shakespeare/sonnets, the variable list.lines comes into R-Instat as a list.

There is a bit of initial work for @lilyclements and then (assuming she agrees) most could perhaps be done by @anastasia-mbithe ?

a) I have added the request for lists to have a proper data type into issue #7493 because that is on adding another type of data into R-Instat. But having a type called (L) is more urgent, because currently it has nothing and hence looks numeric.
b) It looks like a complicated character variable, but the function data_book$convert_column_to_type(data_name="data", col_names="list.lines1", to_type="character")
doesn't work. It converts all values to NA. Please could this function work. (This is probably a @lilyclements task?)
c) The menu Prepare > Column: Text mainly uses the stringr package. These functions seem to work directly on lists, but most of the dialogue options don't allow the variable, because it isn't factor or character. Please allow this option to be added. Perhaps @anastasia-mbithe could do this? The Split command does allow the list variables.
d) Improve the Prepare > Data Reshape > Stack > Unnest option to be able to stack these data into line by line. It is an excellent example of multiple response.
0) The unnest function we use is from the tidytext package. There is now one, with the same name in tidyr. I think we are still ok using the one we have, but a check by @lilyclements would help.

I assume it will work directly on lists, so allow that type of data - just as discussed in c) above.
The option paragraph allows a separator, called paragraph.break but there is no option in the dialogue. So add this option. We could call it Pattern perhaps?
Similarly when token is regex there is a pattern = possibility.
Check, but I think, in each case we could usefully add our regex keyboard as an option for the pattern.

Find what works for this example? I am almost there below:

I used the lines as follows:

# Code generated by the dialog, Stack (Pivot Longer)

list.lines1 <- data_book$get_columns_from_data(data_name="data", col_names="list.lines1")
data <- data_book$get_data_frame(data_name="data")
data_unnest1 <- tidytext::unnest_tokens(input=list.lines1, tbl=data, output="lines", token="paragraphs",paragraph_break = "\", \"")
data_book$import_data(data_tables=list(data_unnest1=data_unnest1))

rm(list=c("data_unnest1", "list.lines1", "data"))

That searched for the string ", " and you see that it misses the occasion where there is either a second space, or perhaps a line return in the string. My regex is not good enough. I am not even sure here why I need just a single \ here as an escape.

Add the to_lower = FALSE , when it is false. The default is TRUE.

The text was updated successfully, but these errors were encountered:

rdstern added the Menu: Prepare label Aug 14, 2022

rdstern added this to the 0.7.8 milestone Aug 14, 2022

rdstern assigned lilyclements Aug 14, 2022

anastasia-mbithe mentioned this issue Aug 19, 2022

Included list variables in the Prepare> Column: Text Options dialog #7790

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add facilities to analyse lists #7770

Add facilities to analyse lists #7770

rdstern commented Aug 14, 2022

Add facilities to analyse lists #7770

Add facilities to analyse lists #7770

Comments

rdstern commented Aug 14, 2022