You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In File > New Data Frame > Lists > Words/Literature > Shakespeare/sonnets, the variable list.lines comes into R-Instat as a list.
There is a bit of initial work for @lilyclements and then (assuming she agrees) most could perhaps be done by @anastasia-mbithe ?
a) I have added the request for lists to have a proper data type into issue #7493 because that is on adding another type of data into R-Instat. But having a type called (L) is more urgent, because currently it has nothing and hence looks numeric.
b) It looks like a complicated character variable, but the function data_book$convert_column_to_type(data_name="data", col_names="list.lines1", to_type="character")
doesn't work. It converts all values to NA. Please could this function work. (This is probably a @lilyclements task?)
c) The menu Prepare > Column: Text mainly uses the stringr package. These functions seem to work directly on lists, but most of the dialogue options don't allow the variable, because it isn't factor or character. Please allow this option to be added. Perhaps @anastasia-mbithe could do this? The Split command does allow the list variables.
d) Improve the Prepare > Data Reshape > Stack > Unnest option to be able to stack these data into line by line. It is an excellent example of multiple response.
0) The unnest function we use is from the tidytext package. There is now one, with the same name in tidyr. I think we are still ok using the one we have, but a check by @lilyclements would help.
I assume it will work directly on lists, so allow that type of data - just as discussed in c) above.
The option paragraph allows a separator, called paragraph.break but there is no option in the dialogue. So add this option. We could call it Pattern perhaps?
Similarly when token is regex there is a pattern = possibility.
Check, but I think, in each case we could usefully add our regex keyboard as an option for the pattern.
Find what works for this example? I am almost there below:
I used the lines as follows:
# Code generated by the dialog, Stack (Pivot Longer)
list.lines1 <- data_book$get_columns_from_data(data_name="data", col_names="list.lines1")
data <- data_book$get_data_frame(data_name="data")
data_unnest1 <- tidytext::unnest_tokens(input=list.lines1, tbl=data, output="lines", token="paragraphs",paragraph_break = "\", \"")
data_book$import_data(data_tables=list(data_unnest1=data_unnest1))
rm(list=c("data_unnest1", "list.lines1", "data"))
That searched for the string ", " and you see that it misses the occasion where there is either a second space, or perhaps a line return in the string. My regex is not good enough. I am not even sure here why I need just a single \ here as an escape.
Add the to_lower = FALSE , when it is false. The default is TRUE.
The text was updated successfully, but these errors were encountered:
In
File > New Data Frame > Lists > Words/Literature > Shakespeare/sonnets
, the variable list.lines comes into R-Instat as a list.There is a bit of initial work for @lilyclements and then (assuming she agrees) most could perhaps be done by @anastasia-mbithe ?
a) I have added the request for lists to have a proper data type into issue #7493 because that is on adding another type of data into R-Instat. But having a type called (L) is more urgent, because currently it has nothing and hence looks numeric.
b) It looks like a complicated character variable, but the function
data_book$convert_column_to_type(data_name="data", col_names="list.lines1", to_type="character")
doesn't work. It converts all values to NA. Please could this function work. (This is probably a @lilyclements task?)
c) The menu
Prepare > Column: Text
mainly uses thestringr
package. These functions seem to work directly on lists, but most of the dialogue options don't allow the variable, because it isn't factor or character. Please allow this option to be added. Perhaps @anastasia-mbithe could do this? The Split command does allow the list variables.d) Improve the
Prepare > Data Reshape > Stack > Unnest
option to be able to stack these data into line by line. It is an excellent example of multiple response.0) The
unnest
function we use is from the tidytext package. There is now one, with the same name intidyr
. I think we are still ok using the one we have, but a check by @lilyclements would help.paragraph
allows a separator, calledparagraph.break
but there is no option in the dialogue. So add this option. We could call itPattern
perhaps?pattern =
possibility.Find what works for this example? I am almost there below:
data:image/s3,"s3://crabby-images/1c660/1c66026c3e678abec843f2965bb898aa1c59ca7d" alt="image"
I used the lines as follows:
That searched for the string
", "
and you see that it misses the occasion where there is either a second space, or perhaps a line return in the string. My regex is not good enough. I am not even sure here why I need just a single\
here as an escape.to_lower = FALSE
, when it is false. The default is TRUE.The text was updated successfully, but these errors were encountered: