Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing undo in Reogrid #9189

Merged
merged 33 commits into from
Oct 24, 2024
Merged

Conversation

N-thony
Copy link
Collaborator

@N-thony N-thony commented Oct 12, 2024

@rdstern, this is still a draft, but I would appreciate it if you could test it and share your feedback. This PR implements an undo operation for any changes made to the dataframe, which will then be reflected as usual in ReoGrid.
Just like in Excel, try anything and then click Ctrl+Z or Ctrl+Shift+Z to undo the operation on the grid.

@rdstern
Copy link
Collaborator

rdstern commented Oct 12, 2024

@N-thony I tried on the survey data, but nothing happened when I pressed ctrl Z (or ctrl Shift Z? Would be great if it did - probably better than my suggestions, mentioned earlier, if this could work.

@N-thony
Copy link
Collaborator Author

N-thony commented Oct 12, 2024

@N-thony I tried on the survey data, but nothing happened when I pressed ctrl Z (or ctrl Shift Z? Would be great if it did - probably better than my suggestions, mentioned earlier, if this could work.

You have to make sure Reogrid has the focus. Click anywhere on the Grid before Ctrl+Z or Ctrl+Shift+Z,

@rdstern
Copy link
Collaborator

rdstern commented Oct 12, 2024

@N-thony that's pretty amazing to me!
Undo seems to work, when I am in the grid and press ctrl + shift + Z. I have tried:
a) Edit cell, and delete cell and then undo.
b) Delete a column
c) Rename a column
d) Make a column into a factor

I tried first with the small survey data - 36 rows and 7 variables.
Then with dodoma that has 29000 rows, so lots of pages. I tried changes on the last page - edit cell, and also, from there, did delete of a column.

I then tried with the MICS survey data, where I made changes to other than the first 50 columns.

I'm not sure what you have done, but it seems excellent. It would be nice if ctrl Z would work as well, but not a problem if only ctrl + shift + Z is what has to be used.

I tried ctrl Y for a redo, and couldn't get that to work. Might that be possible.

I tried 2 changes on the same sheet and I think it only did undo for the first of them. Is that correct? I am happy with the one change limit per sheet. I deleted 2 columns, one at a time, and it got the second one back. Not the first. Then I deleted 2 variables in the same delete operation and it got both back.

I tried one change on each of 2 sheets. Then I corrected the first change I had made. That worked, and moved me to the second sheet. I did undo there and it worked too.

This is much better than my suggestions in an earlier issue. But I still don't understand how it works - can you explain easily?

Will there be a dialog, or will it just be those keys? If only the keys, because the focus needs to be in the grid, then that would be fine. We might then still have a simple dialog that mainly gives access to the help, and tells you to press the keys!

Finally I looked into Reogrid, to see if it was all automatic from there. To my surprise there seems to be an upgrade, or a second product called ReoGrid? It has a free version, but you have to pay if you want the upgraded one - and not cheap! Is this the same product re-born, or something different?

@N-thony
Copy link
Collaborator Author

N-thony commented Oct 14, 2024

@rdstern I have now improved the undo feature so that it now tracks operations performed on the sheets. This means that if multiple changes are made consecutively, you can undo each of them step by step. Additionally, I have set the focus to default to the grid after an operation is performed. I will think on the dialogue if possible. You can test.

@rdstern
Copy link
Collaborator

rdstern commented Oct 14, 2024

@N-thony exciting and good to have you back in this way. before I test, I wonder on your views on redo? Might that also be possible?
If you do a dialog, then I assume it would be in the Edit menu? Where? Should it be under Find? Anyway, if you do, the nthe Help context number is 702.

@N-thony
Copy link
Collaborator Author

N-thony commented Oct 14, 2024

@N-thony exciting and good to have you back in this way. before I test, I wonder on your views on redo? Might that also be possible? If you do a dialog, then I assume it would be in the Edit menu? Where? Should it be under Find? Anyway, if you do, the nthe Help context number is 702.

Yes, I see that possible too. I suggest we finish the undo first before moving to redo.

Copy link
Collaborator

@rdstern rdstern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@N-thony it is still looking excellent. I enthused to Danny, who said he hopes it is being done through R, or he is not interested. I think so, as I see you have implemented a data_book$undo_last_action function?

a) I just note, while I reemember:

image

This will have to change, and could perhaps be added to this pull request as it is presumably so simple! The second sentence is no longer true. I suggest we still give the message, but now delete the second line.

b) I tried deleting multiple variables separately, and can now get them each back. Great.

c) I note that Z still doesn't work. It has to be Z. That's fine if it has to be.

d) I could now not get a single value back after I had used right-click > delete cell. It now stays as NA? That's the only fault I could find.

@N-thony
Copy link
Collaborator Author

N-thony commented Oct 14, 2024

@N-thony it is still looking excellent. I enthused to Danny, who said he hopes it is being done through R, or he is not interested. I think so, as I see you have implemented a data_book$undo_last_action function?

I just note, while I reemember:

@rdstern yes the undo is done in R.
How? The undo functionality here done in R allows users to revert to the last saved state by maintaining a history of previous states of a dataframe. When a user makes changes, the current state is appended to the history list, and upon an undo action, the latest state is removed from the history and the previous state is restored.

@N-thony
Copy link
Collaborator Author

N-thony commented Oct 14, 2024

@N-thony it is still looking excellent. I enthused to Danny, who said he hopes it is being done through R, or he is not interested. I think so, as I see you have implemented a data_book$undo_last_action function?

a) I just note, while I reemember:

image

This will have to change, and could perhaps be added to this pull request as it is presumably so simple! The second sentence is no longer true. I suggest we still give the message, but now delete the second line.

b) I tried deleting multiple variables separately, and can now get them each back. Great.

c) I note that Z still doesn't work. It has to be Z. That's fine if it has to be.

d) I could now not get a single value back after I had used right-click > delete cell. It now stays as NA? That's the only fault I could find.

@rdstern I have extended the undo to more operations we can do in data, like insert/remove rows, delete cells. I was thinking of adding the Edit > Undo menu as the first menu, which later will be followed by Edit > Redo like in Excel. I'm still thinking of the necessity of having an undo/redo dialogue, but all that can be done quickly in a separate PR as I will also make changes in frmMain.

rdstern
rdstern previously approved these changes Oct 14, 2024
Copy link
Collaborator

@rdstern rdstern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@N-thony this is very nice indeed. I am approving even though it is only a draft.
@Patowhiz could you check and perhaps merge if ok
a) There's just one text to delete. When deleting rows it still sats it can't be undone. That's no longer true.

I checked with filtered data, there even delting all. I also tried reordering columns, and changing multiple names.

I would still like to know what is behind the function, especially as you can do multiple undeletes. Does it therefore make multiple copies of each data frame. If so, then is there a size limit we could set, so it doesn't slow down visibly. Might it also slow down - or get confused with multiple sheets. It doesn't seem to?

You are wondering about adding a menu item - to match Excel. No problem with that, partly so the Help can tell users about it, and it can indicate the short-cut keys. But may not be necessary. I am not keen on adding it to the column right-click, because that's already long enough, and I am sure we should consider a redo as well? But I would be happy to add it to the row-right click menu. That could include the short-cut keys . I know it works on more than rows, but it does work there. Abd that has quite a small number of items, and it has a Help button already.

@N-thony
Copy link
Collaborator Author

N-thony commented Oct 14, 2024

@N-thony this is very nice indeed. I am approving even though it is only a draft. @Patowhiz could you check and perhaps merge if ok a) There's just one text to delete. When deleting rows it still sats it can't be undone. That's no longer true.

I checked with filtered data, there even delting all. I also tried reordering columns, and changing multiple names.

I would still like to know what is behind the funcyion, especially as you can do multiple undeletes. Does it therefore make multiple copies of each data frame. If so, then is there a size limit we could set, so it doesn't slow down visibly. Might it also slow down - or get confused with multiple sheets. It doesn't seem to?

@rdstern there is a limit of 10 states tho I think we would like it to be unlimited. One approach I'm still studying is using temporary files instead of storing the entire history in memory to handle the memory limitation issue for undo functionality in R, especially when dealing with large data frames.

@rdstern
Copy link
Collaborator

rdstern commented Oct 14, 2024

@N-thony am I right that all the data frames, i.e. the whole data book is copied up whenever any sheet is changed? Or is it just that sheet. So we soon have 10 progressive data books?
Because I remain extremely keen on this feature, but we then can't permit this for very large data books. In particular we don't automatically prodiuce the column metadata when there are 1000 variables or more. I suggest 1000 variables might be a strict limit, but we might want a lower limit as the default, maybe with an offer for Undo to be enabled on larger dataframes (in the tools > Options dialog, for any particular session. I suggest, if we add that feature, then it needs to be set every session where we want quite large datasets to offer undo.

I call this spreadsheet-style undo, and we already have R-stype undo via the Tools > Restore Backup dialog.

So I see this as great for teaching, and really useful as a practical tool. But we don't want R-Instat to be slowed down too much by this feature. Also I anticipate that users of the spreadsheet-style undo will rarely be using very large data books.

@N-thony
Copy link
Collaborator Author

N-thony commented Oct 15, 2024

@N-thony am I right that all the data frames, i.e. the whole data book is copied up whenever any sheet is changed? Or is it just that sheet. So we soon have 10 progressive data books? Because I remain extremely keen on this feature, but we then can't permit this for very large data books. In particular we don't automatically prodiuce the column metadata when there are 1000 variables or more. I suggest 1000 variables might be a strict limit, but we might want a lower limit as the default, maybe with an offer for Undo to be enabled on larger dataframes (in the tools > Options dialog, for any particular session. I suggest, if we add that feature, then it needs to be set every session where we want quite large datasets to offer undo.

I call this spreadsheet-style undo, and we already have R-stype undo via the Tools > Restore Backup dialog.

So I see this as great for teaching, and really useful as a practical tool. But we don't want R-Instat to be slowed down too much by this feature. Also I anticipate that users of the spreadsheet-style undo will rarely be using very large data books.

@rdstern no, only the dataframe/sheet changed is referenced in the history list. We save up to 10 previous states i.e changes done in dataframe in memory; whenever this limit (10 states) is reached, the oldest state is removed.
We can have a quick call on this where I can explain more what is happening.

@rdstern
Copy link
Collaborator

rdstern commented Oct 15, 2024

@N-thony that's even better, namely that only the chosen sheet is copied. That means we can have a simple rule for allowing your undo to operate. It works by default unless the length of the longest data frame is more than xxx rows, or wider than yyy variables. This then gives us a chance to discuss the sizes we can handle in R-Instat easily.
Remember "your" undo is for beginners, and we have the R undo (if needed) for those who are happy to learn a bit more. Explaining about undo fits well into our training, with yours mentioned in the basic workshops and both methods discussed in the Further workshop!
@jkmusyoka you may have views, and @Patowhiz could be well placed to help specify the limits. I'm trhinking that the user can decide to ignore the limits for a given session - but that maybe silly if a data frame bepomes very wide (Patrick?) I don't mind it being quite silly - because then the user learns not to try the simple undo with such crazy data frames?
I'm wondering if copying a long data frame is always quite quick? It is only with wide daya frames that we have real time problems?
If we always have a limit then my first go is that the default limit (for now) is <600,000 rows or perhaps 1 million rows (is this too high). And < 500 variables. And we can optionally take off the limit for rows (unlimited), and at the same time increase the variable limit to 1000 variables. @Patowhiz you know all about speeds, when data frames are wide. You also made a limit of 1000 variables, so any dataframe wider than that R-Instat asks if the metadata is to be calculated? I suggest our undo limit is always the same as your metadata limit. I did wonder whether that was a bit conservative? Would you consider raising that to 2000 variables?

@N-thony
Copy link
Collaborator Author

N-thony commented Oct 15, 2024

@N-thony that's even better, namely that only the chosen sheet is copied. That means we can have a simple rule for allowing your undo to operate. It works by default unless the length of the longest data frame is more than xxx rows, or wider than yyy variables. This then gives us a chance to discuss the sizes we can handle in R-Instat easily. Remember "your" undo is for beginners, and we have the R undo (if needed) for those who are happy to learn a bit more. Explaining about undo fits well into our training, with yours mentioned in the basic workshops and both methods discussed in the Further workshop! @jkmusyoka you may have views, and @Patowhiz could be well placed to help specify the limits. I'm trhinking that the user can decide to ignore the limits for a given session - but that maybe silly if a data frame bepomes very wide (Patrick?) I don't mind it being quite silly - because then the user learns not to try the simple undo with such crazy data frames? I'm wondering if copying a long data frame is always quite quick? It is only with wide daya frames that we have real time problems? If we always have a limit then my first go is that the default limit (for now) is <600,000 rows or perhaps 1 million rows (is this too high). And < 500 variables. And we can optionally take off the limit for rows (unlimited), and at the same time increase the variable limit to 1000 variables. @Patowhiz you know all about speeds, when data frames are wide. You also made a limit of 1000 variables, so any dataframe wider than that R-Instat asks if the metadata is to be calculated? I suggest our undo limit is always the same as your metadata limit. I did wonder whether that was a bit conservative? Would you consider raising that to 2000 variables?

@rdstern Note that the function doesn't copy a data frame. It appends the reference of the history object (which could include data frames) to the list. This means that no matter how large or wide the data frame is, appending its reference to a list is efficient and does not involve copying the entire data frame, which would be slower.

@rdstern
Copy link
Collaborator

rdstern commented Oct 15, 2024

@N-thony now I am learning more! Does that mean that earlier we didn't include the data frames in the history object, and now we do, so we can have our undo?
Or it was always included, and now we are simply taking advantage and using it?

@N-thony
Copy link
Collaborator Author

N-thony commented Oct 15, 2024

@N-thony now I am learning more! Does that mean that earlier we didn't include the data frames in the history object, and now we do, so we can have our undo? Or it was always included, and now we are simply taking advantage and using it?

@rdstern we did not have a history object to record each state of a dataframe, I have added it for this purpose.

@rdstern
Copy link
Collaborator

rdstern commented Oct 15, 2024

@Patowhiz I am still happy - when you are - to have this merged as it stands. You may have other ideas.

There should then still be a further issue/pull request, to

a) Consider redo, but not if it takes long.
b) Put it somewhere as well as having the short-cut key. I would be happy if you decided that in the column right-click is sufficient.
c) Adding limits, so it is not for large data frames - whatever we decide is "large".

Here is a picture of R-Instat:

image

I am keen to show that undo doesn't just change the data, it can also change the meta-data at both column. That's really nice for our teaching about the meta-data.

However, I assume changing the data and metadata will be a strain, particularly on very wide datasets, (just as it is for reading them in) . I am keen for R-Instat to remain as efficient as possible for large datasets and don't want to always wonder whether some problems are caused by the undo facility. I also see no problem, because people handling very large datasets shouldn't need the simple undo so much. (We also have the R undo for them!)

@N-thony
Copy link
Collaborator Author

N-thony commented Oct 15, 2024

@rdstern I have tried with a dataset of 1 million rows and 20 columns from File > New Dataframe, it seems to work well. You might also check it out.

@ChrisMarsh82
Copy link
Contributor

@N-thony This will need to be tested with multiple large datasets where multiple changes are made on each of the datasets and R-Instat is kept running for at least half a day. We need to make sure there are no memory issues with the changes.

@N-thony
Copy link
Collaborator Author

N-thony commented Oct 15, 2024

@N-thony This will need to be tested with multiple large datasets where multiple changes are made on each of the datasets and R-Instat is kept running for at least half a day. We need to make sure there are no memory issues with the changes.

@rdstern I agree, this needs proper testing. Perhaps @jkmusyoka can also test this. We need more testing here.
Meanwhile,, I will work on offloading the dataset history management to temporary files. This will help prevent memory overload issues by offloading historical states to temporary files.

@rdstern
Copy link
Collaborator

rdstern commented Oct 15, 2024

@N-thony while I am very happy with this addition, I remain convinced that you also need to add (In Tools > Options) the option to switch off undo for large datasets. It goes here:

image

I'm not surprised, by the way, that it seems to work for a data frame of 1 million rows and 20 columns. I don't anticipate an obvious problem until you get very wide data sets, e.g. more than 1000 variables. (That's when @Patowhiz turns off the default addition to the metadata.) I am also quite happy if the undo is (at least initially) for relatively small datasets - and we need to define small. So please, please add the facility to turn off the undo for large datasets. I am happy that the trigger is dictated by the largest data frame. I suggest we might, for testing, want to be able to turn off undo anyway. But later - it is so nice, that the default would be to have undo as long as the as long as no data frame has more than xxx rows and yyy variables. If the tests all work, then I would quite like the option for the user to include the undo for any length and maybe up to 1000 variables. But that's just for the current session. It has to be extended from the default each run.

If you make xxx and yyy, at least for now, changeable by the user, then we can switch it on and off for the testing.

It could be
Switch off spreadsheet-syle undo when a dataframe has more than up-down with say 200 variables, (up to 1000 in steps of 100), or 200,000 rows (up/down in steps of 100,000 and maybe no upper limit!
(Any changes from the default only apply to the current session of R-Instat.)

Please also look at #6264. I have long wanted a timer in R-Instat and the tictoc package still seems a nice option. How were you planning to do any testing? I hope it will be pretty trivial to add, and again to the tools menu. There are other options discussed here. particularly the bench and rbenchmark packages.

@rdstern
Copy link
Collaborator

rdstern commented Oct 18, 2024

@N-thony very nice that you also added Undo to the Edit menu (with the short-cut) and it seems to work well there.

image

a) The up-downs could be still a bit wider still.
b) You say they reset each time - but they don't. I set it to 1.6 million and it remained at that the next time.
c) The switch off undo is checked by default - but doesn't switch off. Change the default to unchecked please.
d) In the whole dialog we have been inconsistent with most words starting capitalised. Could you chege that to be consistent. Maybe leave the resetting in brackets mainly in lower case?
e) I suggest 200,000 for the default rows may be ok?
f) You say you have added the times. Great - but where.? Can you guide on how it is used?

@N-thony
Copy link
Collaborator Author

N-thony commented Oct 18, 2024

@N-thony very nice that you also added Undo to the Edit menu (with the short-cut) and it seems to work well there.

image

a) The up-downs could be still a bit wider still. b) You say they reset each time - but they don't. I set it to 1.6 million and it remained at that the next time. c) The switch off undo is checked by default - but doesn't switch off. Change the default to unchecked please. d) In the whole dialog we have been inconsistent with most words starting capitalised. Could you chege that to be consistent. Maybe leave the resetting in brackets mainly in lower case? e) I suggest 200,000 for the default rows may be ok? f) You say you have added the times. Great - but where.? Can you guide on how it is used?

The text can be done in another PR since I think it will also be done in the database, I will request Fidel to make the change.
I have also made the nud control wider and set the default row to 200000.
The switch on undo is unchecked by default, try to reset your options to default by clicking on Factory Reset
Yes, I have added the timing, which is to be displayed in the script window, but not yet and working on it. Am I correct?

@rdstern
Copy link
Collaborator

rdstern commented Oct 18, 2024

@N-thony you've done it the other way round now. I wanted R-Instat to start with undo enabled. So the message in Tools Options should be unticked, and say Switch Off. When you tick, then it will switch off.
You are right, at this stage I am happy that you have added the timer simply to be used in the script window.
Have you looked at whether you can also add something to check memory, in the same way - also in the script window?

Then I am ready to ask @ChrisMarsh82 to check the code and merge. His requested tests will still need to be done, but I see that as a useful task as we test the release more generally. (If it shows a particular problem we can quickly hard wire the off switch to be the default, and disable it, for the released version. I don't think it will!)

@N-thony
Copy link
Collaborator Author

N-thony commented Oct 18, 2024

@rdstern memory.size(max = TRUE) will return the maximum memory that R has used during the session and memory.limit() will return the memory limit that R has used during the session, in megabytes.

@rdstern
Copy link
Collaborator

rdstern commented Oct 21, 2024

You chenged the message correctly to switch off. But it is still off to start with.

image

I want to be positive, so to start with Undo switched on. That's now consistent with the dialog - so that doesn't need to change. Just start with undo switched on.

image

By gthe way I noticed I ahd to swich off (i.e. tck) to switch it on. Then it enabled me to undo the changes I made before!!! Would that be the case for every data set or just the small undoabole data frames? Are you sure that's not dangerous?

rdstern
rdstern previously approved these changes Oct 22, 2024
Copy link
Collaborator

@rdstern rdstern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@N-thony great. I am approving. @ChrisMarsh82 over to you, and I'd really like this in the next version if possible. My one concern is that I turned off the undo. That was fine. Then I turned it on again and I could now undo what I couldn't when it was turned off.
So I wonder how that happened and what actually is turned off? Maybe we should chat.

@N-thony
Copy link
Collaborator Author

N-thony commented Oct 22, 2024

@N-thony great. I am approving. @ChrisMarsh82 over to you, and I'd really like this in the next version if possible. My one concern is that I turned off the undo. That was fine. Then I turned it on again and I could now undo what I couldn't when it was turned off. So I wonder how that happened and what actually is turned off? Maybe we should chat.

@rdstern I see you have approved but I was adding the possiblity to switch on or off undo in different datasets. The default is on, so to reduce the risk if the dataset is very large you can switch off undo for that specific dataset. It has taken me the whole day to come up with this. I hope this will significantly reduce the risk in the memory. Have a look.

@rdstern
Copy link
Collaborator

rdstern commented Oct 22, 2024

@N-thony I'll check, but that's getting more complicated than I think is needed. I see this undo as an immediate one. So I would even be ok if the undos are cleared when you move data frames. Maybe you could discuss with Chris, when you meet him tomorrow morning.

@N-thony
Copy link
Collaborator Author

N-thony commented Oct 22, 2024

Then I turned it on again and I could now undo what I couldn't when it was turned off.

@rdstern in this version, I have fixed this, and it also clears and freed up the memory the undos of a given dataset when you switch off off undo.

@N-thony
Copy link
Collaborator Author

N-thony commented Oct 23, 2024

@rdstern I've learned valuable insights into how R manages memory, which is especially useful for optimizing my current approach to handling dataframes. The undo functionality relies on maintaining references to different states of dataframes in the lists. When the list is reset, cleared, or a specific dataframe is removed, the references to these dataframes remain in memory. R may not immediately reclaim this memory, even though the data is no longer needed.

To address this, I explicitly call gc() (the garbage collector) after such operations. This ensures that the memory previously occupied by the removed or reset dataframes is promptly released, preventing unnecessary memory usage and improving overall performance.

@N-thony
Copy link
Collaborator Author

N-thony commented Oct 23, 2024

@ChrisMarsh82 as discussed this morning, I have added the the code to report the loading process in the log file under the Calculator dialogue. Also I have done the test by loading 43 dataframes with at least hundred thousands rows and 10 columns each also made max undos (10 undos) in each dataframes, plus running some dialogue like Calculator, One variable summary graph. It did work well.

The screenshot below shows the loading process before loading the datasets, loading the datasets with undos, and after undo operationS in each dataset.
image

Copy link
Collaborator

@rdstern rdstern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@N-thony still looks fine to me. @ChrisMarsh82 maybe back to you?

@N-thony
Copy link
Collaborator Author

N-thony commented Oct 24, 2024

@N-thony still looks fine to me. @ChrisMarsh82 maybe back to you?

@ChrisMarsh82 I'm planning to create the test version tomorrow. Can we take the risk of including this PR in that version?
On my laptop it works fine with more than 40 large datasets, maybe because I have a RAM of 8Gb. I'm curious to see how it will work on other laptops with different RAM capacities.

@N-thony N-thony merged commit 1765c3a into IDEMSInternational:master Oct 24, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants