-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathWorkingDocument4ImproveNLS.Rmd
1701 lines (1249 loc) · 70.5 KB
/
WorkingDocument4ImproveNLS.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "Working Document for the Improvement to nls() GSOC project"
author:
- Arkajyoti Bhattacharjee, Indian Institute of Technology, Kanpur
- John C. Nash, University of Ottawa, Canada
- Heather Turner, University of Warwick, UK
date: "`r format(Sys.time(), '%d %B, %Y')`"
bibliography: ImproveNLS.bib
output:
pdf_document:
keep_tex: false
---
## TODOS
- Show how `dir` works in nlsalt::numericDeriv(); need some examples/tests
Currently is used when approaching upper bounds. ?? Need to put this in
other codes and document with examples.
- Can we find a way to use analytic derivatives where they are available?
nlsderivchk.R shows deriv() not terribly smart.
[partly DONE]: the revision to rjfun now tries nlsdericchk() and then
uses numericDeriv() if analytic deriv() fails. Temporary changes to
nls.control() in nlsj.control(), but these choices need some rationalization,
cleanup and documentation
[unfinished]: have not added numDeriv choices (of which there are several);
nor has nlfb been upgraded to use such choices.
- At some point, it will be important to know what forms of residual and
jacobian evaluation forms work best. That means measuring timing (which
we know how to do with microbenchmark) and memory, which is a little more
difficult because of R's "garbage collection" and other policies.
?? Not even begun, but eventually performance will become important.
- Does the presence of p1 and p2 objects in .GlobalEnv cause trouble for nls()
(it does seem to for nlsj)??
- Testing structure to enable us to compare different routines on all the
different tests.
- nlsModel needs better attention to WHICH routines from the `m` object are
used EXTERNALLY to nls(). `nlsj` with NOT have all of these. And `m` is now
(2021-7-12) built at the end of the computation rather than the beginning.
- When we evaluate derivatives, should we ignore bounded/masked parameters to
avoid stepping into inadmissible area?
- ## mimic what model.frame.default does
wts <- if (mWeights) rep_len(1, n)
else eval(substitute(weights), data, environment(formula))
- Can we handle indexed parameters??
Possibly working through model frame. Note original nls code varIndex.
## Abstract
nls() is the primary nonlinear modeling tool in base R. It has a great
many features, but it is about two decades old and has a number of
weaknesses, as well as some gaps in documentation. This document is an
ongoing record of work under the Google Summer of Code 2021 of the
first author. As such it is NOT meant to be a finished academic
report, but a form of extended diary of activity, issues and results.
## Project repository
\url{https://gitlab.com/nashjc/improvenls}
## ----------- MAIN OUTCOMES ------------
Here is an annotated listing of product of the project.
- RefactoringNLS.Rms: the document which will become the main report of this project.
- DerivsNLS.Rmd: a document to explain different ways in which Jacobian information
is supplied to nonlinear least squares computation in R. File `ExDerivs.R` is
intended to provide examples.?? needs documentation??
- ImproveNLS.bib: a consolidated BibTex bibliography for all documents in this
project, possibly with wider application to nonlinear least squares in general
- MachineSummary.Rmd: an informal investigation of ways to report the characteristics
and identity of machines running tests. `MachID.R` offers a concise summary function
?? should revisit??
- nlsalt: attempt to mirror nls() behaviour in R. UNFINISHED?? The effort showed that
the structure of the programs was difficult to follow, undocumented, and unsuited
to adding improvements like Marquardt. Possible that we will abandon this.
`nls-flowchart.txt` is a partial documentation of the structure.
` nls-changes-for-small-residuals-in-nls-R-4.0.2.zip: collected material for the
JN fix to the relative offset convergence criterion failure when there are small
residuals in problems sent to nls().
- nlsj: A refactoring of the nls() functionality. `redesign2107.txt` gives some notes.
` nlsModelDoc.Rmd
` nlspkg: a packaged version of the nls() code from R-base. Thanks to Duncan Murdoch
` nlsralt: a modified version of Nash and Murdoch package `nlsr` with improvements
discovered as a result of this project. Need to add subset correctly and document
changes e.g., to numerical derivatives. This is for JN rather than AB.
` PkgFromRbase.Rmd: an explanation of the construction of the `nlspkg` from the
code in R-base.
` TestsDoc.Rmd: a survey of testing tools in R. ?? need to incorporate MachID and
the many tests.
` VarietyInNonlinearLeastSquaresCodes.Rmd: a review of the different algorithms and
the many choices in their implementation for nonlinear least squares. ??unfinished
but getting there.
- WorkingDocument4ImproveNLS.Rmd: this document
## ----------- DIARY PORTION ------------
* In reverse date order *
[2021-8-17] - [2021-8-20]
- JN worked on Refactoring document and a new svd-based Gauss-Newton method,
but this latter item is a long-term project.
- AB cleaned and documented nlsCompare and edited a number of items, fixing
outstanding issues.
- AB, JN, HT met August 20 for wrap-up.
[2021-8-7] - [2021-8-10]
- JN fix and test nlsj bounds -- needed to move marquardt stabilization
AFTER the bounds check.
- AB worked on tester, modularizing code, checking and cleaning code.
- both: meetings on Sunday Aug 8 and Tues Aug 10
Heather's comment:
For the internal data sets like "problems.csv" you might use sysdata.rda: https://r-pkgs.org/data.html#data-sysdata
[2021-8-6] AB: Worked on creating a package version of the runner folder called
"nlscompare". Still messy. [To be discussed in the meeting.]
Created a corresponding article using The R Journal template from
"rticles" package in RStudio. Need suggestions on that. [To be discussed
in the meeting. ]
[2021-8-5] AB: Did error handling on Testing part of the "run_JN.R" file. Made
changes in how "NLSErrorLog" outputs - improved presentation and clarity
of results.
JN: look into the possibility of a generic `getStart(formula, data)` function.
Test if SSxyzw functions show a class "selfStart" easily. No!
Note that one can try to have formulas of type
```
y ~ SSab(x, p1, p2, p4) + SScd(x, p2, p3)
```
and `nls()` seems not to decode their selfstarts. This is another INCONSISTENCY.
There is also the inconsistency in indexed parameter output. The output is numbered,
but is NOT in the form prmA[index] but prmAindex in output.
[2021-8-4] Looked into "RefactoringNLS" adn suggested minor fixes.
[3-8-2021] Discussed problems with runner files and testthat files -
1. NLSlower, NLSupper, NLSsubset, NLSweights not used - later
2. gradient in run_JN causing problems - solved
3. How to set up nlsr in methods? "algorithm=" needs to be handled!
Idea: Before output_nls and output, define NLSrunline if
solver==nlsr::nlxb;
algorithm=default in output_nls;
how to setup control for nlsr::nlxb - setup runline -- FILTER IT
4. commented out gradient from "Passed" condition and changed 6-->5 --
SOLVED
5. minpack convInfo()?? -- SOLVED
6. where to put try statement? problems like gradient may occur?
SOLUTION - put try statements and write a function check2()
that outputs "ERROR" in passed?
7. Note - Tetra gives singular grad in nls but not in nlsj -- BETTER??
8. Compare "port" in nls with "default" in nlsj and "marquardt" in
nlsj for bounds. Do also (for bounds) nlsLM, nlxb.
```
------------------------------------------------
```
Latest test and runner folders attached in a single zip file.
SUGGESTIONS -
In GitLab repo - improvenls, inside "Tests", you can
DELETE: "singular Gradient Problems", "test_better.R", "test-Sulfi_1.R",
"test-Sacch2_1.R",
"test-Tetra_1.R"
KEEP: "test-Isom_1.R"
Note that both "Tests" and inside "nlsj"-"tests" are similar. I think
putting the attached (inside zip file) "tests" in the nlsj-tests would be
better. One file in "Tests" does have a comment - "better stub" - so maybe
you are using those files somewhere? If not, you may consider to delete
them as well.
[1-8-2021 to 2-8-2021] AB. Edited run file in runner directory to create
"nlsErrorLog.csv" hat contains which solver, algorithm, control, filename,
date and machine that causes an error while running run.R. Added edited
Croucher_base.R in testhat. run.R is currently running without any error.
Maybe need to add more test files in runner??
[28-7-2021 to 30-7-2021] Fixed runner directory contents and edited
testthat files. Succeeded with "control" using JN's suggestion.
[2021-7-28] to [2021-7-30]
JN: worked on nlsj and on a stripped down version nlsgn (only the Gauss Newton
and no bounds). Some issues with haveQRJ NOT being reset when a successful
line search complete. Also the tactical rules for setting the stepsize `fac`
had to be adjusted as AB pointed out that the "Leaves" test was not getting
answers equivalent to those from `nls()`. Also some cleanup of several documents.
Both: meeting of July 30 found glitch in Gillespie's benchmarkme::get_ram().
Sorted out some of the issues in tests (for packages) and the run.R program,
in particular the string to be added for different "control" sections of the
calls.
AB: more work on the tests.
[2021-07-25 - 2021-07-26]
- AB: Got a working workaround function to resolve RAM
issue for Windows (probably; Machine ID is now machine dependent. Need to
make it robust??). Started "runner_v2" to compare various NLS methods.
Observed that "plinear" created an error `"Error in nls(formula =
NLSformula, data = NLSdata, start = NLSstart, algorithm =
NLSmethods$algorithm[j]) :
step factor 0.000488281 reduced below 'minFactor' of 0.000976562"`.
There is an issue with "control" in "methods.csv" file. Trying to resolve it.
Spreadsheet is running successfully, but with default options only. Should
column names be extended??
### [2021-7-25]
- JN: continued work on nlsj to check features working. Some rewrite of sections
of the Refactoring article.
### [2021-7-24]
- gradual working through the code to add Marquardt yet not lose Gauss Newton.
Looking to Hobbs to prepare problems to test different situations, in particular,
-- bounds
-- singular gradient
-- bad start
-- good start
and match `nls()` output.
### [2021-7-23]
- nlsj more or less running OK with base Gauss Newton. Putting in Marquardt started
but messy. Issues of object$m$resid() (which is WEIGHTED) vs. resid(object)
(which is NOT) partly resolved, but definitely needs documenting better.
- meeting resolved some ideas. Runner is progressing.
### [2021-7-22]
- JN did some cleanup of nlsj and made some modifications to the Croucher example
which we have been using to provide some initial checks of the code.
- AB provided some test files and infrastructure.
- JN suggested a function rmNLS() to remove test routine elements to keep the
workspace tidy. We need to use `rm(list=objectname, envir=.GlobalEnv)`
Note the "list=".
To discuss July 23:
- Should we create "base" files for families of test problems?
- Should we put file / problem name into a variable?
- Ditto for the description?
- How should be structure / build the problem list?
- In the "runner" program, how should the file location be specified?
This is a cross-platform issue, and we would like a standard program that needs
minimal change between platforms / users.
- Should we modify rmNLS() or have rm() statements at end of runs?
- How to specify which program is to be run?
- How to provide lists of problems to be run?
- Further discussion of the output.
### [2021-7-21]
[20-7-21] Discussed problems with test file outputs over GMeet.
`tolerance` needs to be modified in certain expectations. Rather than
comparing entire lists, was suggested to test particular list items.
Problems with `testthat` while executing R CMD check, saying: "Error in
library(testthat) : there is no package called ‘testthat’".
[21-7-21] Fixed `testthat` issue. Apparantly, the test file names in the
`testthat` sub-directory inside `tests` directory needs to be prefixed
with "test-filename.R". Realized this when a template test file was
created using `usethis::use_test("nlspkg")` named "test-nlspkg.R". Still,
tests require some work; errors maybe due to `nlsj`??
[24-07-21] Cleaned test files and ran R CMD check after moving the tests
directory inside nlsj. Seems to work!
Inside "run.R", benchmarkme::get_cpu does not seem to work on Windows 10.
Reference Issue: /~https://github.com/csgillespie/benchmarkme/issues/25
Workaround: CPU<-as.numeric(system('wmic MemoryChip get
Capacity',intern=TRUE))[2]/1024^3
Plans to improve structure of runner discussed over GMeet.
- JN looked into minpack.lm handling of bounds constraints, and also downloaded
the original **netlib** version of `minpack`. This does NOT handle bounds (at
least no mention of "bounds", "upper" or "lower" were found w.r.t. parameters).
Inside `minpack.lm` package, it seems that bounds are handled by the code
```
if(par[i] < NUMERIC_POINTER(OS->lower)[i])
par[i] = NUMERIC_POINTER(OS->lower)[i];
if(par[i] > NUMERIC_POINTER(OS->upper)[i])
par[i] = NUMERIC_POINTER(OS->upper)[i];
```
This is simply moving the parameters to the bounds.
Have I missed something?
### [2021-7-20]
Meeting online. Worked through some issues relating to testing.
Outside of meeting, JN got nlsj to do a "first run" with bounds. Much to be
checked and cleaned up.
AB working on tests, both for the "within package" and "across packages" roles.
### [2021-7-19]
Look at indexed parameters example from `nls()`. This does NOT work at moment for
nlsr[alt], even with wrapnlsr.
```
## The muscle dataset in MASS is from an experiment on muscle
## contraction on 21 animals. The observed variables are Strip
## (identifier of muscle), Conc (Cacl concentration) and Length
## (resulting length of muscle section).
utils::data(muscle, package = "MASS")
## The non linear model considered is
## Length = alpha + beta*exp(-Conc/theta) + error
## where theta is constant but alpha and beta may vary with Strip.
with(muscle, table(Strip)) # 2, 3 or 4 obs per strip
## We first use the plinear algorithm to fit an overall model,
## ignoring that alpha and beta might vary with Strip.
musc.1 <- nls(Length ~ cbind(1, exp(-Conc/th)), muscle,
start = list(th = 1), algorithm = "plinear")
## IGNORE_RDIFF_BEGIN
summary(musc.1)
## IGNORE_RDIFF_END
## Then we use nls' indexing feature for parameters in non-linear
## models to use the conventional algorithm to fit a model in which
## alpha and beta vary with Strip. The starting values are provided
## by the previously fitted model.
## Note that with indexed parameters, the starting values must be
## given in a list (with names):
b <- coef(musc.1)
musc.2 <- nls(Length ~ a[Strip] + b[Strip]*exp(-Conc/th), muscle,
start = list(a = rep(b[2], 21), b = rep(b[3], 21), th = b[1]))
## IGNORE_RDIFF_BEGIN
summary(musc.2)
## IGNORE_RDIFF_END
library(nlsr)
## THIS FAILS -- we don't handle indexed parameters yet
musc.2a <- try(nlxb(Length ~ a[Strip] + b[Strip]*exp(-Conc/th), muscle,
start = list(a = rep(b[2], 21), b = rep(b[3], 21), th = b[1])))
if (inherits(musc.2a, "try-error")) cat("nlxb indexed does not work")
# But
musc.2b <- try(wrapnlsr(Length ~ a[Strip] + b[Strip]*exp(-Conc/th), muscle,
start = list(a = rep(b[2], 21), b = rep(b[3], 21), th = b[1])))
if (inherits(musc.2b, "try-error")) cat("wrapnlsr indexed does not work")
```
### [2021-7-19]
- JN got first run bounds and masks working on Croucher example. Many questions
and issues remain, but it is getting seemingly correct answers compared to nlxb.
Test alternative coding for bounds. Note that for loop seems twice as fast in microbenchmark.
```
## tests of bounds setting - J Nash 210719
# setup
rm(list=ls())
bdmsk <- c(1,1,0,-1,-3,1)
set.seed(123)
res <- runif(10)*10
J <- matrix(1:60, nrow=10)
pnum<-1:6
npar<-6
gjty<-t(J)%*%res # raw gradient
tr1 <- function(){
for (i in 1:npar){
bmi<-bdmsk[i]
if (bmi==0) {
gjty[i]<-0 # masked
J[,i]<-0
}
if (bmi<0) {
if((2+bmi)*gjty[i] > 0) { # free parameter
bdmsk[i]<-1
# if (watch) cat("freeing parameter ",i," now at ",pnum[i],"\n")
} else {
gjty[i]<-0 # active bound
J[,i]<-0
# if (watch) cat("active bound ",i," at ",pnum[i],"\n")
}
} # bmi
} # end for loop
ans<-list(J=J, gjty<-gjty, bdmsk=bdmsk)
} # end tr1
tr2 <- function(){
# btmp<-rep(1,npar)
gjty<-t(J)%*%res # raw gradient
idx0 <- which(bdmsk==0)
gjty[idx0]<-0 # masked
J[,idx0]<-0
idxb <- which((bdmsk < 0) & ((2+bdmsk)*gjty <= 0.0))
bdmsk[-idxb] <- 1 # change to free
gjty[idxb]<-0 # active bound
J[,idxb]<-0
ans<-list(J=J, gjty<-gjty, bdmsk=bdmsk)
}
bdmsk
J
gjty
print(tr1())
bdmsk
gjty
J
print(tr2())$bdmsk
print(tr1())$bdmsk
library(microbenchmark)
microbenchmark(tr1())
microbenchmark(tr2())
```
### 2021-7-18 and 19 AB
Working on creating test files based on `Examples
for tests`. Initial draft based on BOD2 data sent to Dr.Nash. It contains
nls vs nlsj for now; Will have to work on including nls vs
minpack.lm::nlsLM and nls vs nlsr::nlxb.
### [2021-7-16] + [2021-7-17]
- JN working on trying to put bounds into nlsj.R, but very early days.
- Started project report whith background and goals. RefactoringNLS.Rmd
### [2021-07-14 to 2021-7-16] AB
Worked on `Examples for Tests` file. Includes those
data from NRAIA package whose help files have models explicitly defined.
Hobbs weed problem is also included in it. Will have to look into more
examples??
[2021-07-16] Via Gmeet with JN did a code-walk through nlsj.R and discussed on how to proceed
with tests.
### [2021-7-15]
- JN changed NAMESPACE of nlspkg to remove the exportPattern line and explicitly
export `nls` and `numericDeriv`. ?? More may be needed to be exported.
Now R CMD check clears OK, but "test" fails.
### [2021-7-14] [2021-7-15]
- some work on nlsj to purge errors raised by R CMD check. One WARNING was raised
by generic functions e.g., fitted.nlsj being "undocumented". It turns out the
same error is in `nlspkg`, and JN emailed Duncan Murdoch for advice.
- added asOneSidedFormula to nlsjextra.R but commented out. Not sure this is needed
or useful, but put in collection so it is available to us.
- JN created a `Performance` directory and `timessloop.R`. Found sum(x^2) by far
the fastest way to compute sum of squares.
- To check derivative computation in `nlsj`, JN created `testrj.R` (in `Performance`
though not about timing at this moment, but about values.)
- AB working on SelfStart functions and tests.
### [2021-7-13]
- JN some work on nlsj -- more or less running with both analytic and numeric
derivs, but need to check sign of residuals and jacobian for equivalence,
and also for nlsr.
- AB edits on Variety paper incorporated by JN. (Excellent attention to detail
by AB.)
- HT, AB, JN meeting. Discussed directions. AB will focus on tests to check
equivalence of new codes with nls(). JN will concentrate on new code. Both
to work on checking each other. WIll meet more frequently to do code walks.
- JN posted on r-devel asking about "subset". Note some replies pointing to
"model.frame" as the structure through which subsetting is applied, along
with some other things. JN to dig!
- HT suggestion in email led to notes on how we might structure test output.
This is included as follows.
```
TestingIdeas.txt
2021-7-13
Following up Heather's email about the need to rigorously test to see if
our "new" code gets equivalent output to nls(), I think we should discuss
the "how" of testing and build it into the TestsDoc.
1) To be able to check how we are doing, we need to have a set of tests and
know what they do. You have been looking into this, but we should try to see
how we can structure things so they can be run easily.
Can we put each test in a separate file of R code, with a filename having
what we can call the testID in the name. Hence testID.R.nltst as a filename.
As a check on whether this is feasible, let me try with Croucher, since it
is small
# TCroucher.R.nltst
# Implements tests of nonlinear least squares code for the Croucher example
# Reference: https://walkingrandomly.com/?p=5254
xdata = c(-2,-1.64,-1.33,-0.7,0,0.45,1.2,1.64,2.32,2.9)
ydata = c(0.699369,0.700462,0.695354,1.03905,1.97389,2.41143,1.91091,0.919576,-0.730975,-1.42001)
Croucherdata<-data.frame(xdata, ydata)
Croucherform <- ydata ~ p1*cos(p2*xdata) + p2*sin(p1*xdata)
# ?? Do we want an array of starts? Some problems have > 1
Croucherstart<-list(p1=1,p2=0.2)
Crouchersubset<-1:8 # Possibly have subsets -- multiple tests.
# ?? do we want separate files for each test. More storage, but then each test
# has a separate ID and can be run independently
## Example use
# library(nlsj)
# fitj = nlsjx(ydata ~ p1*cos(p2*xdata) + p2*sin(p1*xdata), start=list(p1=1,p2=.2), subset=1:8, trace=TRUE)
# summarise
# summary(fitj)
callstring<-"(Croucherform, start=Croucherstart, data=Croucherdata, subset=1:8)"
#?? can we embed this in different calls e.g.,
# runline <- paste("result<-nlxb(",callstring,")")
We can build a "database" or else simply documentation files (?? .Rd format like
in R, or something similar and simple) to explain the origin and purpose of each
test file. This collection will get large, but I think it will be best to keep things
simple for now.
2) My feeling is that EACH variant of a test gets its own file that builds the problem,
so that our output is indexed on machineID, testID, programID, and relevant time/date
and results information.
While we could (and likely will eventually) use a database, for the moment I think a
comma separated values (csv) file is more straightforward. As long as we agree the
structure in advance, we can simply concatenate such files and draw them into data
frames or databases easily.
3) Output fields from tests
machineID: the ID of the machine running the tests. We may need to set up a list
of these. For ourselves it won't be large, but if others join us we may need to
appoint a "naming authority". Just thinking ahead.
userID: the ID of the person who ran the test. Again we may eventually need a
"naming authority". JN would be nashjc, for example.
testID: the particular test and variation (subset, start, etc.) I suggest that we
think of test names and put a 2 or 3 digit suffix so we can expand over time.
UTCtime: A date and timestamp at UTC (GMT) in form yyyymmddhhmmss (maybe could leave
off seconds, but .... UTC so we don't get timezone confusion.
programID: the program used. I suggest names like nlsr::nlxb, nlspkg::nls, minpack.lm::nlsLM
so we identify as completely as possible. These can be the names used with the "runline"
above.
tResult: a 1 character output. 0 = success, and anything else will need documenting.
notes: a field allowing us to add extra information
?? Can you think of anything else?
Files like this will be plain text and can simply be concatenated. The structure
is independent of the testing program(s) that we decide to use, but hopefully can
be output by them.
-------------------------------------------------------------
It seems that given my experience in the nonlinear codes, I should probably continue
with working on nlsj and the other packages, and you (Arkajyoti) should work on the
tests. That way we won't overlap too much and tread on each other's toes. Are you OK
with that.
On the other hand, I think that we should probably schedule extra meetings to do
code walks where one of us explains each line of code in a program to the other. I've
found that sort of exercise shows up loose ends and forces the code to be tightened.
It's how I did the programming with Mary Walker-Smith in the mid-80s and we made a
lot of progress quite quickly.
Talk later, but I wanted to get some of these ideas down.
Best, JN
tryrunline.R test
xdata = c(-2,-1.64,-1.33,-0.7,0,0.45,1.2,1.64,2.32,2.9)
ydata = c(0.699369,0.700462,0.695354,1.03905,1.97389,2.41143,1.91091,0.919576,-0.730975,-1.42001)
Croucherdata<-data.frame(xdata, ydata)
Croucherform <- ydata ~ p1*cos(p2*xdata) + p2*sin(p1*xdata)
# ?? Do we want an array of starts? Some problems have > 1
Croucherstart<-list(p1=1,p2=0.2)
Crouchersubset<-1:8 # Possibly have subsets -- multiple tests.
# ?? do we want separate files for each test. More storage, but then each test
# has a separate ID and can be run independently
## Example use
# library(nlsj)
# fitj = nlsjx(ydata ~ p1*cos(p2*xdata) + p2*sin(p1*xdata), start=list(p1=1,p2=.2), subset=1:8, trace=TRUE)
# summarise
# summary(fitj)
callstring<-"(Croucherform, start=Croucherstart, data=Croucherdata)"
#?? can we embed this in different calls e.g.,
runline <- paste("result<-nlsr::nlxb",callstring)
runline
eval(parse(text=runline))
result
Heather Turner12:00 -- in chat of meeting
So we can produce something like this: https://cran.r-project.org/web/checks/check_results_gnm.html
(Which is the CRAN package test results page for gnm.)
```
### [2021-7-12]
- JN and AB completed evaluations for GSoC.
- JN working on nlsj package in last couple of days. Refactored into new
file nlsjx.R with NO nlsjModel() function. Got this "running" on
Croucher example (which is changed slightly to reflect new name.)
Notes:
- It seems nls() counts "iterations" whereas JN prefers jacobians and
residuals. As structured, the algorithm computes residuals an extra
time, since the internal backtracking does NOT compute jacobian.
Possibly we should. Timing will be needed to see what is most
efficient, though we will need a number of different sizes and types
of problem. Can possibly think of ways to short-circuit the computation.
- The particular choice of strategy for the backtracking always uses
fac=0.5 on first iteration (as far as JN can determine.) This may or
may not be a good choice, but for the moment it gives the same results
as nls().
- The "m" object is NOT built until we finish nlsj(). And some of the
functions are missing / wrong. We need to fix what we do need and
eliminate those that are not needed.
- note that by using zero weights, the `subset` argument is handled OK.
- Some functions for print and summary needed changes to allow them to run,
since they try to use the functions from nlsModel.
- Output is currently untidy. Need to fixup.
### [2021-7-10]
- AB noted error in PkgFromRbase document that was fixed.
- JN working on `nlsj` and issues raised, along with some documentation of ideas
in the "Variety" article.
- Serious issues of concern in the STRUCTURE of the `nlsj` package are mostly
related to the placement of functions and features. For example, if we put
functions in the nlsjModel() function and return them to nlsj() by the model
object `m`, then there are some awkward communication issues of the convergence
criterion information, then updated parameters, etc.
Consider, for example, if we were to change from the nls() step-halving method
of seeking a "better" set of parameters. The original method at each iteration
has a set of parameters $x_0$ and forms $x = x_0 + s * \delta$ where $s$ is
stored in the variable `fac`. We start with this at 1.0 and halve it until
the sum of squared weighted residuals is reduced in size. Then we double
`fac` for the next major iteration (i.e., with $x_0$ replacing $x$ and a
newly computed $\delta$.
However, if we use a parabolic inverse interpolation, we need to store 3
sets of parameters and their deviances e.g., $x_0$, $x$ and $x_{half}$,
where we use a step of `0.5*fac` for the step $s$. These have to be
shared with the objects and environment that nlsjModel() can access.
?? Arkajyoti: Give these issues some thought. I don't think there's a true
"answer", just some good and some not-so-good choices. I'll do some trials
to see what seems to work and how easy to understand and use.
By creating a small function that builds m <- nlsjModel(....) and trying
out some ideas, it appears the easiest way to work is to run the nlsjModel()
then ee <- m$getEnv() to get the environment and put items in it. This is
NOT necessarily how R gurus would do it, but likely will be safer (and maybe
faster -- need to check.)
### [2021-7-7] - [2021-7-9]
- JN had computer troubles (failed machine and disks)
- AB participating in UseR!
### [2021-7-5] [2021-7-6]
- some editing and extension of Variety document. ?? still needs some
tidying and rationalization.
- redesign2107.txt is a set of notes on some redesign issues.
- Note new function nlsderivchk.R -- checks if derivs can be found
analytically. So far it seems we cannot do some partial derivatives analytically.
- JN checked nlsr will work with data in current workspace
- note
```
# A function to evaluate f in environment e
with_env <- function(f, e=parent.frame()) {
stopifnot(is.function(f))
environment(f) <- e
f
}
```
- JN: Looking at ORIGINAL `nls.R` in `nlspkg`, it appeared that argument `subset`
is NOT used. However, Croucher-example1.R modified to check using
`subset=1:8` showed it did work. ?? Documentation could be improved! The
input to subset should resolve to a set of indices in the range of the data
rows. Trying to use similar call to nlsralt did NOT work -- subset was
ignored even when put in the calling args. Difficult to see how nls() manages
this feature.
- BELIEVE(??) that the subsetting is done via model.frame using match.call() in
nls() (nlspkg::nls.R#497), but mechanism does not seem clear. However, we can
do a much simpler job by doing the subset within nlfb(x)
- Note that `subset` does NOT seems to be explicit in the code of `nlspkg` nor
in the R-devel source tarball in a way that seems obvious. Where is it??
IDEAS: We could incorporate subset into weights and ALWAYS use weights. This
may be inefficient in general, but easier and possibly safer to implement.
That is, weights (and square roots) are initialized to 1, then set 0 for
not-in-subset. Another, and possibly parallel, approach is to build a
wrap-subset. Part of the difficulty in subsetting is that users and programmers
can get confused. wrap-subset would be useful if the active problem is much
smaller than the original data.
- nls has generic "residuals" function for nls() objects.
residuals(myfit). Note that myfit$m$resid() returns WEIGHTED residuals
(weighted by square roots of the weights)
Note that getAllPars() add the plinear parameters in nlsModel.plinear, but
otherwise is the same as getPars()
### [2021-7-1] to [2021-7-4] nlsj and tentative program for rest of formal project
July 4: discussed tentative steps. JN made some progress with nlsj, but still some
issues with the m$functions not finding the correct inputs, particularly the residual
function does NOT seem to use the correct parameters. Seems that parameters are in
both a vector `prm` and individual named variables, e.g., for Croucher `p1` and `p2`.
Suggested directions for latter half of project.
Should agree approximate timeline for finishing items.
- Continue diary so we have a good record of what has been done.
[Aug 31]
- AB: continue "testing" review document TestsDoc [July 16]
[Added July 4:] JN realizing that building tests for different
aspects of nls() (and, by natural extension, nlsr, minpack.lm etc.)
is a good way for AB to learn about nonlinear modeling. But the
current tests largely are "just there", so it would be useful to
add comments that tell what the test aims to check. For example,
we could be checking simple bounds, or a formula that includes
a special function (this will defeat analytic derivatives in nlsr
and future nls()).
- AB: review DerivsNLS, VarietyinNonlinearLeastSquaresCodes
MachineSummary and PkgFromRbase documents and report any
unfinished / erroneous / clumsy parts [July 16]
- JN: review TestsDoc [July 21]
- Both: List tests for nls() and its substitutes with explanation
of why they are included. [Aug 1]
- JN: continue nlsj package. This is a "from zero" attempt to
mimic nls(), but using a different set of routines and model
object "m" [July 21 to first version]
- Both: Try to clear all "??" issues and check working of nlsj.
Prepare: "nlsj -- a substitute for nls()" [July 28]
- Both: Try to document where issues remain in nlsalt package,
which is an attempt to put nls() (i.e., nlspkg) entirely in
R and to document and clean up. [Aug 7?? Probably won't be
fully finished]
- Clean up repo, removing incomplete or unhelpful elements
AB: Look into how Google wants things put away.
/~https://github.com/rstats-gsoc/gsoc2020/wiki/table-of-proposed-coding-projects
[Aug 15??]
The following is from an email to Patrice Kiener, a co-mentor on the Google Summer
of code NNBenchmark projects for 2019 and 2020.
```
The nls() stuff is going much more slowly than I'd like. I knew when we started that
the student is willing and generally capable but not very experienced in R internals.
But then, my own experience is strong on nonlinear optimization, but not so much on
the quirkiness of R + C interaction. Heather and I decided to try to segment the work
so there are some sub-projects that can be completed, though they might not be
directly on the original topic. I'm leading the code digging, and getting Arkajyoti
to try to follow and ask questions to force us to keep honest. This is working quite
well, and between us we are learning and moving the frontier forward. However, as my
doctoral supervisor (C. A. Coulson FRS) said "pi is the ratio of how long a task
takes over how long you thought it would take".
The big obstacle at the moment is a key routine created by nls() in its "model"
object "m" called "setPars". Here is the output of fit <- nls([a simple problem])
fit$m$setPars
function(newPars) {
setPars(newPars)
resid <<- .swts * (lhs - (rhs <<- getRHS())) # envir = thisEnv {2 x}
dev <<- sum(resid^2) # envir = thisEnv
if(length(gr <- attr(rhs, "gradient")) == 1L) gr <- c(gr)
QR <<- qr(.swts * gr) # envir = thisEnv
(QR$rank < min(dim(QR$qr))) # to catch the singular gradient matrix
}
<bytecode: 0x55f3e6c05cb0>
<environment: 0x55f3e6c27360>
>
You'll note:
1) it is recursive
2) it creates a bunch of functions and objects, but only returns a single LOGICAL
3) if you run nls() finding QR and other objects just doesn't seem possible easily
(I haven't been able to find them! But they are the key solution item.)
4) "global" assignments are used "<<-"
5) "hidden" objects are used (.swts)
6) if the problem has nonlinear parameters in the lhs, then this fails, because it
only builds the jacobian for the rhs of the model. Problems like
that aren't traditional in statistics, but can arise in optimization.
So a real mess. I've been in contact with Doug Bates, who is supposedly the author and who
said he would help if he could. He switched to Julia a few years ago. He can't recall why
the recursion is needed. But if you run the function outside of nls(), it blows the C
language stack! Sigh.
I plan to refactor, eliminating the recursion and globals and use a proper list() return
so all objects are in known locations. However, that will certainly cause us some pain when
it comes to profile.nls() and possibly some other features. And truthfully, I'd like to
understand the code, even if we replace it. My intention is to continue for a few more
days to run calls of the function under different initial conditions to create a
"reproducible example" (I think reprex is becoming the abbreviation) so I can post on
r-devel and see if there is anyone who can help to decode this. I strongly suspect that
the R-core maintainers are praying that no changes in the R and C infrastructure will
break the current code, as I'm pretty sure there's no developer documentation to provide
a guide.
On the bright side, I have got numericDeriv() function all in R. It's a bit slower than the
mixed R+C, but much shorter and more easily maintained and adapted -- the plan is to offer
the analytic derivatives as an option. Also to change the name to "jacobian" or "jacobian.nls".
And we've drafts of several short articles on some issues tangential to the project,
including one I'm preparing on the variety of ways nonlinear least squares can be
structured and implemented. Also one on a neat idea from Duncan Murdoch on how to extract
base-R code into a package that can be modified, thereby avoiding rebuilds of full R.
```
In that JN has NOT been able to satisfactorily understand the structure of setPars()
and setVarying() (the latter produces a huge set of functions, and it is not clear
how it has been defined), I decided to do a zero-base build of a new set of functions
in a package called `nlsj`. This will hopefully have fully documented functions and
features. However, I expect that it will need quite a lot of work to add the
equivalent of `profile.nls()` and similar features. We note, from the `nls()`
documentation, that
```
An nls object is a type of fitted model object. It has methods for the generic
functions anova, coef, confint, deviance, df.residual, fitted, formula, logLik,
predict, print, profile, residuals, summary, vcov and weights.
```
Thus we will have to build equivalent capability, but we can possibly avoid a lot
of the extraneous material in the current infrastructure and also document what we
do provide.
### [2021-6-29] and [2021-6-30] Struggling with nls() structure
- JN: worked on the Croucher-expandednlsnoc.R file. A number of issues related
to the setPars() function. This was expanded to try to work around the issue
that it is recursively defined (why?). Then troubles with getRHS(), which
does not seem to be exposed for access in the top level code. Moreover, issues
with .swts, which is hidden (why? again).
- JN: discussion with Paul Gilbert about recursive calls and global assignment
in nlsModel and setPars. Email to Doug Bates got rather discouraging reply
that Doug unable to recall or figure out why setPars is recursive.
- setPars(): search shows it is in nls-profile() as well as nls(). The calls
to it are unusual, first with no argument, then with an argument. We need
to understand how it works, though nls-profile is a tool that may not be
sensible to keep in R. However, "profile" is a tool in R, and "profile.nls"
is documented, with an example.
- note also that setPars() works with gradient in rhs only. This is NOT
OK when lhs and rhs have parameters, which can be the case when we are
not modeling a single variable. Should use `resid`.
### [2021-6-28] continuing Croucher-expanded
- JN: found that minpack.lm version of conv could be put into code and appears to
function correctly. It is much simpler than code in nlspkg::nlsModel().
Still need to sort out setPars.
- Online meeting -- some issues with git, Google Meet. Discussed diary portion
of this document, some issues related to testing, and plans for this week.
Agreed to meet Sunday July 4 to avoid collision with UseR!2021.
### [2021-6-26] and [2021-6-27]
- expanded VarietyInNonlinearLeastSquaresCodes substantially
- started unrolling nls code in Croucher examples. Some issues with if ... else and
brackets, possibly due to last "}" in function definition of nls().
- Note the strange "%||%" -- comes from `utils` package via base-internals.R in
`nlspkg` or `nlsalt` BUT looks like a language construct. ??Should we replace
with something inline so this does not trip up other programmers. In `minpack.lm`
the structure used is
```
env <- environment(formula)
if (is.null(env)) env <- parent.frame()
```
- AB: looking up suitable examples in NRAIA package[23 June]
- AB: looking up various testing methods in R [21-28 June]
- AB: working on an Rmd file, documenting what testing strategies might be
taken up for our project [26-27 June] (Made a branch, but I don't see
a new branch created in the repo. May have forgotten how to do this. Will look into it.
### [2021-6-23] [2021-6-25] Tidying and documenting nlsModel
- JN cleaned up DerivsNLS. Extracted ssasymptot model and tested in nlsalt.
numericDeriv must be working, but not explicitly tested. Note that the
encapsulated version in the examples of nls() seem to fail.??
- used file inport of nls.R and nls.c parts (which are now separated out
in improvnls)
- nlsModelDoc proceeding slowly. There are many undocumented features and
pieces of code in nlsModel() that need careful understanding, particularly
the scoping assignment "<<-" in some lines of the code. In `minpack.lm` there
is a version of nlsModel() that uses `assign` rather than `<<-`, and some
documentation points out that this may be a more explicit mechanism (and
therefore easier to understand and keep correct). Note the R documentation
`?assignOps` as well as the `assign()` and `get()` functions.