-
-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Precision Loss 🤷♀️ #16
Labels
Comments
Fixed# 1) generate vector
# for values
matrix_method <- weighted_method <- reference <- numeric()
# 2) simulate sample
# sizes
i <- 1
for (N in c(1e2, 1e3, 1e4, 1e4, 1e6)) {
# 2.1) generate classes
actual <- factor(sample(letters[1:3], N, replace = TRUE))
predicted <- factor(sample(letters[1:3], N, replace = TRUE))
weights <- runif(length(actual))
# 2.2) generate accuracy
# from each method
matrix_method[i] <- SLmetrics::accuracy(
SLmetrics::cmatrix(
actual = actual,
predicted = predicted,
w = weights
)
)
weighted_method[i] <- SLmetrics::weighted.accuracy(
actual = actual,
predicted = predicted,
w = weights
)
reference[i] <- weighted.mean(
x = as.numeric(actual) == as.numeric(predicted),
w = weights
)
i <<- i + 1
}
# 3) display results
data.frame(
reference,
matrix_method,
weighted_method
)
#> reference matrix_method weighted_method
#> 1 0.3695044 0.3695044 0.3695044
#> 2 0.3241264 0.3241264 0.3241264
#> 3 0.3379269 0.3379269 0.3379269
#> 4 0.3362808 0.3362808 0.3362808
#> 5 0.3327261 0.3327261 0.3327261 Created on 2024-12-16 with reprex v2.1.1 Changing |
serkor1
added a commit
that referenced
this issue
Dec 16, 2024
* Changed all IntegerMatrix to NumericMatrix (See Issue #16) - this partially fixes the issues of inconsistent results. * Bumped default tolerance to 4e-4; the target is 1e-5. * Disabled unit-testing for micro/macro averages as the missing value handling has been disabled for now, so they wont be equal as of now. * Disabled testing for `ROC()` and `prROC()`, the tests are unstable. Looking for a fix. * Disabled various functions for testing as they differ from the target results still. Reason currently unkown. * Bug-fix in `ckappa()`-function in the penalizing matrix.
serkor1
added a commit
that referenced
this issue
Dec 17, 2024
* [UPDATE] `cmatrix()`-function 🔥 * `cmatrix()`-function now accepts the argument `w`, a <numeric>-vector of sample weights. 🚀 * Unit-tests: Updated the unit-tests so it can accomodate weighted metrics * Documentation: Updated documentation on `cmatrix()`-function * NEWS: Updated news. * DESCRIPTION: Version bump! 🚀 All tests passed locally. * [OPTIMIZE] Set compiler-flags 🚀 * Created src/Makevars and set compiler-flags. No clue if it actually works. See https://stackoverflow.com/questions/32586455/how-to-change-and-set-rcpp-compile-arguments * Updated unit-tests 🚀 * The unit-tests now includes a wider array of functions. At this stage the functions are mainly based on .cmatrix methods, as I am reconsidering the structurer of the back-end. * Deleted the Makevars from commit c63e9a4, the flags were interfering with the functions and many regression functions broke. So the compiler flags will be set at a later point. * [OPTIMIZE] See commit message 📚 * In Issue #15 a plan were set to optimize the code base to increase maintainability of the code-base. This commit is the first step in that direction. This commit introduces some breaking changes (locally and user-facing). * The TP, FP, TN and FN functions have been rewritten, and now returns arrays directly. This means that ALL functions who were dependent on it are now broken. * All metrics are now calculated using OOP. Which is an awesome concept. This has significantly reduced the code-base, and comes without any further breaking changes. - To accomodate this new structure two new files are introduced in this commit. Classification utilities and helperrs. The helpers includes all code that is common for all metrics. The utilities holds the class-constructor (or whatever it is called). * Breaking changes: All classification metrics have an option of NULL, TRUE, FALSE for the `micro`-argument. Adding the `w`-argument, which can be NULL, or a NumericVector, introduces four different combinations of argument combinations. Although negligible, it does introduce some overhead in the function construction which is just inefficient. At this stage I have decided to follow the syntax of early R, so all weighted metrics will now be called with `weighted.foo`. - To ensure consistency, this will apply to regression metrics too. So at a later point these will have breaking changes. > NOTE: > > NEWS and README is not updated as the package can't build at this stage because of the locally breaking changes. * [UPDATE] `precision()`- and `specificity()` 🚀 * [UPDATE] Class likelihood ratios 🚀 * `plr()` updated * `nlr()` updated * `dor()` updated * [UPDATE] Migrated (more) functions to classes 🚀 * `fdr()` now has a weighted version * `fer()` now has a weighted version * `fpr()` now has a weighted version * `fbeta()` now has a weighted version * `jaccard()` now has a weighted version * `npv()` now has a weighted version * [OPTIMIZE] `accuracy()` and `baccuracy()` 🔥 * Both functions are now OOP, too. NOTE: The missing-value handling has been cancelled so far. It needs some revision. * `baccuracy()`-function can only do adjusted scores at the moment. Looking for a fix. 🔨 * [Optimize] `ZeroOneLoss`-function 🚀 * It now uses classes, too. * [Opimize] See message 🔥 * The base classification function that processes the data now uses varidic template. This solution enables dynamic passing of arguments, and reduces the need of overloaded functions * All weighted metrics now have their own method, this was a necessary trade-off in order to limit branching. * All remaining functions are now derived class functions, and heavily utilizes Eigen library. * All classification functions passes through the confusionMatrix function. This creates a bottleneck; but it gives the code-base decent maintainabiltiy. * There is currently no missing value handling in pair-wise functions; there is a need to find a better solution. More on this later. * Unit-tests, broken functions and what-nots [🔨] * The tolerance in the unit-tests has been nerfed to 1e-2; otherwise the tests doesn't pass. The source of the error is not exactly known, but it **may** be floating point inconsistencies between conversions. This will be investigated further. * S3 methods updated, and now supports weighted.foo for classification functions. NOTE to self: Be aware of passing arguments that are not used, as this will crash the program. If you pass it, you use it - or get rekt. * [Optimize] `ckappa()`-function 🔥 * Simplified the penalizing matrix by removing the redundant diag(0) operation. * Updated README 📚 * It now includes weigted.foo()-methods * [BUG-FIX] Floating Precision 🎯 * Changed all IntegerMatrix to NumericMatrix (See Issue #16) - this partially fixes the issues of inconsistent results. * Bumped default tolerance to 4e-4; the target is 1e-5. * Disabled unit-testing for micro/macro averages as the missing value handling has been disabled for now, so they wont be equal as of now. * Disabled testing for `ROC()` and `prROC()`, the tests are unstable. Looking for a fix. * Disabled various functions for testing as they differ from the target results still. Reason currently unkown. * Bug-fix in `ckappa()`-function in the penalizing matrix. * [MAJOR BUG-FIX] See commit message 📚 * An error in how FN were calculated has been fixed. This has solved many issues with the unit-testing. 🔨 * Unit-test tolerance has been set to 1e-9, and all tests passes except `fmi`-, `mcc`- and `ckappa`-function. This fix will come later. NEWS has been updated accordingly. * [BUG-FIX] `weigthed.mcc()`-function 🔨 * The weighting argument was not being passed at all. * [BUG-FIX] Added `weighted.ckappa()`-method 🔨 * [BUG-FIX] Error in Fowlkes Mallows Index fixed 🔨 * An error in the `fmi()`-function has been fixed. * Unit-tests for weighted functions are now tested against cmatrix methods. * NEWS Updated
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
issue
All classification functions are passed through the
cmatrix()
-function. But forfoo.cmatrix()
-functions where a weighted confusion is passed, produces different results fromweighted.foo()
-functions.reprex
Created on 2024-12-16 with reprex v2.1.1
source of bug
The output does converge, which suggest that its a floating-point precision issue. This also explains why all functions suddenly failed unit-tests. One possible source is as follows,
/~https://github.com/serkor1/SLmetrics/blob/f86d934b03910e8e385128563afb490d448a2199/src/classification_Helpers.h#L28C1-L41C2
The matrix is getting passed as
IntegerMatrix
and converted toEigen::MatrixXd
.fix
One possibility is to 🤷, another is to 😱 or change the following code:
/~https://github.com/serkor1/SLmetrics/blob/f86d934b03910e8e385128563afb490d448a2199/src/classification_Helpers.h#L28C1-L41C2
To
This goes for all
.cpp
-files too! 😥The text was updated successfully, but these errors were encountered: