[BUG] Precision Loss 🤷‍♀️ #16

serkor1 · 2024-12-16T19:37:44Z

issue

All classification functions are passed through the cmatrix()-function. But for foo.cmatrix()-functions where a weighted confusion is passed, produces different results from weighted.foo()-functions.

reprex

# 1) generate vector
# for values
matrix_method <- weighted_method <- reference <- numeric()

# 2) simulate sample
# sizes
i <- 1
for (N in c(1e2, 1e3, 1e4, 1e4, 1e6)) {

  # 2.1) generate classes
  actual    <- factor(sample(letters[1:3], N, replace = TRUE))
  predicted <- factor(sample(letters[1:3], N, replace = TRUE))
  weights   <- runif(length(actual))

  # 2.2) generate accuracy
  # from each method
  matrix_method[i] <- SLmetrics::accuracy(
    SLmetrics::cmatrix(
      actual    = actual,
      predicted = predicted,
      w         = weights
    )
  )

  weighted_method[i] <- SLmetrics::weighted.accuracy(
    actual    = actual,
    predicted = predicted,
    w         = weights
  )
  
  reference[i] <- weighted.mean(
    x = as.numeric(actual) == as.numeric(predicted),
    w = weights
  )

  i <<- i + 1
}

# 3) display results
data.frame(
  reference,
  matrix_method,
  weighted_method
)
#>   reference matrix_method weighted_method
#> 1 0.3943053     0.3829787       0.3943053
#> 2 0.3465705     0.3486056       0.3465705
#> 3 0.3388643     0.3389222       0.3388643
#> 4 0.3350625     0.3349970       0.3350625
#> 5 0.3342705     0.3342708       0.3342705

^{Created on 2024-12-16 with reprex v2.1.1}

source of bug

The output does converge, which suggest that its a floating-point precision issue. This also explains why all functions suddenly failed unit-tests. One possible source is as follows,

/~https://github.com/serkor1/SLmetrics/blob/f86d934b03910e8e385128563afb490d448a2199/src/classification_Helpers.h#L28C1-L41C2

The matrix is getting passed as IntegerMatrix and converted to Eigen::MatrixXd.

fix

One possibility is to 🤷, another is to 😱 or change the following code:

/~https://github.com/serkor1/SLmetrics/blob/f86d934b03910e8e385128563afb490d448a2199/src/classification_Helpers.h#L28C1-L41C2

To

// 1) matrix template
// without micro-agument
template <typename... Args>
Rcpp::NumericVector classification_base(
    const Rcpp::NumericMatrix& matrix,
    const classification& foo,
    Args&&... args)
{
    // 0) Convert matrix to Eigen format
    Eigen::MatrixXd eigen_matrix = Rcpp::as<Eigen::MatrixXd>(matrix);


    // 1) Forward the additional arguments to foo.compute
    return foo.compute(eigen_matrix, std::forward<Args>(args)...);
}

This goes for all .cpp-files too! 😥

The text was updated successfully, but these errors were encountered:

serkor1 · 2024-12-16T22:05:08Z

Fixed

# 1) generate vector
# for values
matrix_method <- weighted_method <- reference <- numeric()

# 2) simulate sample
# sizes
i <- 1
for (N in c(1e2, 1e3, 1e4, 1e4, 1e6)) {

  # 2.1) generate classes
  actual    <- factor(sample(letters[1:3], N, replace = TRUE))
  predicted <- factor(sample(letters[1:3], N, replace = TRUE))
  weights   <- runif(length(actual))

  # 2.2) generate accuracy
  # from each method
  matrix_method[i] <- SLmetrics::accuracy(
    SLmetrics::cmatrix(
      actual    = actual,
      predicted = predicted,
      w         = weights
    )
  )

  weighted_method[i] <- SLmetrics::weighted.accuracy(
    actual    = actual,
    predicted = predicted,
    w         = weights
  )
  
  reference[i] <- weighted.mean(
    x = as.numeric(actual) == as.numeric(predicted),
    w = weights
  )

  i <<- i + 1
}

# 3) display results
data.frame(
  reference,
  matrix_method,
  weighted_method
)
#>   reference matrix_method weighted_method
#> 1 0.3695044     0.3695044       0.3695044
#> 2 0.3241264     0.3241264       0.3241264
#> 3 0.3379269     0.3379269       0.3379269
#> 4 0.3362808     0.3362808       0.3362808
#> 5 0.3327261     0.3327261       0.3327261

^{Created on 2024-12-16 with reprex v2.1.1}

Changing IntegerMatrix to NumericMatrix fixed it!

* Changed all IntegerMatrix to NumericMatrix (See Issue #16) - this partially fixes the issues of inconsistent results. * Bumped default tolerance to 4e-4; the target is 1e-5. * Disabled unit-testing for micro/macro averages as the missing value handling has been disabled for now, so they wont be equal as of now. * Disabled testing for `ROC()` and `prROC()`, the tests are unstable. Looking for a fix. * Disabled various functions for testing as they differ from the target results still. Reason currently unkown. * Bug-fix in `ckappa()`-function in the penalizing matrix.

* [UPDATE] `cmatrix()`-function 🔥 * `cmatrix()`-function now accepts the argument `w`, a <numeric>-vector of sample weights. 🚀 * Unit-tests: Updated the unit-tests so it can accomodate weighted metrics * Documentation: Updated documentation on `cmatrix()`-function * NEWS: Updated news. * DESCRIPTION: Version bump! 🚀 All tests passed locally. * [OPTIMIZE] Set compiler-flags 🚀 * Created src/Makevars and set compiler-flags. No clue if it actually works. See https://stackoverflow.com/questions/32586455/how-to-change-and-set-rcpp-compile-arguments * Updated unit-tests 🚀 * The unit-tests now includes a wider array of functions. At this stage the functions are mainly based on .cmatrix methods, as I am reconsidering the structurer of the back-end. * Deleted the Makevars from commit c63e9a4, the flags were interfering with the functions and many regression functions broke. So the compiler flags will be set at a later point. * [OPTIMIZE] See commit message 📚 * In Issue #15 a plan were set to optimize the code base to increase maintainability of the code-base. This commit is the first step in that direction. This commit introduces some breaking changes (locally and user-facing). * The TP, FP, TN and FN functions have been rewritten, and now returns arrays directly. This means that ALL functions who were dependent on it are now broken. * All metrics are now calculated using OOP. Which is an awesome concept. This has significantly reduced the code-base, and comes without any further breaking changes. - To accomodate this new structure two new files are introduced in this commit. Classification utilities and helperrs. The helpers includes all code that is common for all metrics. The utilities holds the class-constructor (or whatever it is called). * Breaking changes: All classification metrics have an option of NULL, TRUE, FALSE for the `micro`-argument. Adding the `w`-argument, which can be NULL, or a NumericVector, introduces four different combinations of argument combinations. Although negligible, it does introduce some overhead in the function construction which is just inefficient. At this stage I have decided to follow the syntax of early R, so all weighted metrics will now be called with `weighted.foo`. - To ensure consistency, this will apply to regression metrics too. So at a later point these will have breaking changes. > NOTE: > > NEWS and README is not updated as the package can't build at this stage because of the locally breaking changes. * [UPDATE] `precision()`- and `specificity()` 🚀 * [UPDATE] Class likelihood ratios 🚀 * `plr()` updated * `nlr()` updated * `dor()` updated * [UPDATE] Migrated (more) functions to classes 🚀 * `fdr()` now has a weighted version * `fer()` now has a weighted version * `fpr()` now has a weighted version * `fbeta()` now has a weighted version * `jaccard()` now has a weighted version * `npv()` now has a weighted version * [OPTIMIZE] `accuracy()` and `baccuracy()` 🔥 * Both functions are now OOP, too. NOTE: The missing-value handling has been cancelled so far. It needs some revision. * `baccuracy()`-function can only do adjusted scores at the moment. Looking for a fix. 🔨 * [Optimize] `ZeroOneLoss`-function 🚀 * It now uses classes, too. * [Opimize] See message 🔥 * The base classification function that processes the data now uses varidic template. This solution enables dynamic passing of arguments, and reduces the need of overloaded functions * All weighted metrics now have their own method, this was a necessary trade-off in order to limit branching. * All remaining functions are now derived class functions, and heavily utilizes Eigen library. * All classification functions passes through the confusionMatrix function. This creates a bottleneck; but it gives the code-base decent maintainabiltiy. * There is currently no missing value handling in pair-wise functions; there is a need to find a better solution. More on this later. * Unit-tests, broken functions and what-nots [🔨] * The tolerance in the unit-tests has been nerfed to 1e-2; otherwise the tests doesn't pass. The source of the error is not exactly known, but it **may** be floating point inconsistencies between conversions. This will be investigated further. * S3 methods updated, and now supports weighted.foo for classification functions. NOTE to self: Be aware of passing arguments that are not used, as this will crash the program. If you pass it, you use it - or get rekt. * [Optimize] `ckappa()`-function 🔥 * Simplified the penalizing matrix by removing the redundant diag(0) operation. * Updated README 📚 * It now includes weigted.foo()-methods * [BUG-FIX] Floating Precision 🎯 * Changed all IntegerMatrix to NumericMatrix (See Issue #16) - this partially fixes the issues of inconsistent results. * Bumped default tolerance to 4e-4; the target is 1e-5. * Disabled unit-testing for micro/macro averages as the missing value handling has been disabled for now, so they wont be equal as of now. * Disabled testing for `ROC()` and `prROC()`, the tests are unstable. Looking for a fix. * Disabled various functions for testing as they differ from the target results still. Reason currently unkown. * Bug-fix in `ckappa()`-function in the penalizing matrix. * [MAJOR BUG-FIX] See commit message 📚 * An error in how FN were calculated has been fixed. This has solved many issues with the unit-testing. 🔨 * Unit-test tolerance has been set to 1e-9, and all tests passes except `fmi`-, `mcc`- and `ckappa`-function. This fix will come later. NEWS has been updated accordingly. * [BUG-FIX] `weigthed.mcc()`-function 🔨 * The weighting argument was not being passed at all. * [BUG-FIX] Added `weighted.ckappa()`-method 🔨 * [BUG-FIX] Error in Fowlkes Mallows Index fixed 🔨 * An error in the `fmi()`-function has been fixed. * Unit-tests for weighted functions are now tested against cmatrix methods. * NEWS Updated

serkor1 added bug Something isn't working help wanted Extra attention is needed labels Dec 16, 2024

serkor1 self-assigned this Dec 16, 2024

serkor1 pinned this issue Dec 16, 2024

serkor1 closed this as completed Dec 16, 2024

serkor1 unpinned this issue Dec 17, 2024

serkor1 mentioned this issue Dec 17, 2024

Weighted Metrics, Bug-fixes and Reversions 🔥 #17

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Precision Loss 🤷‍♀️ #16

[BUG] Precision Loss 🤷‍♀️ #16

serkor1 commented Dec 16, 2024 •

edited

Loading

serkor1 commented Dec 16, 2024

[BUG] Precision Loss 🤷‍♀️ #16

[BUG] Precision Loss 🤷‍♀️ #16

Comments

serkor1 commented Dec 16, 2024 • edited Loading

issue

reprex

source of bug

fix

serkor1 commented Dec 16, 2024

Fixed

serkor1 commented Dec 16, 2024 •

edited

Loading