-
-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
{SLmetrics} Version 0.3-0 🚀 #33
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* [OPTIMIZE] A new template for classification 🔨 * The new template uses a template handler to avoid too many overloading templates. This effective reduces the amount of code to achieve the same thing. It is still variadic by construction to maintain flexbility of future additions. * [HOT-FIX] See commit-message 📚 * Root Relative Squared Error example script renamed so it aligns with documentation * Removed ellipsis from S3_Accuracy documentation to avoid roxygen2 producing "fully documented"-warnings * Removed Redundant Logic 🔨 * All overloaded functions in the classifiation class has been removed. The overall logic has been simplified, and the old functions are therefore no longer needed 🚀 * The old templates have been removed as they are no longer needed. * [HOT-FIX] Likelihood Ratios 🔨 * All Likelihood Ratios have had the `micro`- and `na_rm`-argument removed as they were not used. * The functions have been refactored and is now named more verbosely according to the metric. * [REFACTOR] See commit message 📚 * All classification functions are constructed more verbosely - all affilliated derived classes are named as FooClass in CamelCase. * All function logic and arguments are namespace qualified, and are now on the form Rcpp::ObjectType * All additional arguments other than `w` and `micro` are handled inside the derived class. This reduces the clutter in the classifcation class object. All tests passed locally.
* `beta` is now correctly passed as a double parameter.
* Test-setup flexibility 🔨 * Added interactive tests to ease the development flow. This enables sourcing all files via an external script. * New Matrix Class 🔨 * The matrix class reduces the amount of repeated code. * The matrix templates have been incorporated in the class, and are using overloading instead of if-statements * Updated NAMESPACE, and associated methods. * Deleted classification_Utils.H 🔨 * The header file is no longer needed. * Deleted calls to the header file. * Consolidated classification_Helpers.h 🔨 * The classification_Utils.H contents are moved to the helper. * [UPDATE] See commit-message 📚 * Updated NEWS (but not rendered), * Updaetd unit-tests and references according to the new matrix method.
* Test-setup flexibility 🔨 * Added interactive tests to ease the development flow. This enables sourcing all files via an external script. * New Matrix Class 🔨 * The matrix class reduces the amount of repeated code. * The matrix templates have been incorporated in the class, and are using overloading instead of if-statements * Updated NAMESPACE, and associated methods. * Deleted classification_Utils.H 🔨 * The header file is no longer needed. * Deleted calls to the header file. * Consolidated classification_Helpers.h 🔨 * The classification_Utils.H contents are moved to the helper. * [UPDATE] See commit-message 📚 * Updated NEWS (but not rendered), * Updaetd unit-tests and references according to the new matrix method. * [FEATURE] Relative Root Mean Squared Errror 🚀 * A new feature have been (re)introduced; Relative Root Mean Squared Error. The function normalizes the RMSE relative to the mean, range or IQR. * The quantile functions have been taken from `pinball()` - it is probably a good idea to create another header file for this. * Created unit-tests, example and documentatation. NOTE: NEWS are not updated. * [DOCUMENTATION] Updated NEWS 📚
* The returned vectores of classification metrics weren't named.
commit cfb8d5cbacebafc0997cd8f2c076940b18cd82ec Author: serkor1 <77464572+serkor1@users.noreply.github.com> Date: Sat Dec 28 11:53:22 2024 +0100 Re-written unit-tests for the new functions :hammer: * Functions written in R have had py_-prefix removed so the distinction between what is from scikit-learn and manual is clear. NOTE: not all functions have been rewritten. This is a work-in-progress. commit aa60bbbaa319b5ef7e512e87291a6fbd74e28623 Author: serkor1 <77464572+serkor1@users.noreply.github.com> Date: Sat Dec 28 11:50:37 2024 +0100 [OPTIMIZE] Refactored reference functions :rocket: * The prROC and ROC function from scikit-learn now iterates through all available labels, to simplify the R side of the unit-tests * The reference functions have been refactored such that the amount of repeated code is reduced by introducing a generalized metric function. * The reference functions have been split between regression and classification functions * The setup script has been wrapped in is.interactive() so the tests can be run directly
* Removed old artefacts; all functions are now treated equally - so there is no need to specify function names.
* Removed python-setup: This is piece of code were created when transitioning from Rstudio to Positron. The idea was that the python testing environment were to be selfcontained in the project. But for reason either related to my skills, {reticulate} or Positron it never worked as intended. And considering the state of {SLmetrics} it is not worth spending time on making it work at this stage. This is for later when all else is done. * Added cleaning-commands to build and check before and after checks; its always good to start from scratch.
* [FEATURE] Weighted `ROC()` and `prROC()` 🚀 * The functions now has a weighted version. * The functions still supports custom thresholds. * Micro and na.rm has been removed for now; the micro/macro average still needs some polishing * [UNIT-TEST] Unit-tests updated 🔨 * The unit-tests are a bit shaky for ROC as scikit-learn drops some thresholds. This is not implemented here. * [DOCUMENTATION] Vignette Modfied 📚 * It was using max() which returrns Inf; Inf can't be plotted... * [UPDATE] NAMESPACE and .Rd updated 📚 * [DOCUMENTATION] Updated NEWS 📚 * [BUG-FIX] Added class to functions 🥷 * [BUG-FIX] .... Don't ask
* General layout changes and additions to the README NOTE: The performance evaluation have been extended to include memory-checks via {bench}
## 📚 What? * **New Feature:** Weighted and unweighted cross-entropy loss. * **Bug-fix:** The {bench} package reference was misspelled, and `prROC()` was incorrectly sharing documentation with `ROC()`. This has been corrected. ## 👀 Showcase The order of the <[factor]> doesn't matter, as long as the `response_matrix` is correctly specified probability-wise. Ie. the `classes` can be specified in any order, as long as the corresponding `response_matrix` follows the order of the classes. See below. ### Example 1: "Class A" followed by "Class B" ``` r # 1) define classes and # observed classes (actual) classes <- c("Class A", "Class B") actual <- factor( c("Class A", "Class B", "Class A"), levels = classes ) # 2) define probabilites # and construct response_matrix response <- c( 0.2, 0.8, 0.8, 0.2, 0.7, 0.3 ) response_matrix <- matrix( response, nrow = 3, ncol = 2, byrow = TRUE ) colnames(response_matrix) <- classes response_matrix #> Class A Class B #> [1,] 0.2 0.8 #> [2,] 0.8 0.2 #> [3,] 0.7 0.3 # 3) calculate entropy SLmetrics::entropy( actual, response_matrix ) #> [1] 1.19185 ``` <sup>Created on 2024-12-30 with [reprex v2.1.1](https://reprex.tidyverse.org)</sup> ### Example 2: "Class B" followed by "Class A" ``` r # 1) define classes and # observed classes (actual) classes <- c("Class B", "Class A") actual <- factor( c("Class A", "Class B", "Class A"), levels = classes ) # 2) define probabilites # and construct response_matrix response <- c( 0.2, 0.8, 0.8, 0.2, 0.7, 0.3 ) response_matrix <- 1 - matrix( response, nrow = 3, ncol = 2, byrow = TRUE ) colnames(response_matrix) <- classes response_matrix #> Class B Class A #> [1,] 0.8 0.2 #> [2,] 0.2 0.8 #> [3,] 0.3 0.7 # 3) calculate entropy SLmetrics::entropy( actual, response_matrix ) #> [1] 1.19185 ``` <sup>Created on 2024-12-30 with [reprex v2.1.1](https://reprex.tidyverse.org)</sup> As shown, the cross-entropy is identical (1.19185 in both cases). The order of `classes` in the factor’s levels just needs to match the order of columns in the `response_matrix`. With this new feature, you can also add observation-level weights via `SLmetrics::weighted.entropy()`.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
documentation
Improvements or additions to documentation
enhancement
New feature or request
optimze
Various optimizations to source code
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Note
See NEWS or commit history for detailed changes.
📚 What?
🚀 New features
This update introduces four new features. These are described below,
Cross-Entropy Loss (PR #34): Weighted and unweighted cross-entropy loss. The function can be used as follows,
Relative Root Mean Squared Error (Commit 5521b5b):
The function normalizes the Root Mean Squared Error by a factor. There is no official way of normalizing it - and in {SLmetrics} the RMSE can be normalized using three options; mean-, range- and IQR-normalization. It can be used as follows,
Weighted Receiver Operator Characteristics and Precision-Recall Curves (PR #31):
These functions returns the weighted version of
TPR
,FPR
andprecision
,recalll
inweighted.ROC()
andweighted.prROC()
respectively. Theweighted.ROC()
-function1 can be used as follows,w
-argument incmatrix()
has beenremoved in favor of the more verbose weighted confusion matrix call
weighted.cmatrix()
-function. See below,Prior to version
0.3-0
the weighted confusion matrix were a part ofthe
cmatrix()
-function and were called as follows,This solution, although simple, were inconsistent with the remaining
implementation of weighted metrics in {SLmetrics}. To regain consistency
and simplicity the weighted confusion matrix are now retrieved as
follows,
🐛 Bug-fixes
micro == NULL
were not returning named vectors. This has been fixed.Footnotes
The syntax is the same for
weighted.prROC()
↩