v. 0.15.0

This release contains numerous improvements, bug fixes, and new methods. The main highlights are a standalone DD-SIMCA implementation, overhauled preprocessing pipelines (with some breaking changes), and JSON/CSV interoperability with the mda.tools web-applications.The tutorial has been updated and improved accordingly.

New dedicated methods for DD-SIMCA

In previous version DD-SIMCA was implemented via more versatile method simca, which lets also use other SIMCA implementations. While versatility is in general good, it limited the DD-SIMCA possibilities and it was decided to implement it separately.

Method ddsimca can now be used for training, testing and applying of Data Driven SIMCA models. It matches functionality of the corresponding web-application, including all plots and figures of merits (for example estimation of beta, selectivity, etc.). It also lets you change decision boundary parameters without rebuilding the main model.

See all details in the tutorial.

The original method simca is still available (and always will) for compatibility.

Improvements to preprocessing methods

There are several new methods for preprocessing, including:

The following methods are considered as deprecated, you can still use them (they will be kept for compatibility), but for new code it is recommended to use the alternatives:

In addition to that, the possibility to combine the preprocessing methods together into preprocessing chain (we will call it preprocessing model) has been improved. However, these improvements cause breaking changes, so if you used this feature before, check the text below and the updated user guides very carefully.

First of all, the syntax for creating preprocessing items for combining them to the chain has changed — parameters are now passed as named arguments instead of a list:

Also, the option to add user defined preprocessing methods into a preprocessing model has been removed as it caused issues and non-stable behavior in some cases. From this version, only selected methods can be combined together to a preprocessing model. You can see a full list of the currently supported methods by running prep.list(). This list will be extended eventually. And you can still use user defined methods and methods which are not in the list separately.

Second, a preprocessing model can now be integrated directly into pca, pls, simca, ddsimca, and plsda via the prep parameter — the model will train the preprocessing pipeline and apply it automatically during calibration and prediction.

Finally, preprocessing methods can also be trained independently using prep.fit(), which pre-computes parameters that depend on the training set (e.g. values for centering, scaling, or the reference spectrum for EMSC). Applying the trained preprocessing model to new data is done with prep.apply().

Please check the updated documentation for all details and examples.

JSON and CSV interoperability

Models created with pca, pls, and ddsimca can now be exported to JSON using writeJSON() and imported from JSON using readJSON() method. This enables round-trip interoperability with the corresponding mda.tools web-applications — you can build a model in R, upload it to a browser for interactive use, or develop a model in a web-app and load it into R for predictions.

When a model includes a preprocessing pipeline (via the prep parameter), the pipeline is saved as part of the JSON file and applied automatically on import.

Result objects from pca, pls, and ddsimca now also have a writeCSV() method that exports main outcomes in a format identical to the one produced by the web-applications.

Improvements and changes

Bug fixes

v. 0.14.2

v. 0.14.1

v. 0.14.0

The changes are relatively small, but some of them can be potentially breaking, hence the version is bumped up to 0.14.0.

v. 0.13.1

v. 0.13.0

This release brings an updated implementation of PLS algorithm (SIMPLS) which is more numerically stable and gives sufficiently less warnings about using too many components in case when you work with small y-values. The speed of pls() method in general has been also improved.

Another important thing is that cross-validation of regression and classification models has been re-written towards more simple solution and now you can also use your own custom splits by providing a vector with segment indices associated with each measurement. For example if you run PLS with parameter cv = c(1, 2, 3, 4, 1, 2, 3, 4, 1, 2) it is assumed that you want to use venetian blinds split with four segments and your dataset has 10 measurements. See more details in the tutorial, where description of cross-validation procedure has been moved to a separate section.

Other changes and improvements:

v. 0.12.0

This release is mostly about preprocessing - added some new methods, improved the existent once and implemented a possibility to combine preprocessing methods together (including parameter values) and apply them all together in a correct sequence. See preprocessing section in the tutorials for details

New features and improvements

Bug fixes

Other changes

v. 0.11.5

v. 0.11.4

v. 0.11.3

v. 0.11.2

v. 0.11.1

v. 0.11.0

New features

Improvements and bug fixes

v. 0.10.4

v. 0.10.3

v. 0.10.2

v. 0.10.1

v. 0.10.0

Many changes have been made in this version, but most of them are under the hood. Code has been refactored significantly in order to improve its efficiency and make future support easier. Some functionality has been re-written from the scratch. Most of the code is backward compatible, which means your old scripts should have no problem to run with this version. However, some changes are incompatible and this can lead to occasional errors and warning messages. All details are shown below, pay a special attention to breaking changes part.

Another important thing is the way cross-validation works starting from this version. It was decided to use cross-validation only for computing performance statistics, e.g. error of predictions in PLS or classification error in SIMCA or PLS-DA. Decomposition results, such as explained variance or residual distances are not computed for cross-validation anymore. It was a bad idea from the beginning, as the way it has been implemented is not fully correct — distances and variances measured for different local models should not be compared directly. After a long consideration it was decided to implement this part in a more correct and conservative way.

Finally, all model results (calibration, cross-validation and test set validation), are now combined into a single list, model$res. This makes a lot of things easier. However, the old way of accessing the result objects (e.g. model$calres or model$cvres) still works, you can access e.g. calibration results both using model$res$cal and model$calres, so this change will not break the compatibility.

Below is more detailed list of changes. The tutorial has been updated accordingly.

Breaking changes

Here are changes which can potentially lead to error messages in previously written code.

General

Plotting functions

PCA

As mentioned above, the biggest change which can potentially lead to some issues with your old code is that cross-validation is no more available for PCA models.

Other changes: * Default value for lim.type parameter is "ddmoments" (before it was "jm"). This changes default method for computing critical limits for orthogonal and score distances. * Added new tools for assessing complexity of model (e.g. DoF plots, see tutorial for details). * More options available for analysis of residual distances (e.g marking objects as extremes, etc.). * Method setResLimits() is renamed to setDistanceLimits() and has an extra parameter, lim.type which allows to change the method for critical limits calculation without rebuilding the PCA model itself. * Extended output for summary() of PCA model including DoF for distances (Nh and Nq). * plotExtreme() is now also available for PCA model (was used only for SIMCA models before). * For most of PCA model plots you can now provide list with result objects to show the plot for. This makes possible to combine, for example, results from calibration set and new predictions on the same plot. * You can now add convex hull or confidence ellipse to groups of points on scores or residuals plot made for a result object. * New method categorize() allowing to categorize data rows as “regular”, “extreme” or “outliers” based on residual distances and corresponding critical limits.

SIMCA/SIMCAM

Regression coefficients

PLS regression

As mentioned above, the PLS calibration has been simplified, thus selectivity ratio and VIP scores are not computed automatically when PLS model is created. This makes the calibration faster and makes parameter light unnecessary (removed). Also Jack-Knifing is used every time you apply cross-validation, there is no need to specify parameters coeffs.alpha and coeffs.ci anymore (both parameters have been removed). It does not lead to any additional computational time and therefore it was decided to do it automatically.

Other changes are listed below:

v. 0.9.6

v. 0.9.5

v. 0.9.4

v. 0.9.3

v. 0.9.2

v. 0.9.1

v. 0.9.0

v. 0.8.4

v. 0.8.3

v. 0.8.2

v. 0.8.1

v. 0.8.0

v. 0.7.2

v. 0.7.1

v. 0.7.0

v. 0.6.2

v. 0.6.1

v. 0.6.0

v. 0.5.3

v. 0.5.2

v. 0.5.1

v. 0.5.0

v. 0.4.0

v. 0.3.2

v. 0.3.1

v. 0.3.0