RapidMiner Studio Changelog

What's new in RapidMiner Studio 10.2.0

Aug 17, 2023
  • Features:
  • Added Delete Amazon S3 Resource operator
  • Added user interaction after a project was cleaned up on AI Hub
  • Ignore it and keep project disconnected
  • Overwrite local version by clean check out from AI Hub
  • Archive local changes and then overwrite local version as above
  • Enhancements:
  • Migrated Generate ID and Split Data operators to the new Belt data core, future-proofing them and improving their speed.
  • Added new setting in the preferences to control whether RapidMiner Studio should favour speed over memory footprint or vice-versa. It can be changed to reduce memory footprint while trading runtime if memory is critical. The setting can be found under System and is called Memory Management.
  • Added repository web action to go to deployment endpoints
  • Improved error recovery and error messages for date_parse_str function of the expression parser.
  • Trailing white spaces are no longer treated as errors in the expression parser.
  • Improved opening URL experience on certain Linux distributions which do not support triggering browsing programmatically
  • The Correlation Matrix operator now uses the new and improved subset selector
  • Further reduced start-up time of RapidMiner Studio:
  • Introduced lazy loading of operators
  • Improved utilization of operator signature cache
  • Introduced shallow plugin initialization
  • Bugfixes:
  • Fixed broken error messages in the Edit Expressions dialog that operators like Generate Attributes use to display the expression parser
  • Removed deprecated Stream Database operator (deprecated since version 7.5, six years ago)
  • Fixed bug in data splitting code that prevented empty partitions in some cases.
  • Fixed Synchronize Meta Data with Real Data not working even though it was selected. The selection is now remembered after restart.
  • When Synchronize Meta Data with Real Data is activated and the process has been run, Read operators like Read Excel and Read CSV remember the real metadata even if another operator is added to the process.
  • Fixed parameters stay above value and stay below value of Prescriptive Analytics operator
  • Fixed a possible concurrency issue when writing json IOObjects in parallel
  • Fixed potential access denied error for Read Azure Data Lake Storage Gen2 operator when reading larger files
  • Development:
  • Added com.rapidminer.repository.recent.RecentDataManager to allow global access to the recently used data sets. It comes with a listener mechanism and currently keeps track of data opened in the Results view, as well as used in the Interactive Decision Tree wizard.
  • Removed deprecated classes and methods pertaining to the old concept of managing Perspectives (including MainFrame#getPerspectives())
  • Added DeveloperTools#shouldDeveloperToolsBeShown() to allow for an easy way to check whether you want to offer developer tools of some capacity when appropriate
  • Fixed bug that caused TableMetaData#columns() to return a meta data sub-table with random column order
  • Fixed bug when registering IOObjects from operator signature
  • Plugins now properly also look up resources like icons from the default com/rapidminer/extension/resources path. The old additional lookup for com/rapidminer/resources is kept for compatibility reasons.
  • Deprecated: SwingTools#addIconStoragePath(String), it never worked

New in RapidMiner Studio 10.1.3 (Jun 19, 2023)

  • Enhancements:
  • Drastically reduced time needed for RapidMiner Studio to start up, especially in scenarios with many installed extensions.
  • Improved logging for AI Hub logins in case of problems
  • Added recursive option to Delete Azure Data Lake Storage Resource and Delete Azure Data Lake Storage Gen2 Resource operators to delete non-empty folders
  • Bugfixes:
  • Fixed an issue that could lead to memory leaks when applying a model
  • Fixed an issue where get updates and create snapshot on the snapshot history panel might work even though the connection to AI Hub was severed
  • Fixed a rare issue which could cause extensions installed via the Marketplace to no longer load on startup due to a corrupted index
  • Fixed Update Salesforce operator, it no longer reports that no ID attribute was found
  • The quickfix for a missing role no longer uses the deprecated Set Role operator, but the new one instead
  • Fixed errors being sometimes silently hidden from inside Execute Process operators

New in RapidMiner Studio 10.1.2 (Mar 23, 2023)

  • Enhancements:
  • Excel .xlsx file import is significantly more robust now. This should mean that almost any Excel file can be read successfully now. This applies to both the GUI import wizard and the Read Excel operator.
  • Added link to onboarding dialog to enable using Altair Units license.
  • Improved error message if infinite values are in the data set when displaying a histogram visualization.
  • Details of error messages during process execution now also show up in the log.
  • Altair Units License:
  • Added default limit of 8 logical cores. If you want to utilize more cores, please increase the number in the License tab of the Settings dialog.
  • Added setting to reserve threads for background execution. These count against the logical core limit.
  • Bugfixes:
  • Fixed multiple issues with Auto Model result saving and re-opening
  • Fixed an issue that prevented disconnecting from projects
  • Fixed EULA in Windows installer being displayed with strange characters

New in RapidMiner Studio 10.1.1 (Mar 23, 2023)

  • Features:
  • Enabled usage of RapidMiner Studio with Altair licenses
  • Introduced new operators which are powered by the new Belt data core. Existing processes will continue to use the previous operator versions, so existing processes will continue to work as before.
  • Generate Attributes:
  • The expression parser can now access other rows via index, allowing for much more powerful expressions (e.g. Fibonacci, aggregations, etc.)
  • Added new lead/lag/cell_value/row_number functions
  • Added new time functions for time arithmetic
  • Improved date-time functions for a more consistent user experience across different time zones and locales.
  • Select Attributes:
  • The new column types available in the new data core are now available here. This is a feature not yet used, however it will be in the future.
  • Greatly improved the user interface for the selection to make it much more user-friendly
  • Removed the very rarely used filter types 'block type' and 'numeric value filter'
  • Set Role:
  • Roles can now be assigned more than once, e.g. selecting multiple label columns. This is a feature not yet used, however it will be in the future.
  • The new data core does not allow dynamic roles anymore, so only predefined roles are now available. The 'metadata' role can be used to mark special columns that should be ignored during operator calculations.
  • Advanced columns are now allowed outside operators:
  • The new advanced column types text, text-list, text-set and real-array can now be used in Belt Data Tables between operators
  • They can be filtered out by type with Select Attributes to use the data table in operators that still operate on Example Sets
  • For now, this feature will not be visible to you unless you install future extensions that make use of these new column types. This is just the foundation so new extensions can start making use of these new capabilities.
  • Enabled future-proof JSON serialization for all IOObjects
  • Enhancements:
  • Removed parameter create view from all preprocessing operators and Apply Model. This parameter was both virtually unused and broken - and is not being supported by our new internal data structure we introduced a while ago.
  • Improved automatic storage of custom result visualizations in cases where you have more than one file with the same name (but of different types) in the same repository folder
  • Activated Json serialization for all IOObjects (besides ExampleSets, Belt IOTables and IOObjectCollections, which all already have a new, special file format)
  • Old .ioo files can still be read
  • Non-converted IOObjects are still stored as .ioo
  • Works in local repositories and projects, but not in legacy repositories
  • Added admin settings with key rapidminer.disallow.decryption.storage. This impacts a user's interaction with connections
  • When set to true, any and all interactions with connections that need to decrypt values require a login (edit, create, move/copy, get metadata in a process)
  • If a project with disabled decryption storage is deleted from AI Hub, it cannot be recreated
  • Added support for OAuth for Salesforce connections
  • Support to manually configure the AI Hub frontend URL when connecting to a project (optional)
  • Added Setting to de-/activate automatic detection of remote changes for a project
  • Added Setting to de-/activate automatic detection of local changes for a project, to allow reducing filesystem access
  • Bugfixes:
  • Fixed UI issue when opening an operator chain (e.g. Subprocess operator) with a breakpoint
  • Fixed issues with the data import wizard not closing on completion
  • Studio is now respecting the max memory setting again when started via the .exe on Windows
  • Fixed date-time formatting when starting via the .exe on Windows
  • Fixed some UI scaling issues when starting via the .exe on Windows
  • Error dialog improvements when using AI Hub
  • Fixed some json serialization that was behaving incorrectly
  • Fixed bug in Remember operator that could lead to unexpected modification of the stored IOObject
  • Fixed a bug in Weight by Relief which led to an incorrectly empty weight table when the sample ratio was smaller than one
  • Fixed a bug in SVM (Evolutionary) that occurred for nominal labels, when the hold out set ratio was non-zero.
  • Fixed a bug in SVM (Evolutionary) that occurred when the hold out ratio lead to empty training data.
  • Fixed a bug in the view for kernel models with infinite parameters.
  • Fixed possible NPE in process tool
  • Fixed issue in the Statistics tab of a data table failing when data contained infinite values in numerical columns Fix missing error message when trying to run a process execution on AI Hub from a project where the encryption key is not known anymore
  • Fixed Google Cloud Services connections using Service account for Google Drive access
  • Development:
  • Server client now exposes loaded and failed extensions on AI Hub (ServerClient#getLoadedExtensions() and ServerClient#getFailedExtensions())
  • Renamed ServerClient methods listVersionedRepositories and deleteVersionedRepository to listProjects and deleteProject
  • Added ServerClient methods for getProject and createProject to match the naming convention
  • Deprecated and forwarded old ServerClient methods getVersionedRepository and createVersionedRepository to getProject and createProject
  • Exposed json serialization for IOObjects in API of JsonIOObjectEntry

New in RapidMiner Studio 10.1.0 (Jan 25, 2023)

  • Features:
  • Enabled usage of RapidMiner Studio with Altair licenses
  • Introduced new operators which are powered by the new Belt data core. Existing processes will continue to use the previous operator versions, so existing processes will continue to work as before.
  • Generate Attributes:
  • The expression parser can now access other rows via index, allowing for much more powerful expressions (e.g. Fibonacci, aggregations, etc.)
  • Added new lead/lag/cell_value/row_number functions
  • Added new time functions for time arithmetic
  • Improved date-time functions for a more consistent user experience across different time zones and locales.
  • Select Attributes:
  • The new column types available in the new data core are now available here. This is a feature not yet used, however it will be in the future.
  • Greatly improved the user interface for the selection to make it much more user-friendly
  • Removed the very rarely used filter types 'block type' and 'numeric value filter'
  • Set Role:
  • Roles can now be assigned more than once, e.g. selecting multiple label columns. This is a feature not yet used, however it will be in the future.
  • The new data core does not allow dynamic roles anymore, so only predefined roles are now available. The 'metadata' role can be used to mark special columns that should be ignored during operator calculations.
  • Advanced columns are now allowed outside operators:
  • The new advanced column types text, text-list, text-set and real-array can now be used in Belt Data Tables between operators
  • They can be filtered out by type with Select Attributes to use the data table in operators that still operate on Example Sets
  • For now, this feature will not be visible to you unless you install future extensions that make use of these new column types. This is just the foundation so new extensions can start making use of these new capabilities.
  • Enabled future-proof JSON serialization for all IOObjects
  • Enhancements:
  • Removed parameter create view from all preprocessing operators and Apply Model. This parameter was both virtually unused and broken - and is not being supported by our new internal data structure we introduced a while ago.
  • Improved automatic storage of custom result visualizations in cases where you have more than one file with the same name (but of different types) in the same repository folder
  • Activated Json serialization for all IOObjects (besides ExampleSets, Belt IOTables and IOObjectCollections, which all already have a new, special file format)
  • Old .ioo files can still be read
  • Non-converted IOObjects are still stored as .ioo
  • Works in local repositories and projects, but not in legacy repositories
  • Added admin settings with key rapidminer.disallow.decryption.storage. This impacts a user's interaction with connections
  • When set to true, any and all interactions with connections that need to decrypt values require a login (edit, create, move/copy, get metadata in a process)
  • If a project with disabled decryption storage is deleted from AI Hub, it cannot be recreated
  • Support to manually configure the AI Hub frontend URL when connecting to a project (optional)
  • Added Setting to de-/activate automatic detection of remote changes for a project
  • Added Setting to de-/activate automatic detection of local changes for a project, to allow reducing filesystem access
  • Bugfixes:
  • Fixed UI issue when opening an operator chain (e.g. Subprocess operator) with a breakpoint
  • Fixed issues with the data import wizard not closing on completion
  • Studio is now respecting the max memory setting again when started via the .exe on Windows
  • Fixed date-time formatting when starting via the .exe on Windows
  • Fixed some UI scaling issues when starting via the .exe on Windows
  • Error dialog improvements when using AI Hub
  • Fixed some json serialization that was behaving incorrectly
  • Fixed bug in Remember operator that could lead to unexpected modification of the stored IOObject
  • Fixed a bug in Weight by Relief which led to an incorrectly empty weight table when the sample ratio was smaller than one
  • Fixed a bug in SVM (Evolutionary) that occurred for nominal labels, when the hold out set ratio was non-zero.
  • Fixed a bug in SVM (Evolutionary) that occurred when the hold out ratio lead to empty training data.
  • Fixed a bug in the view for kernel models with infinite parameters.
  • Fixed possible NPE in process tools
  • Fixed Google Cloud Services connections using Service account for Google Drive access
  • Added support for OAuth for Salesforce connections
  • Development:
  • Server client now exposes loaded and failed extensions on AI Hub (ServerClient#getLoadedExtensions() and ServerClient#getFailedExtensions())
  • Renamed ServerClient methods listVersionedRepositories and deleteVersionedRepository to listProjects and deleteProject
  • Added ServerClient methods for getProject and createProject to match the naming convention
  • Deprecated and forwarded old ServerClient methods getVersionedRepository and createVersionedRepository to getProject and createProject
  • Exposed json serialization for IOObjects in API of JsonIOObjectEntry

New in RapidMiner Studio 10.0.0 (Nov 8, 2022)

  • Features:
  • RapidMiner Studio now finally uses Java 11 as opposed to Java 8!
  • AI Hub X now also uses Java 11, and as a consequence, RapidMiner Studio X cannot connect to AI Hub 9 or earlier! Both Studio and AI Hub need to be upgraded to version 10!
  • Windows & OS X users will get the updated Java runtime automatically, but Unix users (or anyone using the platform independent release) need to provide Java 11 manually for running Studio.
  • Some extensions might no longer work with Java 11 and require an update, please check the Marketplace for updates.
  • Visualizations: Added ability to sort results when using aggregations in all charts where it makes sense. Sorting can be ascending/descending either on the aggregated result value, or the aggregation column name.
  • Time Series: Added the Windowing Model as a preprocessing model for the Windowing operator.
  • The model can be used to apply the configured windowing operation on any data set (having the same columns) by using Apply Model operator
  • The model can be grouped together with other models using the Group Model operator.
  • Cloud Connectivity: Added Google Drive operators to read, write, delete and loop files, as well as create folders.
  • Connectivity: Added Snowflake as a first-class citizen for database connections
  • Enhancements:
  • Added preprocessing model to Pivot operator
  • Improved High-DPI scaling on Windows
  • The tooltip for date-time entries in the result view now shows the time-stamp in ISO format (including potential nanoseconds)
  • Copy&pasting data from date-time cells is now consistent with what is displayed in the precise tooltip
  • Added setting to disable repository indexing for searching altogether via the Enable repository search indexing setting. This can be used for very large repositories or ones behind a slow network drive or when a virus scanner is involved
  • Time Series: Added the parameter sort time series to all time series operators where an indices column is mandatory or optional
  • If selected the input time series is automatically sorted before the time series operation is applied. The output of original ports will also contain the sorted data set.
  • Time Series: Improved UserError for indices attributes which are not sorted or has non-unique values
  • Bugfixes:
  • Fixed a problem where collections with empty sub-collections might not be readable
  • Fixed problems with empty (sub-) collections not being readable
  • Fixed problems with repeatedly extracting collections because of an incorrectly set timestamp
  • Fixed the storage of the LFS & editable flags in the repositories.xml file for projects
  • Fixed a problem where collections with empty subcollections might not be readable
  • Fixed issue that could cause an error when a better license was installed automatically
  • Fixed creation of new Google Cloud Services connection after the recent Google OAuth flow changes
  • Development:
  • RapidMiner Studio is now running with Java 11, as are all bundled extensions. Starting from this version, all extensions targeting RapidMiner X and beyond must be compatible with Java 11 as well!
  • We upgraded ALL libraries RapidMiner Studio uses to their latest available versions. This includes libraries where the version jump comes with API changes. Please thoroughly test your extensions to ensure they not only run with Java 11, but are also working as expected given all the library upgrades.
  • Added new (Belt-based) expression parser that replaces the old one. The new expression parser comes with an improved API for developers, it can handle the new Belt types and index-based functions.
  • Some index-based functions have already been added to the new expression parser: lead, lag and row_number
  • Date-time functions have been revised
  • Time functions have been added
  • The behavior of the and / or functions for missing values changed
  • Back ported separation of Tools class to extract number formatting methods and make them available without core dependencies.
  • Upgraded JxBrowser to version 7.26 for HTML5-based visualizations. This should not affect extensions, unless they accessed the Browser creation directly.

New in RapidMiner Studio 9.10.11 (Aug 12, 2022)

  • Bugfixes:
  • Fixed an issue that caused an unauthorized rc=401 error when trying to create a new snapshot in a project / when connecting to LFS (this was meant to be included in 9.10.10 already, but accidentally was not)
  • Abnormally large nominal values no longer lead to an error when opening a data table from the repository

New in RapidMiner Studio 9.10.10 (Jul 18, 2022)

  • Enhancements:
  • Improved IOObject creation time by making meta data creation in file system repository and projects rely more on in-memory data
  • Getting remote project updates should no longer fail due to a MERGING_RESOLVED error
  • Added new method to create copy of polynominal mapping
  • Improved performance of Obfuscate and Deobfuscate operators
  • Bugfixes:
  • Fixed an issue that caused an unauthorized rc=401 error when trying o create a new snapshot in a project
  • Fixed an issue that could make storing or retrieving legacy IOObjects in new local repositories and projects extremely slow

New in RapidMiner Studio 9.10.8 (May 3, 2022)

  • Enhancements:
  • Dropbox connections now use TLS1.2 for secure transport
  • Rebranded RapidMiner Studio

New in RapidMiner Studio 9.10.7 (May 3, 2022)

  • Bugfixes:
  • Fixed an issue with storing data in context locations for some java versions
  • Fixed problems with De-Obfuscate operator if an attribute name or attribute value contained special characters or whitespaces
  • Fixed possible deadlock at startup of 9.10.6

New in RapidMiner Studio 9.10.6 (May 3, 2022)

  • Enhancements:
  • Fixed a memory & file leak when using large numbers of repeated JDBC connections
  • Visualizations: Added options to customize Wordcloud word orientations
  • Visualizations: Added Jamaica to the map collection
  • Updated postgres JDBC driver to version 42.3.2
  • Added skip inaccessible parameter for Loop Files to skip inaccessible files/directories, instead of a silent failure. If unchecked, the operator does not loop at all and will throw a proper error.
  • Stopping Loop Files is now always possible in a timely manner, even if you selected a directory with millions of files.
  • Updated H2 DB library due to security advisory
  • Added new parameter fitting error handling to the ARIMA Trainer operator.
  • In case of a fitting error during training, either a proper error is thrown or a fallback Default Forecast Model is provided.
  • Removed meta data warning for number of parameters is too large for the ARIMA Trainer operator.
  • Added new option to Amazon S3 connections that allows for much more flexible authentication schemes, like credential profiles and IAM roles.
  • Bugfixes:
  • Fixed character corruption issue with Read Database and Execute SQL when reading a query via a file from disk on certain operating systems
  • Fixed a memory leak when using database connections
  • Fixed a general file leak when using connections
  • Fixed a problem when creating dynamically suffixed attributes through the AttributeFactory in parallel
  • Fixed side effects for models when executing in parallel
  • Fixed an issue in projects that could sometimes cause Execute Process or Retrieve operators within parallel loops or similar setups to fail with an error message like "Cannot retrieve 'entry', it does not exist"
  • Fixed an issue that could sometimes cause Execute Process operators within parallel loops or similar setups to fail with a error messages like "Cannot connect to the RapidMiner AI Hub repository '_LOCAL'" when running on an AI Hub legacy repository
  • Fixed a wrong error, which was thrown during Apply Forecast when a Multiply operator was used on the Holt-Winters model
  • Fixed calculation errors for Holt-Winters models with additive seasonality

New in RapidMiner Studio 9.10.1 (Oct 25, 2021)

  • Enhancements:
  • Improved potential bias detection by producing less false positives
  • Added further explanations in the bias warning tooltip to help educate users better about why it occurred - and what can be done to mitigate the problem
  • Replaced DBSCAN operator by new version
  • Deprecated Expectation Maximization Clustering operator
  • Improved/minimized operator instantiation for documentation/search, leading to a reduced startup time
  • Bugfixes:
  • Fixed metadata of Apply Model in rare cases
  • Fixed wrong results after applying the Single Rule Induction model in case of a different ordering of the columns
  • Single Rule Induction model can now be stored in the repository
  • Fixed wrong results after applying the Subgroup Discovery model in case of a different ordering of the columns
  • Fixed table capability store/retrieve in signatures
  • Fixed wrong URL when opening the link in project connections when using AI Hub vault injections
  • Time Series: Fixed a bug in Process Windows which caused an Exception for input data which has long gaps and if the parameter "empty window handling" is set to skip
  • Time Series: Fixed a bug in Holt-Winters when the input data contains a section with 0 as values, or if every n.th value in 0 (with n being the period).
  • section with 0 as values will be ignored in the smoothing of the seasonal component in holt-Winters
  • every n.th value is 0 (with n being the period) will result in an UserError for the multiplicative seasonality model

New in RapidMiner Studio 9.10.0 (Aug 16, 2021)

  • Features:
  • Added Function Fitting operator that can optimize parameters in a function of the attributes to fit the label. It can be used to create an optimal function to fit the data points in your data.
  • Bias Awareness: if the use of a specific column is more likely to add unwanted bias to your models, it is highlighted as such. This happens in various places such as in the Statistics view of data, the model simulator, in Turbo Prep, in Auto Model, during model training, in model annotations among others.
  • Enhancements:
  • The De-Normalization operator has a new parameter to also de-normalize predictions.
  • Based on attribute name: prediction(abc) tries to use de-normalization of abc if no explicit de-normalization available
  • The label (or other special attributes) can be included in normalization already in the normalize operator. The changes allow for multiple prediction attributes to be affected
  • Added date format parameter to Write CSV in case format date attributes is selected
  • Improved performance of Append operator
  • Handled yet another case of JDBC drivers ignoring the JDBC standard gracefully (here: Infor Data Lake DatabaseMetaData#getTypeInfo())
  • Introduced operator signatures to improve the startup of Studio
  • Signatures contain meta information that is used in operator registration, global search setup and documentation browser display
  • Signatures are persisted between starts for an improved startup time
  • Signature persistence can be configured or cleared with the setting System -> Local File Cache -> Keep Operator Signatures
  • Time Series: Enabled the usage of constant values for the replace types in the Equalize Numerical Indices and Equalize Time Stamps operators
  • The operators can now be used to fill gaps in non-equal data sets with constant values
  • Time Series: All Time Series operators (except for Multi Horizon Forecast, Multi Horizon Performance) now working with Belt IOTable (as in- and output)
  • Bugfixes:
  • In rare instances, operator parameters did not get saved correctly if a default value was set for it. This e.g. affected date parameters used in extensions.
  • Generate Attributes max and min functions do now always return missing value if any of the values is missing.
  • Fixed missing operator help for Azure Blob Storage and Data Lake Storage operators

New in RapidMiner Studio 9.9.0 (Mar 26, 2021)

  • New Features:
  • Data is the central piece in any RapidMiner process. The way RapidMiner internally deals with data has fundamentally changed in this release with the new Data Core (codename Belt). Its new columnar table representation provides a quantum leap in processing speed and memory efficiency for RapidMiner processes. Multiple operators already use it internally and it becomes fully available now for extension developers to create fast and efficient operators.
  • Added a Set Positive Value operator for the new Data Core which can make nominal attributes binominal or change the positive value of binominal attributes
  • Enhancements:
  • Replaced the Rename by Example Values operator by a new and improved version
  • Replaced the Rename operator by a new one that can additionally handle a renaming dictionary
  • Replaced the Sort operator by one that can sort by multiple attributes (currently already part of the Operator Toolbox extension)
  • Improved the FP-Growth operator so that it only works with explicitly defined positive values (either via binominal attributes or the positive value parameter) for items in dummy coded columns
  • Improved memory consumption of Cross Validation in certain circumstances
  • The operators Read CSV and Read Excel were improved to use the new data core
  • Pivot now supports Least and Mode aggregations for numerical attributes as well
  • Annotate now adds the annotations to the meta data as well
  • Added warning when trying to run a process on an AI Hub with a lower feature version than the current Studio version
  • Added a reason when displaying incompatible extensions in the dialog after startup to show why an extension failed to load. Details available via tooltip.
  • Upgraded integrated Chromium to version 84
  • Improved some metadata transformation w.r.t. nominal value sets
  • The splashscreen no longer shows duplicate extension icons during startup if more than one copy of an extension is installed
  • Visualizations now also support Least and Mode aggregations for numerical attributes
  • Improved concurrent execution in some corner cases
  • Deprecated the Exchange Roles operator
  • Model viewer for Gradient Boosted Tree models now respects the Number format settings in Studio preferences
  • Auto Model uses new clustering algorithms which no longer require one-hot encoding on the data set and therefore reduce the memory footprint for data sets with nominal columns with many values. As a result, users can no longer specify the minimum number of clusters in the X-Means case (automatic determination of the optimal number of clusters). The minimum is now fixed at 2.
  • Time Series: Added the option to ignore invalid values to the Moving Average Filter operator: Invalid values (missing, positive and negative infinity are now ignored when calculating the filtered value
  • This also results in valid values at the beginning and end of the filtered time series
  • As the Classic Decomposition and the Function and Seasonal Component Forecast are based on the Moving Average Filter, the also have now the "ignore invalid values" option
  • Bugfixes:
  • Fixed Data Table reading/writing when LFS light checkout is enabled
  • Fixed a problem where an uncaught exception could go through when using date/time attributes with values in the far future/past
  • Fixed an uncaught exception that could happen when the process run via Execute Process failed, the user opened it via the popup and ran it directly after fixing the problem
  • Fixed wrong attribute weights for Random Forest regression
  • Fixed error in Store operator when used after application of k-Means model
  • Fixed issue that Save dialogs did not accept any selection if a wildcard (.*) filter was provided (e.g. for Write Document)
  • Fixed Pivot meta data column names not matching the real data
  • Fixed missing text for the file restoring confirm dialog in projects
  • Fixed an issue that could cause Studio startup to silently fail
  • Fixed a possible error during startup w.r.t port preconditions on some operators
  • Fixed a bug that could cause project creation to not show an error and appear to do nothing
  • Removed check for preprocessing models in model deployments for custom models. This has been causing certain grouped models to fail if they contained models which have technically been not preprocessing models (e.g. PCA).
  • Time Series: Fixed a bug for the Lag operator, which caused original data to be changed at preceding ports as well
  • Time Series: Fixed some small errors in the description of two tutorial processes for Sliding Window Validation
  • Time Series: Fixed an error, which occurs in time-based windowing, when the end of the last window is equal to the last timestamp in the input data. This effects all windowing operators (Windowing, Process Windows, Forecast Validation, Sliding Window Validation).
  • Cloud Connectivity: File browser now adds the correct path separator character on Windows, and resolves macros properly for AWS, Azure, and Google Cloud file operators

New in RapidMiner Studio 9.8.1 (Dec 4, 2020)

  • New Features:
  • Added new operators to delete data from Azure Cloud:
  • Delete Azure Blob Storage Resource
  • Delete Azure Data Lake Storage Resource
  • Delete Azure Data Lake Storage Gen2 Resource
  • Enhancements:
  • All Loop cloud operators (e.g. Loop Amazon S3, Loop Azure Blob Storage, etc) now only download a file when another operator reads its content. The memory footprint may also decrease by 50%, and unnecessary writes to the disk are avoided.
  • Bugfixes:
  • Continue RapidMiner Studio start if proxy discovery fails
  • Added missing Cluster attribute to metadata when applying a KMeans model via Apply Model
  • Fixed a regression in Generalized Linear Model (GLM) model training. It again accepts weighted training data
  • Auto Model Clustering showed incorrect results, ignoring training data normalization and attribute reordering
  • Fixed AbstractMethodError when using very old JDBC drivers (built for Java 6 and earlier) to connect to SQL databases
  • Fixes inconsistent parameter order and two unused parameter displayed in parameter panel of Loop Google Storage
  • Fixed result view in open source version
  • Time Series: Fixed spelling errors in help texts
  • Time Series: Fixed missing indices attribute in the meta data of Apply Forecast, if a Function and Seasonal Component Forecast model is used
  • Fixed an issue that could cause connection tests to AI Hubs running behind a federated login via KeyCloak to not properly declare credentials as invalid but instead return a weird error message.

New in RapidMiner Studio 9.8.0 (Oct 15, 2020)

  • New Features:
  • Utilize AI Hub 9.8 support for large files in Projects. Files with more than 10MB and stored ExampleSets are automatically handled to be versioned as expected, but stored more efficiently. This is backed by Git LFS, which means Python or R coders can continue to easily work with these projects as long as they have the Git LFS extension installed.
  • Time Series Windowing Update:
  • Added time based (window parameters are specified in time units) and custom windowing (start and stop values of the windows are provided by an additional example set) for all windowing operators (Windowing, Process Windows, Forecast Validation, Sliding Window Validation)
  • Added a few more parameters: expert settings (couples a few expert parameters into not shown, if it is not selected), windows defined (specifies from which point windows are defined), empty window handling
  • Changed the computation of the final model for the Forecast Validation and Sliding Window Validation operators to compute the model on a final window with the same size as the training windows and which ends at the last example of the input series
  • Time Series: Added new aggregation methods (median, maximum, minimum, standard deviation, variance) to Moving Average Filter
  • Cloud Connectivity:
  • Added connectivity to Azure Data Lake Storage Gen2:
  • Read Azure Data Lake Storage Gen2
  • Loop Azure Data Lake Storage Gen2
  • Write Azure Data Lake Storage Gen2
  • Enhancements:
  • H2O:
  • New operator: K-Means (H2O), which implements K-Means clustering using the bundled H2O library. Key features include:
  • Estimate the optimal value of k, when a good initial guess is not available from the user
  • Built-in standardization and nominal encoding
  • Quick and memory efficient execution
  • Note: estimate k is strongly preferential to low k values. Make sure to double check results and if they are in line with expectations.
  • Newly created repositories and projects are now by default stored in the current users "Documents" folder. The location continues to be customizable on repository / project creation
  • When opening a process or RapidMiner file using "Open with..." RapidMiner Studio, the process will be loaded from the repository registered for the path. Process files that are not stored in a repository will be imported just like the menu item "Import Process" would
  • IOObject collections are now stored in a new, zip-based file format, ending with .collection
  • Incorporated a new library to better make use of system proxy settings if "system" is selected in the preferences, especially w.r.t. Windows and WPAD/PAC files. This will drastically improve the experience in complex corporate network setups
  • HTML5 safe mode is now way more performant
  • Upgraded Chromium binaries to version 79
  • Improved error message for remote repository creation (central AI Hub repository and projects) when the authentication is mismatched (user/password vs SSO)
  • Added Settings option to optimize internal file browser for mapped network drives
  • Time Series: Moved Moving Average Filter into the Transformation operator group and removed the obsolete Filter operator group
  • Time Series: Reordered the output ports of the Multi Label Performance and Multi Horizon Performance operators
  • Bugfixes:
  • Fixed wrong metadata after renaming in the new repositories and then creating a new entry with the previous name
  • Fixed rare issues that could cause problems when trying to view Visualizations on certain machines
  • Fixed Mixed Euclidean Distance for nominal values and Nominal Distance
  • A JNA library on the Windows PATH no longer results in an error
  • Fixed issue that could cause charts in the Deployments view to not show up.
  • Fixed problem that caused the legacy smtp password setting in the Preferences dialog to become broken when the dialog was saved more than once after changing the value. Note that this setting is not recommended anymore, use the new Send Mail connection instead.
  • Fixed a similar problem with the legacy connection UI encrypting passwords and tokens multiple times
  • Auto Model Results calculated on AI Hub can now be opened via Results view after the folder with all results has been moved/copied
  • Upgraded bundled JRE to 8u265
  • Deployments keep working now after the Server repository has been renamed
  • Fixed a problem where unsigned extensions could not make use of the new connection objects inside operators
  • Fixed potential IllegalArgumentException in Google Storage operators when running on Server
  • ExampleSets with huge nominal values can be retrieved again from the repository
  • Time Series: Fixed a bug in Equalize Time Stamps which caused an infinite loop in some cases when the calendar time was set to 'domain' and the input data consists of already partwise equidistant time stamps
  • Known issues:
  • H2O K-Means:
  • Apply Model does not work with cluster models produced with the K-Means (H2O) operator
  • Label and ID roles from the input dataset are lost if add_cluster_attribute is set to true

New in RapidMiner Studio 9.7.2 (Aug 5, 2020)

  • Enhancements:
  • Send Mail once again allows multiple comma-separated recipients, and uses UTF-8 encoding again as opposed to UTF-16
  • Added advanced parameter settings to Database connections (JDBC). This allows to conveniently set the fetch size and restrict which catalogues, schemes, tables and/or table types should be available w.r.t. meta data. Setting these parameters can speed up data retrieval significantly.
  • Added a confirm dialog before repository folders get moved to avoid accidents while dragging the mouse
  • Improved loading of meta data in the repository tooltip
  • Bugfixes:
  • The one class LibSVM can now handle labels with more than one value
  • Fixed rare issue where a file browser might trigger a crash on startup
  • Fixed parameters being less tall than they used to be
  • Made submission of Auto Model processes to Server available for older Server versions again (9.3+)
  • Fixed CSV and XML import wizards not releasing file handles in some cases
  • Metadata of Join now matches the actual result
  • Fixed an issue that could sometimes cause the connection to the AI Hub repository or a project to drop after a while and show the error "authentication cancelled by user" when using Enterprise Login
  • Trying to connect to a project that already exists no longer destroys the connections inside the existing project
  • Fixed rare error when storing collections
  • Fixed meta data loading loop in Auto Model and Model Deployment
  • Fixed problem that prevented URLs from being opened on Linux
  • The dates generated by Generate Sales Data are now all at 00:00:00.000 GMT

New in RapidMiner Studio 9.7.1 (Jun 25, 2020)

  • Enhancements:
  • The Welcome Dialog is now immediately shown after startup
  • Join now opens the key attributes dialog on double-click
  • Errors when submitting Auto Model jobs to an AI Hub now show up in the UI instead of claiming everything went well
  • Inside the Schedule process dialog you can now only select projects and AI Hub repositories
  • Improved operator parameter panel to only increase input area when resizing
  • Bugfixes:
  • One Hot Encoding no longer produces only missing values as results
  • Viewing a past version of a file in the Snapshot History or restoring an old Snapshot no longer requires an active login for the project
  • Viewing a past version of a file in the Snapshot History now also works if the encryption context has changed
  • Restoring a past version of a file in the Snapshot History is no longer possible if the encryption has changed
  • Fixed connection test results being removed right after each test when connecting to a project
  • R2 was re-added to the model string representation in H2O Logistic Regression and H2O Generalized Linear Model operators
  • Development:
  • Added method in com.rapidminer.tools.OperatorService to create embedded operators to ensure concurrency context is always present so that you can safely use operators that need parallel execution capabilities.

New in RapidMiner Studio 9.7.0 (Jun 5, 2020)

  • New Features:
  • Added versioned projects which are tied to RapidMiner Server. You can have as many versioned projects as you like, no limits! The versioning is backed by Git and can be accessed by any regular Git clients. This means sharing between Python/R coders and RapidMiner users has never been easier!
  • Added dialog to select which version of a file to keep in case of a conflict in the versioned projects while getting Snapshots from Server.Versioning happens on a project level. As you can now have as many projects as you like, this is the most sensible behavior because most of the time many entries are interconnected in a project. Thus the entire state is saved and can be later restored, without having to worry about dependency versions.
  • Projects support ALL files you may have on your computer! You can put your .py scripts, your .md files, your .png files, your .pdf files, etc all into a project. It will be neatly displayed in RapidMiner Studio.
  • Of course, all those files can be versioned together, so RapidMiner users and Python coders can share the same git repository. The Python coders can even use their native Git client to do so, no magic required. This will make collaboration between RapidMiner users and Python coders easier than ever before!
  • Processes in versioned projects can also be run and scheduled on RapidMiner Server as they can for an existing Server central repository
  • All the files live locally on your computer, but are also shared via Git. This gives you the performance of a local repository when working with it during prototyping, but also allows for easy collaboration with your colleagues.
  • Added new panel "Snapshot History" which allows to browse the history of your versioned projects, as well as see the changes you've made since the latest snapshot. It can also be used to restore an earlier state of the project, view past versions of individual files, and to restore those past versions.
  • ExampleSets are now written to disk in a new file format: HDF5. This is a well-established format used e.g. by the NASA to store large amounts of data. This also means that Python and RapidMiner Studio can exchange data via HDF5 files much more easily and faster than ever before.
  • Local repositories that will be created with RapidMiner Studio 9.7 or later can also take advantage of supporting all files you may have on your computer (.py, .jpeg, .pdf, etc).
  • New operator Target Encoding which can remove nominal attributes with too many values and performs a target encoding (also known as mean encoding) on the remaining attributes
  • Auto Model: some processes (e.g. SVM, FLM, or weight calculations) now use the new Target Encoding instead of one-hot encoding which reduces memory usage and run times
  • Time Series: New operator Integrate to integrate time series with different methods (cumulative sum / left and right riemann sum / trapezoidal rule)
  • Enhancements:
  • It is now possible to have a folder with the same name as a data entry in the repository (might not work for some old repositories)
  • It is now possible to have a process and a data entry with the same name in the repository (might not work for some old repositories)
  • Replaced Send Mail operator with new version which supports file attachments
  • Improved memory usage for Aggregate and Pivot operators for nominal columns with potentially a lot of unused values
  • Improved dealing with whitespaces in repository entry names
  • Improved cleanup of temp files, to reduce disk space clutter when Studio runs for a long time, i.e. in a Server environment
  • Made log tables in Result View behave more like other results, adding more actions and a shortcut to the context menu
  • Process background images are now using a relative path to the image if possible, instead of an absolute path. This only applies for background images set from now on, it does not work retroactively
  • For binominal attributes the Statistics tab shows the positive and the negative value
  • Renamed RapidMiner Server to RapidMiner AI Hub
  • Opening/Moving the Process panel into the foreground when opening a process while in the Design view to make it more obvious something happened
  • Auto Model: remote executions on Server require the central repository as storage location
  • Turbo Prep: only local file based repositories can now be used as temporary repositories for the handover to Auto Model
  • Model Ops: only local repositories or central Server repositories can be used as storage locations for deployed models (also known as "deployment location")
  • Model Ops: keep unused and ID columns in the results after scoring
  • The operators Explain Predictions and Model Simulator now also support grouped models where arbitrary models have been grouped instead of only preprocessing models
  • The operator Explain Predictions now offers a parameter to limit the number of important features also for the "importances" output
  • Both local repositories and versioned projects (tied to RM Server) have been completely rebuilt to get rid of many old limitations. Benefits include:
  • Enhanced throughput and performance
  • Better meta data caching
  • Concurrent access support
  • Displaying all files (no matter what they are, e.g. Python scripts, images, ...)
  • Allowing different file types (e.g. data, processes) and folders to share the same name
  • Note: Your existing local repositories have (Legacy) after their name, indicating they still run on the old technology and still have some of the limitations! If you create a new local repository, it will have (Local) after its name and have all the capabilities listed above. You can copy your data over via Studio from the old repository to a new one to migrate.
  • Time Series:
  • Added options to use padding for Fast Fourier Transformation and calculate the frequency of the amplitude value.
  • Added the option to specify negative lags for the Lag operator
  • Added the option to specify a default lag for a set of attributes (selected by an attribute subset selector) to the Lag operator
  • Unfortunately due to parameter key incompatibilities, old version of the Lag operator is deprecated and new version with the same name, but different operator key is added.
  • H2O:
  • Updated H2O library to version 3.30.0.1.
  • Added monotonicity constraints to Gradient Boosted Trees
  • Added weights port to Deep Learning
  • Expanded whitelist of accepted expert parameters, now supports all parameters provided by H2O
  • Deep Learning and Logistic Regression now work with datasets that have nominal columns with only one value
  • Bugfixes:
  • Fixed an issue that could cause Studio startup to never complete
  • Made Studio startup more rigid to quit process instead of silently hanging on the splash screen forever
  • Fixed issue that could cause panels to sometimes not open if they had been closed previously in this session
  • Fixed an issue that caused CTAs not working when HTML5 safe mode was enabled
  • Fixed an issue with back propagation of changes to performance vectors
  • Fixed a problem for JDBC drivers that do not implement a certain set of functionality by adding a fallback (e.g. SQLite writing)
  • Fixed potential cause for complete UI freeze when interacting with a CTA notification banner
  • Fixed an issue with process navigation and property panel if operator names contain HTML
  • Generate Multi-Label Data does now correctly work in non-regression mode
  • Fixed memory leak caused by the Visualizations
  • Fixed rare issue where data sets could not be downsampled automatically if license limit was exceeded
  • Fixed an issue in Automatic Feature Engineering if all input features have been nominal in the feature selection case
  • Fixed "Edit Access Rights" dialog for Server repositories not getting the permissions correctly when using Enterprise SSO
  • Fixed an issue that caused Studio to lag and increase memory consumption when using the right-click "Insert operator" popup menu in the Process panel.
  • Fixed broken replacing (instead it was duplicated) on move of data entries to a different repository
  • Auto Model: remote executions show new submission screens now which only allows the reset of Auto Model to load the results which avoids problems with multiple remote submissions within the same session
  • Auto Model: reordering the columns in the column selection table no longer lead to graphics problems
  • Time Series: Fixed a bug in Extract Peaks, that causes all "_position" features to have an offset of 1 to the Example number
  • Known issues:
  • One Hot Encoding does not produce the desired results, this will be fixed with the next patch release.
  • Special notes:
  • Columns of type "Integer" that were previously stored as integers are now stored as their double representation. This of course means more range (~53 bit precision), but also means that values are no longer capped. This might have an impact when storing data to disk and rereading it.
  • Columns of type "Date" no longer store the milliseconds due to the new file format. This might have an impact of equality tests and matching when storing data to disk and rereading it.
  • Visualizations that have been created locally for data sets stored in repositories will not be found anymore after the update, causing the result visualization to reset to its default. If you have set up complex visualizations that you absolutely want to restore, you can follow these steps:
  • Open the data set in the Results view of RapidMiner Studio.
  • Navigate on your disk via your filesystem explorer into the "USER_HOME/.RapidMiner/internal cache/content mapper" folder. There you can find a folder structure matching your repository names and structure.
  • Find the exact path to the data set (e.g. "C:/Users/xyz/.RapidMiner/internal cache/content mapper/Local Repository/Charts/Demo/12. Pie")
  • You should see a very similar path right next to it, either ending in ".ioo" or ".rmhdf5table" (e.g. "C:/Users/xyz/.RapidMiner/internal cache/content mapper/Local Repository/Charts/Demo/12. Pie.ioo")
  • Go into the folder from step 3 (the one without the .ioo ending), and copy the "pc.json" file from it to the folder from step 4 (the one with the .ioo ending)
  • Close the data set in the Results view
  • Open it again. It should now have its configuration back!
  • Development:
  • The introduction of versioned projects (backed by Git) have forced a major redesign of the Repository API. Up until 9.7, a RepositoryLocation was represented by a string like "//RepositoryName/folder/test" and "test" was guaranteed to be unique. It was either a folder, a process, an ioobject (data) object, or a blob. This is no longer the case!
  • Since collaboration with Git can introduce naming conflicts which are not actually file-level conflicts (so Git is fine with them), we had to allow these "non-conflicts" into the Repository world as well.
  • Now a repository location that ends with "test" as the last path element can either depict a folder (RepositoryLocationType#FOLDER), or data (RepositoryLocationType#DATA_ENTRY). Sometimes this is unknown, which is also fine: RepositoryLocationType#UNKNOWN can be used in that case. However, it does not stop there. Since for Git, "test.rmp" and "test.ioo" are also perfectly fine, we had to go one step further and also allow that. Therefore, a RepositoryLocation now also has an expected DataEntry (sub-)type which is used to determine what specific type of a DataEntry to locate (a ProcessEntry, an IOObjectEntry, a ConnectionEntry, or a BinaryEntry).
  • You can even end up in the undesirable situation of having a "test.ioo" and a "test.rmhdf5table" (both IOObjects) in the same location. Because we cannot determine which IOObject a process should potentially use, these situations must be rectified by the user - the Retrieve operator will throw an error in that case! Looking at the data and renaming one of the entries will work fine, though. This scenario can only happen after a Git pull with the new versioned projects.
  • In other words, "test" can in our example now be a folder, a process, a data ioobject, a connection entry, or a binary entry. And they can all exist at the very same time in the very same folder. So be sure to specify in the new RepositoryLocationBuilder what exactly you want from the repository, or you may end up getting the first name match it finds, which may be of an unexpected type.
  • Repositories now distinguish between data and folders, and even between different data subtypes (process, ioobject, connection, binary entry) which means you can have a folder called "A" and e.g. a process called "A" at the same time. This has implications for a large number of APIs, most notably:
  • com.rapidminer.repository.Repository interface:
  • locateFolder(String) and locateData(String, Class) have been added and can be implemented, their default implementation points to the RepositoryManager()#locateFolder(String) and locateData(String, Class

New in RapidMiner Studio 9.6.0 (Jun 5, 2020)

  • New Features:
  • Added buttons for copying/pasting the active process to the process toolbar.
  • Equalize Time Series:
  • Added two new operators (Equalize Numerical Indices and Equalize Time Stamps) which provide the functionality to equalize input time series. The output time series will have new equidistant index values. The operators provide different possibilities to configure the number of examples, the start value and the stop value and the step size of the new index values. The corresponding values of the output time series are computed by using a Replace Missing Values (Series) operation.
  • Equalize Numerical Indices: Equalize numerical indices into equidistant numerical indices with a numerical step size.
  • Equalize Time Stamps: Equalize date-time indices into equidistant date-time indices. Either with an exact duration (with millisecond precision) as the step size, or with a period (multiple of days, weeks, months or years) as the step size.
  • Peak Transformations:
  • Added two new operators (Z-Score Peak Transformation and Highest Peak Transformation) which perform a peak detection and transformation on time series. They detect peaks in a time series and add an indicator peak series (with the values -1,0,1 as peak flag values) and a peaked series (original values if a peak was detected, missing for non-peak areas).
  • Z-Score Peak Transformation: performs the peak detection by calculating the local mean and standard deviation and identifies values as peaks when they have a large deviation to this local mean
  • Highest Peak Transformation: performs the peak detection by dividing the time series in different areas and checking if local minima and maxima are valid peaks or only noise effects.
  • Peak Feature Extraction:
  • New operator Extract Peaks which performs a peak detection (by utilizing one of the new Peak Transformation operators and extracts features describing the peaks)
  • Added optional custom endpoint parameter to Amazon S3 connections. This enables you to use an S3 API compatible storage service other than Amazon S3.
  • Deployments / Model Ops:
  • All custom prediction models are now supported in model ops, i.e. models created with the Design view, in addition to Auto Model models
  • Grouped models are now supported as well which allows combinations of preprocessing models with a prediction model
  • Model Simulator in Deployments now uses raw data columns as input and performs data prep on the fly
  • Offer setting if scores should be explained (about 100x faster without), new deployments will have this disabled per default, existing deployments enabled
  • Show if scores should be explained in overview table
  • Model Ops initialization happens in background now – no longer blocking UI start of RM if a remote location is not available (anymore)
  • Some speed improvements for model ops (less objects are loaded from repos which makes things a bit faster for remote deployments
  • Model Simulator operator now also supports grouped models
  • Enhancements:
  • Connections to external data sources like Cassandra or MongoDB are now properly re-used (within reason) and closed when a process is finished. This should lead to less connections to an external data source when using loop constructs, as well as properly closed connections after a process if finished.
  • Windows and OS X builds now ship with OpenJDK (version 8u232)
  • Added new timezone parameter to JDBC connections. Note: date handling in databases (and generally) is a tricky subject, and there are quite a few ways to make mistakes while doing so. Some databases/JDBC drivers also don't implement date handling properly. Last but not least, keep in mind that a date_time/date is a fixed point in time, but when it is displayed in a more human readable format than "milliseconds since 01-01-1970 UTC", the display string is converting that instant to your display timezone. So even if for example a date is 13th of Jan in UTC, you may see 12th of Jan when viewing it in Australia, due to the display timezone offset. The actual point in time (milliseconds since 01-01-1970 UTC) however would be identical. See documentation for further information.
  • When parsing a string to time with Nominal to Date, the associated timestamp now represents that time on the 1st of January, 1970 instead of 1st of February 1970
  • Added Default User-Agent setting to Preferences / System
  • Updated MariaDB JDBC driver
  • You can now see which Java version is being used when looking at the "About" dialog
  • Improved meta data warning in case the time series attribute selection of time series operators is empty
  • Added option to autodetect S3 region in Amazon S3 connections
  • Improved Google Cloud Services connection UI
  • File chooser icons on OS X are now also supporting HiDPI
  • When removing a repository, the repository.xml file now gets updated immediately
  • Visualizations: Tick interval input field now allows to set much larger values for datetime axes as its using milliseconds as a unit to split the chunks
  • Updated the Step by Step In-Product Tutorial content
  • Added more search tags to various performance and aggregation operators
  • Improved error message when download/deserialization of data from a remote repository occurs
  • Improved error message when SSL certificate was invalid when attempting to connect to a RM Server repository.
  • Improved logging when trying to connect to a RM Server and unusual exceptions occur, e.g. more details about why SSL connection failed, what the network problem is, etc.
  • Bugfixes:
  • Fixed issue that could cause Studio to stop starting and be stuck at the splash screen forever.
  • Fixed an issue where storing datasets in a database using the automatically created primary key was not possible.
  • Declare Missing Value no longer crashes if the expression mode is selected and the expression itself returns a missing value. Instead, it will evaluate to false and thus NOT set a missing value for that row.
  • Fixed models and other IOObjects coming from extensions not being identified correctly in Server repositories.
  • Fixed Auto Model not being able to use results of a Join operator in some cases.
  • Fixed broken properties when storing data tables in rare cases.
  • It is no longer possible to create RapidMiner Server repositories with an invalid name.
  • Filter Examples now correctly resolves all macros in parameters, including in custom filter attribute names.
  • Fixed error that could sometimes cause result tables not being able to move to Auto Model via the button in the Results tab.
  • Fixed an issue that caused Visualizations to not appear on certain Linux systems.
  • Fixed file chooser icons on OS X.
  • Fixed bug for scoring in Deployments: if column types are incompatible, they are actually dropped now (which was documented as such but did not happen)
  • Auto Model will now be restored if the user cancels a deployment by closing the deployment dialog
  • Other:
  • It is no longer possible to create legacy connections and other connections which have been replaced with the new repository connection objects in RapidMiner 9.3. Existing connections can still be edited and used, but this functionality will be removed eventually as well. Make sure to migrate existing legacy connections to repository connection objects! See documentation for reference.
  • Development:
  • Added caching for connections based on ConnectionAdapterHandler to reduce connection count and give possibility to clean connections up after it is no longer needed (e.g. the process is finished).
  • GlobalSearch is no longer available in headless mode (aka command line, job container execution, etc)

New in RapidMiner Studio 9.5.1 (Nov 20, 2019)

  • Enhancements:
  • The expression parser (used for example in Generate Attributes) can now use real columns in addition to integer columns for the following functions:
  • date_add
  • date_set
  • rand
  • binominal
  • Note that for obvious reasons it will only use the integer portion of the real value in this case.
  • Operator problems are now sorted by severity
  • Attribute type icons are now also shown in the metadata UI
  • Bugfixes:
  • Attributes of unknown type (attribute_value) are no longer shown as date_time attributes in some UI elements
  • Visualizations: The plot type selection popup and the other style configuration UI popups should now open on the correct screen in a multi-monitor setup on OS X
  • Visualizations: Fixed order of aggregated values for a nominal group-by column in Line/Area/Bar/Column/Heatmap plots
  • Development:
  • Development of extensions just got easier: If you have a Developer license, you can activate a new checkbox in the RapidMiner Studio settings: Grant development permissions to unsigned extensions in the Start-up section. This setting, once enabled and a restart of Studio has happened, will grant unsigned extensions all permissions as the Studio code itself has. Note: With great power comes great responsibility, so be sure that you don't have untrusted extensions loaded while this setting is active, as they would no longer be constrained in any way.

New in RapidMiner Studio 9.5 (Nov 6, 2019)

  • Upgrade RapidMiner Studio independently from Server:
  • Long awaited and finally here: Connect to and access data and processes on older Server versions (9.0 and above) with any current or future Studio version! The latest Studio release can also verify executability of processes stored on Server.
  • Enhancements and bug fixes:
  • The following pages describe the enhancements and bug fixes in RapidMiner Studio 9.5 releases

New in RapidMiner Studio 9.4.1 (Sep 30, 2019)

  • New Automated Model Ops
  • Follow the fully automated data science path: prepare your data using Turbo Prep, create prediction models via Auto Model and finally put them into production with Model Ops.
  • Deploy the most promising models with one click and score new data via flexible web services or in the UI.
  • Track model performance on an intuitive dashboard and swap easily to the best performing one. Setup an email alert to get notified if a model outperforms the one in production.
  • Evaluate each model with respect to their financial impact instead of pure Data Science metrics.
  • Detect changes in data and their impact on model performance early to address problems.
  • Use our integrated dashboard to keep track of data drift and model performance.
  • New map visualizations:
  • Visualize geospatial data with the new map visualizations. You can choose from multiple map types with many different configuration options, as well as dozens of maps for geographic regions, continents, and countries. Available map types:
  • Choropleth maps: Used to display numeric values associated to regions (e.g. a country or a state) via a color gradient
  • Categorical maps: Used to visualize regions that belong to a number of distinct categories
  • Point maps: These maps offer latitude and longitude support and display a marker for each coordinate on the selected map
  • New charts:
  • Three new chart types have been added in addition to some tweaks and fixes to the existing charts:
  • Sunburst chart
  • Chord diagram
  • Parliament chart
  • Improved Auto Model:
  • Auto Model features several improvements under the hood as well as a few more visible enhancements:
  • All predictive processes generated by Auto Model are now much cleaner, well-structured, and can be understood way easier.
  • Cost-sensitive learning has been added to show the costs / benefits in the validation result. This allows to solve problems (e.g. fraud detection) that involve highly imbalanced data sets (e.g. credit card transaction data).
  • New data prep and modeling capabilities:
  • Several new operators have been added to ease and enhance data preparation and machine learning:
  • New operators Replace All Missings, Handle Unknown Values, One Hot Encoding and Append (Robust) to easily prepare data for modeling and scoring.
  • New operator Rescale Confidences (Logistic) to rescale confidences even for classification with more than two classes.
  • New operator Cost-Sensitive Scoring: Novel approach for cost-sensitive learning which works for more than two classes.
  • New operators Multi Label Modeling and Multi Label Performance to train and validate a combined model for multiple label columns in a single step.
  • Enhanced time series forecasting:
  • New operators have been added for:
  • Forecasting multiple horizons of a time series with any machine learning model (Multi Horizon Forecast)
  • Validating performance of multi horizon forecasts (Multi Horizon Performance)
  • Sliding window validation for time series data science problems
  • Enhanced data source connection framework:
  • All RapidMiner-supported connectivity extensions on the Marketplace now use the new data source connection framework, which includes handling connections to
  • MongoDB
  • Cassandra
  • Splunk
  • Solr
  • Mozenda

New in RapidMiner Studio 9.1.0 (Dec 14, 2018)

  • New Features:
  • The Aggregate Operator got the percentile function where the percentile can be changed in the aggregation attributes functions list. It is possible to use an integer like 75 or a floating point value like 80.5 here. It is of course also possible to use a macro here.
  • SSL certificates stored in .RapidMiner/cacert are now trusted on startup. See trust-certificates for more information.
  • Added support to open operator tutorial processes directly from the web.
  • Split the setting to keep operators connected upon disabling or deleting them into these settings:
  • Drop or bridge operator connections upon deletion
  • Drop, bridge or keep connections upon disabling
  • Enhancements:
  • The "Import Data" dialog for CSV files will try to guess the best matching date format and preselect date for attributes that contain mostly matching date entries
  • The "Import Data" dialog for Excel files does now differentiate between date, time and datetime columns specified in Excel
  • Improved CSV import wizard to use the structure found in the header or starting row
  • Parse Numbers and the Data Import wizards now support exponents in numbers with a leading '+' for positive exponents, e.g. "5.9876E+7"
  • Improved Cross Validation error handling when the Performance port is not connected
  • The XML Panel does no longer hide default values
  • Split thread settings in foreground and background threads (for the currently opened process and processes running in the background, respectively)
  • Updated bundled Java for Windows and OS X to version 8u181. This should fix right-click issues on OS X
  • Added support for aggregation functions for Pivot operator and improved performance
  • When moving operators in the Process view, connected operators will be rearranged and moved to the right if necessary
  • Bugfixes:
  • For large ExampleSets with more than ~71.5 million rows, the result table will compress the height of each row a bit to accomodate. Data sets with more than ~86 million rows will only display the first ~86 million rows and show a warning that the rest is cut off.
  • Fixed an issue that could cause Studio to be stuck for up to ~2 minutes on start-up.
  • Fixed very rare process error when working with attribute weights.
  • X-Means item count of cluster model will now show the correct size.
  • Fixed an issue where (temporary) Access files could not be deleted in a RapidMiner process.
  • Development:
  • Added registerLanguage method to the I18N class, which allows to add new languages to the Settings->Preferences->Language selection. The i18n is picked up by providing resource bundles in the usual form of for example GUI_ja.properties and Error_ja.properties. If you want to get a list of not-yet-translated keys, add a file called translation_help.txt in your .RapidMiner folder. After you shut down Studio with your new language selected, it will write all keys for which it did not find the translation in it. This should help you identify keys that you still need to translate.
  • Added the OperatorPortActionRegistry to add actions to operator ports.
  • Added identifier for last delivering port to the IOObject's userdata via IOObject.getUserData(DeliveringPortManager.LAST_DELIVERING_PORT)
  • Added support for parameter dependencies and hidden state to the settings dialog.

New in RapidMiner Studio 9.0.3 (Oct 4, 2018)

  • Enhancements:
  • The Windows installer now automatically goes to the Finish page after all files have been copied instead of waiting for the user to click "Next".
  • Bugfixes:
  • Performance (Ranking) and Performance (Costs) no longer report a wrong micro average when used inside a Cross Validation.
  • Stacking does now work inside other Ensamble operators.
  • Fixed Outlier Detection in Auto Model.
  • Bugfix for some Time Series operators (notably Process Windows) which did not reset the data from the input port after the operators were executed.
  • Bugfix for Windowing, in case the attribute selection results in no attributes selected.
  • Fixed location of the some dialogs in multi screen setups.

New in RapidMiner Studio 9.0.2 (Sep 5, 2018)

  • Enhancements:
  • Improved user interface responsiveness when running processes
  • Changed progress bar progression for updates to better reflect the actual update process when downloading from the Marketplace
  • Improved default parameters for K-Means, K-Means (fast), K-Means (kernel), X-Means, K-nn, Parallel Decision Tree, ID3, CHAID, Parallel Random Forest, Gradient Boosted Trees, Neural Net and Join to better reflect commonly used values
  • Fixed a problem where entering the wrong credentials to connect ot a remote repository could take a long time to ask for new credentials
  • Improved password input dialog to show that the credentials were invalid or something else went wrong
  • Improved configuration of remote repository when editing the repository
  • Viewing Collection results no longer display lots of wasted space on the left side
  • Restore Default View is now only available for the Design and Result view
  • Enabled "Show location of current Process" for Training Resources
  • Bug fixes
  • Fixed a bug in which the process panel becomes invisible.
  • Fixed a bug where the process panel was displayed only partially
  • Fixed possible crash on startup on Windows
  • Date to Numerical does now produce Real attribute instead of Integer to prevent truncation
  • Fixed behavior of the Unify Item Sets
  • Fixed bug in Join when using date-time attributes as key
  • Fixed bug with K-Means (fast) causing Determine good start values parameter to be ignored
  • Fixed bug if a process is opened from a file system path that contains more than 150 characters
  • Fixed an issue that prevented Studio from starting
  • Fixed process background image location when zooming in
  • Fixed potential infinity loop with K-Means and X-Means if Determine good start values was used.
  • Fixed a bug in time series operators regarding parameter misbehavior

New in RapidMiner Studio 9.0.1 (Aug 14, 2018)

  • Enhancements:
  • It is no longer possible to close or hide panels in RapidMiner Studio by accidentially pressing certain obscure key-commands. Panel manipulation can now be solely done via right-click on the panel header. Note that you can still press Ctrl-W to close result tabs.
  • Optimized educational & community repository to remove UI freezes.
  • When an operator has no parameters, that information is now displayed in the Parameters panel instead of just showing a completely empty panel.
  • Handle view switch errors more gracefully.
  • Bug fixes:
  • Added runtime check for Loop operators to require at least one iteration
  • Fixed roles bug in X-Means
  • The Configure RapidMiner Server Repository Check connection settings function does no longer give false-positive results, in case valid RapidMiner Server credentials exist in the Password Manager.
  • Fixed potential UI freezes when switching views due to breakpoints etc.
  • Fixed sometimes missing notification in the event of upload errors to RM Server
  • Fixed icon on operators depicting hidden notes when zoomed

New in RapidMiner Studio 9.0.0 (Aug 8, 2018)

  • NEW FEATURES:
  • Added TurboPrep, your interactive data preparation in a data-centric UI
  • Added new Time Series functionality
  • Added support for Google Cloud Storage with Read Google Storage, Write Google Storage, and Loop Google Storage operators. They work similar to their existing Amazon S3 and Azure Blob Storage counterparts.
  • Added new online repositories which contain up-to-date help content. These contents are used by our online educational materials.
  • Added concatenation function to Generate Aggregation
  • Added a new "admin configuration" feature (documentation here):
  • Operator Blacklisting
  • Extension Whitelisting
  • Telemetry
  • Studio Settings
  • ENHANCEMENTS:
  • Global Search results can now be navigated by keyboard
  • Operators can now be renamed by double-clicking on their name (indicated by a text cursor)
  • Improved operator renaming visuals when zoomed in/out of the process
  • Process panel in Design view can no longer be closed
  • Updated behavior for Result History panel outside of Result view
  • Uncloseable panels no longer have close buttons
  • Updated import wizards for Read CSV and Read Excel operators to make them consistent with the Add Data repository action
  • Added Remove All Breakpoints entry to Edit menu and right click context menus
  • A warning is shown for correlation matrices that could not be calculated
  • Improved the guessing for type of Quotes during CSV import
  • Improved the guessing on decimal separator in CSV import
  • Twitter operators now correctly warn about the rate limit when it is exceeded instead of throwing a generic error
  • Hyperlinks in process notes are now clickable and open the default browser
  • Repository actions that need write access are now grayed out when a read-only entry is selected
  • Inserting an operator via Global Search will now correctly grant focus to the Process panel, so you can immediately use the keyboard to manipulate the operator
  • Added workaround for a bug in the Amazon Redshift JDBC driver so that it can be used now
  • Saving a process in a read-only repository now offers the SaveAs dialog instead
  • Repository location chooser (for opening and for saving) no longer sometimes appears as a separate instance of RM Studio in the operating system taskbar
  • BUG FIXES:
  • Clicking on a selected operator no longer sometimes selects an operator behind it
  • Fixed process panel sometimes being opened in other views
  • Fixed an issue where icons did not show up on Retina displays
  • Updated vulnerable libraries
  • Fixed potential UI freeze during the Import Data process
  • A rare error concerning parallel loops in combination with Generate Attributes was fixed
  • Fixed an issue that RapidMiner Studio always started in fullscreen mode on Mac OS X
  • Fixed results view not showing the latest result as the active tab
  • DEVELOPMENT:
  • Added callback hook for DataImportWizardBuilder. The callback can be used to determine by the caller what should happen after the user has concluded the data import.

New in RapidMiner Studio 8.2.1 (Jul 6, 2018)

  • New Features:
  • Added possibility to disconnect from RapidMiner Server repositories
  • Enhancements:
  • Edit Access Rights dialog is now read-only if the user does not have enough permissions to make changes
  • The Generate Weight Stratification does now warn about mismatching data
  • Updated tutorial process for Loop Attributes
  • Bug fixes:
  • Fixed broken preview when using the Guess value types or Reload data buttons in the Import Configuration Wizard of the Read Excel and Read CSV operators, after manually changing the attribute selection or an attribute role.
  • Fixed a metadata problem with the Singular Value Decomposition operator showing the wrong type of preprocessing model.
  • Fixed a bug causing Aggregate to concatenate the same value multiple times even though only distinct was set.
  • It is no longer possible to toggle breakpoints if Process panel is not visible.
  • Write CSV is no longer writing Integer values as floating points.
  • Updated mode aggregation function of Aggregrate to take missing values into account.
  • Remember can now be used in every iteration of a parallel operator, instead of only the last. No execution order is guaranteed.
  • The New Revision server repository action does no longer block the UI.
  • Fixed bug preventing SVM Kernel Scatter Plot from displaying certain variables.
  • The macro command line argument -M does now work as expected when passed to the rapidminer-batch.bat launcher.
  • Fixed rare bug that could occur when looking at a subprocess of a parallel operator while zoomed out and trying to run the process.
  • Fixed pass through port of the Correlation Matrix operator (returned a subset of the input for some data sets).
  • Fixed missing visual indicator in the top bar for the currently selected view when resizing RM Studio horizontally.
  • Fixed spelling error in Direct Marketing template.
  • Fixed spelling error for mikro/makro.
  • Fixed a problem using undo/redo during a tutorial.
  • Fixed a rare bug that might occur on restoring a process on startup.
  • Fixed uncommon bug where Views will break when switching too fast between them.
  • Fixed bug making Apply Threshold use the wrong mapping.

New in RapidMiner Studio 8.2.0 (May 10, 2018)

  • Enhancements:
  • Double-click on an unconnected operator port will connect it to a matching output port of the process.
  • The menu View -> Show Panel is now scrollable.
  • Updated visualization of tutorial's next button to go to next tutorial or back to tutorial overview when reaching the end of a tutorial or a chapter respectively.
  • Removed search button from search bar and changed result dialog to open with one-click logic.
  • Creating a RapidMiner Server repository no longer stores the credentials automatically. However, if desired you can still do so by selecting the "Remember Password" checkbox when creating the repository.
  • Panels now always have proper tooltips.
  • Improved visualization of nested Operators.
  • Added primary parameter mechanic to some Operators; double clicking an Operator now opens the editor of a primary parameter. This also works for operators that have subprocesses. In that case, pressing the Alt-key while double-clicking activates the primary parameter.
  • Quickfixes now can be directly accessed after a process run fails from the error bubble.
  • Improved performance of FP Growth and added support for additional input formats.
  • The status bar (found at the very bottom of RM Studio) now more precisely displays possible actions when editing a process.
  • Pressing the arrow keys in the process panel when no operators are selected will now select the first operator.
  • Bug fixes:
  • Parallel operators now produce identical results when running in parallel and when running sequentially
  • Removed several sources for redundant undo steps
  • Fixed a bug that could lead to incomplete output of Execute Program
  • Fixed and improved on generic process runtime errors
  • Fixed erratic behaviour of EMClusterer
  • Date to Nominal does no longer remove the role of the selected attribute
  • Fixed a bug where results from Data to Similarity Data could not be processed further
  • Fixed an issue that could result in the "Drag here" annotation being shown in the process all the time when using the Global Search
  • Fixed a bug that allowed operators to connect to themselves
  • Fixed Web Analytics template

New in RapidMiner Studio 8.1.3 (Apr 20, 2018)

  • Enhancements:
  • Added feedback form to final step of each tutorial to help us improve the tutorials.
  • Added feedback form to operator help panel to help us improve the documentation.
  • Scrolling when multiple scrollable areas are nested within each other now works as expected.
  • Creating a RapidMiner Server repository no longer stores the credentials automatically. However, if desired you can still do so by selecting the "Remember Password" checkbox when creating the repository.
  • Bug fixes:
  • Fixed concurrent access of repository entries from RM Server (e.g. inside a parallel Loop operator).
  • The choice to downsample data if the data has more rows than the current license permits now works correctly inside parallel operators.
  • Typing attribute names in a field which has auto-completion no longer causes the field to lose focus mid-typing.
  • Fixed a bug with Aggregate that could crash a process.
  • Fixed a very rare bug that crashed the process when using the old and no longer supported Tree panel.
  • Fixed a problem where one could accidentally create connection loops when dragging an operator.
  • Fixed problem with Mac OSX fullscreen mode.
  • Fixed refresh issue in the Repository Panel.
  • Fixed an issue that could cause endless dialogs asking for login credentials when connecting to RM Server.
  • Development:
  • Fixed a crash that occurred when using NominalToNumericModel programmatically.

New in RapidMiner Studio 8.1.1 (Mar 8, 2018)

  • Enhancements:
  • Deleting a RapidMiner Server repository now removes its credentials from the wallet.
  • Whitespaces at the beginning or end of attribute names are now automatically discarded for newly imported data to prevent invalid attribute names.
  • Bug fixes:
  • Fixed bug causing multiple operators to reject attribute names with leading or trailing whitespaces
  • Fixed bug causing failure to load templates

New in RapidMiner Studio 8.1.0 (Feb 6, 2018)

  • Model Wizard and Explorer:
  • A new working mode for rapid creation, comparison, and exploration of new models. The Modeling Wizard will save you a lot of time in creating processes for multiple models.
  • Global search:
  • Find anything within your repository and the operator list using a central search engine: processes, models, operators, extensions… even your past actions! No need to search through all our folder structure any more: everything is now at hand!
  • Security:
  • User passwords are now hidden and replaced by stars after typed.
  • Passwords are now kept encrypted in the .RapidMiner user folder.
  • Improved performance:
  • We have re-factored a few operators, including Join, Correlation Matrix and K-Means to drastically improve performance, with up to x10 increases in speed.
  • New Features:
  • Added Auto Model feature, a new working mode for rapid creation, comparison, and exploration of new models. It can be found as a new view at the top.
  • Added a powerful global search functionality which can be found in the top-right corner and activated via Ctrl+F shortcut. You can currently search for operators, repository contents, UI actions, and Marketplace content. See the documentation for more information if you are interested in more complex and powerful search queries (e.g. finding data/models that contain a specific attribute, or were last modified before a certain date, etc).
  • Enhancements:
  • New Process Templates upgraded to use the latest operator versions.
  • Read Excel now allows sheet selection by name.
  • Read CSV, Read XML and Read Excel have a new expert parameter read all values as polynominal, which allows the user to disable type guessing.
  • Hide passwords in the Password Manager dialog and store them with a stronger encryption.
  • Seach Twitter and Get Twitter User Statuses added support for 280-character tweets.
  • All Twitter operators moved from numerical to nominal attributes for user and status IDs.
  • Made the Views display at the top more dynamic on resizing to prevent squashed GUI elements for low(er) resolutions and to show more views for high(er) resolutions. To achieve this, both the Undo and Redo buttons for process editing were removed. You can still undo/redo via the top Edit menu, or by pressing Ctrl+Z/Ctrl+Y, or even via the new global search by searching for Undo or Redo.
  • Bug fixes:
  • Secured XML parsing against XXE vulnerability
  • Fixed a rare error when logging inside parallel operators
  • Fixed problem that caused Parse Numbers to fail if input was an empty value
  • Fixed a rare error when running Join, Replace Missing Values, or Add inside a parallel loop
  • Fixed handling of polynominal attributes in Apply Model when applying a Cluster Model
  • Updated Regularized/Linear/Quadratic Discriminant Analysis to avoid uncaught errors and give more information if an error occurs
  • Fixed uncaught Runtime Exception when using Loop Parameters and Optimize Parameters (Grid) with log_all_criteria
  • Fixed issues with duplicated or missing entries, as well as missing groups in the Manage Connections dialog
  • Refreshing folders in a RapidMiner Server repository no longer blocks the entire Studio interface
  • Renaming entries in a RapidMiner Server repository no longer blocks the entire Studio interface
  • Pressing Ctrl-A in an empty process no longer makes the process parameters disappear
  • Hotkeys for view switches now work properly from all views
  • Upgraded MSSQL JDBC driver to version 4.2
  • Upgraded PostGreSQL JDBC driver to version 42.2.1
  • Development:
  • The Global Search feature is highly flexible and open to extensions - look at com.rapidminer.search.GlobalSearchable and com.rapidminer.gui.search.GlobalSearchableGUIProvider to get started!
  • Unsigned 3rd party extensions can now call ParameterService#setParameterValue(String, String) without causing a SecurityException
  • Please note: We have accumulated lots of outdated code over the years. Anything that is annotated with @Deprecated will be removed at some point in the future. Removal will start with RapidMiner Studio 9.0, so please prepare your extensions by not using any deprecated code anymore. JavaDoc will help guide you to replacement classes/interfaces/methods.

New in RapidMiner Studio 8.0.1 (Jan 15, 2018)

  • Bug fixes:
  • "Manage Database Connections" dialog is no longer shy and will appear again if requested.

New in RapidMiner Studio 8.0.0 (Dec 4, 2017)

  • New features:
  • New parallel Loop Parameters operator which can make use of all your CPU cores.
  • New parallel Optimize Parameters (Grid) operator which can make use of all your CPU cores.
  • Enhancements:
  • Searching for operators now automatically ignores simple spelling mistakes.
  • Decision Tree and Random Forest can now handle numerical labels and solve regression problems.
  • For Random Forest there is a new option to select splits of numerical attributes randomly.
  • Decision Tree and Random Forest now provide a new port that outputs feature weights.
  • Connections of removed operators can now be kept if feasible.
  • Parallel loops have a new log value iteration_number that is independent of parallel execution. This should be used instead of applycount.
  • Improved warnings for wrong input data types for Loop Attributes and Loop Values.
  • More informative error messages for Seemingly Unrelated Regression.
  • Improved documentation for the following operators: Naive Bayes, Normalize, k-Means, k-NN, Join, Performance (Binominal Classification), and Seemingly Unrelated Regression.
  • Bug fixes:
  • When using parallel operators inside other parallel operators (e.g. Cross Validation inside Loop), your CPU cores are now fully utilized instead of most of them sitting idle. This amounts to a drastic performance increase in those situations.
  • Fixed a bug that caused Connections stored on RM Server via the Studio UI to not work for processes executed on RapidMiner Server. If this affects you, please open Manage Connections and press Save all changes.
  • Fixed median computation for Aggregate in edge cases.
  • Fixed a bug in cluster model results when clicking on elements in the Folder view.
  • Fixed a bug in the visualization of slider elements in the UI when used with certain min/max values.
  • Fixed Churn template input size.
  • Fixed problem when exiting RapidMiner Studio while an error bubble was shown.

New in RapidMiner Studio 7.6.1 (Sep 6, 2017)

  • Enhancements:
  • Random Forest results are now reproducible between runs
  • Bug fixes:
  • Fixed "Invalid ice_root" error if the Windows username contains whitespaces when running Logistic Regression, Generalized Linear Model, Gradient Boosted Trees or Deep Learning
  • Advanced properties of database connections now support special characters
  • Fixed an issue when storing in a repository inside parallel loops
  • Fixed Support Vector Machine (LibSVM) crashing when running in one-class mode
  • Advanced settings for Oracle database connections are no longer ignored
  • RapidMiner Studio no longer freezes, if a RapidMiner Server database connection has no password
  • Fixed freeze in case the EULA is declined
  • Developers:
  • ParameterTypeEnumeration#getXML properly returns the default value

New in RapidMiner Studio 7.6.0 (Sep 6, 2017)

  • New features:
  • Sending notification emails can now be configured in the preferences to make use of all modern connection security and authentication mechanisms like TLS 1.2 + PFS
  • Enhancements:
  • The sender of notification emails can now be configured in the preferences
  • Licenses are now valid for the full last day until midnight
  • Improved handling of infeasible parameter values for Self-Organizing Map
  • Changed default sampling type parameter for Validation operators to automatic
  • Write Message now has a parameter option to append to existing files instead of overwriting them
  • Logistic Regression and Generalized Linear Model learners now have a threshold output where they deliver a threshold value optimized for maximal F-measure
  • Improved handling of missing and infinite values for Normalize
  • Improved handling of missing or broken compatibility numbers in the process xml
  • Made behavior of add as label parameter consistent for all cluster operators
  • Improved checks for empty example sets in cluster operators
  • Improved shown capabilities for cluster operators and added quick fixes for inconsistent parameter selection
  • Reduced some internal logging by moving it behind the debug flag which can be activated in the preferences
  • Updated Java for Windows and Mac OS X to version 8u141
  • Bug fixes:
  • Fixed reproducibility of results when concurrent operators (e.g. Loops) are involved.
  • Changing the default connection timeout setting in the preferences now takes effect immediately.
  • Sending notification emails now uses the default connection timeout.
  • Fixed metadata of Flatten Clustering.
  • Fixed behavior of Loop Parameter inside parallel loops.
  • Removed unnecessary warning for clustering operators with nominal input data
  • Generate Weights (LPR) and Local Polynomial Regression now provide additional kernel parameters for the numerical measure KernelEuclideanDistance instead of failing
  • Fixed Gradient Boosted Trees renderer, it no longer shows wrong edge labels and incorrect value sets
  • Logistic Regression, Generalized Linear Model, Gradient Boosted Trees and Deep Learning operators no longer crash the software if certain temporary folder permissions are missing
  • Logistic Regression and Generalized Linear Model learners now use 0.5 as the threshold as other binominal learners
  • Fixed behavior of Loop Attributes when only one attribute is selected for parallel execution
  • Fixed Average for Performance inputs that contain AUC
  • Fixed side-effects of Apply Threshold in other branches of the process
  • Fixed rare crash in Create Association Rules under certain parameter configurations

New in RapidMiner Studio 7.5.3 (Jun 30, 2017)

  • Enhancements:
  • Font configuration added to the RapidMiner Studio preferences.
  • Dropbox support updated to API v2.
  • Bug fixes:
  • Added some missing permissions which users with a Large license can unlock for unsigned extensions via the preferences toggle.
  • Fixed error which sometimes occurred when calculating result statistics.
  • Loop Repository now supports (re-)move operations on the entry which it is currently iterating over.
  • Developers:
  • Added NetPermission("specifyStreamHandler") to unsigned extensions.

New in RapidMiner Studio 7.5.1 (May 10, 2017)

  • Enhancements:
  • Fewer unnecessary copies of example sets while running processes.
  • Added missing source description when opening data from the App Objects panel.
  • Bug fixes:
  • Fixed tracking of example set source for certain ExampleSet collections
  • Fixed closing a tab by clicking 'x'
  • Fixed macro support for process root parameters

New in RapidMiner Studio 7.5.0 (May 3, 2017)

  • New features:
  • The first iteration of new data core that manages data sets in a much more efficient way has arrived! This results in both better performance and less memory usage for the vast majority of operators.
  • Added support for Microsoft Azure Blob Storage with Read Azure Blob Storage, Write Azure Blob Storage, and Loop Azure Blob Storage operators. They work exactly like their existing Amazon S3 counterparts.
  • Added support for Amazon Key Management Service (AWS KMS) for all Amazon S3 operators. You can now optionally add an encryption key id to your Amazon S3 connection to decrypt/encrypt files when working with Amazon S3.
  • Added a new mechanism to provide help, advice messages, and even important announcements to the user.
  • Enhancements:
  • Completely revised result graph interaction, presentation, and visualization (e.g. decision trees, clusters, etc.).
  • It is now possible to highlight the path to a node of a decision tree in the Results view.
  • Cluster nodes in the Results view are now scaled according to their relative size.
  • Undo and redo functionality is now much more intuitive when working with the process canvas. It will now not only restore the process state, but also restore canvas location, operator selection, and the zoom level.
  • Navigating up and down through subprocesses in the UI is now more user friendly. When entering a subprocess and later going back up, you will see the same part of the process you were looking at before entering the subprocess.
  • Remove Duplicates now features a new output port called duplicates which returns the examples identified as duplicates.
  • Fixed memory leaks for Handle Exception, Select Subprocess, and Branch.
  • Execute Script now caches the parsed scripts for significantly faster execution, especially inside Loop operators or other highly concurrent environments. General performance of script execution has also been improved. Also added operator tags and added a default example script to make usage of the operator easier. Last but not least, error messages now include the causing stacktrace for easier debugging.
  • Improved AutoMLP performance.
  • Loading context data shows progress now.
  • Added new global process macro: %{process_start} which captures the timestamp when a process was started.
  • It is now possible to close result tabs with the same shortcut as in your web browser: ctrl+w (command+w on OS X)
  • Added new tutorials for RapidMiner Server and RapidMiner Radoop.
  • Added some more usable date and datetime format defaults to choose from when importing data.
  • Added folder buildingblocks in the .RapidMiner directory which will also be searched for .buildingblock files on startup.
  • The dialog letting you know about an available RapidMiner Studio update now also displays the version number of the update.
  • Bug fixes:
  • Fixed a bug making all parallel Loop operators incredibly resource hungry when running hundreds of thousands of iterations
  • Error bubbles indicating the source of an error in the process now work correctly in nested loops again
  • Removed empty confidence columns when applying the model from Linear Discriminant Analysis, Quadratic Discriminant Analysis, Regularized Discriminant Analysis, Single Rule Induction, Subgroup Discovery
  • Regular Discriminant Analysis no longer ignores the alpha parameter
  • The median for Aggregate now takes the middle point of both middle values in case of an even number of values
  • Fixed error that made operators which use a connection (e.g. Read Salesforce) unusable after importing a process
  • Fixed layout of marketplace search link in operator panel
  • Fixed broken dialog title for package download error
  • Fixed broken configurable entries due to unnecessary escaping
  • Fixed delay when trying to view decision trees in the Results view
  • Fixed major memory leak for Loop, Loop Values, Loop Attributes, and Loop Files
  • Fixed some operator parameter help tooltips being cut off
  • Fixed behaviour of Fast Large Margin if learned with bias (parameter)
  • Fixed pdf/svg image export of the scatter matrix chart
  • Fixed some spelling errors
  • Fixed Linear Regression calculation in case use bias is not selected
  • Fixed confidences of Ada Boost in border cases
  • Logistic Regression and Generalized Linear Model no longer allow p-value calculation without adding intercept
  • Fixed problem when trying to delete extensions of which more than one version was installed
  • Developers:
  • Concurrency API introduced with 7.4.0 is now available for unsigned extensions
  • Notes:
  • Changes to Fast Large Margin might affect behaviour of models learned with prior versions of RapidMiner. If you have an existing Fast Large Margin model which was learned using bias, we suggest you learn the model again with this release to ensure correct predictions.

New in RapidMiner Studio 7.4.0 (Feb 14, 2017)

  • New features:
  • Processes can now be executed in the background of Studio while you work on a different process in the user interface. This feature is only available for users with a Large license.
  • New parallelized Loop operator.
  • New parallelized Loop Values operator.
  • New parallelized Loop Attributes operator.
  • New parallelized Loop Files operator.
  • Repository entries can now be sorted by date.
  • Users with Large licenses can now grant additional permissions to unsigned extensions.
  • Enhancements:
  • Added a few new templates which can be used as a starting point when creating a new process.
  • Improved performance of Polynominal Regression.
  • Improved performance of Linear Regression.
  • Improved error message in case a selected input attribute for an operator is of the wrong type.
  • Improved operator progress for Generate Massive Data and several segmentation operators.
  • Improved performance of LibSVM and Fast Large Margin when sparse input data is not in sparse data format.
  • Small performance improvements for several operators that read parameters unnecessarily often.
  • Performance improvement for operators that iterate over all attributes.
  • Optimize by Generation (Evolutionary Aggregation) no longer shows unnecessary popup.
  • Repository entry sorting by name now ignores capitalization.
  • Users with Large licenses can now grant additional permissions to unsigned extensions via a new setting in the Start-up tab in the preferences.
  • The Log table in the results panel now also uses the new UI look and feel.
  • Bug fixes:
  • Fixed useless cipher error when starting Studio for the very first time.
  • Fixed swapped title in models of Linear Discriminant Analysis and Quadratic Discriminant Analysis.
  • Fixed side-effects of application of preprocessing models in other branches of the process.
  • Fixed side-effects of Impute Missing Values in other branches of the process.
  • Fixed wrong behavior when dismissing confirmation dialog asking for interruption of currently running process.
  • Fixed Delete File not being able to handle relative paths.
  • Meta data calculation of Generate Nominal Data can no longer cause freezing.
  • Optimize by Generation (Evolutionary Aggregation) no longer does one iteration too much.
  • Fixed Number of threads setting having no effect for Decision Tree and Random Forest if it was set to 1 and then increased again.
  • Fixed rare error that could occur when displaying a grouped model in the results view.
  • Developers:
  • Added a temporary API for operators which should run in a parallelized fashion. Use the com.rapidminer.studio.concurrency.internal.ConcurrencyExecutionServiceProvider to access it.
  • Notes:
  • The existing Read SAS operator has been deprecated. There is a new SAS connector extension available on the Marketplace which provides an up-to-date replacement of the operator.
  • Removed the compatibility level 7.1.1 of the operators Normalize, Replace Missing Values, Replace Infinite Values, Add Noise. These operators will no longer affect other branches of the progress even for processes created with compatibility level 7.1.0 or below.

New in RapidMiner Studio 7.3.1 (Dec 14, 2016)

  • Enhancements:
  • Improved error messages
  • Improved speed of chart calculation for many nominal attributes
  • Improved performance of operator Remove Duplicates
  • Improved support for Salesforce objects with missing values
  • Bug fixes:
  • Fixed model applying and concurrency issue of new Cross Validation
  • Fixed side-effects of Remap Binominals
  • Fixed display of tutorial process descriptions in the operator help
  • Fixed discretization steps in Decision Tree (Multiway) models

New in RapidMiner Studio 7.3.0 (Nov 7, 2016)

  • Enhancements:
  • New parallel Cross Validation operator replaces X-Validation, Batch X-Validation, and X-Prediction.
  • Operator search now also searches for matching Marketplace extensions
  • Greatly improved Proxy UI and logic
  • Logistic Regression, Generalized Linear Model and Gradient Boosted Trees now return Attribute Weights output as well
  • Added reproducible parameter to Logistic Regression, Generalized Linear Model and Gradient Boosted Trees. If checked, the result is guaranteed to be the same, because the parallelization level is fixed.
  • Improved sorting for repository entries.
  • Performance improvement for Rule Induction and Perceptron operators.
  • Improved high DPI support.
  • Improved operator progress for Apply Model and Logistic Regression (SVM).
  • Improved welcome dialog layout.
  • Bug fixes:
  • Fixed NullPointerException in Logistic Regression and Generalized Linear Model with compute p-values on and solver set to AUTO on an input with large number of nominal values
  • Changed the default of the max_w2 parameter of Deep Learning to 10, as the operator help describes; it also became a non-advanced parameter
  • Fixed some minor tutorial inconsistencies
  • If there is a security error, Logistic Regression, Generalized Linear Model, Gradient Boosted Trees and Deep Learning operators can recover without Studio / Server restart
  • Input data rebalancing in Logistic Regression, Generalized Linear Model, Gradient Boosted Trees and Deep Learning no longer depends on the number of cores but the number of threads (configurable)
  • Logistic Regression, Generalized Linear Model, Gradient Boosted Trees and Deep Learning operators are now loaded even if javafx package is missing from the Java Runtime Environment
  • Fixed multiple problems with the GSP operator
  • Operator progress now vanishes if operator is successfully stopped
  • Fixed operator progress animation being stuck sometimes
  • Fixed import excel data UI issues on Mac OS X
  • Fixed that in-Hadoop scoring of Logistic Regression, Generalized Linear Model, Gradient Boosted Trees and Deep Learning models in Rapidminer Radoop no longer logs something for each row (leads to significant performance improvement)
  • Development:
  • Added a centralized API for data table creation: From now on a new ExampleSet should be created via an ExampleSetBuilder provided by the ExampleSets class instead of using MemoryExampleTable
  • Tweaked project structure for the open source core. This does not affect the functionality of RapidMiner Studio.

New in RapidMiner Studio 7.2.3 (Oct 11, 2016)

  • Bug fixes:
  • Read XML wizard UI crash fixed
  • Guess Types no longer fails in case an attribute only consists of missing values
  • Fixed display of Visualize Model by SOM operator results
  • Fixed exception handling for unsafe Exceptions which might be thrown by some operators
  • Fixed some problems with the Password Manager UI

New in RapidMiner Studio 6.5.001 (Sep 21, 2015)

  • Bug fixes:
  • BUGFIX: Fixed an expressions evaluation error that occurred when referencing attributes with an index of 251 or higher.
  • BUGFIX: Fixed automatic license downloading on startup for RapidMiner Studio.
  • BUGFIX: The backspace button now works again to navigate to the parent operator in the process editor.
  • BUGFIX: Average of Performance (Cost) now displays the correct micro performance.
  • BUGFIX: The ID attribute can now be shown in charts.
  • BUGFIX: Unsupported parameters for Optimize Parameters (Evolutionary) can no longer be selected.

New in RapidMiner Studio 6.5.000 (Sep 21, 2015)

  • Expression engine now offers clearer interface, simpler syntax, and significant performance gains
  • Improved pre-flight check and runtime error messages
  • Hive Connector
  • Enhancements:
  • Completely overhauled problem and error notifications when running processes
  • All Learner Models will show an error rather than log a warning when applied on incompatible data
  • Repositories are now sorted by type and name
  • Improved churn template when using custom data
  • Improved performance when navigating RapidMiner Server repositories over a slow connection
  • Execute Process nesting depth is now limited to prevent endless loops; the maximum depth can be tweaked in the preferences
  • Added Netezza 7.0 JDBC support
  • Added a new "Move into new Subprocess" action that allows moving a group of selected operators into a Subprocess operator
  • Standard dialogs now support hyperlinks in the description
  • API: ParameterTypeText is now able to handle template text that is shown in the TextPropertyDialog if no text is set
  • API: Removed SassyReader and kdb dependencies, increased SLF4J API dependency to version 1.7.12
  • Bug fixes:
  • BUGFIX: Fixed possible startup problems when the _JAVA_OPTIONS environment variable is set
  • BUGFIX: Fixed rare cases of Studio becoming unresponsive because dialogs opened behind other dialogs
  • BUGFIX: When opening a process from the Server Processes view, confirmation is now required before an unsaved process is discarded
  • BUGFIX: Fixed rare problem when trying to save preferences
  • BUGFIX: Fixed some copy and paste problems of process notes
  • BUGFIX: Fixed Generate Data performance when selecting gaussian mixture clusters as the target function
  • BUGFIX: Fixed several problems when both Process and XML views were open and visible at the same time
  • BUGFIX: "Sample (Bootstrapping)" now duplicates examples when upsampling data
  • BUGFIX: Averaging of Performance Vectors can now handle additional or fewer classes after the first iteration
  • BUGFIX: Aggregate operator now supports non-alphanumerical attribute names for grouping
  • BUGFIX: Execution order is now up-to-date even if process validation has not finished
  • BUGIFX: Fixed computation of binary classification criteria (performance) for remapped binominal labels
  • BUGFIX: Decision Tree and Random Forest can now handle an unbounded number of different label values
  • BUGFIX: 'Principal Components Analysis', 'Generalized Hebbian Algorithm', 'Independent Component Analysis' or 'Principal Component Analysis (Kernel)' in combination with Apply Model no longer modify the original example set
  • BUGFIX: Decision Tree(rule) model edge labels now correctly display dates instead of Unix timestamps in the Results perspective
  • BUGFIX: Read Access and Write Access now work with 64-bit Java and Java 8
  • BUGFIX: Log operator no longer silently fails if duplicate column names have been entered
  • BUGFIX: Fixed rare case where the Chart view in the Results perspective was broken
  • BUGFIX: Fixed rare case where the date format field vanished in data import dialogs
  • BUGFIX: Context data is no longer loaded when the input port is not connected
  • BUGFIX: Generate Attributes no longer forgets roles in metadata if an attribute is overwritten
  • BUGFIX: Read Excel, Read CSV, and Read XML can now be stopped
  • BUGFIX: Metadata of Execute Process operators is no longer calculated if an endless process loop is suspected
  • BUGFIX: Loop Files operator now shows an error message if the directory is invalid or the user has insufficient privileges
  • BUGFIX: Fixed an error that occurred in Write Database with an empty JNDI name
  • BUGFIX: Fixed problems with reconnecting operators after the 'Replace Operator' action
  • BUGFIX: Fixed displayed number of combinations for integer parameters in Optimize Parameters (Grid)
  • BUGFIX: Fixed jumping to correct subprocess when clicking on the cause of a failed process in the error dialog
  • BUGFIX: Generate Attributes can now be stopped
  • BUGFIX: Fixed a bug that occurred when trying to install a non-existent extension via one-click installation
  • BUGFIX: Fixed reading of XLSX files with cells that contain mixed font formats
  • BUGFIX: Now max 100 attributes are shown in regex dialogs to prevent GUI freezes
  • BUGFIX: Fixed a rare bug that occurred while refreshing a remote repository with a remote database

New in RapidMiner Studio 6.4.000 (Sep 21, 2015)

  • A new method of workflow annotation:
  • Collaboration among stakeholders is key for analytics initiatives and projects. With the new workflow annotation capabilities of RapidMiner Studio, you can now annotate RapidMiner processes using stickers on the Process view canvas. These stickers can be freely placed and re-sized anywhere on the canvas, including attached to individual operators.
  • With this tool, you can easily and visually document whole analytic processes, highlighted parts of a process, or individual steps within a process--as you build. These capabilities greatly improve collaboration among users as well as ease and streamline the maintenance and auditing of analytic processes. The new workflow annotation feature replaces the old process and operator commenting functionality. Any existing process or operator comment is automatically converted into workflow annotations when loading a process.
  • New extensions:
  • Improved R integration: RapidMiner Studio 6.4 features an improved integration of the well-adopted statistical programming language R. The integration focuses on providing the core functionality needed when combining RapidMiner with R. Now, you can execute R code from within a RapidMiner process, passing data to R and passing the result of the R code execution back to RapidMiner after executing the R script. The integration has been completely revised, resulting in not only an easier installation and configuration in RapidMiner Studio and RapidMiner Server, but also in a more stable and secure integration solution. The R integration is delivered as a new extension called R Scripting, which supersedes the earlier R Extension.
  • Python integration: Analogous to the R integration, RapidMiner Studio 6.4 introduces integration with the data scientist-friendly Python programming language. You can now easily integrate Python code into your RapidMiner processes. As with R, data can be passed seamlessly from RapidMiner to Python, where it can be manipulated and used for model building or charting; Python results can then be transferred back and made available in RapidMiner.
  • Splunk connector: RapidMiner now provides native connectivity to Splunk, a platform for storing, searching, monitoring, and analyzing machine-generated data. With the RapidMiner Studio 6.4 connector operator, you can now build Splunk data ingestion into RapidMiner processes for deeper analysis.
  • Extension development kit: RapidMiner Studio 6.4 makes it much simpler to develop new extensions. First, we provide an extension template on Github that users can easily clone. Using Gradle as a modern build tool, we then provide scaffolding capabilities to quickly create a new extension stub. Also provided is documentation on how to: extend RapidMiner, implement specific operators, make use of RapidMiner's data structures, and more.
  • One-click extension installation: With RapidMiner Studio 6.4, you can install extensions directly from the RapidMiner Marketplace website with a single click. Each Extension page displays a button that, when clicked, starts up RapidMiner Studio and then the automatic extension installation.
  • New Mac version of RapidMiner Studio:
  • The RapidMiner Studio 6.4 Mac download contains an installer app that significantly eases and accelerates Mac installation. RapidMiner Studio now feels and behaves like a native Mac application.
  • Enhancements:
  • Improved Process history view
  • Connections to RapidMiner Server no longer require equal license editions for Studio and Server. For example, professional-level RapidMiner Studio can now connect to Enterprise-level RapidMiner Server.
  • Improved visual feedback for port and connection interactions in the Process view
  • Drastically improved Process view performance
  • Cleaned up right-click context menu in the Process view
  • RapidMiner Server connections are now editable in RapidMiner Studio
  • Breakpoints in subprocesses are now indicated in the top right corner of the Process view
  • Dragging multiple repository entries into a process is now possible
  • Updated keyboard shortcuts and mouse handling improves Mac user experience
  • Ctrl + Backspace is now available for text inputs and deletes an entire word instead of a single character
  • On opening, problem display only occurs if a critical problem was detected
  • In Select Attribute operators, numeric conditions now ignore blank spaces
  • Improved error message shown when class weights are specified for classes that do not exist
  • Added display of release platform to the About screen
  • Unmanaged extensions are now also loaded from ~/.RapidMiner/extensions if not specified otherwise in Preferences
  • All sample processes have been updated and improved to be compatible with the current version
  • Added new sampling type of automatic to the X-Validation operator
  • Operator search only expands groups with hits inside
  • Operator search is case sensitive when search term starts with an upper case letter
  • API: Added draw decorator and event hooks for the Process view. See ProcessRendererView#addDrawDecorator() and ProcessRendererView#addEventDecorator().
  • Bug fixes:
  • BUGFIX: Safemode dialog on startup is no longer sometimes hidden behind other windows
  • BUGFIX: Update Database now closes database connections after finishing
  • BUGFIX: Restarting after activating a license with more memory now correctly increases available memory on Windows
  • BUGFIX: A more meaningful error message is displayed when an invalid numeric condition is entered as a parameter
  • BUGFIX: Adding new database drivers via the Manage Database Drivers dialog no longer requires a restart
  • BUGFIX: Fixed rare error that could prevent the Manage Database Connections dialog from opening
  • BUGFIX: Fixed broken parameter help content for some operator parameters
  • BUGFIX: Calculation of a SOM-plot can now be cancelled
  • BUGFIX: It is no longer possible to drag operators out of the Process view
  • BUGFIX: Fixed rare error that could occur during automatic operator port connection
  • BUGFIX: Scrolling speed in the Process view is increased
  • BUGFIX: Fixed duplicate entry error in the History view
  • BUGFIX: Fixed Guess Types operator which occasionally took only the last numerical value into account
  • BUGFIX: A more meaningful error message is displayed when using Add generated primary keys for writing to MSSQL databases
  • BUGFIX: Fixed broken Execute Process operator help
  • BUGFIX: Disabled zoom functionality in Histogram Charts
  • BUGFIX: A more meaningful error message is displayed when using the Hyper Hyper operator with invalid input
  • BUGFIX: Principal Component Analysis operator works when applied on special attributes with missing values
  • BUGFIX: Fixed Read Excel operator encoding errors on Windows 8.1
  • BUGFIX: In Excel import wizard, wrong-typed values are parsed as missing instead of causing an error
  • BUGFIX: Removed unused parameter attribute type from Discretize by User Specification operator
  • BUGFIX: Fixed some broken templates and sample processes
  • BUGFIX: Clustering models now work with special attributes that contain missing values
  • BUGFIX: K-Medoids operator now always uses the selected measure type
  • BUGFIX: Fixed rare cases of broken standard coefficients for Linear Regression operator
  • BUGFIX: Right-clicking an operator now selects it before opening the popup menu (Linux/Mac)
  • BUGFIX: When installing extensions from Marketplace, dependencies are only added if not yet installed
  • BUGFIX: Marketplace dialogs now always open in the correct order
  • BUGFIX: The date functions of Generate Attributes operator now add correct metadata for new attributes
  • BUGFIX: Operator text parameter dialogs (e.g., the SQL query dialog) can now be closed by pressing Ctrl + Enter
  • BUGFIX: The log level of the Log view is now correctly restored on each start

New in RapidMiner Studio 6.3.000 (Sep 21, 2015)

  • Improved Startup and Onboarding:
  • RapidMiner Studio 6.3 greatly improves the first-time startup and onboarding experience. Manual installation of a license key has become obsolete. Now, simply log on to RapidMiner.com and either a trial license or any commercial license associated with your user account is automatically installed. After license installation, a new onboarding dialog recommends next steps, helping you start using RapidMiner quickly and effectively.
  • Wisdom of Crowds: New and Improved Recommenders. With the Wisdom of Crowds features, RapidMiner users can get help designing and implementing analytical workflows and building predictive models. These features offer next-step recommendations based on the knowledge and best practices of other RapidMiner users. RapidMiner Studio 6.3 provides the following enhancements:
  • Context-aware operator recommender: The operator recommender, first introduced in RapidMiner 6.1, helps you design by recommending operators to add to your process. Initially the feature recommended operators based on the complete process; now, the recommender evaluates the current subprocess selection for more granular assistance. For example, recommendations differ significantly when you are looking at the top-level process and when you have drilled into a subprocess (e.g., an X-Validation operator). By considering the context, the operator recommender provides much more focused and accurate recommendations.
  • Parameter recommender: The new parameter recommender helps you configure operators and set the parameters of a selected operator. The tool not only provides recommendations on which parameters to change, it also suggests appropriate values to select for those parameters.
  • Improved Excel Import:
  • RapidMiner Studio 6.3 dramatically improves one of the most widely used RapidMiner features — the import of Excel files. Previously, due to suboptimal parsing of XML-based Excel files (Excel 2007 and above), an Excel import caused excessive memory consumption, and reading large files took quite some time. RapidMiner Studio 6.3 reduces memory consumption overhead (by up to 30x in some test cases) and speeds file reading (up to 5x faster) by moving away from the formerly used library and implementing the necessary functionality within RapidMiner itself.
  • Version Control:
  • A new version management feature allows you to start new revisions of a process while keeping them in parallel with older versions. Available with RapidMiner Server 2.3 running with RapidMiner Studio 6.3, processes can now be rolled back and forth between old and new revisions.
  • Other Changes and Bugfixes:
  • Progress dialog no longer opens when saving the process to a remote location
  • The file chooser dialog for 'Read Excel' now defaults .xlsx and .xls files
  • 'Write Excel' format is now XLSX instead of XLS
  • The operator 'Execute Process' now shows a button to open the selected process in the parameter view
  • Parameter help is shown in a tool tip window when hovering over the information symbol
  • Histogram Charts now use date instead of numerical axis in case more than one date attribute is selected
  • Added Netezza JDBC support
  • The Application Wizard is now called Accelerator
  • BUGFIX: The 'Read Salesforce' operator can now handle relationship queries
  • BUGFIX: Operator recommendations now always appear when creating a new process or switching to the Design perspective
  • BUGFIX: Fixed process recovery encoding problem on Windows which could break umlauts and other symbols
  • BUGFIX: Fixed row deletion error in 'Edit Parameter List' dialog
  • BUGFIX: The recent analysis list in Home perspective no longer extends below the visible area of the monitor
  • BUGFIX: Naive Bayes is now handles dates correctly
  • BUGFIX: SVM models can now only be applied on ExampleSets with the same attributes
  • BUGFIX: Stratified sampling with a defined local random seed now produces the same output on every system
  • BUGFIX: The Surface 3D chart now limits the number of data points (to ensure good performance)
  • BUGFIX: The chart for distribution model attributes limits the number of nominal values (to ensure good performance)
  • BUGFIX: Fixed a validation error that occurred when choosing an inverted set of attributes
  • BUGFIX: Fixed a validation error that occurred when 'Execute Process' referenced the operator's process
  • BUGFIX: Fixed local repository not being created in some special cases when starting for the first time
  • BUGFIX: Tooltips now work with modal dialogs after being focused via F3

New in RapidMiner Studio 6.2.000 (Sep 21, 2015)

  • Added operators 'Publish to App' and 'Recall from App' and a new view 'App Objects' for RapidMiner Server App manipulations
  • Resizing the attribute name column in the Statistics view of process results is now possible
  • New processes can now be saved via save button or ctrl+s
  • Improved error messages for broken custom filters in the 'Filter Examples' operator
  • Improved error message when selecting special attributes in an operator despite special attributes not being included
  • Show Git revision of RapidMiner Studio release in About window
  • Improved speed and behavior of 'Decision Tree' and 'Random Forest' operators
  • BUGFIX: Fixes problems with single parameter selection for several Java implementations
  • BUGFIX: Fixed opening of stored results via the result history
  • BUGFIX: Operator port tooltips should no longer cover the port
  • BUGFIX: Charts should now display 'Missing' instead of '1.1.1970' for missing values in date attributes
  • BUGFIX: 'Update Database' should throw a more reasonable error message in case the database user lacks permission
  • BUGFIX: 'Neural Net' operator works again when applied on special attributes with missing values
  • BUGFIX: 'Neural Net' can no longer be applied on incompatible data
  • BUGFIX: The expression parser function round() now returns a missing value instead of 0 when applied on a missing value
  • BUGFIX: 'Sample (Bootstrapping)' operator now throws a reasonable error message in case the input example set is empty
  • BUGFIX: Moving colors in the color scheme dialog of Advanced Charts does not save duplicates anymore
  • BUGFIX: Fixed a bug which occurred when an optional password field was left empty
  • BUGFIX: Fixed overwriting an already existing file in Import Binary File Wizard
  • BUGFIX: Fixed a UI problem that occurred when a Collection with empty ExampleSets was displayed
  • BUGFIX: Fixed operator tree display in log view which is shown in case of a process error
  • API: Introduced AbstractConfigurator which deprecates the Configurator class. The AbstractConfigurator improves parameter dependency handling for Configurables
  • API: removed Encog dependency and all deprecated classes that used Encog
  • API: Added capability to allow parallel processing inside operators

New in RapidMiner Studio 6.1.000 (Sep 21, 2015)

  • Overhauled Repositories view: Now multiple elements can be selected, copied, moved and deleted at the same time
  • Completely revised preferences dialog to make customization of RapidMiner Studio more accessible
  • Drastically sped up Log view for larger logs
  • Improved startup code to reduce launch problems. Also memory settings are now based on the actual free memory when starting for Win32 versions. Furthermore added property in 'System' tab in the preferences where the maximum amount of memory for RM Studio can be configured
  • Improved SQL editor dialog responsiveness
  • It is now possible to ignore meta data for the 'Filter Examples' GUI
  • 'Weight by' operators: The default value of the parameter normalize weights is now false
  • BUGFIX: Results containing missing values are sorted correctly
  • BUGFIX: Update Database now throws a meaningful error when the input example set contains no attributes
  • BUGFIX: Improved error message when applying a PCA model to incompatible data
  • BUGFIX: More meaningful error message when a mandatory attribute is not selected
  • BUGFIX: Loop/Optimize parameters are not longer dismissed if selection changes
  • BUGFIX: Distribution Models will no longer be able to be applied on subsets of the training set or sets with same name but other type
  • BUGFIX: Log Operator now uses modern UI to show the result
  • BUGFIX: Fixed Linear Regression matrix calculation corner cases which could lead to missing values for standard error, t-stat, and p-value
  • BUGFIX: Fixed an issue that caused Top Down Clustering to fail
  • BUGFIX: Replace (Dictionary) maps each value only once
  • BUGFIX: Fixed an issue that sometimes caused data in the results perspective to be shown with a null source
  • API: Added support for parameter dependencies in the Configurable framework (see Configurator#getParameterHandler())
  • API: Added operator parameter type which can display a file chooser for arbitrary remote file systems (see ParameterTypeRemoteFile)
  • API: Added greater control over preferences internationalization and layout (see SettingsDialog)

New in RapidMiner Studio 6.0.008 (Sep 21, 2015)

  • The performance of the Read XML operator has been increased dramatically, especially for large files
  • OK buttons in dialogs can now be pressed via keyboard (ALT+O)
  • BUGFIX: K-Means no longer erroneously requires a label
  • BUGFIX: Clicking on the cause in a 'Process Failed' dialog now switches to the design perspective in addition to selecting the operator
  • BUGFIX: The Split Operator now displays an Error when an attribute does not exist
  • BUGFIX: Tooltip windows are now displayed on the same screen as Rapidminer Studio
  • BUGFIX: Fixed rare histogram display error in the Result Statistics
  • BUGFIX: Fixed a bug where custom Perspectives were still visible after their removal
  • BUGFIX: Import Wizards don't freeze anymore when certain encodings are being selected
  • BUGFIX: Reorder Attributes no longer shows a warning for undefined "attribute ordering" when it is not needed
  • BUGFIX: Generate Massive Data now returns correct meta data
  • BUGFIX: Fixed error occurring while opening the SQL query editor when no connection is selected
  • BUGFIX: Dates are now correctly displayed on x-axis of Histogram Charts
  • BUGFIX: Entering a license directly after the last one expired now works again

New in RapidMiner Studio 6.0.007 (Sep 21, 2015)

  • BUGFIX: Adds missing compatibility level for Aggregate operator

New in RapidMiner Studio 6.0.006 (Sep 21, 2015)

  • Improved copy and paste functionality of the process editor
  • Added new logging mechanism which can also be used by extensions to display their own logs in the default log view
  • Added parameter to Parse Numbers operator to show an error message or use missing values if a value can't be parsed
  • On lower screen resolutions smaller plot preview icons will be used
  • Aggregate operator throws an error when the example set does not contain attributes selected by the parameter "group by"
  • Improved the ability to stop the process while executing a Join operator
  • Ports of disabled operators are now highlighted to indicate that interaction is possible
  • Loop/Optimize Parameters GUI now automatically selects newly added parameter
  • Refreshing a repository folder is now possible regardless whether a folder or a data entry is selected
  • New chart type added: Web
  • Improved tooltip behavior
  • Improved resizing of subprocesses
  • Added parameter to Loop/Optimize Parameters which specifies how errors occurring in the inner process should be handled
  • Switching perspectives now remembers focused tabs and the position of all scroll bars
  • BUGFIX: Fixed problem with prepared statements in Read Database and Execute SQL operators
  • BUGFIX: Data readers will no longer automatically choose binominal as the value type to avoid import failures
  • BUGFIX: Saving a process can no longer freeze the user interface
  • BUGFIX: Storing/Reading models in XML representation works again when executing the process on RapidMiner Server
  • BUGFIX: Pasting process xml into the process view directly no longer messes up the layout and the connections
  • BUGFIX: Execute Process: Number of ports shown by operator matches ports used by embedded process.
  • BUGFIX: Weighting Operators which require a label attribute now throw an error if no label is present
  • BUGFIX: Superset and Union operators now fail with a better error message if the special attributes do not match
  • BUGFIX: macro() can now be used in the expression condition at Branch
  • BUGFIX: Loop Repository: using the parent folder name as filtered string does not throw an error anymore
  • BUGFIX: The Cumulative Variance plot for the PCA now displays the correct values
  • BUGFIX: Excel Operators show a human readable Error if wrong sheet is selected
  • BUGFIX: Aggregate now detects DATE_TIME in MetaData
  • BUGFIX: Predefined operator macros are working again
  • BUGFIX: Data import operators of extensions are no longer sometimes displayed as disabled for some licenses
  • BUGFIX: Use correct file filter for Loop Zip-File Entries file chooser
  • BUGFIX: Read and Update Database operators can now be stopped
  • BUGFIX: Generate Macro will no longer add unnecessary zeros to the end of numbers
  • BUGFIX: Reduced logging at Generate Function Set if NaN was generated
  • BUGFIX: Operators which provide a subset selection now show an error if selected attributes are not present
  • BUGFIX: Correct display of operator status when starting a process
  • BUGFIX: Catch errors when trying to parse empty strings to numbers
  • BUGFIX: Remember/Recall operators now use a more sensible default for the io object type
  • BUGFIX: Fixed endless loop in Logistic Regression
  • BUGFIX: Generate Data can now be stopped
  • BUGFIX: Import wizards now ignore the check for duplicate names regarding columns that are disabled
  • BUGFIX: Linear, Quadratic and Regularized Discriminant Analysis can now be stopped
  • BUGFIX: K-Means, Linear Regression and SVM now ignore missing values in special attributes, except for the label
  • BUGFIX: Generate Nominal Data operator can now be stopped
  • BUGFIX: The arrange operators function no longer adds horizontal space between operators unnecessarily
  • BUGFIX: Fixed Filter Examples operator failing on date filters for dates before 1970
  • BUGFIX: The Split operator correctly outputs missing values if the input value was missing
  • BUGFIX: The Replace (Dictionary) operator now displays a meaningful error message if the to or from parameters are left undefined
  • BUGFIX: The displayed error, when using an invalid expression in the Branch operator, now contains a link to the operator
  • BUGFIX: Fixed a rare error while loading extensions on startup
  • BUGFIX: RapidMiner remembers all tabs that are visible and keeps them focused between perspective switches
  • BUGFIX: Tooltips in New Operator Dialog are now correctly formatted
  • BUGFIX: The Loop Repository operator now shows an error when the selected repository location does not exist
  • BUGFIX: Building Block Numerical X-Validation now defaults to shuffled sampling
  • BUGFIX: Improved error handling when pasting an unsupported file into the process editor
  • BUGFIX: More meaningful error message when a wrong attribute is selected in some operators

New in RapidMiner Studio 6.0.005 (Sep 21, 2015)

  • Improved copy and paste functionality of the process editor
  • Added new logging mechanism which can also be used by extensions to display their own logs in the default log view
  • Added parameter to Parse Numbers operator to show an error message or use missing values if a value can't be parsed
  • On lower screen resolutions smaller plot preview icons will be used
  • Aggregate operator throws an error when the example set does not contain attributes selected by the parameter "group by"
  • Improved the ability to stop the process while executing a Join operator
  • Ports of disabled operators are now highlighted to indicate that interaction is possible
  • Loop/Optimize Parameters GUI now automatically selects newly added parameter
  • Refreshing a repository folder is now possible regardless whether a folder or a data entry is selected
  • New chart type added: Web
  • Improved tooltip behavior
  • Improved resizing of subprocesses
  • Added parameter to Loop/Optimize Parameters which specifies how errors occurring in the inner process should be handled
  • Switching perspectives now remembers focused tabs and the position of all scroll bars
  • BUGFIX: Data readers will no longer automatically choose binominal as the value type to avoid import failures
  • BUGFIX: Saving a process can no longer freeze the user interface
  • BUGFIX: Storing/Reading models in XML representation works again when executing the process on RapidMiner Server
  • BUGFIX: Pasting process xml into the process view directly no longer messes up the layout and the connections
  • BUGFIX: Execute Process: Number of ports shown by operator matches ports used by embedded process.
  • BUGFIX: Weighting Operators which require a label attribute now throw an error if no label is present
  • BUGFIX: Superset and Union operators now fail with a better error message if the special attributes do not match
  • BUGFIX: macro() can now be used in the expression condition at Branch
  • BUGFIX: Loop Repository: using the parent folder name as filtered string does not throw an error anymore
  • BUGFIX: The Cumulative Variance plot for the PCA now displays the correct values
  • BUGFIX: Excel Operators show a human readable Error if wrong sheet is selected
  • BUGFIX: Aggregate now detects DATE_TIME in MetaData
  • BUGFIX: Predefined operator macros are working again
  • BUGFIX: Data import operators of extensions are no longer sometimes displayed as disabled for some licenses
  • BUGFIX: Use correct file filter for Loop Zip-File Entries file chooser
  • BUGFIX: Read and Update Database operators can now be stopped
  • BUGFIX: Generate Macro will no longer add unnecessary zeros to the end of numbers
  • BUGFIX: Reduced logging at Generate Function Set if NaN was generated
  • BUGFIX: Operators which provide a subset selection now show an error if selected attributes are not present
  • BUGFIX: Correct display of operator status when starting a process
  • BUGFIX: Catch errors when trying to parse empty strings to numbers
  • BUGFIX: Remember/Recall operators now use a more sensible default for the io object type
  • BUGFIX: Fixed endless loop in Logistic Regression
  • BUGFIX: Generate Data can now be stopped
  • BUGFIX: Import wizards now ignore the check for duplicate names regarding columns that are disabled
  • BUGFIX: Linear, Quadratic and Regularized Discriminant Analysis can now be stopped
  • BUGFIX: K-Means, Linear Regression and SVM now ignore missing values in special attributes, except for the label
  • BUGFIX: Generate Nominal Data operator can now be stopped
  • BUGFIX: The arrange operators function no longer adds horizontal space between operators unnecessarily
  • BUGFIX: Fixed Filter Examples operator failing on date filters for dates before 1970
  • BUGFIX: The Split operator correctly outputs missing values if the input value was missing
  • BUGFIX: The Replace (Dictionary) operator now displays a meaningful error message if the to or from parameters are left undefined
  • BUGFIX: The displayed error, when using an invalid expression in the Branch operator, now contains a link to the operator
  • BUGFIX: Fixed a rare error while loading extensions on startup
  • BUGFIX: RapidMiner remembers all tabs that are visible and keeps them focused between perspective switches
  • BUGFIX: Tooltips in New Operator Dialog are now correctly formatted
  • BUGFIX: The Loop Repository operator now shows an error when the selected repository location does not exist
  • BUGFIX: Building Block Numerical X-Validation now defaults to shuffled sampling
  • BUGFIX: Improved error handling when pasting an unsupported file into the process editor
  • BUGFIX: More meaningful error message when a wrong attribute is selected in some operators

New in RapidMiner Studio 6.0.003 (Sep 21, 2015)

  • Added new dialog to create and manage various connections
  • Tasks (shown in the lower right corner) should no longer unintentionally block each other
  • Process result display creation should be much faster now
  • Added attribute statistics when hovering over a table header in the example set result view.
  • New order for special attributes in data and meta data result view
  • Execute SQL dialog now has syntax highlight and content assist (ctrl+space)
  • Extension can now declare more than one dependency
  • Added 'unmatched example set' output port to Filter Examples operator which outputs all examples that did not match the specified condition
  • Added parameter to De-Normalize operator to control handling of missing attributes
  • Added parameter to Execute Process which allows to control if process should fail if you define a macro which is not defined in the context of the embedded process
  • Added GUI parameter rapidminer.gui.plotter.default.maximum which defines the maximum size of an example set for which a default plot will be created
  • BUGFIX: Vote operator should be functional again
  • BUGFIX: Excel 2007 import no longer fails when the sheet contains nominal formula values
  • BUGFIX: Custom filters for the Filter Examples operator should no longer crash when selecting the 'matches' filter on empty input
  • BUGFIX: FindThreshold operator now throws error if the confidence role has the wrong name or does not exist
  • BUGFIX: Fixed bug preventing storage of Lift charts in the repository
  • BUGFIX: Fixed bug in expression parser which did not remove faulty expressions, leading to errors in later runs
  • BUGFIX: Fixed bug that prevented the usage of global process-related macros
  • BUGFIX: Loop Repositories operator can now be stopped
  • BUGFIX: Fixed recent processes being sometimes cut off in the Welcome perspective
  • BUGFIX: Fixed wrong default file extension for directory and file parameters
  • BUGFIX: Fixed rearranging of operators in subprocesses
  • BUGFIX: Fixed bug when creating charts for an empty example set
  • BUGFIX: Optimize Parameters Operator now interrupts with an understandable explanation when no performance values were delivered
  • BUGFIX: Fixed error with password fields when the password is less than 4 characters long
  • BUGFIX: Vector Linear Regression now checks for missing values
  • BUGFIX: Fixed scrolling when moving operators outside of visible area
  • BUGFIX: Support Vector Machine(LibSVM) can now be stopped
  • BUGFIX: Fix result of Join operator with only missing values in ID nominal attribute
  • BUGFIX: Decision Tree operators no longer fail with a cryptic error message when the label attribute contains missing values
  • BUGFIX: Generate Macro no longer proceeds if an error occurred during macro generation
  • BUGFIX: Using undefined macros as operator parameters now causes an error when executing the process
  • BUGFIX: Applying a k-NN model can now be stopped
  • BUGFIX: Logistic Regression (Evolutionary) can now be stopped
  • BUGFIX: NominalToNumerical can now be stopped
  • BUGFIX: Optimize Parameters (Evolutionary) can now be stopped
  • BUGFIX: Polynomial Regression can now be stopped
  • BUGFIX: Remove Duplicates operator can now be stopped
  • BUGFIX: Self-Organizing Map operator can now be stopped
  • BUGFIX: Support Vector Machine (Evolutionary) can now be stopped
  • BUGFIX: In most cases programs executed with Execute Program operator can now be stopped properly
  • BUGFIX: The chart selection menu in the results perspective should no longer appear in strange locations

New in RapidMiner Studio 5.2.008 (Jul 10, 2012)

  • Send Mail operator has a new behavior: Stop process and show error if mail cannot be send
  • Send Mail operator has a new parameter: 'Ignore errors'
  • Write CSV operator has a new parameter: 'Append to file'
  • Write Excel operator has a new parameter: File Format (xls, xlsx)
  • New Operator: Reorder Attributes
  • Set Macro: now can define empty macros
  • BUGFIX: Keep old settings after updating RapidMiner
  • BUGFIX: 'Cancel' ParameterTypeList Dialog works correctly now
  • BUGFIX: FP-Growth correctly handles parameter 'must contain'
  • BUGFIX: Join Operator displays MetaData correctly

New in RapidMiner Studio 5.2.003 (Mar 27, 2012)

  • Remove Correlated Attributes uses deterministic random numbers
  • Improved Repository Tree handling (save expansion state on refresh and improved tree selection on entry removal)
  • Improved exporting of Advanced Charts View

New in RapidMiner Studio 5.2.002 (Mar 7, 2012)

  • Remove Correlated Attributes uses deterministic random numbers
  • Improved Repository Tree handling (save expansion state on refresh and improved tree selection on entry removal)
  • Improved exporting of Advanced Charts View

New in RapidMiner Studio 5.2.001 (Feb 24, 2012)

  • Added operators to manage repository entries: Copy, Move, Delete, Rename

New in RapidMiner Studio 5.2.000 (Feb 2, 2012)

  • Added "File" objects to pass to reader operators.
  • Added operators to open files and URL connections
  • Added operators to iterate ZIP files
  • Superset and Union operator can handle special attributes
  • Catch block subprocess for Handle Exception operator
  • Database connections can define driver properties
  • XML import
  • Join operator can operate on multiple columns
  • Easier bug reporting: Direct connection to Bugzilla
  • Added new Operators:
  • Denormalization Operator
  • Remove Unused Values Operator
  • Loop Repository
  • Open File, Write File, Loop Zip-File Entries
  • Read Excel with Format
  • Aggregation Operator now supports default Aggregation for a set of attributes and is implemented more efficiently

New in RapidMiner Studio 5.1.016 (Jan 5, 2012)

  • Added "File" objects to pass to reader operators.
  • Added operators to open files and URL connections
  • Added operators to iterate ZIP files
  • Superset and Union operator can handle special attributes
  • Catch block subprocess for Handle Exception operator
  • Database connections can define driver properties
  • XML import
  • Join operator can operate on multiple columns
  • Easier bug reporting: Direct connection to Bugzilla
  • Added new Operators:
  • Denormalization Operator
  • Remove Unused Values Operator
  • Loop Repository
  • Open File, Write File, Loop Zip-File Entries
  • Read Excel with Format
  • Aggregation Operator now supports default Aggregation for a set of attributes and is implemented more efficiently

New in RapidMiner Studio 5.1.015 (Dec 21, 2011)

  • Added "File" objects to pass to reader operators.
  • Added operators to open files and URL connections
  • Added operators to iterate ZIP files
  • Superset and Union operator can handle special attributes
  • Catch block for Handle Exception operator
  • Database connections can define driver properties
  • XML import
  • Join operator can operate on multiple columns
  • Easier bug reporting: Direct connection to Bugzilla
  • Added new Operators:
  • Denormalization Operator
  • Remove Unused Values Operator
  • Aggregation Operator now supports default Aggregation for a set of attributes and is implemented more efficiently

New in RapidMiner Studio 5.1.006 (Mar 31, 2011)

  • Easier bug reporting: Direct connection to Bugzilla
  • Added new Operators:
  • Denormalization Operator
  • Remove Unused Values Operator
  • Aggregation Operator now supports default Aggregation for a set of attributes

New in RapidMiner Studio 5.1.000 (Dec 16, 2010)

  • Added RapidAnalytics connectivity
  • Added new repository type that reflects database connections
  • Added type-specific icons to repository tree
  • Added annotations to IOObjects
  • Import operators and wizards remake
  • Most wanted feature: "Rename" and "Set Role" can handle multiple attributes at a time
  • Versioned operators allow easier updates
  • "Generate Attributes" has new UI and supports more text and date functions
  • Operator documentation uses Wiki (http://rapid-i.com/wiki/).
  • IOObjects can be annotated, e.g. with file source or SQL statement
  • Added new Operators:
  • Print to Console
  • Unset Macro
  • "Auto MLP" and "k-Means (fast)" contributed by DFKI
  • Hierarchical Classification
  • Numerical to Date
  • Delay
  • Database operators can prepare statements
  • Revised import wizards
  • Background tasks stoppable
  • Added process profiling and resource consumption annotations
  • Added Support for R Extension
  • Added new boolean GUI property rapidminer.gui.fetch_data_base_table_names which suppresses to fetch data base table names in the SQLQueryBuilder
  • More efficient meta data handling for Excel, CSV, and database readers
  • Meta data propagation uses context macros
  • Splash screen shows plugins
  • Aggregate operator can compute product
  • Various smaller fixes
  • Various UI improvements
  • Major Bugfixes:
  • Fixed memory leak causing RapidMiner to run out of memory if processed many and large example sets
  • Readded descriptive error messages

New in RapidMiner Studio 5.0.010 (Aug 9, 2010)

  • Added annotations to IOObjects
  • Versioned operators allow easier updates
  • Added new Operators:
  • Print to Console
  • Unset Macro
  • Revised import wizards
  • Added Process Profiling
  • Added Support for R Extension
  • Major Bugfixes:
  • Fixed memory leak causing RapidMiner to run out of memory if processed many and large example sets
  • Readded descriptive error messages

New in RapidMiner Studio 4.5 (Jul 21, 2009)

  • Implementation Details:
  • New properties for additional ioobjects.xml
  • Bugfixes:
  • Fixed bug for reporting images smaller than 800 x 600
  • Fixed class loader problem occurring when more than
  • one plugin was used
  • Fixed bug for iterative operator chain
  • Fixed XML export bug where XML in parameters was not
  • properly escaped

New in RapidMiner Studio 4.4 (Mar 16, 2009)

  • New operators:
  • ExampleSetSuperset
  • ExampleSetUnion
  • MacroConstruction
  • CumulateSeries
  • FastLargeMargin
  • Split
  • Construction2Names
  • NeuralNetSimple
  • Parameters will now be adapted according to an operator rename, for example the settings of operators like the ProcessLog or the parameter optimization operators are automatically corrected to the new operator names
  • Graphs like the similarity graph display the strengths of the edges now by their color
  • Added new tree layout algorithm for the decision trees preventing most overlapping, the old tighter version is available as layout type "Tree (Tight)"
  • Decision trees now show the subtree size as tool tip for the inner nodes, the edges are now darker for larger subtrees and brighter for smaller ones
  • Decision trees are learned faster now due to internal optimizations in the splitted example set handling
  • Tables like the (meta) data view now supports a new context menu for common table operations like column sorting or row / column selection
  • The "New Operator" dialog now also supports full text search in the description texts of the operators
  • RapidMiner now stores all parameter values in the process files including the default values which ensures a better compatibility with future versions. The XML tab, however, only shows the values differing from the default
  • Plugins can now define a class com.rapidminer.PluginInit providing a method "initPlugin()" which will be invoked during plugin initialization
  • Univariate and multivariate series windowing operators now also support nominal attributes and even mixed types in cases where the series is represented by the examples (rows) of the data set
  • The range statistics of nominal attributes in the meta data view now shows the values with highest and lowest occurrency counts, sorts the values according to the counts, and displays only an excerpt of the occurring values if large amounts of different values exist
  • List of recent files is now directly saved after opening a new process and not only during shutdown
  • Changes in the process setup are now allowed even during process runtime, e.g. when waiting at a breakpoint
  • NaiveBayes can now handle new nominal values during the model application phase
  • Deprecated operators are now rendered with a gray color in the new operator tab and dialog
  • Updated to the latest version of Weka (as of February 26th, 2009)
  • Updated to the latest version of Joone, optimized some of the neural network default parameters
  • Added some new sample processes to the sample directory as well as to the tutorial
  • ExampleFilter and most important discretization parameters are no longer expert parameters
  • ArffExampleSource now states an error message in cases where attributes containing a space which is not quoted
  • New binominal classification performance measures:
  • positive predictive value
  • negative predictive value
  • psep
  • Implementation details:
  • SplittedExampleSet has been improved leading to faster data access times for operators like cross validation or decision tree learning
  • Plugins can now define a class com.rapidminer.PluginInit providing a method "initPlugin()" which will be invoked during plugin initialization
  • Bugfixes:
  • fixed bug accuracy criterion for the revised decision tree learner
  • Fixed bug in parameter list of ValueSubgroupIterator
  • Fixed bug in ExceptionHandling which sometimes led to doubled outputs
  • Fixed bug in ProcessBranch which sometimes led to doubled outputs
  • ViewAttributes did not add min and max statistics so that those statistics where not calculated on data table views
  • Fixed bug in Windows GUI start script (linebreak)
  • Fixed bug for surface 3D plot where x and y were replaced by each other
  • Fixed paths to icons for building blocks
  • Fixed issue with ROC plots in cases where several points with same confidence occurred
  • Fixed potential thread deadlock during the filling of the plotter list
  • Fixed bug for distance weighted vote and k = 1 in NearestNeighbors
  • Fixed a bug in ChiSquaredWeighting for mixed-type data sets where the number of bins was smaller than the maximum number of nominal values
  • The default global random seed in the preferences dialog was not allowed to be set to -1
  • The property keys of the preferences dialog were editable
  • Fixed bug in PolynomialRegression
  • Range normalization now delivers maximum value for constant attributes
  • Weighted precision and recall do now no longer deliver NaN if a class did not occur

New in RapidMiner Studio 4.3 (Nov 24, 2008)

  • New operators:
  • AccessExampleSource
  • Example2AttributePivoting
  • Attribute2ExamplePivoting
  • PolynomialRegression
  • Similarity2ExampleSet
  • ExampleSet2SimilarityExampleSet
  • Nominal2String
  • String2Nominal
  • Date2Numerical
  • Real2Integer
  • Numerical2Real
  • Nominal2Numerical
  • Numerical2Binominal
  • Numerical2Polynominal
  • AbsoluteDiscretization
  • ConditionedFeatureGeneration
  • AttributeAggregation
  • SupportVectorCounter
  • MutualInformationMatrix
  • GaussFeatureConstructionOperator
  • ProductGenerationOperator
  • AbsoluteValues
  • MovingAverage
  • ExponentialSmoothing
  • SeriesMissingValueReplenishment
  • DifferentiateSeries
  • IndexSeries
  • Numerical2Real
  • Real2Integer
  • FillDataGaps
  • EnsureMonotonicity
  • WindowExamples2ModelingData
  • WindowExamples2OriginalData
  • ProcessLog2AttributeWeights
  • Mapping
  • Substring
  • Trim
  • Replace
  • AddValue
  • MergeValues
  • AttributeConstruction
  • ValueIterator
  • IOStorer
  • IORetriever
  • SQLExecution
  • ClearProcessLog
  • ProcessLog2ExampleSet
  • Data2Performance
  • Data2Log
  • Macro2Log
  • DataMacroDefinition
  • LiftParetoChart
  • Deprecated Operators:
  • Nominal2Numeric (please use Nominal2Numerical instead)
  • Numeric2Binominal (please use Numerical2Binominal instead)
  • Numeric2Polynominal (please use Numerical2Polynominal instead)
  • LinearCombination (please use AttributeAggregation instead)
  • AttributeValueMapper (please use Mapping instead)
  • AttributeValueSubstring (please use Substring instead)
  • AddNominalValue (please use AddValue instead)
  • MergeNominalValues (please use MergeValues instead)
  • New implementation of clusterings for more efficient computing and memory usage:
  • Reimplemented or adapted operators:
  • AgglomerativeClustering
  • ClusterModel2ExampleSet
  • DBScanClustering
  • ExampleSet2ClusterModel
  • FlattenClusterModel
  • KMeans
  • KMedoids
  • KernelKMeans
  • RandomFlatClustering
  • SupportVectorClustering
  • TopDownClustering
  • ClusterModelWriter
  • ClusterModelReader
  • TransitionMatrix
  • Removed operators:
  • AgglomerativeFlatClustering, use AgglomerativeClustering and FlattenClusterModel instead - BregmanHardClustering, use KMeans with BregmanDivergences instead - ExampleSet2ClusterConstraintList - MPCKMeans - TopDownRandomClustering, use TopDownClustering with RandomFlatClustering as inner learner - UPGMAClustering, use AgglomerativeClustering with average link instead - SimilarityComparator
  • The new AttributeConstruction operator supports infix written formulas, a simple format for constants and new calculation methodsBetter support for special characters in process XML
  • Macros are now also supported in parameter lists and for numerical parameters Added new overwriting mode to the DatabaseExampleSetWriter named "first overwrite, then append"
  • Replaced "append" parameter in ExampleSetWriter by the new overwriting modes "none", "overwrite", "append", and "first overwrite, then append"
  • ExampleFilter can now use regular expressions for the values of the nominal attribute value filtering
  • New Plotter: Pareto Chart
  • New Plotter: Series Multiple
  • New Plotter: Scatter Multiple
  • The old scatter plotter has been divided into a new Scatter plot and the new Scatter Multiple plot
  • Most plotters now support panning during zooming by pressing the Ctrl Key while dragging the mouse
  • The file chooser in the modern look and feel now always remembers the last directory from which a file was chosen as an additional default bookmark (on the left)
  • Changed the order the in which models are added to the grouped model (ModelGrouper), i.e. the last created model will now be added as last one
  • The wizards of the database reading and writing operators are now initialized with the last settings
  • The feature selection and feature weighting operators are now based on double arrays which should lead to smaller memory footprints
  • Added new performance measures: sensitivity, specificity, Youden index, relative error lenient, relative error strict
  • The CachedDatabaseExampleSource operator has now a more appropriate wizard
  • The plotters now provide consistent colors for classes
  • Improved the names of the features of the (multi-)variate windowing operators
  • Multivariate windowing now also supports a name for the label column in addition to the index
  • Multivariate windowing can now also applied without the creation of a label and even with horizon 0
  • Improved the graph and plotter panel for long column / item names, long names are now displayed in a short fashion and the full name is shown as tool tip
  • DecisionTree now supports a new parameter min_size_for_split
  • Added new process branch conditions: attribute_available, min_examples, max_examples, min_attributes, max_attributes.
  • The viewers for symmetrical matrices like correlations etc. now always shows the values of the first column
  • Improved the range names of discretized data
  • Added selection of criterion to AssociationRulesGenerator, also improved the visualization of association rules by adding a selector for the criterion used for the minimum value slider
  • Added new option for Normalization. Now might chose from z-transformation, range-transformation or the new proportional transformation via category selection.
  • LinearRegression is now also applicable on binominal classification tasks
  • Added support for logging only the top-k or bottom-k objects with the ProcessLog operator
  • Improved the parameter optimization / iteration dialog: small numbers are no longer cut off, GUI is more consistent, dialog now used icons Improved the CachedDatabaseExampleSource operator and database handling: now arbitrary tables are accepted and primary keys (index) and / or mapping tables are automatically handled
  • Integrated the latest version of the JFreeChart library
  • A dialog informs the users now if any unknown parameters were part of the process during loading
  • A SimpleVoteModel now supports the output of textual results
  • (Multivariate) Windowing on example based input representations now keep the input id attribute
  • Added writing of intermediate weights for GeneticAlgorithm (feature selection) and EvolutionaryWeighting (feature weighting), both operators now also support the initialization with attribute weights (e.g. from the last run)
  • Implementation Details:
  • Moved AnovaMatrix(Operator) into the package com.rapidminer.operatir.visualization.dependencies
  • Moved all attributes based matrix operators (correlation, covariance etc.) into the new package com.rapidminer.operatir.visualization.dependencies
  • Moved aggregation functions into package com.rapidminer.tools.math.function.aggregation
  • Bugfixes:
  • processes now only write the logged information from the run, not the global information for example collected from the GUI. Hence, the logging will also no longer directly overwrite old log files right after loading
  • switch workspace and initial workspace selection now prevent the selection of the RapidMiner main directory and all subdirectories in order to prevent a recursive copy
  • switched weight "direction" for corpus based weighting
  • fixed bug in evolutionary parameter optimization in combination with logging
  • fixed bug in Wizard for ExampleSource preventing the correct guess of value types (were always nominal)
  • fixed error in nominal re-mapping for cases where the nominal values of training and test set did not match
  • fixed jittering bug in Histogram plots causing the bins to drop out of the plotter
  • fixed minor bug in ExampleSetWriter which caused the ExampleSource operator to state a warning
  • fixed bug if special characters were part of the process XML
  • DistributionModel is updatable now
  • AttributeValueSubstring ignores missing values and is able to extract single characters now
  • Fixed a GUI error only occurring in Java 6 Update 10
  • Fixed bug in FeatureSubsetIteration where the specified maximum number of features was not used
  • Fixed bug in PerformanceVector writing from the result dialog (Save button) which led to large data files and long runtimes until the data was actually saved
  • Fixed bug in uninstaller which under certain circumstances also removed non-RapidMiner files in the installation directory

New in RapidMiner Studio 4.1 (May 19, 2008)

  • New operators
  • New 64 bit version for Windows x64 OS now provided; other 64 bit systems are supported by using a64 bit Java version
  • Parameter optimization operators now provide a nicer wizard dialog for setting the parameters
  • All GUI elements provide now longer descriptions for operators SplitChain and AbsoluteSplitChain were moved from the postprocessing into the meta group
  • Meta group was restructured and two subgroups (control and other) were added Fixed a memory leak in the result history which was affectingthe GUI for multiple processes if they were performed in asingle sequence
  • SOMDimensionalityReduction and SVDReduction are now able to createa preprocessing model BruteForce and GeneticAlgorithm feature selection now support a minimum and maximum number of features and also the selection of a exact number of features
  • RapidMiner now offers two different look and feels: modern(recommended) and classic
  • Improved comment tab so that it already registers and saves new text directly after it was typed (instead of changing the tab) DataStatistics (IOObject) now shows the standard deviation like in the GUI instead of the variance
  • Robustified ExampleSource wizard: the same output files as the input file are no longer allowed
  • Series Plotter does now no longer scale the axis ranges ina way that zero must be contained
  • All SVM and other hyperplane models now supports the visualization of a sortable data table for the coefficients (weights) An error message now indicates if XML entities are used for operator names which is not allowed
  • Anova calculator now allows value editing in table and the specification of the significance level
  • Meta data views can now be correctly sorted according to sum or unknown value columns MissingValueImputation: added warnings in the case that not all values could be imputed, improved attribute ordering (ascending and descending sorting, sort by number of missing values), added log messages
  • Naive Bayes distribution model now uses the same class coloring for both numerical and nominal distributions
  • Latest available Weka version integrated (as of 2008/05/09)
  • The AttributeParser no longer supports batch generations
  • The ClusterModel reader is now able to read both compressed and uncompressed files
  • PCA and GHA now use global covariance matrix calculation
  • LibSVMLearner now provides the correct range for the nuparameter
  • Fixed bug in AttributeParser which prevents the correct calculation for nested generations or cases where the generation is divided into several operators
  • Fixed bug in value type guessing for numerical columnswith missing values
  • Fixed bug in ExampleSetTranspose for missing values in nominal attributes
  • Fixed bug in DatabaseExampleSource Wizards for userdefined URLs Parameter lists are now cloned correctly
  • Fixed bug for quoted input files occuring in some cases where the quoted string was part of the line before
  • Fixed a bug for learning with example weights with the JMySVM learner
  • Fixed a NPE if empty example sets were used as input for feature selection operators
  • Fixed wrong normalization for confidences predicted by distribution models (e.g. NaiveBayes)
  • AttributeEditor and ExampleSource wizard did not regard the decimal point character (and quotes)
  • The value type guessing operators did not take a possible decimal point character different from '.' into account
  • Fixed tool tip for z-transform in Normalization operator: changed "variance" to "standard deviation"
  • Fixed locale for Ok
  • Cancel dialogs to US locale like the rest of RapidMiner
  • Fixed bug in operator tree which caused the reconstruction of the expansion state to be faulty in some cases
  • Fixed statistics copy bug introduced in 4.1beta2 for predicted label statistics

New in RapidMiner Studio 4.1 Beta 2 (Feb 19, 2008)

  • New operators:
  • ProcessBranch
  • FileEcho
  • ExchangeAttributeRoles
  • ChangeAttributeRole
  • SeriesPrediction
  • Deprecated operators: ChangeAttributeType (use ChangeAttributeRole instead)
  • New version of chart plotting library
  • New plotter: Series
  • Removed the numerical sample sizes for the tree and rule learners
  • Introduced different shapes for plotter points
  • Use bigger strokes for plotter lines
  • Added max_items parameter for FPGrowth
  • Changed default mode for view creation of preprocessing models
  • Added signum generator for manual feature generation and for generation with YAGGA2
  • Relief can now handle missing values
  • Changed default data representation back to double because toohigh number of rounding errors otherwise for larger data ranges
  • Introduced AttributeDescriptions and AttributeTransformations in order to lower large memory consumptions due to clones and to avoid re-wrappings for new views on the example set view stack
  • removed clone of mappings for clones of nominal attributes
  • Changed DataRow methods from package private to protected
  • ConditionedExampleSets no longer support dynamical conditions
  • Changed default data representation back to "double"
  • The visualization of integers and the nominal statistics calculation are now based on longs instead of integers
  • Fixed MAJOR bug introduced in 4.1beta in example sets /views which occured after a new view was created ontop of a splitted example set (e.g. in a cross validation) and has hidden the partition then
  • Fixed some problems (due to too much cloned objects, seeabove) which caused much more memory usage in 4.1beta
  • Fixed bug in PredictionTrendAccuracy calculation
  • Fixed wrong linefeeds in unix start scripts
  • Fixed bug in aggregation function selection of the chart plotters
  • Fixed ID handling bug for example sets (views) which prevented the correct application of Id-based operators like the ExampleSetJoin operator
  • Fixed bug in table index assignment of view attributes
  • Fixed bug in SortedExampleSet
  • Fixed bug in some plotters based on JMathplot
  • Removed remapId() call in IdUtils which increased theruntime of some clustering schemes (especially DBScan and SupportVectorClustering)
  • Fixed bug in RuleLearner for nominal attributes
  • Fixed bug for (operator / parameter) pair parameter values for the parameter iteration and optimizationoperators
  • Fixed wrong name for continous attributes in C45 loader
  • ConditionedExampleSet caused some problems if the base attributes for conditions were removed after thefiltering
  • Fixed a bug in getNominalValue(Attribute) of Example which delivered the first nominal value instead of missing values
  • File filters do now accept lower and upper caseextensions
  • Fixed wrong colors after sorting a column of the ANOVA matrix
  • Removed unnecessary statistics registration in nominal attributes consuming unused memory and runtime
  • Fixed rounding error in the stepwise parameteroperators
  • Removed data representation type query during firststartup since rounding errors are often too high
  • AbsoluteSampling produced sample with duplicates