Skip to main content
Skip table of contents

Data cleansing

Basics

Large amounts of data are historicized by the process engine in the database for each completed process instance and instance action. This data is needed for process reports. There are, however, implementation processes for which no historicized data is needed. Data from several years ago is furthermore no longer relevant. Each process can now have a data cleansing job assigned to clean up the historicized data. Data cleansing is tied to the customizing transport.

Definition

The standard provides three data cleansing definitions. New processes are assigned to the data cleansing SHORT. A cleansing run always consists of 2 steps.

  • Script processes without reporting (SHORT)

    1. Deleting of historical process instance information – Default 1 day

    2. Deleting of outdated process definitions - Default 2 years

  • Interaction processes with a short reporting period (MEDIUM)

    1. Deleting of historical process instance information – Default 1 month

    2. Deleting of outdated process definitions - Default 2 years

  • Interaction processes with a long reporting period (DEFAULT)

    1. Deleting of historical instance information - Default 1 year

    2. Deleting of outdated process definitions - Default 2 years

There are 2 date settings for each job separated by year (max. 99), month (max. 11) and day (max. 30). The settings calculate a value in days, how old entries must be before they are deleted.

Example: 1 year, 6 months, 7 days=> 1*365 + 6*30 + 7 = 742 days

The time value of the process definition will always be greater than that of the instance information.

Each data cleansing run can now have processes assigned. The previous assignment will be removed. An assignment is possible only for processes from the current customizing layer. The assignment will also apply for the standard or partner module processes of overloaded processes.

Execution

The data cleansing job is triggered by the timer Adjust system tables. This timer starts up once a day in the standard.

Depending on cleansing level, various data types can be deleted from the process history.

A deletion cannot be undone.

Cleansing instance information

The first cleansing step deletes the main data volume in the process history. That includes process variables and action information, also for user tasks.

  • Tables

    • ACT_HI_VARINST -> ACT_GE_BYTEARRAY

    • ACT_HI_ACTINST -> ACT_HI_TASKINST, ACT_HI_IDENTITYLINK

    • ACT_HI_PROCINST

    • ACT_RE_PROCDEF, ACT_RE_DEPLOYMENT

    • ProcessInstance

    • ProcessWaitState

    • ProcessErrorLog

Limitation in BPM

Reports can only be created up to the most recent cleanup. The start count and the completion overview are no longer accurate.

Process definitions cleanup

The second level deletes older versions of process definitions created in CURSOR-BPM, but for which no instance information exists.

  • Tables

    • ProcessContainer, ProcessModel

    • ProcessMask, ProcessSearch

    • Translation

Limitation in BPM

An older version cannot potentially be recovered.

Prevent execution when critical processes are still running

It is possible that long running processes with large transactions run at the same time as data cleansing and thereby database locks may occur. To avoid problems with such critical processes with the database, these processes can be stored as application parameters. The action started by a system timer to clean up the data within the Activiti tables checks whether critical processes are still running at the start. If this is the case, the data cleansing does not begin but starts:

  • a corresponding log entry

  • a mail notification to the system administrator (central recipient for service requests in the system)

The note "critical process" is made via an application parameter in which the process names of the important processes can be recorded. The sender of the mail notification is stored in the system settings.

CRM-11083_SuspendCleanJobProcesses.png

Configuration suggestion

The data cleanup should be used differently in test systems and production systems. When utilizing the customizing transport, it is advisable to try the data cleansing on a development/test system with special settings and then change the settings later for use on the production system.

Data cleansing in the test system

Test systems are used for process development. That is why there usually are a large number of process versions, the process definitions of which will over time strain the database and the server caches. It is therefore necessary to clean up the old processes from time to time. The current process version should be backed up externally via an export from time to time.

The following settings are meaningful for that purpose:

  • DEFAULT

    • Instance information: 1 month

    • Process definition: 1 month

  • MEDIUM/SHORT

    • Process definition: 1 month

The process instances of actively running processes will not be deleted. The most recent version of a process will always remain intact.

Data cleansing in the production system

Once the data cleansing was done in the test system, the DEFAULT configuration should be reset to the standard. A production system usually only contains 3 process categories:

  1. Processes that are timer-started and/or carry out server-side logics only

  2. Processes with user interaction but with a runtime of under one day

  3. Processes with user interaction and a runtime of several days or even weeks

Long-running process instances are a good basis for the analysis of wait times and processing times. Purely server-side runs, on the other hand, have a much too short runtime for any meaningful reporting. These are, however, executed very frequently, which may result in high data loads. The following SQL statement can be used to find out, which process has been executed how many times.

SQL
SELECT ACT_RE_PROCDEF.NAME_, COUNT(ACT_HI_PROCINST.ID_) AS count
 FROM ACT_RE_PROCDEF,ACT_HI_PROCINST
 WHERE ACT_HI_PROCINST.PROC_DEF_ID_ = ACT_RE_PROCDEF.ID_
 GROUP BY ACT_RE_PROCDEF.NAME_
 ORDER BY count DESC

Three separate data cleansing settings should be used and the processes should be relevantly assigned to the process categories named above.

Deleting invalid variables

In older versions it can happen that variables of the type IScriptWorkSpace remain in the database for running but waiting instances. By an update the variables become invalid and must be deleted.

SQL
/* Selection der Prozess-Instanzen */
SELECT ACT_HI_PROCINST.*
  FROM ACT_RU_EXECUTION, ACT_RU_VARIABLE, ACT_HI_PROCINST
 WHERE ACT_HI_PROCINST.ID_ = ACT_RU_EXECUTION.PROC_INST_ID_
   AND ACT_RU_EXECUTION.ID_ = ACT_RU_VARIABLE.EXECUTION_ID_
   AND ACT_RU_VARIABLE.NAME_ = 'workSpace'


/* Selection der Prozess-Ids */
SELECT DISTINCT NAME_
  FROM ACT_RE_PROCDEF
 WHERE ID_ IN
  (SELECT ACT_RU_EXECUTION.PROC_DEF_ID_
     FROM ACT_RU_EXECUTION, ACT_RU_VARIABLE
    WHERE ACT_RU_EXECUTION.ID_ = ACT_RU_VARIABLE.EXECUTION_ID_ AND ACT_RU_VARIABLE.NAME_ = 'workSpace')

These variables can be easily deleted from instances because they are invalid at the end of a client session.

SQL
/* BLOB-History-Daten löschen */
DELETE FROM ACT_GE_BYTEARRAY WHERE ID_ IN
 (SELECT BYTEARRAY_ID_ FROM ACT_HI_VARINST WHERE NAME_ = 'workSpace')

/* ORACLE */
CREATE TABLE TMPACT_GE_BYTEARRAY AS
  (SELECT BYTEARRAY_ID_ FROM ACT_RU_VARIABLE WHERE NAME_ = 'workSpace')
/* MSSQL /*
SELECT BYTEARRAY_ID_ INTO TMPACT_GE_BYTEARRAY FROM ACT_RU_VARIABLE WHERE NAME_ = 'workSpace'

/* Variablen löschen */
DELETE FROM ACT_HI_VARINST WHERE NAME_ = 'workSpace'
DELETE FROM ACT_RU_VARIABLE WHERE NAME_ = 'workSpace'

/* BLOB-Runtime-Daten löschen */
DELETE FROM ACT_GE_BYTEARRAY WHERE ID_ IN
 (SELECT BYTEARRAY_ID_ FROM TMPACT_GE_BYTEARRAY)
DROP TABLE TMPACT_GE_BYTEARRAY 

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.