Downloading event logs via API
This example demonstrates how we can easily download well-known process mining event logs from the 4TU.Centre for Research Data using the skpm.event_logs module.
The skpm.event_logs module provides a set of event logs, such as the Sepsis and BPI 2012.
The API overview
Implementing each event log as a class is a design choice that allows us to easily manipulate each of them according to their specific characteristics. One of the main challenges in process mining is the completely different nature of datasets, since each of them is composed of very particular business rules.
For instance, an unbiased split of event logs was proposed in [1]. Roughly speaking, each event log is splitted based on specific temporal characteristics, which is hard coded within each specific event log. You can check this feature in :ref:Unbiased split <sphx_glr_auto_examples_unbiased_split.py>. Now, let us see how to easily download event logs below.
Downloading the BPI 2013 event log
The BPI 2013 event log is a well-known event log that contains data about closed problems from the Volvo IT Belgium. We can easily download it as follows:
[1]:
from skpm.event_logs import BPI13ClosedProblems
bpi13 = BPI13ClosedProblems() # automatically downloads and caches the file
bpi13
[1]:
BPI13ClosedProblems Event Log
Cases: 1,487
Events: 6,660
Activities: 4
Notice, the __repr__method returns a brief overview of the event log. In order to acess the dataframe, just call the dataframe attribute.
[2]:
bpi13.dataframe.head()
[2]:
| org:group | resource country | organization country | org:resource | organization involved | org:role | concept:name | impact | product | lifecycle:transition | time:timestamp | case:concept:name | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Org line A2 | INDIA | se | Minnie | J11 2nd | A2_2 | Queued | High | PROD191 | Awaiting Assignment | 2006-01-11 14:49:42+00:00 | 1-109135791 |
| 1 | Org line A2 | Sweden | cn | Tomas | M1 2nd | A2_2 | Accepted | Medium | PROD753 | In Progress | 2006-11-07 09:00:36+00:00 | 1-147898401 |
| 2 | Org line A2 | Sweden | cn | Tomas | M1 2nd | A2_2 | Accepted | Medium | PROD753 | In Progress | 2006-11-07 12:05:44+00:00 | 1-147898401 |
| 3 | Org line A2 | Sweden | cn | Tomas | M1 2nd | A2_2 | Accepted | Medium | PROD753 | In Progress | 2007-03-20 08:06:25+00:00 | 1-165554831 |
| 4 | Org line A2 | Sweden | cn | Tomas | M1 2nd | A2_2 | Accepted | Low | PROD753 | In Progress | 2007-05-10 14:21:54+00:00 | 1-172473423 |
In this tutorial, we showed how to user our API to automatically download event logs from the 4TU Repository <https://data.4tu.nl/>_. We hope you find it useful for your projects.
References
[1] Hans Weytjens, Jochen De Weerdt. Creating Unbiased Public Benchmark Datasets with Data Leakage Prevention for Predictive Process Monitoring, 2021. doi: 10.1007/978-3-030-94343-1_2