{ "cells": [ { "cell_type": "markdown", "id": "674e52fc", "metadata": {}, "source": [ "# Downloading event logs via API\n", "\n", "This example demonstrates how we can easily download well-known process mining event logs\n", "from the 4TU.Centre for Research Data using the `skpm.event_logs` module.\n", "\n", "The `skpm.event_logs` module provides a set of event logs, such as the Sepsis and BPI 2012." ] }, { "cell_type": "markdown", "id": "b44f78c3", "metadata": {}, "source": [ "## The API overview\n", "\n", "Implementing each event log as a class is a design choice that allows us to\n", "easily manipulate each of them according to their specific characteristics.\n", "One of the main challenges in process mining is the completely different\n", "nature of datasets, since\n", "each of them is composed of very particular business rules.\n", "\n", "For instance, an unbiased split of event logs was proposed in [1]. Roughly\n", "speaking, each event log is splitted based on specific temporal\n", "characteristics, which is hard coded within each specific event log. You can\n", "check this feature in :ref:`Unbiased split\n", "`.\n", "Now, let us see how to easily download event logs below." ] }, { "cell_type": "markdown", "id": "704591a7", "metadata": {}, "source": [ "## Downloading the BPI 2013 event log\n", "\n", "The BPI 2013 event log is a well-known event log that contains data about\n", "closed problems from the Volvo IT Belgium. We can easily download it as\n", "follows:" ] }, { "cell_type": "code", "execution_count": 1, "id": "f574a352", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "BPI13ClosedProblems Event Log\n", " Cases: 1,487\n", " Events: 6,660\n", " Activities: 4" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from skpm.event_logs import BPI13ClosedProblems\n", "\n", "bpi13 = BPI13ClosedProblems() # automatically downloads and caches the file\n", "bpi13" ] }, { "cell_type": "markdown", "id": "e7ccf024", "metadata": {}, "source": [ "Notice, the `__repr__`method returns a brief overview of the event log.\n", "In order to acess the dataframe, just call the `dataframe` attribute." ] }, { "cell_type": "code", "execution_count": 2, "id": "629ac591", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
org:groupresource countryorganization countryorg:resourceorganization involvedorg:roleconcept:nameimpactproductlifecycle:transitiontime:timestampcase:concept:name
0Org line A2INDIAseMinnieJ11 2ndA2_2QueuedHighPROD191Awaiting Assignment2006-01-11 14:49:42+00:001-109135791
1Org line A2SwedencnTomasM1 2ndA2_2AcceptedMediumPROD753In Progress2006-11-07 09:00:36+00:001-147898401
2Org line A2SwedencnTomasM1 2ndA2_2AcceptedMediumPROD753In Progress2006-11-07 12:05:44+00:001-147898401
3Org line A2SwedencnTomasM1 2ndA2_2AcceptedMediumPROD753In Progress2007-03-20 08:06:25+00:001-165554831
4Org line A2SwedencnTomasM1 2ndA2_2AcceptedLowPROD753In Progress2007-05-10 14:21:54+00:001-172473423
\n", "
" ], "text/plain": [ " org:group resource country organization country org:resource \\\n", "0 Org line A2 INDIA se Minnie \n", "1 Org line A2 Sweden cn Tomas \n", "2 Org line A2 Sweden cn Tomas \n", "3 Org line A2 Sweden cn Tomas \n", "4 Org line A2 Sweden cn Tomas \n", "\n", " organization involved org:role concept:name impact product \\\n", "0 J11 2nd A2_2 Queued High PROD191 \n", "1 M1 2nd A2_2 Accepted Medium PROD753 \n", "2 M1 2nd A2_2 Accepted Medium PROD753 \n", "3 M1 2nd A2_2 Accepted Medium PROD753 \n", "4 M1 2nd A2_2 Accepted Low PROD753 \n", "\n", " lifecycle:transition time:timestamp case:concept:name \n", "0 Awaiting Assignment 2006-01-11 14:49:42+00:00 1-109135791 \n", "1 In Progress 2006-11-07 09:00:36+00:00 1-147898401 \n", "2 In Progress 2006-11-07 12:05:44+00:00 1-147898401 \n", "3 In Progress 2007-03-20 08:06:25+00:00 1-165554831 \n", "4 In Progress 2007-05-10 14:21:54+00:00 1-172473423 " ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bpi13.dataframe.head()" ] }, { "cell_type": "markdown", "id": "a3c5fa2a", "metadata": {}, "source": [ "In this tutorial, we showed how to user our API to automatically\n", "download event logs from the `4TU Repository `_.\n", "We hope you find it useful for your projects.\n", "\n", "## References\n", "\n", "[1] Hans Weytjens, Jochen De Weerdt. Creating Unbiased Public Benchmark Datasets with Data Leakage Prevention for Predictive Process Monitoring, 2021. doi: 10.1007/978-3-030-94343-1_2" ] } ], "metadata": { "kernelspec": { "display_name": ".venv", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.0" } }, "nbformat": 4, "nbformat_minor": 5 }