Event Feature Extraction

In this tutorial, we introduce a few feature extraction techniques. Let’s start by loading some data:

[39]:
from skpm.event_logs import split, BPI17
from skpm.feature_extraction import TimestampExtractor

# download the dataset
log = BPI17()
log
[39]:
BPI17 Event Log
    Cases: 31,509
    Events: 1,202,267
    Activities: 26

We’ll use the unbiased split strategy in this tutorial and extract features accordingly.

[40]:
train, test = split.unbiased(log.dataframe, **log.unbiased_split_params)

Inter-case features

Inter-case features refer to features that are computed based on the relationship between different cases. It aims to quantify and module the resource sharing between cases, for instance. In the current version of our library, we only have a simple example of such feature: the number of cases in progress simultaneously. This feature is commonly called work in progress.

Let’s see how it works:

[46]:
import pandas as pd
from skpm.feature_extraction import WorkInProgress

wip = WorkInProgress()
wip.fit(train)
train["wip"] = wip.transform(train)

data = (
    train
    .set_index(elc.timestamp)
    .resample("D")[["wip"]]
    .mean()
    .reset_index()
)
plt.figure(figsize=(10, 3))
plt.plot(pd.to_datetime(data[elc.timestamp]), data["wip"])
plt.title("Average daily \nWork in Progress (WIP) over time")
[46]:
Text(0.5, 1.0, 'Average daily \nWork in Progress (WIP) over time')
../_images/examples_00_intro_15_1.png

In this tutorial, we showed how to extract features from timestamps, resources, and the inter-case perspective. We hope you find it useful for your projects. If you have any questions or suggestions, please open an issue on our GitHub repository.

[48]:
train.head().T
[48]:
0 1 2 3 4
Action Created statechange Created Deleted Created
org:resource User_1 User_1 User_1 User_1 User_1
concept:name A_Create Application A_Submitted W_Handle leads W_Handle leads W_Complete application
EventOrigin Application Application Workflow Workflow Workflow
EventID Application_652823628 ApplState_1582051990 Workitem_1298499574 Workitem_1673366067 Workitem_1493664571
lifecycle:transition complete complete schedule withdraw schedule
time:timestamp 2016-01-01 09:51:15.304000 2016-01-01 09:51:15.352000 2016-01-01 09:51:15.774000 2016-01-01 09:52:36.392000 2016-01-01 09:52:36.403000
case:LoanGoal Existing loan takeover Existing loan takeover Existing loan takeover Existing loan takeover Existing loan takeover
case:ApplicationType New credit New credit New credit New credit New credit
case:concept:name Application_652823628 Application_652823628 Application_652823628 Application_652823628 Application_652823628
case:RequestedAmount 20000.0 20000.0 20000.0 20000.0 20000.0
FirstWithdrawalAmount None None None None None
NumberOfTerms None None None None None
Accepted None None None None None
MonthlyCost None None None None None
Selected None None None None None
CreditScore None None None None None
OfferedAmount None None None None None
OfferID None None None None None
accumulated_time 0.0 0.048 0.47 81.088 81.099
execution_time 0.048 0.422 80.618 0.011 0.01
remaining_time 1144676.119 1144676.071 1144675.649 1144595.031 1144595.02
day_of_month -0.5 -0.5 -0.5 -0.5 -0.5
day_of_week 0.166667 0.166667 0.166667 0.166667 0.166667
day_of_year -0.5 -0.5 -0.5 -0.5 -0.5
hour_of_day -0.108696 -0.108696 -0.108696 -0.108696 -0.108696
min_of_hour 0.364407 0.364407 0.364407 0.381356 0.381356
month_of_year -0.5 -0.5 -0.5 -0.5 -0.5
numerical_timestamp 1451641875.0 1451641875.0 1451641875.0 1451641956.0 1451641956.0
sec_of_min -0.245763 -0.245763 -0.245763 0.110169 0.110169
secs_since_sunday -0.441344 -0.441344 -0.441344 -0.44121 -0.44121
secs_within_day -0.08941 -0.08941 -0.08941 -0.088472 -0.088472
week_of_year 0.5 0.5 0.5 0.5 0.5
resource_role 1 1 1 1 1
wip 19.0 19.0 19.0 19.0 19.0