Event Feature Extraction
In this tutorial, we introduce a few feature extraction techniques. Let’s start by loading some data:
[39]:
from skpm.event_logs import split, BPI17
from skpm.feature_extraction import TimestampExtractor
# download the dataset
log = BPI17()
log
[39]:
BPI17 Event Log
Cases: 31,509
Events: 1,202,267
Activities: 26
We’ll use the unbiased split strategy in this tutorial and extract features accordingly.
[40]:
train, test = split.unbiased(log.dataframe, **log.unbiased_split_params)
Inter-case features
Inter-case features refer to features that are computed based on the relationship between different cases. It aims to quantify and module the resource sharing between cases, for instance. In the current version of our library, we only have a simple example of such feature: the number of cases in progress simultaneously. This feature is commonly called work in progress.
Let’s see how it works:
[46]:
import pandas as pd
from skpm.feature_extraction import WorkInProgress
wip = WorkInProgress()
wip.fit(train)
train["wip"] = wip.transform(train)
data = (
train
.set_index(elc.timestamp)
.resample("D")[["wip"]]
.mean()
.reset_index()
)
plt.figure(figsize=(10, 3))
plt.plot(pd.to_datetime(data[elc.timestamp]), data["wip"])
plt.title("Average daily \nWork in Progress (WIP) over time")
[46]:
Text(0.5, 1.0, 'Average daily \nWork in Progress (WIP) over time')
In this tutorial, we showed how to extract features from timestamps, resources, and the inter-case perspective. We hope you find it useful for your projects. If you have any questions or suggestions, please open an issue on our GitHub repository.
[48]:
train.head().T
[48]:
| 0 | 1 | 2 | 3 | 4 | |
|---|---|---|---|---|---|
| Action | Created | statechange | Created | Deleted | Created |
| org:resource | User_1 | User_1 | User_1 | User_1 | User_1 |
| concept:name | A_Create Application | A_Submitted | W_Handle leads | W_Handle leads | W_Complete application |
| EventOrigin | Application | Application | Workflow | Workflow | Workflow |
| EventID | Application_652823628 | ApplState_1582051990 | Workitem_1298499574 | Workitem_1673366067 | Workitem_1493664571 |
| lifecycle:transition | complete | complete | schedule | withdraw | schedule |
| time:timestamp | 2016-01-01 09:51:15.304000 | 2016-01-01 09:51:15.352000 | 2016-01-01 09:51:15.774000 | 2016-01-01 09:52:36.392000 | 2016-01-01 09:52:36.403000 |
| case:LoanGoal | Existing loan takeover | Existing loan takeover | Existing loan takeover | Existing loan takeover | Existing loan takeover |
| case:ApplicationType | New credit | New credit | New credit | New credit | New credit |
| case:concept:name | Application_652823628 | Application_652823628 | Application_652823628 | Application_652823628 | Application_652823628 |
| case:RequestedAmount | 20000.0 | 20000.0 | 20000.0 | 20000.0 | 20000.0 |
| FirstWithdrawalAmount | None | None | None | None | None |
| NumberOfTerms | None | None | None | None | None |
| Accepted | None | None | None | None | None |
| MonthlyCost | None | None | None | None | None |
| Selected | None | None | None | None | None |
| CreditScore | None | None | None | None | None |
| OfferedAmount | None | None | None | None | None |
| OfferID | None | None | None | None | None |
| accumulated_time | 0.0 | 0.048 | 0.47 | 81.088 | 81.099 |
| execution_time | 0.048 | 0.422 | 80.618 | 0.011 | 0.01 |
| remaining_time | 1144676.119 | 1144676.071 | 1144675.649 | 1144595.031 | 1144595.02 |
| day_of_month | -0.5 | -0.5 | -0.5 | -0.5 | -0.5 |
| day_of_week | 0.166667 | 0.166667 | 0.166667 | 0.166667 | 0.166667 |
| day_of_year | -0.5 | -0.5 | -0.5 | -0.5 | -0.5 |
| hour_of_day | -0.108696 | -0.108696 | -0.108696 | -0.108696 | -0.108696 |
| min_of_hour | 0.364407 | 0.364407 | 0.364407 | 0.381356 | 0.381356 |
| month_of_year | -0.5 | -0.5 | -0.5 | -0.5 | -0.5 |
| numerical_timestamp | 1451641875.0 | 1451641875.0 | 1451641875.0 | 1451641956.0 | 1451641956.0 |
| sec_of_min | -0.245763 | -0.245763 | -0.245763 | 0.110169 | 0.110169 |
| secs_since_sunday | -0.441344 | -0.441344 | -0.441344 | -0.44121 | -0.44121 |
| secs_within_day | -0.08941 | -0.08941 | -0.08941 | -0.088472 | -0.088472 |
| week_of_year | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 |
| resource_role | 1 | 1 | 1 | 1 | 1 |
| wip | 19.0 | 19.0 | 19.0 | 19.0 | 19.0 |