main.tex

\documentclass[a4paper]{IEEEtran}

% Ein paar hilfreiche Pakete
\usepackage[utf8]{inputenc}
\usepackage[english]{babel}
\usepackage{graphicx}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{amsthm}
\usepackage{mathtools}
\usepackage{subcaption}
\usepackage{hyperref}

% Drawing
\usepackage{verbatim}
\usepackage{tikz}
\usepackage{tikz-uml} % UML Class Diagrams
\tikzumlset{fill class=white}


% Algorithm 
\usepackage{algorithm}
\usepackage[noend]{algpseudocode}
% New Command for procedure TypeSetting
\mathtoolsset{showonlyrefs}

\newtheorem{definition}{\textbf{Definition}}

\newenvironment{defin}[2][Definition]{\begin{trivlist}
\item[\hskip \labelsep {\bfseries #1}\hskip \labelsep {\bfseries #2.}]}{\end{trivlist}}


\markboth{Proseminar WS 18/19: Anthropomatics: From Theory to Practice}{Proseminar SS 16: Anthropomatik: Von der Theorie zur Anwendung}

\sloppy
% Hier den Titel des eigenen Proseminars eitnragen
\title{Scalable Inductive Process Mining}

% Hier deinen eigenen Namen
\author{Maximilian L. Franz}


\begin{document}

\theoremstyle{plain}
\newtheorem{thm}{Theorem}[] % reset theorem numbering for each chapter

\theoremstyle{definition}
\newtheorem{defn}[thm]{Definition} % definition numbers are dependent on theorem numbers
\newtheorem{exmp}[thm]{Example} % same for example numbers


\maketitle

% Zusammenfassung
\begin{abstract}
Since the invention of the transistor, the speed of information processing at constant cost has roughly doubled every two years. With this increase in processing power, also our ability to sense, store and transfer information has increased in an exponential manner, so that it is now easier than ever to collect and store huge amounts of data. Data by itself does not provide actionable  or even understandable information, however. Thus, in order to retrieve useful information from it, we ought to find meaningful and repeating patterns.
In this work, \textit{process mining} as a discipline to discover such patterns in event logs is introduced and \textit{inductive mining} in particular is examined as a modular approach to deal with very large event logs, yielding block-structured process trees with desirable quality guarantees.
\end{abstract}

% Erster Abschnitt
\section{Introduction}
It is no surprise that the amounts of data we are able to store and process is increasing rapidly along the lines of \textit{Moore's Law}. Not only in the information technology companies is so called Big Data an omnipresent theme nowadays. The importance of data as a source of competitive advantage and better understanding of ones own processes is becoming self-evident in basically every industrial sector \cite{manyika2011bigdata}.
Considering this bulk of readily available data the industry is collecting \cite{hilbert2011worldcapacity}, we need reliable tools to retrieve information from this data, because data by itself is not useful. To enable the management levels of companies to make insightful decisions and inventions we have to extract actionable pieces of information. Working with data and discovering such useful patterns in large datasets or making predictions for future events based on data is the core task of \textit{data mining} \cite{DataMiningPrinciples}. Data mining has seen big improvements by the introduction of more powerful computers and more expressive statistical models in recent years.

Process management on the other hand has been a part of business development for quite some time. It is the discipline of modelling the behaviours and procedures inside organizations in a formal and graphic manner \cite{jeston2014business}. However, traditional \textit{process modelling} has some major drawbacks due to the nature of its approach, which is to use human cognition to either model how the process should look or to abstract a process from vaguely observed behaviour. By being based on human judgment, process modelling introduces strong biases into the model.
Common errors (\cite{process_mining, pitfalls}) of traditional \textit{process modelling} include
\begin{itemize}
    \item The model describes an idealized version of reality.
    \item Human behaviour cannot be captured in simple stated forms.
    \item The model is at the wrong abstraction level. 
\end{itemize}

It is exactly here that \textit{process mining} as a conjunction of \textit{data mining} and \textit{process modelling} comes in and alleviates some of the problems. 
Process mining can facilitate the construction of better models in less time \cite{process_mining}. Mined models are based on the empirical reality from event-logs and can either help to discover errors in previously manually designed process models or create new models from scratch.

\textit{Inductive process mining} is a fairly new method to discover process trees from very large event logs by recursion on smaller sub-logs. Inductive Mining has the neat property to produce only sound models by construction and it provides quality guarantees for its results \cite{leemans2015scalable, inductivemining-constructive}.

In Section \ref{sec:terminology} we will introduce terminology and basic notation of the task at hand. Also, we cover some quality criteria along with the main challenges of process mining which leads us to the choice of \textit{inductive mining}. In section \ref{sec:inductivemining}, we introduce the idea of inductive mining to discover block-structured process models, a.k.a process trees. Lastly, in \ref{sec:related} we shortly discuss the prospect of the method in the future and other common approaches.

% Notation
\section{Process Mining - Task and Terminology - Preliminaries}
\label{sec:terminology}
Process minining describes a field at the intersection of process modelling and data mining. In order to speak about \textit{inductive mining} as a particular algorithm of process mining we first introduce some basic notation of processes and the necessary data mining techniques. 
\subsection{Processes, Cases and Logs}
Since we want to present an algorithm for process discovery on event logs, we ought first define formally what is meant by a process and how it relates to a log, after \cite{process_mining}.

A process may have an arbitrary number of activities. A case belongs to one specific process and can itself have many activity instances, which manifest precisely one activity. On the event level, a case and an activity instance can have many events, which in turn have multiple event attributes. A simplified overview  can be found in Figure \ref{fig:processes}. With our algorithms we want to model processes by defining the related activities. Only when we instantiate the model can we observe cases and activities which manifest themselves in events. Thus, the only source we have for our construction is a set of observed events called a log.  

\begin{figure}[h!]
    \centering
    \begin{tikzpicture}
        \umlclass[]{Process}{
        }{}
        \umlclass[x=3]{Case}{
        }{}
        \umlclass[x=6]{Event}{
        }{}
        \umlclass[y=-2]{Activity}{
        }{}
        
        \umlassoc[geometry=-|-, mult1=1, mult2=*, pos2=2.9, align2=right]{Process}{Case}
        \umlassoc[geometry=-|-, mult1=1, mult2=*, pos2=2.9, align2=right]{Case}{Event}
        \umlassoc[geometry=--, mult1=1, mult2=*]{Process}{Activity}
    \end{tikzpicture}   
    \caption{Relation of Processes, Cases and Events. Simplified representation after \cite{process_mining} to illustrate the dependencies. }
    \label{fig:processes}
\end{figure}
For practicality we only consider \textit{simple event logs} over a set of activity names $\mathcal{A}$. The \textit{simple event log} $L$ can be constructed out of a complete event log $\hat{L}$ by considering only the activity classifiers of the events without their attributes. More formally:
\begin{defn} (Simple Event Log, simple trace) \cite{inductivemining-constructive}
Let $\mathcal{A}$ be a set of activity names like above. A \textit{simple} trace $\sigma$ is a sequence of activities ($\sigma \in \mathcal{A}^*$) and a simple event log is then a multi-set of traces over $\mathcal{A}$, i.e., $L \in \mathbb{B}(\mathcal{A}^*)$, where $\mathbb{B}(\mathcal{A}^*)$ is the multi-set over the domain $\mathcal{A}^*
$ and $\mathcal{A}^*$ describes the set of all finite sequences over $\mathcal{A}$.
\end{defn}
To illustrate, let $A = \{ a,b,c \} $, then $ \sigma_1 = \langle a, c, b\rangle, \sigma_2 = \langle c, b\rangle \in A^*$ are traces and $L_1 = \{ \sigma_1, \sigma_2 \}$ is a log.
In the following we denote by $L$ simple event logs over generic activities indetified by $\mathcal{A} = \{a, b, c, \dots \}$, by $\epsilon$ the empty trace and by $\Vert L \Vert = \sum_{\sigma \in L} |\sigma|$ the size of the log. For example $ \Vert L_1 \Vert = 5$.
\subsection{Process Trees}
A \textit{process tree} after \cite{leemans2015scalable} is a compact and abstract representation of a block-structured workflow net and thus a subclass of Petri Nets, which can be transformed into most process modelling notations. Process trees are sound process models by construction and they describe a regular language.
A process model is informally called \textit{sound} if (i), any execution terminates in one of the pre-defined end-states and (ii), for all activities in the model there exists at least one execution which includes it \cite{soundness}.
\begin{defn} (Process Trees) \cite{inductivemining-constructive}
A \textit{process tree} is a graph $Q = (V, E)$ with the structure of a rooted tree. Vertices $v \in V$ are either nodes or leafs. Leafs are labeled with activites $a \in \mathcal{A}$ and nodes are labeled with operators in $\bigoplus = \{ \times, \, \circlearrowleft, \rightarrow, \wedge \}$, which denote different causal relationships between the targeted nodes. Edges $e \in E$ connect nodes with sub-trees or activities, never activities with activities, because trees are \textit{directed acyclic graphs} \cite{treeIntroduction}.
\end{defn}
To formally describe the semantics of a process tree, we look at the recursive monotonic function $\mathcal{L}(Q)$, which maps to the language (i.e. event-log),  that can be constructed by the tree. I.e.
\begin{align*}
   \mathcal{L}(a) &= \{ \langle a \rangle\} \text{ for } a \in \mathcal{A}, \\
   \mathcal{L}(\tau) &= \{ \epsilon \}  \text{ and }\\
    \mathcal{L}(\oplus(Q_1, \dots, Q_n)) &= \oplus_l(\mathcal{L}(Q_1) \dots, \mathcal{L}(Q_n))).
\end{align*}
Where the $Q_i$ for $ i=1, \dots, n$ are sound subtrees (or single activities) and $\tau$ is the silent activity, which cannot be observed in the log. Thus, process trees are defined recursively.
Formally, the different language join functions $\oplus_l$ function are
\begin{align}
\label{eq:operators}
    \times_l(L_1, \dots, L_n) &= \bigcup_{i} L_i, \\
    \rightarrow_l(L_1, \dots, L_n) &= \{ t_1 \cdot \dots \cdot t_n | \forall i : t_i \in L_i\}, \\
    \wedge_l(L_1, \dots, L_n) &= L_1 \diamond L_2 \diamond \dots \diamond L_n, \\
    \circlearrowleft_l(L_1, \dots, L_n) &= \\
    \{ t_1 \cdot t'_1 \dots t_m &| \forall i: t_i \in L_i \wedge t'_i \in \times(L_2,\dots, L_n)\}
\end{align}
where $i = 1, \dots, n$, $L_i = \mathcal{L}(Q_i)$ and $t_i$ are traces in the sub-logs $L_i$.

In order to understand the $\wedge$ join, we introduce a shuffle operator $\diamond$, which takes two sequences $\sigma_1, \sigma_2$ and generates the set of all interleaved sequences, where the ordering of the original sequences is preserved. E.g.,
\begin{align*}
\langle p,r\rangle \diamond \langle o,m \rangle = &\{ \langle p,r,o,m \rangle, \langle p,o,r,m \rangle, \langle p,o,m,r \rangle, \\ 
& \langle o,m,p,r \rangle \langle o,p,m,r\rangle, ... \}.
\end{align*}
This operator can be generalized to sets of sequences by $S_1 \diamond S_2 = \{ \sigma_1 \diamond \sigma_2 | \sigma_1 \in S_1 \wedge \sigma_2 \in S_2 \}$ and - since it is associative - also over multiple sets $S_1, \dots, S_n$.

\begin{figure}[h!]
    \centering
    \usetikzlibrary{graphdrawing.trees}
    \begin{tikzpicture}[nodes={draw, circle}, ->, level 2/.style={sibling distance=15mm}, level distance=0.7cm,]
    
    \node{$\rightarrow$}
    child { node{$a$} } 
    child { node {$\circlearrowleft$} 
        child { node {$\rightarrow$} 
            child {node {$\wedge$}
                child {node {$\times$}
                    child {node {b}}
                    child {node {c}}
                }
                child {node {d}}
            }
            child {node {e}}
        }
        child { node {f} }
    }
 
    \end{tikzpicture}
\caption{An example process tree $Q_e$}
\label{fig:tree}
\end{figure}

See Figure \ref{fig:tree} for an example process tree $Q_e$. We can also represent this tree textually starting from the root:
\begin{equation}
    Q_e \cong \quad \rightarrow\big(a, \circlearrowleft(\rightarrow(\wedge(\times(b,c),d),e),f)\big).
\end{equation}
And possible traces $\sigma_1, \sigma_2 \in \mathcal{L}(Q_e)$ that can be replayed from this model are $\sigma_1 = \langle a, c, d, e, f\rangle $ or $\sigma_2 = \langle a, c, d, e, f, d, b, e, f\rangle$ where $\sigma_2$ makes use of the loop construct and the shuffle operator for the second loop execution $(.., f, d, b, e, f)$.
Each process tree operator has a formal translation to a sound, block-structured workflow Petri net (WF-net). Thus any process tree can be translated into a petri-net which in turn can be translated into almost any workflow notation (e.g. YAWL, BPMN, EPCs). For details, see \cite{buijs2012treetranslation}.

Given these notations we can now more or less formally define, what we mean by \textit{process discovery.}
\begin{defn} (Process Discovery Algorithm) \cite{process_mining}
A \textit{process discovery algorithm} can be understood as a function $\gamma$ which maps a simple event log $L$ onto a marked Petri net, which is ideally sound. 
\end{defn}
In our case the function maps to a \textit{process tree}, which is sound by construction and equivalent up to isomorphism to a Petri net.


\subsection{Directly-Follows Graph}
Within the \textit{Inductive Mining} we will work with \textit{directly-follows relations} and corresponding \textit{directly-follows graphs}, which we quickly introduce here. 
\begin{defn}(Directly follows graph (DFG)) \cite{process_mining} Let $L$ be a simple event log. The \textit{directly-follows graph } of $L$ is $G(L) = ( A_L, \mapsto_L, A_L^{start}, A_L^{end})$ where 
\begin{align}
    A_L &= \{ a \in \sigma | \sigma \in L\}, \text{ the set of acitivites}, \\
    \mapsto_L &= \{ (a,b) \in A \times A | a >_L b \}, \text{the follows relation}, \\
    A_L^{start} &= \{ a \in A | \exists_{\sigma \in L} a = first(\sigma) \}, \text{ start acitivites} \text{ and } \\
    A_L^{end} &= \{ a \in A | \exists_{\sigma \in L} a = last(\sigma) \}, \text{ end acitivites}.
\end{align}
For $a, b \in A, a >_l b$ means that $a$ is directly followed by $b$ somewhere in the log. In the following we use the short form DFG for a directly-follows graph.
\end{defn}
Using standard graph algorithms, DFGs can be used to perfom meaningful cuts.
\begin{defn} (Cut) \cite{process_mining} Let $G(L)$ be a DFG. A \textit{n-ary cut} of $G(L)$ is a partition of $A_L$ into pairwise disjoint sets $A_1, \dots, A_n$. The notation based on the semantic relation of the partitions is $(\oplus, A_1, \dots, A_n)$. An \textit{n-ary cut} is \textit{maximal} if there exists no cut $(\oplus, (A_1, \dots, A_{m}))$ of $G(L)$ with $m > n$. A Cut is \textit{nontrivial} if $n > 1$. 
\end{defn}

A Log $L$ is called \textit{directly-follows complete} with respect to process tree $Q$ if (i) all activities in $Q$ appear in the log, (ii) for every start- or end-activity in $Q$ there is a trace starting or ending with that activity and (iii) $a \mapsto_L b$ if b can directly follow $a$ in $Q$.

\subsection{Quality Criteria of Process Models}
\label{quality}
In \cite{process_mining} van der Aalst introduces four competing quality criteria of process models that we quickly introduce here to get a sense of the challenges posed to \textit{process discovery} at large. 
\begin{itemize}
    \item \textit{Fitness}: A model is called \textit{fit} if it is able to replay the log from which it was generated
    \item \textit{Simplicity}: Along the lines of \textit{Occam's Razor}, the best model is that which is simplest in structure
    \item \textit{Generalization}: Refers to the problem of \textit{overfitting} in machine learning, where the model is heavily specialized on the traces encountered in the log. A model \textit{generalizes} if it is able to account for behaviour generated by the original process models which is not in the log. 
    \item \textit{Precision}: A Model is \textit{precise} if it does not allow for 'too much' behaviour. That is, if it properly replays structures visible in the log, but doesn't allow for random behaviour. 
\end{itemize}
To illustrate, consider a \textproc{FlowerModel(L)}, which is a model that is able to replay any log with activities in $L$. It is thus \textit{fit} and reasonably \textit{simple}, because it only contains one re-do loop node. However, a \textproc{FlowerModel(L)} is obviously not \textit{precise}, since it allows for any behaviour. An enumeration model \textproc{Enum(L)} on the other hand is built by having a distinct start to end path for every trace encountered in the log. This way, \textproc{Enum(L)} is perfectly \textit{precise} but horribly \textit{overfitted}, because no behaviour is possible that is not in the log $L$. 


\section{Inductive Mining}
\label{sec:inductivemining}
Inductive Mining describes a framework which aims to discover block-structured process models (i.e. \textit{process trees}) from event logs \cite{inductivemining-constructive}.
\subsection{Inductive Mining - General Idea}
Given a set $\bigoplus$ of operators like above, the idea is to search for possible splits of $L$ into smaller sub-logs $L_1, \dots, L_n$, such that the sub-logs combined by the respective operator yield $L$ again. This selection step is performed by some - currently still abstract - $select(L)$, for which some essential properties hold. The algorithm then recurses on the found sub-logs until a base case is discovered. (e.g. $L = \{ \langle a \rangle \}, \{ \epsilon \}$ or $\{ \varnothing \}$)

The framework independent of the actual discovery of cuts is depicted in Algorithm \ref{inductive}. Due to space constraints we will focus on one concrete implementation of the framework provided in \cite{inductivemining-constructive}, which is specified by the selection algorithm $select_{B'}(L)$ in \ref{ssub:efficientselect}. 

The counter variable $\phi$ allows the $select(L)$ procedure to return a sub-log of equal size $\Vert L_i \Vert = \Vert L \Vert$, while only doing so finitely often. Otherwise, termination could not be guaranteed. The base cases in lines 1 to 6 describe the behaviour when the algorithm encounters empty logs or logs with only one activity and thus define the terminating steps of the recursion.

\begin{algorithm}[h!]
\caption{Recursive B_{select}(L, $\phi$)}
\label{inductive-mining}
\begin{algorithmic}[1]
\Require the log $L$, counter variable $\phi$
% \State base $\gets $ \Call{SetBaseCase}{L}
\If{$L = \{\epsilon\}$} \Comment{handle base cases}
    \State base $\gets \{\tau\}$
\EndIf
\State \textbf{else if} $ \exists a \in \Sigma : L =  \{ \langle a \rangle \}$ \textbf{then}
    \State $\quad$  base $\gets \{ a \}$ 
\State \textbf{else}
    \State $\quad$  base $\gets \varnothing$ 
\State $P \gets select(L)$

\If{$|P| = 0$}
    \If{base $= \varnothing$}
        \State \textbf{return} \Call{FlowerModel}{L} 
    \Else 
        \State \textbf{return} base
    \EndIf
\EndIf
\State \textbf{return} $B(P)$ \Comment{Recursion occurs here}
\end{algorithmic}
\label{inductive}
\end{algorithm}

Where
\begin{align*}
B(P) = & \Big\{ \oplus (M_1, \dots ,M_n) \quad | \\ &(\oplus, ((L_1, \phi_1), \dots, (L_n, \phi_n))) \in P \\ &\wedge M_i \in B(L_i, \phi_i) \Big\} \cup \text{base}
\end{align*}
and $\textproc{FlowerModel}(L)$ returns a loop that can replay any log containing the acitivities in L.
We see that the generic procedure $select(L)$ returns tuples $(\oplus, ((L_1, \phi_1), \dots, (L_n, \phi_n)))$, for which all of the following conditions must hold.
\begin{itemize}
    \item The operator applied on the sublogs $L_i$ must yield a super-set of $L$
    \item $\forall i : \Vert L_i \Vert + \phi_i < \Vert L \Vert + \phi $, so that the recursion ends
    \item $\forall i :$ $L_i$ less than or equal to $L$
    \item $\forall i :$ $\phi_i$ less than or equal to $\phi$
    \item Activities in $L_i$ also occur in $L$, so that it is a sound sub-log. (i.e $A_i \subset A_L$)
    \item For the number of sub-logs $n \leq \Vert L \Vert + \phi$ must hold
\label{list:properties}
\end{itemize}
Given this, the termination and fitness of the method can be proven for a generic $select(L)$ \cite{inductivemining-constructive} (Theorem 2 and 3)

\subsection{Efficient Select} % (fold)
\label{ssub:efficientselect}
Let us now consider a concrete selection procedure called $select_{B'}(L)$ in \cite{inductivemining-constructive}:

Using the DFG of a log $L$, we try to find the dominant operator that orders the behaviour in the graph. A \textit{directed-acylclic graph} for example can always be split meaningfully by a sequence cut. Given the cut and the corresponding operator, we split the log and perform the same procedure on the sub-logs $L_1, \dots, L_n$, as outlined above. The different cuts explained below closely resemble existing graph algorithms for (strongly) connected components. Given the sub-graphs (i.e. subsets of activities $A_i$), each cut has its corresponding \textproc{SPLIT}$(L, (A_1, \dots, A_n))$ function, which maps the log $L$ onto the sub-logs $L_i$ according to the cut specified by $A_1, \dots, A_n$

We now define the four cuts (defined in \cite{process_mining, inductivemining-constructive}) related to the operators defined in \ref{eq:operators}. Let $G$ be a DFG and let $A_1, \dots, A_n$ be sets of activities $A_i \subset A_L$.

\begin{defn} (Exclusive Choice Cut)
An \textit{exclusive choice cut} is a cut ($\times, (A_1, \dots, A_n)$) of $G$ such that
$$
\forall i,j = 1, \dots, n \wedge a \in A_i, b \in A_j: i \neq j \Rightarrow a \not \mapsto_L b.
$$
\end{defn}

\begin{defn} (Sequence Cut)
A \textit{sequence cut} is an ordered cut ($\rightarrow, (A_1, \dots, A_n$)) of $G$ such that
$$
\forall i,j =1,\dots, n \wedge  a \in A_i, b \in A_j: i < j \Rightarrow (a \mapsto_L^{+} b \not \mapsto^{+}_L a),
$$
where $a \mapsto_L^{+} b$ means that $a$ is followed by $b$ in the log, but not only directly. 
\end{defn}

\begin{defn} (Parallel Cut)
A \textit{parallel cut} is a cut ($\wedge,( A_1, \dots, A_n$)) of $G$ such that
\begin{align*}
&\forall i = 1,\dots, n \wedge A_i \cap A_L^{start} \neq \varnothing, A_i \cap A_L^{end} \neq \varnothing \text{ and } \\
&\forall i,j = 1, \dots, n \wedge a \in A_i, b \in A_j : i \neq j \Rightarrow a \mapsto_L^{+} b .
\end{align*}
\end{defn}

\begin{defn} (Loop Cut)
A \textit{loop cut} is a partially ordered cut ($\circlearrowleft,( A_1, \dots, A_n$)) of $G$ such that
\begin{enumerate}
    \item $n \geq 2 \text{ and } A_L^{start} \cup A_L^{end} \subseteq A_1 $ 
    \item $ \{ a \in A_1 | \exists i \exists b \in A_i: a \mapsto_L b \} \\ \subseteq A_L^{end} \text{ similar for } A_L^{start} $
    \item $ \forall a \in A_i, b \in A_j : i \neq j \Rightarrow a \not \mapsto_L b$
    \item $ \forall i = 1, \dots, n \forall a \in A_i, b \in A_L^{end} : a \mapsto_L b \\ 
    \Rightarrow \forall a' \in A_L^{end} : a' \mapsto_L b .
    $ 
\end{enumerate}
\end{defn}

Thus, in the concrete $B'_{select}$ (Algorithm \ref{select}), we take a log, construct a directly follows graph, search for cuts and create the sub-trees. Using the trees, we can project the original log onto the sub-log that is allowed by the respective sub-tree. This projection is called \textproc{Split}. In this regard, a process tree $ Q $ is called \textit{language-rediscoverable} by an inductive miner $B$, if for any directly-follows complete event log $L$ generated from $Q$ it holds that $\mathcal{L}(B(L)) = \mathcal{L}(Q)$ \cite{process_mining}.

Using these cuts and there respective \textproc{SPLIT} functions we can construct the $select_{B'}(L)$ in Algorithm \ref{select}. Write \textit{n.m. cut} as short for nontrivial maximal cut.

\begin{algorithm}[h!]
\caption{$select_{B'}(L)$ from \cite{inductivemining-constructive}}
\begin{algorithmic}[1]
\If{$\epsilon \in L \vee L = \{ \langle a \rangle \}$ for $a \in A_L$}
    \State \textbf{return} $\varnothing$
\EndIf
\If{$c \gets$ n.m exclusive-choice cut of $G(L)$}
    \State $L_1, \dots, L_n \gets \Call{ECSplit}{L, (A_1, \dots, A_n)} $
    \State \textbf{return} $\{\times, ((L_1, 0), \dots, (L_n, 0))\}$
\EndIf
\If{$c \gets$ n.m sequence cut of $G(L)$}
    \State $L_1, \dots, L_n \gets \Call{SequenceSplit}{L, (A_1, \dots, A_n)} $
    \State \textbf{return} $\{\rightarrow, ((L_1, 0), \dots, (L_n, 0))\}$
\EndIf
\If{$c \gets$ n.m parallel cut of $G(L)$}
    \State $L_1, \dots, L_n \gets \Call{ParallelSplit}{L, (A_1, \dots, A_n)} $
    \State \textbf{return} $\{\wedge, ((L_1, 0), \dots, (L_n, 0))\}$
\EndIf
\If{$c \gets$ n.m sequence cut of $G(L)$}
    \State $L_1, \dots, L_n \gets \Call{LoopSplit}{L, (A_1, \dots, A_n)} $
    \State \textbf{return} $\{\circlearrowleft, ((L_1, 0), \dots, (L_n, 0))\}$
\EndIf
\State \textbf{return} $\varnothing$
\end{algorithmic}
\label{select}
\end{algorithm}
Note that the order in which the different cuts in $\bigoplus$ are considered is irrelevant (\cite{inductivemining-constructive} Lemma 16), but we will see it is handy to consider them in the order depicted above. We can then prove that the properties in List \ref{list:properties} hold, by considering them for a fixed $\phi = 0$ \cite{inductivemining-constructive}.
Note also, that this is has polynomial time complexity, although the generation of the DFG and the cut detection run in linear time \cite{evermann2016scalable, tarjan1972depth}. Polynomial complexity comes from recursing on each sub-log returned. Efficiency of inductive mining can be further improved by recursing directly on the DFG rather than on the sub-logs.

\subsection{Finding Cuts in the Directly-Follows Graph}
To finish our discussion of \textit{inductive mining}, we consider the details of how one can find the cuts outlined above in a \textit{directly-follows graph} $G$ of a log $L$ using existing graph algorithms for \textit{connected components} (simple breadth search) as well as \textit{strongly connected components} (see \cite{tarjan1972depth} for details). For clarity, we do this given an example log $L_2 = \{ \langle a,b,c,d \rangle, \langle a,c,b,d \rangle, \langle a,e,d \rangle \}$. Constructing the DFG by simply considering which activity follows which, yields $G_L = G(L_2)$ as shown in Figure \ref{graph:dfg}.

\begin{figure}[h!]
    \centering
        \begin{tikzpicture}[
            > = stealth, % arrow head style
            shorten > = 1pt, % don't touch arrow head to node
            auto,
            node distance = 1.5cm, % distance between nodes
            semithick % line style
        ]

        \tikzstyle{every state}=[
            draw = black,
            thick,
            fill = white,
            minimum size = 4mm
        ]

        \node[state] (a) {$a$};
        \node[state] (c) [right of=a] {$c$};
        \node[state] (b) [above of=c] {$b$};
        \node[state] (e) [below of=c] {$e$};
        \node[state] (d) [right of=c] {$d$};

        \path[->] (a) edge node {} (b);
        \path[->] (a) edge node {} (c);
        \path[->] (a) edge node {} (e);
        \path[->] (c) edge[bend right] node {} (b);
        \path[->] (b) edge[bend right] node {} (c);
        \path[->] (b) edge node {} (d);
        \path[->] (c) edge node {} (d);
        \path[->] (e) edge node {} (d);
        
    \draw[red, dashed] (0.5, 1) -- (0.5, -1);
    \draw[red, dashed] (2.5, 1) -- (2.5, -1);
    \draw[green, dashed] (1, -0.7) -- (2, -0.7);
    \draw[blue, dashed] (1, 0.7) -- (2, 0.7);
    \end{tikzpicture}

    \caption{DFG $ G_L = G(L_2)$ constructed from $L_2$}
    \label{graph:dfg}
\end{figure}
Now we look for cuts in the order $(\times, \rightarrow, \wedge, \circlearrowleft)$. We note that in $G_L$ no \textit{exclusive-choice cut} can be found, because all nodes are connected. Thus we consider a \textit{sequence cut} and find that the red dashed lines provide a sequence cut. To compute this, one would collapse all strongly connected components and pairwaise unreachable nodes into single nodes, which then represent the sub-graphs of a sequence cut. We now continue on the subgraphs $G_{La}, G_{Lbce}, G_{Ld}$ that are created by spliting $G_L$ at the dashed red lines. Note that $G_{La}$ and $G_{Ld}$ already represent base cases (only one activity) and the recursion finishes there (the sub-tree that $select_{B'}(Ld)$ would return is simply one node $Q_{Ld} = (d)$). On $G_{Lbce}$ we find an \textit{exlusive-choice} cut represented by the green dashed line in Figure \ref{graph:dfg}, yielding sub-graphs $ G_{Lbc}, G_{Le}$ of which we again only consider $G_{Lbc}$. There we finally find a parallel cut, depicted by the blue dashed line. All in all, we found in top-down order the cuts $(\rightarrow, \times, \wedge)$ and no \textit{loop cut}. Thus the inductive miner yields
$$
Q_L = B_{select}(L_2,0) = \rightarrow(a, \times(\wedge(b,c), e),d),
$$
or more illustrative in Figure \ref{fig:mined_tree}.

\begin{figure}[h!]
    \centering
    \usetikzlibrary{graphdrawing.trees}
    \begin{tikzpicture}[nodes={draw, circle}, ->, level 2/.style={sibling distance=15mm}, level distance=1.0cm,]
 
    % Tree structure
    \node{$\rightarrow$}
    child { node{$a$} } 
        child { node {$\times$} 
            child {node {$\wedge$}
                child {node {b}}
                child {node {c}}
            }
            child {node {e}}
        }
    child { node{$d$} } 
 
    \end{tikzpicture}
\caption{ Process tree $Q_L$ mined from $L_2$ with inductive mining after \cite{inductivemining-constructive}}
\label{fig:mined_tree}
\end{figure}
Considering again the original log $L_2$, we see that all traces in the log can be replayed, thus the model is \textit{fit} and per construction also \textit{sound}. That is, execution of $Q_L$ always concludes in the end state $d$ and there are no dead-ends. If the log $L_2$ was \textit{directly-follows complete} with respect to the original model from which it was drawn, then our tree $Q_L$ is language-equivalent to that model. Mind however, that such an \textit{original model} does not usually exist, since we are mining for the purpose of extracting a new model.

\section{Discussion \& Related Work}
\label{sec:related}
Like outlined in \ref{quality}, process discovery faces some of the common statistical problems that keep occurring because of the nature of the task. More challenges to the field are described in \cite{van2004processagenda}. Some of which are more technical in nature like \textit{noisy data}, \textit{incompleteness} or \textit{visualization of results} while others point to a more abstract epistemological problem. Mining processes, for example, always impose a \textit{representational bias} by limiting the bandwith of models they can possibly discover (\textit{process trees} in our case). Furthermore, process mining as presented here can by nature only consider one perspective of organizational processes, namely the \textit{control-flow perspective}, while other views like the \textit{information-} or \textit{organization perspective} are neglected \cite{van2004processagenda}. Also, one has to keep in mind that as of now, most discovery techniques work only with positive examples. That is, they consider only events found in the log, while neglecting possible negative examples that cannot possibly occur. 
Having these structural limitations in mind is an essential requirement to continue further work on the topic. 

Since \cite{processessoftware}, many approaches from various fields have been tried and tested to tackle some of the more technical problems mentioned above. Genetic algorithms proved to have very high precision \cite{van2011geneticsoundness}, but have impracticable run-times due to their evolutionary approach. Heuristic Mining makes use of statistics to deal with infrequent behaviour (i.e. noise) during the model construction. In \cite{processessoftware} also \textit{neural nets} and \textit{Markov models} are considered besides a simpler, algorithmic approach. 

In the vast collection of possibilities, \textit{inductive mining} has shown to have desirable properties both in quality (soundness, rediscoverability, normalization) as well as in performance, due to its recursive approach.
Most importantly, inductive mining provides a modular framework which can be expanded in the future. For example, normalization can be applied to simplify models post-hoc, like in \cite{fahland2013simplifying}. Performance can be increased by working on a single DFG rather that projecting the original log back onto sub-logs \cite{leemans2015scalable, evermann2016scalable}. 

To close, inductive mining opens a promising path toward better understanding the bulk of process data that is collected, while doing so in a deterministic and rather straightforward manner. 

\bibliographystyle{ieeetr} 
\bibliography{references}
\end{document}