Filtering Event Logs: Advanced Techniques for Removing Noise, Outliers, or Specific Activities to Simplify Process Discovery

Process discovery can be powerful, but it often fails for a simple reason: the event log is messy. Real operational systems produce logs with duplicates, missing values, inconsistent activity names, bot-driven actions, and edge-case variants that make the discovered model look like “spaghetti”. Filtering event logs is not about hiding reality. It is about preparing the data so the discovered process is readable, diagnosable, and useful for improvement work. If you are building skills through a business analyst course, log filtering is a practical capability because it connects raw system data to clear process insights that stakeholders can act on.

This article explains advanced event log filtering techniques to remove noise, manage outliers, and exclude specific activities so that process discovery becomes simpler and more meaningful.

1) Why event logs become noisy in the first place

An event log is typically a table with at least: Case ID (process instance), Activity (event name), Timestamp, and sometimes Resource, Channel, or Cost. Noise enters when:

Multiple systems record the same action with different labels.
Automated jobs create events that are not part of the “human process”.
Users perform workarounds that create rare but complex paths.
Data quality issues generate missing timestamps or duplicate events.

If you run discovery directly on such logs, you may get a model with too many variants, unclear decision points, and loops that represent logging artefacts rather than genuine operational behaviour. Filtering helps you focus on the “dominant process” first and then reintroduce complexity intentionally.

2) Noise and outlier filtering at the event level

Event-level filtering removes or corrects problematic records without removing entire cases.

Remove duplicates and near-duplicates

Duplicates can occur when systems retry logging or write multiple events for the same user action. Common approaches:

Deduplicate by (Case ID, Activity, Timestamp) within a small time window.
Keep the first event and drop subsequent duplicates within, for example, 1-5 seconds depending on the system.

This prevents artificial loops such as “Submit → Submit → Submit”.

Standardise activity names

Before filtering, normalise labels:

Trim whitespace, unify case, fix typos.
Map synonyms to a single activity (“Approve Request” vs “Approval Completed”).
Split overloaded events if needed (e.g., “Payment Updated” might include success/failure states that should be separate attributes rather than separate activities).

This is one of the highest-impact steps because process discovery is sensitive to activity name variation.

Filter known “non-process” events

Some events represent infrastructure rather than the business flow:

heartbeat events
autosave events
system sync events
background notification logs

If these do not represent actual steps in the process you want to analyse, filter them out early. Keep a rule list and document it so the analysis is reproducible.

3) Case-level filtering to simplify variants and focus the analysis

Case-level filtering removes entire process instances (cases) that distort the model or fall outside the scope.

Variant frequency filtering

Most processes have a long tail of rare variants. A practical technique is:

Keep only the top variants that cover, say, 80-95% of cases.
Or remove variants that appear fewer than N times.

This produces a readable “core model” that you can validate with stakeholders. Later, you can analyse excluded variants separately as exceptions, compliance risks, or special handling cases.

Duration-based outlier filtering

Cases with extreme durations can dominate performance analysis and create misleading bottlenecks. Use robust thresholds:

Identify outliers using percentiles (e.g., remove above the 99th percentile) rather than mean ± standard deviation.
Alternatively, separate “paused” cases (waiting for customer, external approval) from active processing time if those states can be identified.

This helps the discovered process represent typical execution rather than worst-case delay scenarios.

Start/end boundary filtering

Many logs include partial cases due to system cutovers or missing early events. You can:

Keep only cases that contain an expected start activity and end activity.
Or truncate cases to start at the first meaningful activity (e.g., “Case Created”) and end at “Closed”.

Defining clean boundaries is essential for meaningful process maps and conformance checks.

4) Activity-based filtering: remove, collapse, or segment

Not all activities need to be shown at the same level of detail.

Remove low-value steps

Some steps add clutter without insight (e.g., “Open Screen”, “View Details”). If the objective is to understand handoffs and approvals, remove purely navigational activities.

Collapse repeated micro-steps into macro-activities

In many digital processes, one logical step produces multiple events:

“Fill Form” may generate 10 field-change events.
“Underwriting Review” may generate several internal status updates.

You can aggregate these into a single macro-activity using rules such as:

collapse sequences within a time window
collapse all “status update” events into one “Status Updated” step with an attribute recording how many updates occurred

This preserves signal while reducing complexity.

Segment logs by meaningful attributes

Instead of filtering everything out, split the analysis:

channel (web vs branch)
customer type (new vs returning)
priority (normal vs urgent)
region or product line

Segmentation often produces clearer models than aggressive filtering because each segment has a more consistent process behaviour.

5) Validation: how to ensure you did not “filter away” the truth

Filtering must be defensible. A simple validation checklist:

Track how many events and cases were removed at each step.
Compare KPIs before and after filtering (cycle time, rework rate, handoff count).
Share the filtered vs unfiltered view with process owners to confirm the model still reflects reality.
Keep filters versioned so results can be reproduced.

This discipline is emphasised in a business analysis course because stakeholders will ask, “Are we seeing the real process or a cleaned-up story?”

Conclusion

Filtering event logs is a critical step in process discovery because it converts noisy system traces into a process model that people can read and improve. By applying event-level cleaning, case-level outlier controls, and activity-based simplification, you can reduce “spaghetti” models and focus on the operational truth that matters. Done well, filtering does not hide problems-it makes them visible in a structured way. These are practical, job-relevant techniques for anyone building process mining and operational analytics skills through a business analyst course or a business analysis course.

Business Name: ExcelR- Data Science, Data Analytics, Business Analyst Course Training Mumbai
Address: Unit no. 302, 03rd Floor, Ashok Premises, Old Nagardas Rd, Nicolas Wadi Rd, Mogra Village, Gundavali Gaothan, Andheri E, Mumbai, Maharashtra 400069, Phone: 09108238354, Email: enquiry@excelr.com.

Filtering Event Logs: Advanced Techniques for Removing Noise, Outliers, or Specific Activities to Simplify Process Discovery

1) Why event logs become noisy in the first place

2) Noise and outlier filtering at the event level

Remove duplicates and near-duplicates

Standardise activity names

Filter known “non-process” events

3) Case-level filtering to simplify variants and focus the analysis

Variant frequency filtering

Duration-based outlier filtering

Start/end boundary filtering

4) Activity-based filtering: remove, collapse, or segment

Remove low-value steps

Collapse repeated micro-steps into macro-activities

Segment logs by meaningful attributes

5) Validation: how to ensure you did not “filter away” the truth

Conclusion

Categories

Business

Effective Use of Long SMS for Group Join Messages

Filtering Event Logs: Advanced Techniques for Removing Noise, Outliers, or Specific Activities to Simplify Process Discovery

The Importance of Reviews, Reputation, and Visibility Before the Busy Season Begins

Why visual identity outside a building shapes customer first impressions

Rotary Airlock Valves Supporting Smooth Transfer Between Pressure Differential Systems