Real-time fraud scoring with Apache Spark

In short

The outcome we're after.

A payments fintech wins or loses on two numbers that pull against each other. How much fraud it catches, and how many genuine customers it blocks by mistake. Score a transaction too slowly and the payment stalls at checkout. Score it too crudely and a legitimate card gets declined while real fraud slips through. Apache Spark Structured Streaming scores each transaction as it arrives, drawing on features and history held in Snowflake and a transactional record in PostgreSQL, so the catch rate climbs without genuine customers paying for it in declined cards and abandoned baskets.

Book a discovery call

A credit card resting on a laptop keyboard during an online payment, representing the transactions a fintech must screen for fraud in real time.

Apache Spark

primary technology

The two numbers a payments fintech lives between

A payments fintech is judged on two numbers that pull in opposite directions. How much fraud it stops, and how many genuine customers it wrongly blocks. Push the rules harder to catch more fraud and the false declines climb, so real customers see a card rejected at checkout and walk away. Loosen the rules to let genuine customers through and fraud finds the gap. Every fraud team in payments lives in the tension between those two numbers, and a crude system makes both worse at once.

The clock makes it harder. The fraud decision has to land inside the authorisation window, the fraction of a second between a customer pressing pay and the payment being approved or declined. A richer model that reads more signals is more accurate, but if it scores too slowly it delays the payment, and a slow checkout costs sales on its own. So the team is squeezed from two sides. The decision must be both more accurate and faster than the rules it replaces.

Then there is the shape of the data. Fraud is rare. In a healthy book the overwhelming majority of transactions are genuine, so a model that simply predicts “not fraud” looks accurate and is useless. That rarity is exactly what produces false positives. A model straining to catch the few real frauds flags a slice of genuine activity along with them, and each of those is a real customer turned away.

The obligations sit over all of it. Transaction monitoring underpins the AML/CTF reporting expected by AUSTRAC, customer data is governed by the Privacy Act 1988, and an APRA-regulated entity carries CPS 234 information-security duties. The fraud system is not just a model. It is monitored, logged and retained infrastructure that other people audit.

Why Apache Spark, and what sits around it

The engine has to score a live stream of transactions in real time, on data that arrives without pause, and keep up at the busiest minute of the day. That is what Apache Spark is built for, so it headlines the build. Spark Structured Streaming treats the transaction feed as an unbounded stream and scores each event as it lands, with the same processing logic that also runs the historical batch jobs. One framework covers both the live decision and the training and back-testing behind it, which keeps the streaming path and the offline path from drifting apart.

The alternative is batch, and batch is the wrong tool here. A nightly or hourly batch job tells you a transaction was fraudulent after the money has already moved. For a payment authorised in under a second, that is too late to matter. Streaming moves the decision inside the authorisation window, so the choice to allow, review or block is made before the payment completes. Spark gives that low-latency scoring while still handling the volume at peak, which a single-node approach cannot.

The supporting pieces hold the data Spark needs. Features and account history live in Snowflake. The recent behaviour that tells genuine activity from fraud, the rolling counts, the device and location patterns, the customer’s normal spend, all modelled once and served to the scoring pipeline rather than recomputed per transaction. PostgreSQL is the transactional store, the system of record for the live decisions and the case state that fraud analysts work from. Spark scores the stream, reads features from Snowflake, and writes outcomes back through PostgreSQL.

The scoring itself is tiered on purpose. Cheap deterministic rules run first and clear the clearly genuine transactions and stop the clearly fraudulent ones. The model is spent only on the uncertain middle, where the answer is genuinely in doubt. That keeps the average latency low, because most transactions never touch the heavier model, and concentrates the expensive computation where it changes the decision.

An IT specialist monitoring transaction and fraud dashboards on a computer, the operations view behind a real-time scoring pipeline

Building it, and where it got hard

The model was rarely the hard part. The friction lived in the trade-off the whole system is built around, and one example stands in for the rest.

The first squeeze was latency against accuracy. The richer the model and the more features it read, the better it caught fraud, and the longer it took to score. Early on the scoring pipeline was accurate and too slow, because it recomputed features per transaction and called the feature store on every event. A check that should sit inside the authorisation window was instead nudging payments toward a timeout. The fix was the tiered design plus a fast feature path. Rules cleared the easy majority so the model ran on a fraction of traffic. Features were precomputed and served from Snowflake rather than derived in the hot path, and the hottest ones were cached so the model read them in single-digit milliseconds. Accuracy stayed, the slow path shrank to the cases that warranted it.

The second was the class imbalance and the false positives it produced. With fraud so rare, the first models flagged too much genuine activity, and every wrong flag was a real customer declined. Tuning purely for catch rate made it worse. The answer was to tune thresholds against the real cost of a false positive, treating a wrongly declined genuine customer as a quantified loss rather than an afterthought, and to set a review tier so borderline cases were held for a check instead of being blocked outright. Then a feedback loop closed the gap. Confirmed-fraud labels from analysts and chargebacks fed back into the features and thresholds, so new attack patterns were reflected within days rather than waiting for a full model rebuild.

One constraint shaped the rest. Because every decision is auditable and customer data is involved, scoring, logging and retention were built to support the fintech’s AML/CTF and privacy obligations, with personal data minimised in logs and the decision trail kept intact for review.

What changed

In a representative build the pipeline scored each transaction in well under a second end to end, so the fraud check sat inside the authorisation window instead of delaying the payment. Tiered scoring and threshold tuning against the real cost of a wrong decline cut false positives roughly by half against a rules-only baseline, while holding the catch rate, which meant fewer genuine customers declined for the same fraud stopped. The feedback loop from confirmed-fraud labels let new attack patterns reach the live system within days rather than the weeks a full rebuild would take.

These figures are illustrative. They describe the pattern we see rather than a published result for a named fintech. The shape is the point. Real-time scoring brings the fraud decision inside the moment that matters, the tiered design keeps it fast, and tuning to the real cost of a false positive means catching more fraud no longer means turning away more genuine customers.

Where this fits

Real-time fraud scoring is one application of our Data Insights and Analysis service, built on Apache Spark, for the FinTech and Banking sector. It is a contained, high-return starting point, because the transaction stream already exists and the value comes from scoring it fast, tuning it to the real cost of a decline, and keeping it current as fraud shifts. It is distinct from a credit-risk model build or a core-banking migration. This is the live payments decision. If false declines are costing you customers or fraud is slipping through, the place to start is to map your transaction flow and decide where a streaming score would change the outcome.

Illustrative figures, not a published result

Representative outcomes

Scoring latency

A representative pipeline scored each transaction in well under a second end to end, so the fraud check sat inside the authorisation window rather than delaying the payment.

False positives down

Tiered scoring and threshold tuning against the real cost of a wrong decline cut false positives roughly by half against a rules-only baseline, while holding the catch rate.

Faster pattern response

A feedback loop from confirmed-fraud labels let new attack patterns be reflected in features and thresholds within days, not the weeks a full model rebuild would take.

Where this fits

This solution applies our Data Insights & Analysis service, built primarily on Apache Spark , for the FinTech & Banking sector.

Supporting stack: Snowflake, PostgreSQL.

Go deeper: Data Insights & Analysis for FinTech & Banking , or Data Insights & Analysis with Apache Spark.

Frequently asked.

How can AI be used in banking?

Mostly to read signals at a scale and speed people cannot. In payments and retail banking that means scoring each transaction for fraud as it happens, flagging accounts likely to default or leave, and triaging alerts so analysts see the cases that matter. Here the job is narrow and concrete. Score a live transaction for fraud risk in the moment the payment is authorised, using the account's recent behaviour and known fraud patterns.

What is a common use case of machine learning in banking?

Fraud and risk scoring is the most common. A model learns the patterns that separate genuine activity from fraud, then scores new transactions or applications against them. The value is not the model alone. It is the data pipeline feeding it current features fast enough to act, and the thresholds that decide when to block, review or allow. We treat those as the real engineering, with the model as one component.

How do you catch more fraud without blocking genuine customers?

By making the decision tiered rather than a single yes or no. Cheap rules clear the obvious genuine transactions and stop the obvious fraud, and the model is spent only on the uncertain middle. Thresholds are tuned against the real cost of a wrong decline, not just a catch rate, because a blocked genuine customer is expensive too. A feedback loop from confirmed-fraud labels keeps both the model and the thresholds honest as patterns shift.

Why use streaming rather than batch for fraud scoring?

Because a payment cannot wait for a nightly batch. Batch scoring tells you a transaction was fraudulent after the money has gone. Spark Structured Streaming scores each transaction as it arrives, inside the authorisation window, so the decision to allow, review or block happens before the payment completes. Batch still has a place for model training, back-testing and reporting, but the live decision has to be streaming.

How are privacy and AML obligations handled?

Conservatively, and with the obligations designed in rather than bolted on. Transaction monitoring supports the AML/CTF reporting expected by AUSTRAC, personal data is handled under the Privacy Act 1988, and the information-security controls align with APRA's CPS 234 where the entity is regulated. We do not make regulatory promises. We build the pipeline so monitoring, logging and retention can meet the obligations the fintech and its advisers confirm apply to it.

Real-time scoring that holds up

Catch more fraud without the false declines

We will map your transaction flow and show you where streaming fraud scoring would catch more, decline fewer genuine customers, and stay inside the authorisation window.

Book a discovery call

How a payments fintech scores fraud in real time with Apache Spark

The outcome we're after.

The two numbers a payments fintech lives between

Why Apache Spark, and what sits around it

Building it, and where it got hard

What changed

Where this fits

Representative outcomes

Scoring latency

False positives down

Faster pattern response

Related solutions.

An AWS migration that retires a mutual bank's legacy banking system in stages

How predictive analytics in retail banking sorts who repays from who churns

Wiring the production line into the warehouse with Azure Data Factory

Frequently asked.

Catch more fraud without the false declines