๐Ÿ“š 100 DAYS SQL

CODING CHALLENGE

DAY 13

Master SQL One Problem at a Time

โœจ Daily SQL Problems
โœจ Real-World Scenarios
โœจ Complete Solutions
โœจ Interview Ready

๐Ÿ’ณ Banking Fraud Detection System

Database Schema - RAW DATA (Messy!)

Table 1: transactions - Sample data (25 rows total)

txn_idcustomer_idmerchant_idamounttxn_datestatus
TXN001  C12345  M001$2,500.002024-01-15 14:23:11COMPLETED
TXN002c67890m00245.515/01/2024pending
TXN003C12345M003$-1502024-01-15 14:25:00Refund
TXN004C99999M00115000.002024/01/16 03:45FLAGGED

...and 21 more messy rows! Download the full dataset above.

Table 2: customers - Customer accounts (20 rows)

customer_idfull_nameemailaccount_status
C12345john smithJOHN.SMITH@EMAIL.COMactive
c67890SARAH JONESsarah@invalidACTIVE
C99999anna garciaanna@email.coFLAGGED

Table 3: merchants - Merchant risk levels

merchant_idmerchant_namecategoryrisk_level
M001Amazon StoreRetailLOW
M004Crypto Exchange ProFinanceHIGH
M005Online Casino 24/7GamblingHIGH

Table 4: fraud_reports - Reported fraud cases

report_idtxn_idfraud_typeverified
FR001TXN004Suspicious AmountYes
FR003TXN015Multiple AttemptsYes

๐Ÿ“‹ Problem Requirements

Build a fraud detection pipeline to identify suspicious transactions, validate data quality, and generate risk scores for banking security.

๐Ÿงน Data Cleaning Tasks:
  • Standardize customer_id & merchant_id: uppercase, trim spaces
  • Clean amount: remove $, commas, handle NULL/negative values
  • Standardize dates: convert all to YYYY-MM-DD HH:MM:SS format
  • Standardize status: lowercase, consistent naming
  • Clean emails: trim spaces, lowercase, validate format with REGEXP
  • Standardize country: uppercase, remove extra spaces
๐Ÿ”— JOIN & Enrichment:
  • JOIN with customers to get account details and status
  • JOIN with merchants to get risk levels and categories
  • LEFT JOIN with fraud_reports to check if already reported
  • Flag orphan transactions (no customer/merchant match)
  • Identify transactions with HIGH risk merchants
โš ๏ธ Fraud Detection Rules:
  • Flag transactions with amount โ‰ฅ $10,000 as "High Amount"
  • Flag transactions with HIGH risk merchant (Crypto, Gambling)
  • Use LAG() to find time between consecutive transactions by customer
  • Flag rapid transactions (< 5 minutes apart) as suspicious
  • Flag transactions from suspended/flagged customer accounts
  • Calculate fraud risk score: 0-100 (sum of all risk factors)
  • Categorize as: Low Risk (0-30), Medium (31-60), High (61-100)

๐Ÿ’ก SQL Skills Required: TRIM, UPPER, LOWER, REPLACE, CASE, CAST, STR_TO_DATE, REGEXP, Window Functions (LAG, TIMESTAMPDIFF), JOINs, CTEs, COALESCE

This mirrors real-world banking fraud detection systems used by financial institutions daily!

๐ŸŽ‰ Day 13 Complete!

Congratulations!

You've learned banking fraud detection using window functions, complex JOINs, and multi-factor risk scoring! This is exactly what financial data engineers build in production systems - fraud detection pipelines that protect billions in assets.

๐Ÿ”— Connect & Follow

Join me on the 100 Days SQL journey!

LinkedIn: linkedin.com/in/rahulkanche
GitHub: github.com/rahulkanche/100-days-sql-data

๐Ÿ’ช Keep Practicing!

โœ“ Master fraud detection patterns
โœ“ Build risk scoring systems
โœ“ Use window functions for time analysis
โœ“ Prepare for Banking/FinTech interviews

Follow the 100 Days SQL Challenge
Day 14 tomorrow - Master SQL one problem at a time!

Rahul Kanche โ€” exploring data, decoding clarity!