๐Ÿ“š 100 DAYS SQL

CODING CHALLENGE

DAY 12

Master SQL One Problem at a Time

โœจ Daily SQL Problems
โœจ Real-World Scenarios
โœจ Complete Solutions
โœจ Interview Ready

๐Ÿ”ง E-Commerce Data Cleaning Pipeline

Database Schema - RAW DATA (Messy!)

Table 1: raw_transactions - Sample data (10 rows total)

trans_idcustomer_emailproduct_skuamounttrans_datestatus
TX001  JOHN@EMAIL.COM  SKU-123-A$125.502024-01-15 14:30:00completed
TX002sarah.j@invalidsku-456-b4515/01/2024PENDING
TX003mike@shop.comSKU-789-C$02024-01-20Cancelled
TX004emma@store.iosku-123-a125.502024/01/25 10:15COMPLETED

...and 6 more messy rows! Download the full dataset above.

Table 2: product_catalog - Clean reference data

skuproduct_namecategoryprice
SKU-123-AWireless MouseElectronics125.50
SKU-456-BUSB CableAccessories45.00
SKU-789-CLaptop StandFurniture299.99

Table 3: customer_profiles - Also messy data!

emailfull_namejoin_datecountry
john@email.comjohn smith2023-05-10  USA  
SARAH.J@INVALIDSarah Jones05/10/2023uk
mike@shop.comMIKE WILSON2023-06-15Canada

๐Ÿ“‹ Problem Requirements

Build a complete data cleaning pipeline using SQL to transform messy e-commerce data into clean, analytics-ready format.

๐Ÿงน Data Cleaning Tasks:
  • Standardize emails: trim spaces, lowercase
  • Standardize SKU: uppercase format
  • Clean amount: remove $, handle NULL, negative values
  • Standardize dates: convert all to YYYY-MM-DD format
  • Standardize status: lowercase, consistent values
  • Clean names: proper case (Title Case), trim spaces
  • Standardize country: uppercase
๐Ÿ”— JOIN & Enrichment:
  • JOIN with product_catalog to get product details
  • JOIN with customer_profiles to get customer info
  • LEFT JOIN to keep all transactions (even without matches)
  • Flag orphan records (transactions with no product/customer match)
โœ… Validation & Quality Checks:
  • Validate email format using REGEXP pattern
  • Flag amount mismatches with catalog price
  • Identify potential duplicate transactions
  • Calculate data quality score for each record (0-100)
  • Categorize quality: Excellent (90+), Good (70-89), Poor (<70)

๐Ÿ’ก SQL Skills Required: TRIM, UPPER, LOWER, REPLACE, CASE, CAST, STR_TO_DATE, REGEXP_LIKE, JOINs, CTEs, COALESCE, NULLIF

This is exactly what Data Engineers do daily - cleaning messy data from various sources!

๐ŸŽ‰ Day 12 Complete!

Congratulations!

You've learned complete data cleaning using all SQL concepts! This is exactly what Data Engineers work with in real-world scenarios - messy data from multiple sources that needs standardization, validation, and quality scoring.

๐Ÿ”— Connect & Follow

Join me on the 100 Days SQL journey!

LinkedIn: linkedin.com/in/rahulkanche
GitHub: github.com/rahulkanche/100-days-sql-data

๐Ÿ’ช Keep Practicing!

โœ“ Practice on messy real-world data
โœ“ Build complete cleaning pipelines
โœ“ Master data quality validation
โœ“ Prepare for Data Engineering interviews

Follow the 100 Days SQL Challenge
Master SQL one problem at a time! ๐Ÿš€

Rahul Kanche โ€” exploring data, decoding clarity!