Portfolio | Seongju Hwang

Professional Experience

Hanwha General Insurance

Jul 2024 – Present

Data Engineer | Seoul, Korea

Platform & Pipeline

• Large-Scale Ingestion: Built parallel extraction pipelines using Spark JDBC to ingest 100M+ rows from MySQL/Tibero into Azure HDFS.
• Workflow Mgmt: Designed and maintained 300+ Oozie workflows processing ~1B rows/day with robust retry and failure isolation.
• Optimization: Migrated Sqoop-based CDC to Spark JDBC, reducing end-to-end latency from 7h to 3h.
• Platform Ops: Managed Azure HDInsight-based Spark/Hive platforms with YARN-level resource tuning.

Real-time & BI

• IoT Streaming: Developed Spark Structured Streaming pipelines ingesting 100K events/sec from 700K devices via Kafka.
• BI Infrastructure: Built end-to-end Power BI infra, including on-prem gateway and security coordination.
• Modern Stack: Led PoCs for Apache Superset and Databricks to modernize the analytical ecosystem.

Technical Architecture: Hanwha Data Platform

graph TD
    subgraph Sources [Data Sources]
        direction LR
        MySQL[(MySQL)]
        Tibero[(Tibero)]
        IoT[700K IoT Devices]
    end

    subgraph Compute [Spark Processing Engine]
        direction TB
        Kafka[Kafka Cluster]
        JDBC[Spark JDBC Ingestion]
        DF[Spark DataFrame Transform]
        SSS[Structured Streaming]
        
        JDBC --> DF
        Kafka --> SSS
    end

    subgraph Storage [Azure HDInsight Lake]
        HDFS[Azure HDFS]
        Hive[Hive Metastore]
        HDFS --- Hive
    end

    subgraph Consumption [Analytics Layer]
        JH[Jupyter Hub]
        PB[Power BI]
        SS[Apache Superset]
    end

    MySQL --> JDBC
    Tibero --> JDBC
    IoT --> Kafka
    DF --> HDFS
    SSS --> HDFS
    HDFS --> JH
    HDFS --> PB
    HDFS --> SS

    style Sources fill:#1e293b,stroke:#475569,color:#fff
    style Compute fill:#312e81,stroke:#818cf8,color:#fff
    style Storage fill:#0f172a,stroke:#4f46e5,color:#fff
    style Consumption fill:#064e3b,stroke:#10b981,color:#fff

Visualization of the hybrid cloud data flow and transformation layers.

Carrot Insurance

Apr 2024 – Jul 2024

DW/BI Intern | Seoul, Korea

Data Mart Design: Developed Hive SQL-based data marts aligned with business performance indicators (KPIs).
Performance Migration: Migrated legacy Tez-based workflows to Spark, reducing batch execution time from 2 hours to 30 minutes (75% improvement).
BI Automation: Established automated refresh environments by connecting Power BI dashboards with on-premise data gateways.

Seongju Hwang

Professional Experience

Hanwha General Insurance

Platform & Pipeline

Real-time & BI

Technical Architecture: Hanwha Data Platform

Carrot Insurance

Technical Skills

Advanced Spark

Platforms

Databases

Visualization

Seongju Hwang_Resume.pdf