Data Engineer leading hybrid cloud analytics.
Managing 300+ Spark pipelines processing 1B+ rows daily with Spark JDBC & DataFrames.
Jul 2024 – Present
Data Engineer | Seoul, Korea
graph TD
subgraph Sources [Data Sources]
direction LR
MySQL[(MySQL)]
Tibero[(Tibero)]
IoT[700K IoT Devices]
end
subgraph Compute [Spark Processing Engine]
direction TB
Kafka[Kafka Cluster]
JDBC[Spark JDBC Ingestion]
DF[Spark DataFrame Transform]
SSS[Structured Streaming]
JDBC --> DF
Kafka --> SSS
end
subgraph Storage [Azure HDInsight Lake]
HDFS[Azure HDFS]
Hive[Hive Metastore]
HDFS --- Hive
end
subgraph Consumption [Analytics Layer]
JH[Jupyter Hub]
PB[Power BI]
SS[Apache Superset]
end
MySQL --> JDBC
Tibero --> JDBC
IoT --> Kafka
DF --> HDFS
SSS --> HDFS
HDFS --> JH
HDFS --> PB
HDFS --> SS
style Sources fill:#1e293b,stroke:#475569,color:#fff
style Compute fill:#312e81,stroke:#818cf8,color:#fff
style Storage fill:#0f172a,stroke:#4f46e5,color:#fff
style Consumption fill:#064e3b,stroke:#10b981,color:#fff
Visualization of the hybrid cloud data flow and transformation layers.
Apr 2024 – Jul 2024
DW/BI Intern | Seoul, Korea