Backend Software Engineer – Data Ecosystem (Data Lake) 2957

Singapore, SG-Singapore
Posted 2 weeks ago
About The Company

This company pioneers short-form video creation and social engagement, boasting a vast, engaged user base. Its platform empowers users with creative tools, filters, and effects. With a diverse content ecosystem, it’s a hub of creativity and expression. The proprietary algorithm ensures personalized content feeds, enhancing user engagement and satisfaction. This company wields significant influence on digital media, making it an invaluable partner for innovative collaborations and marketing endeavors.


About The Team

The Data Ecosystem Team has the vital role of crafting and implementing a storage solution for offline data in the recommendation system, which caters to more than a billion users. Their primary objectives are to guarantee system reliability, uninterrupted service, and seamless performance. They aim to create a storage and computing infrastructure that can adapt to various data sources within the recommendation system, accommodating diverse storage needs. Their ultimate goal is to deliver efficient, affordable data storage with easy-to-use data management tools for the recommendation, search, and advertising functions.


What you will be doing

– Design and implement an offline/real-time data architecture for large-scale recommendation systems.
– Design and implement a flexible, scalable, stable, and high-performance storage system and computation model.
– Troubleshoot production systems, and design and implement necessary mechanisms and tools to ensure the overall stability of production systems.
– Build industry-leading distributed systems such as offline and online storage, batch, and stream processing frameworks, providing reliable infrastructure for massive data and large-scale business systems.


What you should have

– Bachelor’s Degree or above, majoring in Computer Science, or related fields, with 3+ years of experience building scalable systems
– Proficiency in common big data processing systems like Spark/Flink at the source code level is required, with a preference for experience in customizing or extending these systems
– A deep understanding of the source code of at least one data lake technology, such as Hudi, Iceberg, or DeltaLake, is highly valuable and should be prominently showcased in your resume, especially if you have practical implementation or customisation experience
– Knowledge of HDFS principles is expected, and familiarity with columnar storage formats like Parquet/ORC is an additional advantage
– Prior experience in data warehousing modeling
– Proficiency in programming languages such as Java, C++, and Scala is essential, along with strong coding skills and the ability to troubleshoot effectively
– Experience with other big data systems/frameworks like Hive, HBase, or Kudu is a plus
– A willingness to tackle challenging problems without clear solutions, a strong enthusiasm for learning new technologies, and prior experience in managing large-scale data (in the petabyte range) are all advantageous qualities

Job Features

Job CategoryBackend
SenioritySenior IC / Tech Lead
Recruiterstone.zhou@ocbridge.ai

Apply Online