Senior Software Engineer - Big Data/AI - remote

Scrapinghub
Posted 4 years ago
Stack Overflow

Scrapinghub is looking for a Senior Backend Engineer to develop and grow a new web crawling and extraction SaaS.

The new SaaS will include our recently released AutoExtract which provides an API for automated e-commerce and article extraction from web pages using Machine Learning. AutoExtract is a distributed application written in Java, Scala and Python;components communicate via Apache Kafka and HTTP, and orchestrated using Kubernetes.

You will be designing and implementing distributed systems: large-scale web crawling platform, integrating Deep Learning based web data extraction components, working on queue algorithms, large datasets, creating a development platform for other company departments, etc. - this is going to be a challenging journey for any backend engineer!

As a Senior Backend Engineer, you will have a large impact on the system we’re building, the new SaaS is still in the early stages of development.

Job Responsibilities:

  • Work on the core platform: develop and troubleshoot Kafka-based distributed application, write and change components implemented in Java, Scala and Python.
  • Work on new features, including design and implementation. You should be able to own and be responsible for the complete lifecycle of your features and code.
  • Solve distributed systems problems, such as scalability, transparency, failure handling, security, multi-tenancy.

Requirements

  • 3+ years of experience building large scale data processing systems or high load services
  • Strong background in algorithms and data structures.
  • Strong track record in at least two of these technologies: Java, Scala, Python, C++. 3+ years of experience with at least one of them.
  • Experience working with Linux and Docker.
  • Good communication skills in English.
  • Computer Science or other engineering degree.

Bonus points for:

  • Kubernetes experience
  • Apache Kafka experience
  • Experience building event-driven architectures
  • Understanding of web browser internals
  • Good knowledge of at least one RDBMS.
  • Knowledge of today’s cloud provider offerings: GCP, Amazon AWS, etc.
  • Web data extraction experience: web crawling, web scraping.
  • Experience with web data processing tasks: finding similar items, mining data streams, link analysis, etc.
  • History of open source contributions