Scott Ringwelski's Blog

Serverless Use Cases At Startups

Recently, as the serverless hype seems to be calming down a bit and stabilizing, I’ve been more and more interested in what it can offer to mid-size startups. In particular, real-world use cases that aren’t a “Hello World” that costs $0.0000001 to run how wonderful is that! Why mid-sized startup? Well, that is the environment I am in every day at Handshake and I like to work on real world problems.

Goals

The two main wins I hear about with serverless is

  1. it’s cheaper to run the infrastructure and
  2. it’s super scalable

Although these are great, I think for many businesses, they should be less of the focus.

Infrastructure costs being low is generally expected of the cloud already and relatively small compared to costs of engineers. Scalability is great, but any infrastructure that auto scales is equally capable (perhaps unless the workloads are very spiky).

My goals, for my environment, are instead:

  • Allow all engineers to deploy and run their services without requiring platform/ops/infrastructure team involvement. For example, avoid needing to write Kubernetes YAMLs or create Postgres databases.
  • Minimize people costs to run services such as keeping a lean Platform team and avoiding time spent by engineering team “bootstrapping” new services.
  • Quickly solve business problems with the least friction possible in setup. Especially including prototyping and MVPs of features within existing products.
  • Easy to maintain in production in terms of minimizing time spent maintaining compute instances, networking, boilerplate code, upgrades of libraries, CVE fixes, and general on-call issues.

Use Cases

Serverless has been growing on me as a great solution to these goals, and here are some use cases I am excited to explore.

Databases

Already so much of data storage in the cloud is serverless. S3/GCS are clearly serverless, and one of those technologies I simply take for granted. I couldn’t be happier to not worry about managing file servers and leaving that to the cloud providers to manage. By letting the cloud provider manage file storage, every engineer has file storage available automatically with almost no costs in terms of dollars and time.

I don’t know many startups that would argue against S3/GCS for most business needs, but serverless databases would maybe be less common. At Handshake, our databases of choice are Postgres, Elasticsearch, and Redis.

For these databases we always choose a managed option, but not necessarily a fully Serverless option. We are still thinking about scalability, CPU utilization, memory management (don’t get me started on Elasticsearch memory usage), storage and disk size.

These are the easy parts - worse is requirements like read replicas scale outs, user/permission management via command line tools, and HA failover systems. These are all things I’d rather not think about at all.

I want a database that lets me store data, and query data, and not worry about anything else. It should be available, fast, and Just Work, just like my S3/GCS does.

Pipeline Glue

Any SaaS application of a certain size has data pipelines: analytics event streams, audit logs, etc. All fairly standard stuff that a SaaS business would want.

In my mind, there are a few high level parts to these pipelines:

  1. Ingest, to receive external data
  2. Pub/Sub, to hold the data and reliably deliver it
  3. Transform, to take the data and put it somewhere
  4. Database, to store the data

A whole lot of a pipeline is what I call “glue”, moving data around and connecting it to different pieces of the system. The word “glue” might not do it justice though, because these are critical systems that need to Just Work. At the same time though, I’m not interested in building Ingest and Pub/Sub systems nonetheless running these systems.

These systems are great candidates for Serverless options that allow us to ingest messages, read messages, and Just Work. Especially if these systems can run in a fully managed and integrated environment, such as in AWS or GCP project, where data connections can work seamlessly and automatically.

If the transform code can be run Serverless, such as Google Dataflow, even better!

“Weird Functions”

At Handshake we work primarily in a Monolithic Rails app (which is currently under transformation to SOA) with a few smaller services. Although we face some pain points with that rails app, there is a lot of positives to having a single consistent codebase to work within for almost all features.

One thing that does not sound productive is breaking that out into a bunch of Cloud Functions. For the most part, if I have a function to write, I’m going to write it in ruby and put it in that main codebase.

Sometimes though… there are “Weird Functions”. Weird Functions are functions with anomalous characteristics:

  • It requires 500mb of underlying linux libraries, such as HTML -> PDF or Docx -> PDF conversions
  • It has spiky or odd capacity requirements, such as an import API

For these use cases, I’m excited to explore Cloud Functions to serve up these unique requirements in a fully isolated and also fully managed way without having to push these unique requirements onto the rest of the codebase.

Internal Automation

Lastly, internal automation. A vendor equivalent might be something like Zapier, but for your internal tools and systems that Zapier doesn’t have a connector for. Often times, this service is just one function that does one thing in response to something else.

Some simpler, perhaps-not-the-best examples:

  • Listen to CI webhooks to update a Github Status
  • Email an employee when a Google Form is submitted

Conclusion

My interest in exploring Serverless is one of caution but also growing interest. I think there is a lot to be figured out still as compared to more proven ways of running services and applications, but even with those shortcomings a lot to be excited about.