Speedpass - ByteDance's Presentation on Serverless at 2021 GMTC

Let's start with the text version and then move on to the audio version.

GMTC 2021 Presentation "ByteDance's Frontend Development Mode Upgrade Based on Serverless"
WeChat Link

by Wang Lei, ByteDance Web Infra, 2021-07-07

Below are my notes, part of which is the original text and part is my thoughts. I will indicate the original text.

Serverless is not a new concept, especially for big companies. Big companies definitely have their own BaaS system, as FaaS is relatively easy to do, but the underlying infrastructure still needs to be customized, keeping this in mind.

Overview#

Original Text: Next, I will introduce today's content from the following six aspects:

First, summarize the responsibilities and challenges of frontend in the era of big frontend
Then introduce the common business forms of ByteDance
ByteDance's traditional development mode and challenges
Then introduce how we upgraded the frontend development mode based on Serverless.
In order to ensure stability, we have also done a lot of work in monitoring and operation and maintenance.
Finally, a brief summary and outlook.

1. Multiple Job Responsibilities#

The first part is not surprising. The job responsibilities have expanded, from being a front-end developer to being involved in SSR/BFF/micro frontends/cross-platform, integration, and Serverless.

I didn't quite understand the progressive development from BFF to micro frontends.

In terms of knowledge, traditional framework knowledge is still needed, as well as build tools and backend knowledge. Redis/mq/object storage/monitoring and alerting need to be supplemented. The main focus is still on backend and operation and maintenance knowledge.

2. Multiple Business Forms#

The second part is also not surprising, with both toC and toB, which cannot avoid CSR+SSR+BFF, and micro frontends are highly regarded. As a global company, global deployment is essential, and global deployment requires distributed data synchronization.

3. Needs and Challenges#

The third part gives examples of web development problems. Let's take a closer look at this.

CSR+BFF

CSR is the basic deployment of frontend, and expansion means integrating CDN + k8s cluster deployment.
- Business highlights: handling object storage, login authentication, AB testing, cluster operation and maintenance
BFF deployment cannot avoid k8s cluster
- Business highlights: permissions, operation and maintenance, traffic control, domain name integration (probably using DNS)
Deployment system
- Project management, release system, travel management, AB management

There is not much new here, but it is a lot for ordinary front-end developers and difficult. The problem is that it is too large, and Serverless is emerging.

Alright, let's get to the point.

4. Solutions based on sls (Serverless)#

The fourth part is about the development mode based on Serverless.

Concept Alignment and Expansion#

Serverless industry practices:

FaaS already exists, such as scheduling, cold start, etc.
Combination with BaaS
Cloud functions, Node frameworks, Runtimes, etc.

So, let's create an all-in-one frontend solution based on Serverless. To create a solution, we need to start with a diagram: architecture diagram + lifecycle roadmap.

Let's continue, the all-in-one platform should provide basic platform capabilities, commonly used capabilities out-of-the-box, and be developer-friendly (just like nuxt.js).

So, what are the platform capabilities?

No need to read the text explanation. Let's focus on the platformization. Let's talk about architecture first, and then implementation.

This part is similar to modern.js. If you want to enable SSR, it can be done with one click. However, it is not easy. Similar to nuxt.js, it provides options for CSR/SSR/SSG modes and automatically maps the API directory. The deployment artifact uses online configuration to allocate routes and domain names.

Here comes the architecture diagram:

There is a lot of text here. The left side introduces the architecture diagram, and the right side explains the lifecycle and data flow.

After discussing the architecture, let's talk about CICD. Based on the capabilities provided by Coding, it involves code submission - compilation - linting - security checks - manual testing and approval - lighthouse performance checks - manual approval for deployment.

Service orchestration pipeline, configuration file and process conversion.

Now let's move on to the implementation.

Implementation of CSR#

First, for regular CSR, the artifacts are automatically uploaded to the CDN, with one copy in ES5 and one copy in ES6. Why separate them? Is it for ESM? In the deployment process, there is a platform control diagram, which seems to be

Allocate domain names
Select the folder for object storage
Choose to publish

Here is an illustration:

I see. ES6 is for dynamic polyfilling, dynamically returning based on the user agent. It's a bit like https://polyfill.io/, dynamically filling in the gaps. Big companies really use everything they can.

Implementation of SSR#

When allocating routes, you can choose SSR and micro frontend modules. This corresponds to the configuration mode in Nuxt.

For users accessing SSR, caching is still necessary. First, check Redis, otherwise use the corresponding service through service discovery (distributed architecture). In exceptional cases, CSR is used as a fallback.

Implementation of BFF#

Oh, I understand now. It is related to existing knowledge, Nuxt's Nitro API directory, and the SSR mentioned earlier is SSR. After compilation, the artifacts have a BFF file, which represents the corresponding service, and deploying the service means having a BFF service.

It's quite extensive, and this is as far as I can understand.

SSR service discovery uses RPC calls, while CSR uses HTTP calls, as usual.

Implementation of Micro Frontends#

Micro frontends are about embedding one system's pages into another system's pages. I won't explain it here. Internally, it is integrated into the Garfish Micro Frontend system, which can be compared to micro-app/wujie.

Big companies definitely aim for smoothness and prepare from the project infrastructure.

For target selection, I need to develop CSP/SSR/BFF/micro frontend applications, and develop and deliver them in the same way. Micro frontends involve parent-child module relationships, micro frontend menus, and other features.

I speculate that the parent-child module relationship requires defining a root container and corresponding URL routes. It has a page and needs to be associated to be unified. I didn't quite understand its purpose.

I speculate that for micro frontends, the menu can be configured, as the menu is the entry point. The pages hit by the menu are empty and serve as containers for integration. After all, dynamic authorization is needed. It reminds me of my past nightmares, hahaha.

That's as far as I can understand. Traffic comes in, goes through the gateway-container page-loads micro frontends.

5. Monitoring and Operation and Maintenance#

After playing with so many things in the frontend, we still need to look at load and operation and maintenance. With k8s, it becomes much simpler. I speculate that there can be dashboard monitoring, monitoring and alerting solutions, automatic restart and rollback, and maybe even flame graphs.

First, define metrics and set thresholds through a rule engine.

404 alerts
5xx alerts
QPS exceeded alerts
SSR failure alerts

With alerts, we need to handle them with pop-ups. What to do when there is a problem:

Know it, don't handle it
Block for half an hour, block for 6 hours
Cancel the block
This is a correct alert, an incorrect alert, needs to be handled. Report with tracking.
Copy content for help

With metrics, we can create a big screen. I'm too alert. Let's try to see the effect of the load.

Logs still need to be integrated into the logging system, with separate management and filtering of the logging system. It's about exploring information from massive data. Aim to achieve traceability.

When there is a problem, how to protect the scene and debug? This part is a bit of a blind spot, probably remote sourcemap, flame graph analysis. He also admits that there are custom things similar to alinode internally, node runtime debugging.

Snapshot/CPU profile analysis is a bit like black magic, I don't understand it, respect.

6. Summary and Outlook#

Future development of serverless:

Runtime needs to better handle cold starts and dynamic scaling
Better BaaS platform construction
Serverless+ as an all-in-one service

Their infra team:

APM platform
Test infra
Low code
Cross-platform solution
Web development engine
Engineering platform
Mobile center
Serverless
Node.js architecture
Web IDE
Micro frontend solution
Design ops

Final Thoughts on the Reading#

Big companies are indeed big companies, very honest and real, going deep into the details.

I'm getting better at bragging about myself, I can guess what they are talking about. Happy!