辛宝Otto

辛宝Otto 的玄酒清谈

北漂前端程序员儿 / 探索新事物 / Web Worker 主播之一/内向话痨
xiaoyuzhou
email

The awesome content of the English podcast about secondary consumption that was bragged about before starting to ponder.

Subsequent supplement: No video was made, only audio was made. Details can be found here

00-Yiya, as long as you can run! Xinxin's new idea, personal solo podcast

Write an article about water, have a chat, and let it flow naturally.

Origin#

I am the host of a Chinese audio podcast called "Web Worker". In a previous episode, I boasted that I wanted to consume English audio podcasts, and I had a process that barely worked. Today, I wrote some code to make the process easier, so I will introduce and take notes here.

If anyone sees it, feel free to interact and give feedback.

What is secondary consumption of English front-end podcasts?#

English front-end podcasts#

Podcasts are a traditional medium for communication in the foreign world, and there are also many podcasts focusing on front-end programmers. This gave me the idea of secondary consumption.

Foreign front-end podcasts have high quality. They invite guests such as the technical director of Vervel, core members of popular frameworks like React/Vue/Angular, the author of popular tool library Eslint, and bestselling authors. The content quality is guaranteed.

I have collected some of them before. In the 2022 State of JS survey, there was a specific question about podcasts in the learning resources section. The conclusion is here: https://2022.stateofjs.com/en-US/resources/podcasts/

So, the front-end podcasts recognized by the community are all here. Of course, Web Worker is not included in 2022, but it will be included in 2023. This is off-topic.

That's where the content comes from.

Secondary consumption#

Since foreign resources are so good and the English is also good (obviously), can't we translate and adapt them?

For me, it is a way to broaden my horizons, increase my knowledge of front-end topics, and learn authentic technical English. If I create a program, it will also force me to produce high-quality content, killing two birds with one stone.

With this in mind, I have the motivation to start a new column or program.

Workflow#

Let's get to work. The process of consuming audio podcasts roughly includes the following steps:

  1. Find interesting podcast content.
  2. Convert audio to text with translation.
  3. Read, listen, understand, and digest the content, and create an outline.
  4. Record and output audio and video.

Analyze each step and optimize the process.

Find interesting podcast content#

Based on the aforementioned front-end podcast rankings, I subscribed to and learned about the content of each one, excluding some channels:

  • For example, German, Russian, etc., only focus on English.
  • Channels that haven't been updated for a long time, at least one episode in the past year.

For the remaining channels, I collected their RSS feeds and subscribed to them. I used the website RSSANT for this. Since it is an RSS feed, downloading audio files is also easy.

In this step, I found interesting podcasts and obtained the mp3 files for each episode.

Convert audio to text with translation#

I have previously used services provided by Tencent Cloud, Alibaba Cloud, and Feishu Miaoji, which were good and usable. Later, I was delighted to see that OpenAI released the Whisper project, which can be deployed independently and uses the CPU/GPU resources of my own computer to convert audio to text.

Since I don't have a GPU, using the CPU is not slow either. After some trial and error, I finally used the solution provided by Whisper.cpp, which uses CPU for computation. Recently, they provided an acceleration solution for the M1 Apple chip, but it was slow to start for me, so I still use CPU more.

Here, I can show an image of the calculation time for an hour-long English podcast using Whisper.cpp + Medium.en model. I will add it later.

I can also show an image explaining the pros and cons of different model choices and the time comparison. I will add it later. The conclusion is that Medium.en is sufficient.

By using the command line, I quickly obtained the SRT English subtitle file.

I used the laf.run website to access a machine translation service in China, specifically using Huoshan Translation. Because Huoshan Translation's machine translation service has a monthly free quota of 5 million words (until May 2, 2023), it is more than enough for me. Machine translation is a mature technology, and there is no need to use anything else. I know you might be thinking of OpenAI's ChatGPT, but hold on.

Here, I continue with the example of the laf.run website, a serverless platform that can be independently deployed and hosted. Currently, I am using their free service, and the interface is very fast. Let's skip this for now, as it is not the focus of this article.

Now that I have an API, the next step is to create a GUI page. I used vue3 and arco-design to create a simple frontend page.

Don't laugh, I'll show a screenshot.

Screenshot 2023-05-02 00.59.22

This page implements the following features:

  • Import and parse SRT files using FileReader.
  • Extract plain text content.
  • Split the text into segments with a maximum of 5000 characters for translation.
  • Call the API in a loop for segment translation.
  • Preview the translation results and export bilingual subtitles.

This way, I have completed the translation work.

Read, listen, understand, and digest the content, and create an outline#

To truly understand the content, I still need to listen to it myself. With bilingual subtitles, it's not difficult to listen.

I can also directly read the subtitle file.

By listening and understanding, I can create an outline.

Because laf.run is very useful, I also integrated ChatGPT to help me summarize, just to be on the safe side.

Technical points:

  • laf.run provides an API that wraps the GPT service.
  • Frontend segmentation of English text and calling the API in a loop.
  • Preview and download summary content.

I don't want to do manual work, so I created a simple page.

Screenshot 2023-05-02 01.06.46

The prompt can be optimized, but I'll talk about that later.

Record and output audio and video#

I uploaded one episode to Bilibili, but I'm not very satisfied with it because it consumes a lot of energy. I'll work on it again tomorrow.

I prefer to have both audio and video, targeting multiple platforms.

I'll update the rest later.

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.