Introducing Distill CLI: An environment friendly, Rust-powered instrument for media summarization

Introducing Distill CLI: An environment friendly, Rust-powered instrument for media summarization
Introducing Distill CLI: An environment friendly, Rust-powered instrument for media summarization


Distill CLI summarizing The Frugal Architect

A couple of weeks in the past, I wrote a couple of mission our crew has been engaged on known as Distill. A easy software that summarizes and extracts necessary particulars from our every day conferences. On the finish of that publish, I promised you a CLI model written in Rust. After a number of code opinions from Rustaceans at Amazon and a little bit of polish, at the moment, I’m able to share the Distill CLI.

After you construct from supply, merely cross Distill CLI a media file and choose the S3 bucket the place you’d wish to retailer the file. At this time, Distill helps outputting summaries as Phrase paperwork, textual content recordsdata, and printing on to terminal (the default). You’ll discover that it’s simply extensible – my crew (OCTO) is already utilizing it to export summaries of our crew conferences on to Slack (and dealing on assist for Markdown).

Tinkering is an efficient method to be taught and be curious

The way we build has changed quite a bit since I started working with distributed systems. Today, if you want it, compute, storage, databases, networking are available on demand. As builders, our focus has shifted to faster and faster innovation, and along the way tinkering at the system level has become a bit of a lost art. But tinkering is as important now as it has ever been. I vividly remember the hours spent fiddling with BSD 2.8 to make it work on PDP-11s, and it cemented my never-ending love for OS software. Tinkering provides us with an opportunity to really get to know our systems. To experiment with new languages, frameworks, and tools. To look for efficiencies big and small. To find inspiration. And this is exactly what happened with Distill.

We rewrote one of our Lambda functions in Rust, and observed that cold starts were 12x faster and the memory footprint decreased by 73%. Before I knew it, I began to think about other ways I could make the entire process more efficient for my use case.

The original proof of concept stored media files, transcripts, and summaries in S3, but since I’m running the CLI locally, I realized I could store the transcripts and summaries in memory and save myself a few writes to S3. I also wanted an easy way to upload media and monitor the summarization process without leaving the command line, so I cobbled together a simple UI that provides status updates and lets me know when anything fails. The original showed what was possible, it left room for tinkering, and it was the blueprint that I used to write the Distill CLI in Rust.

I encourage you to give it a try, and let me know if you discover any bugs, edge instances or have concepts to enhance on it.

Builders are selecting Rust

As technologists, we have a responsibility to build sustainably. And this is where I really see Rust’s potential. With its emphasis on performance, memory safety and concurrency there is a real opportunity to decrease computational and maintenance costs. Its memory safety guarantees eliminate obscure bugs that plague C and C++ projects, reducing crashes without compromising performance. Its concurrency model enforces strict compile-time checks, preventing data races and maximizing multi-core processors. And while compilation errors can be bloody aggravating in the moment, fewer developers chasing bugs, and more time focused on innovation are always good things. That’s why it’s become a go-to for builders who thrive on solving problems at unprecedented scale.

Since 2018, we have increasingly leveraged Rust for critical workloads across various services like S3, EC2, DynamoDB, Lambda, Fargate, and Nitro, especially in scenarios where hardware costs are expected to dominate over time. In his guest post last year, Andy Warfield wrote a bit about ShardStore, the bottom-most layer of S3’s storage stack that manages data on each individual disk. Rust was chosen to get type safety and structured language support to help identify bugs sooner, and how they wrote libraries to extend that type safety to applications to on-disk structures. If you haven’t already, I recommend that you read the post, and the SOSP paper.

This pattern is mirrored throughout the business. Discord moved their Learn States service from Go to Rust to handle massive latency spikes attributable to rubbish assortment. It’s 10x quicker with their worst tail latencies lowered nearly 100x. Equally, Figma rewrote performance-sensitive elements of their multiplayer service in Rust, and so they’ve seen vital server-side efficiency enhancements, similar to lowering peak common CPU utilization per machine by 6x.

The purpose is that if you’re severe about value and sustainability, there isn’t a cause to not take into account Rust.

Rust is tough…

Rust has a reputation for being a difficult language to learn and I won’t dispute that there is a learning curve. It will take time to get familiar with the borrow checker, and you will fight with the compiler. It’s a lot like writing a PRFAQ for a new idea at Amazon. There is a lot of friction up front, which is sometimes hard when all you really want to do is jump into the IDE and start building. But once you’re on the other side, there is tremendous potential to pick up velocity. Remember, the cost to build a system, service, or application is nothing compared to the cost of operating it, so the way you build should be continually under scrutiny.

But you don’t have to take my word for it. Earlier this year, The Register revealed findings from Google that confirmed their Rust groups have been twice as productive as crew’s utilizing C++, and that the identical measurement crew utilizing Rust as a substitute of Go was as productive with extra correctness of their code. There aren’t any bonus factors for rising headcount to deal with avoidable issues.

Closing ideas

I want to be crystal clear: this is not a call to rewrite everything in Rust. Just as monoliths are not dinosaurs, there isn’t a single programming language to rule all of them and never each software can have the identical enterprise or technical necessities. It’s about utilizing the proper instrument for the proper job. This implies questioning the established order, and repeatedly searching for methods to incrementally optimize your programs – to tinker with issues and measure what occurs. One thing so simple as switching the library you utilize to serialize and deserialize json from Python’s normal library to orjson is perhaps all it is advisable pace up your app, cut back your reminiscence footprint, and decrease prices within the course of.

When you take nothing else away from this publish, I encourage you to actively search for efficiencies in all features of your work. Tinker. Measure. As a result of all the things has a price, and value is a reasonably good proxy for a sustainable system.

Now, go construct!

A particular thanks to AWS Rustaceans Niko Matsakis and Grant Gurvis for his or her code opinions and suggestions whereas creating the Distill CLI.

Leave a Reply

Your email address will not be published. Required fields are marked *