Use AI to Grow Your Small Business

AI-assisted copywriting is an application of artificial intelligence (AI) that uses machine learning to write content for a variety of purposes — from filling in the blanks on a form to writing…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Generating Waveforms for Podcasts in Winds 2.0

One of the cool new features of Winds is the waveform functionality. In this blog post, I’ll talk a little bit about what a waveform is exactly and how to go about creating it.

A waveform is a visual representation of the volume over time of an audio file — making it pretty easy to pick out the loud/quiet parts of a song or podcast.

For those visual learners, here’s what a waveform looks like:

There’s gotta be a library for that…

Wavesurfer is an awesome project, but didn’t quite satisfy our requirements. Wavesurfer uses the HTML5 canvas (which wasn’t going to work for us — more on that later!), and it looked like we were going to have to work around a lot of issues integrating the old-school JS library into our new-school React/Redux/Electron app.

We’ve already built the audio player — we just need a waveformrenderer…

It’s also important to note that we also need to display the progress on the waveform, so the user knows how many glorious minutes of sultry radio announcer voice they have left before another pledge drive break.

Before we can render a waveform, we have to take the audio data and convert it into a format that we can actually use to put on a page.

Keep in mind that digital audio files are literally just huge arrays of numbers that, when played in sequence quickly enough, sound like a podcast! Or dark incantations, if played in reverse.

In order for us to grab that data and process it, we’ll need to load it into an AudioContext, then use the getChannelData method to grab the array of audio values. Basically, we’re just turning a JS ArrayBuffer into a Float32Array:

So now I have an array of 143,669,376 32-bit floats. The MP3 has a sample rate of 44,100 samples per second, and runs for 54 minutes and 18 seconds — so, math:

( 44100 samples / sec ) * ( 3258 seconds ) = 143,677,800 samples total

Which is pretty close to our array’s 143,669,376 samples. Neat!

At the end of this process, I want an array of 100-or-so values from 0 to 1 that I can map to the heights of each bar on the waveform. (I mean, I could try to map all 143,669,376 values to bars on the waveform, but I feel like that mightcause some usability problems…)

We’re going to use a bucketing algorithm to generate N buckets — each bin will be a representation of the maximum (or average, or minimum, depending on what you’re looking for) of the volume of that time interval of the audio file.

As you see above, audio data can be negative! That’s just how audio data works — the audio sample can have a value of -1 to 1. However, we don’t want negative values in our waveform, so we’ll just use the maximum value in each bucket.

Our highly stylized waveform doesn’t have to be a 100% precise reproduction of the data in the audio file. We just want an array of 100 bars, each of which indicates the general “loudness” of the audio file in that section.

Here’s the bucketing algorithm:

After running our decodedAudioData (the Float32Array with 143,669,376 samples) through this algorithm with 100 buckets, we get something that looks like this:

Much easier for us to render this to the page!

Okay so — now we’ve got our array of 100 values from 0 to 1, each of which indicates the max loudness of that section of our audio file. We can use this data to actually render the waveform to the page.

Remember, this is what we want our waveform to look like at the end:

Based on this mockup, we’re going to have 3 main requirements:

There’s a couple of different ways that we can go about implementing this — each with a couple tradeoffs.

Pro: Canvas is awesome for 2D and 3D games as well as pixel-perfect implementations of layered or animated visual elements. Sounds perfect for us, right?

Con: Canvas doesn’t really satisfy our #2 requirement, which is easy control over the styling of the component. Scaling the waveform “bar” height, width and spacing is tough, and it’s also difficult for us to provide easy parameters to change.

(Canvas does provide some ways to use clipping masks, but that wasn’t something that I dove super deep into.)

Pro: It’s super easy for web developers to throw a bunch of customization on a DOM waveform using CSS. It scales and stretches nicely, we can set padding, height and width params easily, and we can even add animations as the podcast loads or plays.

Con: What the DOM won’t do easily for us is masking/clipping — e.g., having the progress bar “shine through” the waveform as it plays. Sure, we could probably set up some situation where we calculate how many bars to fill based on the progress through the audio file, then calculate the % to fill the “currently playing” bar, but it turns into a bit of a mess.

Pro: SVG gives us some styling options — (fill color, rectangle border radius and other SVG attributes can all be set via CSS). Most importantly, SVG allows us to do “masking” (or clipping) on DOM elements.

Con: One drawback of using SVG is we don’t get layout control via CSS. Instead, we’ll have to use absolute positioning for each of the “bars”. However, SVG does give us a couple stretching and scaling options, likepreserveAspectRatio="none". With JavaScript, we can specify those parameters (# of bars, space between bars, etc), and use math to figure out where to draw them.

We talked about the different ways we could implement our waveform, but the main requirement we were circling around was “masking”, or letting the progress bar “shine through” the waveform.

There’s 3 parts to making our SVG mask work:

Now, remember this is SVG, which means that style declarations and attributes will look slightly different, but operates on the same principles when styling DOM elements.

The first part — our “progress bar” element along with a “background” element:

The viewBox attribute defines the cartesian grid space for our SVG, with (0,0) starting in the upper left corner of the element, going to (100, 100) in the lower right.

The preserveAspectRatio attribute is the opposite of what you normally see on SVGs - instead of using “meet” or “slice” (which ensure the image is expanded/shrunk to fit in the SVG box, or cropped to fill the SVG box), we’re using “none”, which forces the image to stretch and scale to fit whatever CSS dimensions we throw at it.

Finally, we define two SVG rectangles — the first one is our waveform “background” (the gray color across the entire waveform) and the second is the actual “progress bar” that animates the width as the track plays.

Those two SVGs aren’t the bars on the waveform! They’re just the background and progress bars. In order to create the bars, we need to create another SVG element, this time with a height and width of 0:

Because we don’t know what the bars look like until the page loads (and fetches the audio data), we’ll use JavaScript to create a bunch of rects — one for each bucket:

Keep in mind, we’re drawing these bars in a 0-to-100 coordinate space, because that’s what the other SVGs on the page use. The bars will stretch and scale with viewBox based on whatever CSS height and width are set.

Remember that buckets is just our array of N values from 0 to 1, indicating the general loudness of the track over this time interval. We iterate over all the buckets, and then use some Math™ to figure out exactly where to place these rectangle bars:

Finally, we’ve got the CSS that ties the two (well, three, if you count the background) SVG elements together:

This selects both of the “background” and “progress bar” rectangles and says “use the #waveform-mask SVG as a mask” - as if the #waveform-mask SVG was just a paper cutout over the progress bar, and we can see the pixels through the cutout.

Don’t forget! We need to update the width of the .waveform-progress SVG while the track is playing. Depending on the audio player implementation, there might be an onListen callback, or maybe we’re just using setIntervalto fire every 500-or-so milliseconds. All we have to do is divide the track’s current progress by the length of the track, multiply by 100 and change the width of the .waveform-progress rectangle.

Boom! Look at that — our episode has some loud parts, some quiet parts, the audio plays, the progress bar fills in… and somehow we’ve wasted another perfectly good hour listening to Car Talk.

In this example, I’ve just used an audio element as the audio player. I’m checking the progress of the audio player every 100ms, then adjusting the width of the progress bar.

In nontrivial applications, the audio implementation will vary a lot — for example, in Winds 2.0, we’ve got separate React components for PlayerContainer, ReactAudioPlayer and Waveform. ReactAudioPlayer calls an onListen prop every 500ms, which calculates the progress, stores in PlayerContainer’s state, then renders the Waveform component with a progress prop.

Just one network call, please.

In the sample code, you might notice that there’s actually two network calls to get the /car-talk.mp3 file — one from our script, which fetches the audio in an ArrayBuffer format, and one from the audio element on the page.

Unfortunately, this is a far trickier problem than it initially seems, and something I’m still sorting out. My best suggestion is to either:

Maybe maximum loudness isn’t the variable we should use?

It might be as simple as taking the average or minimum volume over the bucket interval. Alternatively, we can get more complex by figuring out some kind of nonlinear scale to use, or calculating the maximum volume and scaling everything from there. Just something to experiment with!

But what if I want more bars? Or to round the edges of the bars?! And make the component responsive!?

These are mostly just limitations of SVG — it’s a 90% solution. Rounding the corners of a div is trivial, but rounding the corners of really tiny SVG rectangles doesn’t work well. And, if we resize the overall SVG, the number of bars stays the same, but the space between the bars scales. The waveform isn’t very responsive, so we’re better off sticking to a fixed width value when sticking it on a page.

Other than the few SVG limitations, this component is awesome! Instead of having to do all this server-side with ffmpeg or some other audio processing library, we can have the user’s client handle all waveform generation.

Add a comment

Related posts:

The Many Facets of Software

The software has been eating the world for a few decades. Operating systems, applications, websites, servers, and other pieces of software are now critical to our daily lives. I believe that in this…

Great online courses for branding students

A blog on the best courses I've found for Tern Heads a platform connecting creative interns.

Green Lake

Parker Moses was depressed, but relieved to be at least straddling his 1981 dirty green Schwinn. Beloved headphones on, which were nearly dear friends to him, he looked up and gazed across the quiet…