Searching and filtering across 100mm+ songs

I was the sole designer on Attribution Engine - Pex’s flagship music licensing product. From August 2019 to September 2020, we went from concept to Series A raise valuing the company at over $180mm.

Our team size grew 6x in less than a year, and it was time to deliver on our promises.

Initially, I was tasked with designing the platform broadly, owning every module within AE. As the company grew, I was able to dive deeper into specific areas.

Beyond our Series A, my main area of ownership within AE was the content management system. We first designed a system that could easily scale beyond 100mm songs.

Now, we needed to figure out how to find and discover songs.

My role

Senior staff designer through the end to end process: discovery, user research, requirements, design, testing, support through launch.

The team

1 product manager, 4 backend engineers, 1 frontend engineer

Timeline

December 2021 - March 2022

Discoverability at scale

We were able to get Universal, Warner and Sony’s catalog imported in record time. This dramatically decreased their time to value with Pex as a new vendor. It was time to build tools that enabled their existing workflows.

From our early interviews with the majors and other enterprise power users, we learned a lot about the day to day tasks. These insights heavily influenced how I broke down the problem and looked for solutions.

Problem statement

How might we enable findability of specific songs and discoverability of high value songs across millions of songs at scale?

Breakdown of the problem

Hundreds of millions of songs
The majors have a large catalog that grows on a near daily basis. We needed to create tools to make the catalog accessible to novices and power users alike.
Dirty data
The dirtier the data, the harder it is to index and search against. This wasn’t a unique issue to our platform and one they were already used to dealing with, but we wanted to look at ways we could improve and delight.
Replicating high value assets
The majors were struggling to determine which songs were the highest value. Beyond the obvious chart toppers, it was hard to discover which songs were heating up on social media. If we could identify these songs, they could more effectively replicate their success.

User research to build empathy internally

Pex historically had struggled to see value in user research, and as a result, often lacked understanding for the day to day lives of our users.

One of my key contributions was changing the tone and reframing user research. Showing by doing and routinely bringing gold nuggets back from the field. This helped us build solutions that were more on the mark.

User interviews

Content ID power users

ContentID was Pex’s biggest competitor in the space. Focusing on ContentID power users allowed us to start with enterprise adjacent users, but not the major labels themselves. We needed to prove a bit of value internally, before sales and exec teams were ok with us approaching the majors.

I sat down with 5 different ContentID power users, all with slightly different use cases.

I wanted to understand:

Major labels

Similarly, I went deep with 5 different major label and mid-level label users of ContentID.

I wanted to understand:

Key insights

Some key areas and patterns began to emerge right away from very few interviews. It was time to organize our findings.

Overall, both majors and ContentID power users saw basic search and filtering as a table stakes feature. Fair enough.

What both groups uncovered though was that in these systems or other internal systems they used, it was quite difficult to find or discover high value songs.

Majors own millions of songs. Not all created equal.

How might we enable discovery of high value songs across millions at scale, so replicating past success became more science than art?

Potential data points

Views
We held a treasure trove of data, not only in what was being supplied to us, but we also knew which songs were being viewed on social media. Looking at the views for a song was a good indicator it was currently getting traction on all platforms.

Licenses
Similar to views, the number of times a song was being licensed by UGC creators was a good indicator of its traction and virality.

Trending
Could we look at songs that were heating up? Less obvious choices, but songs that had more sizable movement in say - the last 2 weeks.

Ownership
The average hit song of the last decade has 15+ owners between artists, writers, labels and publishers. This means while a song can be a huge hit, it could still be low value given what that label’s stake in the song is. Could we filter out songs where the stake is too low or vice versa?

Market research

I spent a good deal of time researching other enterprise CMS systems, learning the inner workings and identifying key areas that were worth modeling.

We had an enormous amount of data that was searchable and filterable. Most other platforms were opening the data floodgates to its users. I didn’t believe that was the right approach though.

Instead, I wanted to give users fewer choices with filters that had the highest impact and reduce feature bloat.

Determining filters

Armed with the early interviews and insights, I got to work distilling it all down. What filters had the highest impact?

I believed it would be a blend of more broad type filters used in combination to yield a result. For example, configuring various filters to discover high value assets.

In tandem, we also needed more direct filters. For example, entering a specific ID that yields one result.

Label

Record labels often own a multitude of sub and sister labels. Being able to filter by label felt like table stakes.

In my interviews, many folks told me about having to jump around to multiple systems to get to various different label’s catalogs. Obviously painful.

With AE, we were porting all of that together, which is only helpful if you can split it back out if need be.

Artist

Labels own and control many artists’ catalogs. This one allowed folks to go a little more in the weeds. They could look for one artist or hundreds.

ISRC

ISRC was a unique code assigned to the recording at the point of release. This was how most labels identified specific songs.

In talking with labels, I learned they were often downloading big CSV files, opening Excel, finding the ISRC column, and then copy and pasting various ISRC codes into other platforms. It was pretty painful.

I opted to create a filter that allowed them to do that same workflow, but copy the entire row if they’d like. They could paste as many ISRC codes as they wished. Whatever results were shown were direct hits based on the codes.

Ownership percentage

The average hit song of the last decade has 15+ owners. Although certain songs could appear to be high-value, in practice to that specific label, it wasn’t always the case.

Filtering by ownership could allow them to filter broadly by % of their global share of ownership.

In a future iteration, I’d like to add a layer for countries. For instance, you may own 100% of Norway, but that’s not as high value as owning 100% of the US.

Policy

Policy got to the heart of the matter. Filtering by policy meant you could show only results of songs being monetized. Or, maybe you wanted to investigate songs that were being blocked from licensing.

Interaction design

It was time to explore various ways we could allow users to interact with our filters.

The goal was to aim for speed and clarity. Speedy interactions are fairly obvious. Clarity is a bit harder to pin down. Clear filters lead to less mistakes and confusion as to what’s active and what isn’t.

Number of auto suggested search results

I spent a great deal of time researching and exploring this. Auto suggestion is only helpful when the suggestions are on point and not overwhelming. In researching other products, I found the opposite. Suggestions were weak, and the number was too high.

I started exploring the number first. What was the optimal number of suggestions?

Competitor products were around 8-10. I found that to be quite a lot of cognitive load, even when the suggestions were of decent quality.

I created prototypes using 3, 4, 5, 6 and 8. I asked for internal feedback in our company’s Slack channel - the equivalent of the in-person hallway test.

Overall, 5 was our winner by a small margin. 4 the strong second place contender.

RANKING AUTO SUGGESTED SEARCH RESULTS

The next obstacle was ranking those suggestions. We had a lot of data to cross-reference. We also had to balance finding the perfect query with a speedy load time.

I explored ranking by:

As to be expected, while some of these queries provided really robust results, they were largely all too expensive and time consuming to run.

In the end, we combined the user’s search query with the number of associated songs. This was a solid predictive indicator and was a reasonable guess at where the user was trying to get to. It was also a decently inexpensive query for us to run.

This was a simple exercise in clear communication and managing tradeoffs with my engineering team.

Bulk search pattern

Our first filter that allowed bulk pasting was ISRC, given that it aligned well with their current workflows and behaviors. I knew if we kept this simple, this could be a very reusable pattern down the road.

I opted to contain the pasting of bulk ISRC codes in a modal. Users could write or paste to their heart's content. All they had to do was separate by comma or one per line. All spreadsheet tools can do this.

Longer queries would inherently take longer to run, but users expected this. We did limit it to 50,000 codes only so the browser wouldn’t crash the tab.

Given the nature of ISRCs, the results were very high quality since the user was getting direct hits on each unique code.

Adding filters

I played around with many different iterations when showing the status of a filter and what had previously been selected.

Where I landed was on a model that gives clear signal as to what’s been applied, while also allowing users to type into the input field and add new filters. There’s no clunky clicking, opening modals or dropdowns. Just enter the input field and start typing away.

Removing filters

Just as important as adding is taking away. I wanted to keep this pattern lean and tight. Being able to remove filters in line felt pretty familiar and passed the internal Slack / hallway test.

In addition, I also added a hard reset at the top. Always good to have an eject button when you get too in the weeds.

What's next?

The next thing on the roadmap is bulk select and bulk actions. Some light exploratory work has already taken place, but working out the kinks on this across millions of database rows is no easy feat.

Lots of deep thinking will go into determining what can be done in combination and what is too destructive and needs a proper safeguard.

Closing thoughts

Searching and filtering is often deemed as table stakes work. Often overlooked and half baked into products.

By setting the stage early and really building empathy internally, we were able to craft lightweight solutions that not only aided in their current daily work, but also brought new discoveries.

Labels were able to identify new high value songs they were previously unaware of. This could lead to more revenue within our platform, but also on other lucrative platforms.

The details matter. Understanding the end user deeply matters.