Designing a content management system for 100mm+ songs

I was the sole designer on Attribution Engine - Pex’s flagship music licensing product. From August 2019 to September 2020, we went from concept to Series A raise valuing the company at over $180mm.

Our team size grew 6x in less than a year, and it was time to deliver on our promises.

Initially, I was tasked with designing the platform broadly, owning every module within AE. As the company grew, I was able to dive deeper into specific areas.

Beyond our Series A, my main area of ownership within AE was the content management system, which in many ways was the heart of the system.

My role

Senior staff designer through the end to end process: discovery, user research, requirements, design, testing, support through launch.

The team

2 product managers, 7 backend engineers, 1 database architect, 1 frontend engineer

Timeline

October 2020 - December 2021

Approaching research creatively and making internal allies along the way

Market research

YouTube’s ContentID tool was AE’s biggest, direct competition. AE in its simplest form was ContentID for all web platforms.

I spent a lot of time researching how enterprise clients were getting their data into ContentID, and from there, how ContentID was organizing and structuring the data.

Enterprises were already familiar and using ContentID. I learned the most widely used protocol for importing song catalogs was DDEX.

Rather than reinvent the wheel, it was time to learn what was working well and what wasn’t.

Learning about DDEX

I tried to learn everything I could about DDEX.

DDEX was the industry's gold standard of transporting music catalog data. It was used when pushing catalogs back and forth, label to label. It was also used to push catalogs to services or vendors. Examples being ContentID or platforms like Spotify and Pandora.

My approach to system design

At Pex, I was the system thinker guy. I had a strong understanding of the inner workings of our platforms, and knew intimately where things were operating smoothly and where areas of improvements lived.

I approach system design similar to how one approaches biohacking. The parallels between system design and the body are immense. It’s easy to identify obvious issues like broken arms and obesity. Harder to identify things like vitamin deficiencies or overactive glands.

I take a very methodical, deep in the weeds approach to system design. 30,000 ft views are easier, but often being hyper-focused on the smallest details will lead to much more robust, thoughtful solutions that scale for years to come.

Making sense of the data

Now that we’d received all of this data, we needed to make sense of it.

Many songs had multiple owners. This meant many people were sending data about the same songs. They were sending their portion of the data, and we needed to interpret it correctly.

Metadata

Most other CMS systems started and ended with matching data based on metadata. Things like song titles and artists. Most did it poorly, and most stopped here.

One very difficult concept when sifting through music metadata is that a song can be released in many forms. Let’s take a song like Better Now by Post Malone. At the time of writing, Better Now has been released 38 times. As a single in a clean form and an explicit form. On the CD Beerbongs and Bentleys and the vinyl version… and 34 others. Each release having a slightly different ownership and data profile. This was a huge challenge.

After much trial and error working through various models on whiteboards and in Miro, we opted to use both the metadata and the audio itself to guide us towards clean data buckets.

Audio

Audio matching is what Pex does best and what almost no other CMS does. We believed this could be a differentiator.

Audio matching would allow us to bucket data together based on the audio matching 100%. This would weed out a lot of the data in the process. What it didn’t correct for is the exact same audio being released on different products. CD vs. vinyl for instance.

This was ok. From there, we were able to factor in bits of metadata. Things like product codes helped guide us. These auxiliary pieces of metadata were helpful, but often incomplete and couldn’t always be fully relied on due to human error.

Trust score

The final piece of the puzzle was a trust score system. Not all data sources or partners were created equal. Major labels had decent data integrity. They also had a lot of incentive since bad data led to millions in lost revenue. DIY distributors like TuneCore or Distrokid were a different story though. Same with small indie labels and publishers. They meant well, but the resources were lacking.

We implemented a trust score system to guide us in assembling the data. Majors being the highest, followed by large distributors, followed by mid-level indie labels, so on and so forth. For many reasons, I can’t get too into the details here.

What that allowed us to do is make sense of what we received with certainty and automation. We were able to, in a sense, take someone’s word over another, through automation and our trust score system.

Nothing is ever perfect and iterations were plentiful. Overall though, the model worked, clients understood, and it led to us needing 1/10th of the manpower we initially thought to onboard new clients and large catalogs.

Designing a content management system for 100mm+ songs

My role

The team

Timeline

Scaling and accommodating music’s biggest names

Problem statement

Breakdown of the problem

Undoing some growing pains

Fighting for our customer to have their seat at the table

Approaching research creatively and making internal allies along the way

Market research

Learning about DDEX

User interviews

Forming hypothesis

My approach to system design

Importing 100mm songs in days instead of months

Deep dive on ddex

Data storage

Speeding up import

Making sense of the data

Metadata

Audio

Trust score

Displaying 100mm songs

Asset library

Ownership

Licensing

Data conflicts

Other ideas for exploration

Closing thoughts