How to differentiate homonym artists

A common problem on platforms is that several artists can legitimately have the exact same name. In some other cases, questionable providers and artists will deliver a famous artist name to get a bigger attention from the platform audience.

These cases lead to one problem to resolve on the platform side: how to differentiate them. If they are not differentiated, the artist page and discography will be totally mixed up:

  • Correct albums may end up in another artist page
  • Wrong albums may appear in an artist discography
p1
The problem, “Nirvana” is just an example used in this post, as well as the artist IDS (313 and 887)

 

First of all, the artist name is not enough to identify an artist uniquely. Another information is needed. Here are a few options, from a platform point of view.

A couple of artist name / provider

p2

If an artist like Nirvana has a contract with Universal Music, we can legitimately think that all deliveries staring Nirvana not delivered by Universal Music are not mentioning the “true” (famous) Nirvana, but another artist. This way, a couple of artist name / provider would represent this unique artist.

Pros:

  • The provider is an actor which always exist for a platform. The platform does not rely on an additional metadata given by the provider, but by the provider itself
  • This suits well the major artists, the ones which often need an immediate action, who are signed to a unique label / major for a long period of time

Cons:

  • The provider is not a robust information over time. A provider can change and deliver the same content, if the label decides to use another technical way of delivering or another legal entity
  • This needs a manual action to determine which artist belongs to which provider
  • An artist can change labels over time. Another manual action should be then taken, and two couples artist name / providers will refer to the same artist
  • A provider can have homonym artists! Think of a huge distributor (Believe, …) delivering many independent labels. Artists which are just first names (like Rafael) are likely to have homonyms. As they all come from the same provider, this method won’t differentiate them.

Delivering a unique internal artist id

p3

In every database, every element carries a unique ID. Two artists with the same name will have a different ID (in our platform case: 313 and 887). If the music provider delivers this unique internal ID to the platform, the platform will be able to differentiate two artists from the same label or distributor.

Pros:

  • The platform relies on the provider to identify artists properly, which is the provider’s role in a way
  • It is easy and reliable to generate unique IDs and very light to deliver them.

Cons:

  • Not all platforms are able to transfer their artist ID. Many large-audience services won’t be able to identify and delivery an artist with anything else than an artist name
  • An artist which is delivered by two different music providers will never have the same internal ID for the same artist. This will need a manual association for both of them.
  • If an artist move from one provider to another one, the internal ID won’t follow. Internal IDs are, by definition, not consistent from one DB to another.

Delivering an external artist id

p4

This seems to be the right option. However, some issues must also be considered here.

Pros:

  • The external ID will represent the exact actual artist from one DB to another. If the platform associates an artist with this external ID, no discography mix will be to expect.

Cons:

  • The external database must already store the artist, which won’t be the case for new artists, unless the music provider creates it (additional action)
  • No external database is recognized as a common standard. Which means the platform and the music providers should agree on a common external database, leading to the support of several external databases for providers and platforms
  • The external database may differentiate properly two homonymous artists, however both the music provider and the music platform will need to associate this artist (external ID) to their own internal ID. It is hard to map a local artist with an external artist. Manually, only a very small portion of artists would be covered, and algorithmically, this must take in consideration additional data such as the birth dates, discography, or anything common between the contributor and the external DB.

If you look at the Nirvana page on Discogs, you’ll get an idea of why this remains a complicated option.

Then, is there any good option?

In this emerging market, as often, the current best option is… a bit of everything. The mass of coming in recordings and artists forces platforms to find the correct homonym artist from all the given options:

  1. An external ID is given, and this external ID is mapped to a local artist.
  2. Else, an internal ID is delivered and mapped to a local artist
  3. Else, this artist name associated with the music provider ID is locally known to be an artist and not another one
  4. Else, map this artist to a “random” artist with the same name locally. This is the worst case, leading to major artist releases to be mapped to an unknown / suspect artist page ; or the contrary.

Metadata manual management

To limit the impact of the previous worst case (4.), the platform  will always have a manual watchdog to associate a given release to the right artist. This is a punctual action and can hardly be seriously applied to the thousands of releases coming in everyday.

Local music provider ID / artist ID associations are to be set manually. This will impact the forthcoming deliveries having this internal ID.

p5

Other more complex tools can be defined. If an internal ID is given by the music provider but unknown locally, an alert could be raised to force someone to create a new generic rule. Another tool could suggest automatically one or several artists from an external database in order to create such a rule.

More complex tools, from a wide range of public data considered semantically, could eventually find the best artist from a list of homonyms, considering the album or track it is associated to.

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s