Avoid guessable IDs

Avoid guessable IDs

ยท

3 min read

When developing a web application, you often have to store user-generated entities such as user profiles, photos, posts and videos.

Chances are you store them in a database, using an integer ID, generated by a sequence.

So, for example, each newly posted photo gets a numeric ID which is +1 than the previously stored one.

From your pages, you just refer to them like this:

<img src="/photo/1">
...
<img src="/photo/2">

Your web server extracts the ID from the image URL, selects the image from the database and returns it.

While this technically is perfectly fine, there is one lurking security problem:

Beware of scrapers

Storing the photos with consecutive IDs makes it effortless to guess valid img URLs, and download all the available photos. One would just try /photo/1, /photo/2, /photo/3, and so on. Chances are very high to fetch a valid photo.

Now image an online service like MindMates would store user photos like this. With a single one-liner, you could fetch all existing user photos:

curl -O https://MindMat.es/photo/[0-9999999].jpg

This would be a privacy nightmare.

Solution

To avoid such easy scraping, use hard-to-guess IDs instead of simple consecutive ones.

UUIDs

Universally unique identifiers are 128bit numbers randomly chosen in a way, that they are "globally unique" - that is, collisions are veeeeery unlikely.

A literal representation of an example UUID is 37ce9786-5614-4def-9c47-7bb06eecba22.

So, if you used UUIDs to store and refer to your photos - most modern database systems support UUIDs natively - instead of simple integer IDs, your img tags would look like this:

<img src="/photo/71d22a5f-3dd6-4058-a750-802b0be85cb3">
...
<img src="/photo/d7ea87b3-1e2f-4930-8841-61e0fb7a496d">

This makes it practically impossible to scrape your site's photos in a reasonable time.

Only downside is that UUIDs may be a tad slower depending on your database, need a bit more storage and are generally just less human-readable than plain integers.

If this bothers you, you might just add another column to your database, storing the UUID separately and independently of an internally used numeric ID.

Hashids

Quite similarly you could use hashids or Nano IDs instead of UUIDs. There are shorter, easier to read and still good enough to make scraping almost impossible.