# Avoid guessable IDs

When developing a web application, you often have to store user-generated entities such as user profiles, photos, posts and videos.

Chances are you store them in a database, using an integer ID, generated by a sequence.

So, for example, each newly posted photo gets a numeric ID which is +1 than the previously stored one.

From your pages, you just refer to them like this:


```html
<img src="/photo/1">
...
<img src="/photo/2">
``` 

Your web server extracts the ID from the image URL, selects the image from the database and returns it.

While this technically is perfectly fine, there is one lurking security problem:

# Beware of scrapers

**Storing the photos with consecutive IDs makes it effortless to guess valid `img` URLs**, and download all the available photos.
One would just try `/photo/1`, `/photo/2`, `/photo/3`, and so on.
Chances are very high to fetch a valid photo.

Now image an online service like [MindMates](//MindMat.es) would store user photos like this. With a single one-liner, you could fetch *all* existing user photos:

```
curl -O https://MindMat.es/photo/[0-9999999].jpg
``` 

**This would be a privacy nightmare.**

# Solution

To avoid such easy scraping, **use hard-to-guess IDs** instead of simple consecutive ones.


## UUIDs

[Universally unique identifiers](https://en.wikipedia.org/wiki/Universally_unique_identifier) are 128bit numbers randomly chosen in a way, that they are "globally unique" - that is, collisions are veeeeery unlikely.

A literal representation of an example UUID is `37ce9786-5614-4def-9c47-7bb06eecba22`.

So, if you used UUIDs to store and refer to your photos - most modern database systems support UUIDs natively - instead of simple integer IDs, your `img` tags would look like this:

```html
<img src="/photo/71d22a5f-3dd6-4058-a750-802b0be85cb3">
...
<img src="/photo/d7ea87b3-1e2f-4930-8841-61e0fb7a496d">
``` 

This makes it practically impossible to scrape your site's photos in a reasonable time.

Only downside is that UUIDs may be a tad slower depending on your database, need a bit more storage and are generally just less human-readable than plain integers.

If this bothers you, you might just add another column to your database, storing the UUID separately and independently of an *internally* used numeric ID.

## Hashids

Quite similarly you could use [hashids](https://hashids.org) or [Nano IDs](https://github.com/ai/nanoid) instead of UUIDs.
There are shorter, easier to read and still good enough to make scraping almost impossible. 

%%[footer]






