I'm thinking of usage of text IDs in SQL database, similar to how Stripe generates them, e.g.: ch_19iRv22eZvKYlo2CAxkjuHxZ
.
I found this StackOverflow very helpful.
I could use the function to generate such IDs with some entropy provided by pgcrypto
extension function gen_random_bytes
, but still, I'm hesitant.
Especially, after reading this comment:
No, it absolutely is not safe to use in any way shape or form which is why I'm for closing this question. If you want something that's safe to use, use UUID. If you want to play around with something likely to burn you severely and leave you crying. A solution which requires you create your own function that is worse in every way than the stock feature set to do this, then have it at. =)
I would not like to reach a point later on production when the IDs start to duplicate.
Therefore, I'm more biased to use UUIDs, which on the other hand are not very user-friendly, e.g. when they are used in URLs, reports, etc. I personally like the way Stripe does it.
What's your experience? Did anyone do sth similar to Stripe? Did you face any problems?
Top comments (9)
For distributed databases such as DynamoDB (NoSQL) it makes sense to use UUIDs for ID's. For a typical SQL database, is there a reason you would want to use this instead of integer ID's?
One reason is that I don't want to expose some information in the API, e.g. by using auto-incremented integer IDs in product orders I let outside world to know how many orders my system processes. I know I could translate the int IDs to API IDs, maybe this is better approach?
you can still have the database auto increment ID's, you just need to index on another ID that may be a composite identifier based on who buys it for instance (customer details, customer id), the actual order number, and some prefix based on the date of the order.
There are all kinds of ways to mask the actual number of orders and still use auto increment id's in your database
Thanks for the explanation.
Order numbers is an example that came to my head, but imagine that someone wants to exploit my security mechanism and sees url somedomain/api/images/123, then comes with an idea that image IDs may be auto-incremented and I had a security issue that image access is not protected for only owners of an image. Someone tries somedomain/images/124 and gets an image of the other user. These mistakes happen, with text ID it's harder to guess..
You also wrote that for distributed DBs it makes sense to use UUIDs. We can have the same case when some day my data grows to the amount that I want to create shards, then having incremented IDs I would have to remap them somehow based on shard ID to avoid same IDs on different shards.
I agree that for most cases incremented IDs work fine, but my question was not about pros and cons of using text IDs vs numeric IDs but rather how to properly implement such text IDs.
You're mixing two different things here. One is database identifier the other is role based security and general security.
If you need to protect image from access, security by obscurity is not the answer. So the fact they can guess the ID's is not the problem, it's your security mechanism that is the issue that needs to be resolved.
How to implement them? UUIDs can be generated. There's a very minuscule chance of collision, also dependent on the implementation you use. So that would be a first to research, a good library for v4 UUID's
I understand the difference between security and DB IDs, but again, would you agree with me on the statement that it's easier to uncover the security issue with auto-incremented IDs?
I know how to generate UUIDs but again, the question was about other mechanisms to generate text IDs than UUIDs.
Anyway, I got the answer from others in the comments so thanks for your time spent on helping me.
No, auto incremented IDs or not does not have any bearing on security whatsoever. As I've already mentioned, obscurity does not equal security. Just because you choose a random identifier does not make it suddenly more secure.
Your statement would only be correct if it is indeed obscurity that was the only mechanism of protection.
I would say if there's a business need for certain identifier to exist, that should be the primary reason next to any technical requirement that may arise due to sharding/distributed data storage.
You should have a look at Snowflake ID generator, it generates time based 64 bits timestamp.
One way to use auto-incrementing integers and easily mask them from the user is with hashids.org/