-
Notifications
You must be signed in to change notification settings - Fork 102
Create an 'id' field per card #63
Comments
You could potentially You could then truncate a portion of the result (maybe down to only 15 characters), albeit it does increase the risk for collisions. I've yet seen a cockatrice hash collide, so this doesn't seem like a terrible way to go. |
Hey, I'm a dev for www.ozguild.co and we have been using your database to create unique hashes already. I just sent you an email about some changes to Collectors edition basic lands which messed with our system (multiple cards have the exact same details). We would be happy to let you know how we have been doing it and working with you to get this implemented as it would be very useful to us |
Sounds awesome @RavenousWolf :) |
Ok, an 'id' field will be coming soon to mtgjson. It is composed of sha1(setCode + cardName + imageName) |
I'm doing a bit of work with the foreign names and I just found that this "pronto" solution to de UUID of any card might not be the best. First, it uses an arbitrary string field ('imageName') which is already deprecated (even if its still supported). Second it uses a 160bit cryptographic hash for something as a uuid in a set with (at most) 200k records (I'm exagerating and even counting like there is every translation for every print), this is not only not efficient, it is exagerated in most aspects and can't be reproduced without knowledge of this specific environment. IMHO an ID should represent a card by itself in its printing form, though it must include the fields that better represent that card (physical or virtual), in the most straightforward and simple way, those fields are:
This 4 fields can represent ANY card (even if number is missing it should not collide in it's set ie. Promos) ever printed, and this 4 fields can be obtained without any external reference (like this project and the imageName field) so they are a more universal way of describing that specific card. On the other point, even if SHA1 is a very well known Hashing function, this is definitely not it's use, first of all is a correct cryptographic hashing algorithm, which makes it pretty secure, but also slow, and it produces a 160bits Sum which is actually HUGE for a set of records less than 0.0001% the size. I'm already doing collision tests using 32 and 64bit NON-Cryptographic functions like Fast-Hash, SipHash, Murmur3 (I recommend this one) using the fields provided above, NAME+SET+LANG+NUMBER? and it's working perfectly with 32bit Sums (a single integer) although I have found a lot of collisions on alternate art cards. So I'm currently trying to identify those. Thoughts? |
The four fields you mention are not unique enough to represent a card. Early sets do not have card numbers, so LEA for example has two swamps that have identical data, even the same artist, but they have different art so they are different cards. This is why the imageName field exists. Yes, I agree it is a mtgjson specific field, but it's used in the id, and other apps/websites use it too for uniqueness purposes, so therefore I'll never be dropping support for it. As for language, gatherer lacks foreign language info on many sets and mtgjson does not yet support multiple languages anyways (Issue #40) Still, when the day comes to add full foreign language support, I've detailed how I'll handle it in Issue #40 in regards to the id field so existing id's will not change. As far as the choice of hash function and the size of the id field, I don't necessarily disagree with you on that. |
A unique 'id' field per card should be added.
This has been requested a LOT by many people, as it would make importing into a database a lot easier. Also, this id field would be needed in order to implement #55 and #47
There are a number of different ways the id could be created. Some additional discussion and thought will be needed to pick the best way.
My initial hunch is a hash of (setCode + cardName + imageName) would work. Depending on the type of hash used though the id field might be kinda long, which I may want to avoid. This sort of 'predictable hash' would be needed so if anyone were to generate all the files from scratch the id's would end up being the same.
The text was updated successfully, but these errors were encountered: