Skip to content
This repository has been archived by the owner on Dec 21, 2018. It is now read-only.

Create an 'id' field per card #63

Closed
Sembiance opened this issue Aug 3, 2015 · 6 comments
Closed

Create an 'id' field per card #63

Sembiance opened this issue Aug 3, 2015 · 6 comments

Comments

@Sembiance
Copy link
Contributor

A unique 'id' field per card should be added.

This has been requested a LOT by many people, as it would make importing into a database a lot easier. Also, this id field would be needed in order to implement #55 and #47

There are a number of different ways the id could be created. Some additional discussion and thought will be needed to pick the best way.

My initial hunch is a hash of (setCode + cardName + imageName) would work. Depending on the type of hash used though the id field might be kinda long, which I may want to avoid. This sort of 'predictable hash' would be needed so if anyone were to generate all the files from scratch the id's would end up being the same.

@ZeldaZach
Copy link
Member

You could potentially md5 the desired combination (setCode + cardName + imageName) as that's always a predictable length, 32.

You could then truncate a portion of the result (maybe down to only 15 characters), albeit it does increase the risk for collisions. I've yet seen a cockatrice hash collide, so this doesn't seem like a terrible way to go.

@Snawboll
Copy link

Snawboll commented Aug 6, 2015

Hey, I'm a dev for www.ozguild.co and we have been using your database to create unique hashes already. I just sent you an email about some changes to Collectors edition basic lands which messed with our system (multiple cards have the exact same details). We would be happy to let you know how we have been doing it and working with you to get this implemented as it would be very useful to us

@ZeldaZach
Copy link
Member

Sounds awesome @RavenousWolf :)

@Sembiance
Copy link
Contributor Author

Ok, an 'id' field will be coming soon to mtgjson.

It is composed of sha1(setCode + cardName + imageName)

@phrozen
Copy link

phrozen commented Sep 14, 2015

I'm doing a bit of work with the foreign names and I just found that this "pronto" solution to de UUID of any card might not be the best. First, it uses an arbitrary string field ('imageName') which is already deprecated (even if its still supported). Second it uses a 160bit cryptographic hash for something as a uuid in a set with (at most) 200k records (I'm exagerating and even counting like there is every translation for every print), this is not only not efficient, it is exagerated in most aspects and can't be reproduced without knowledge of this specific environment.

IMHO an ID should represent a card by itself in its printing form, though it must include the fields that better represent that card (physical or virtual), in the most straightforward and simple way, those fields are:

This 4 fields can represent ANY card (even if number is missing it should not collide in it's set ie. Promos) ever printed, and this 4 fields can be obtained without any external reference (like this project and the imageName field) so they are a more universal way of describing that specific card.

On the other point, even if SHA1 is a very well known Hashing function, this is definitely not it's use, first of all is a correct cryptographic hashing algorithm, which makes it pretty secure, but also slow, and it produces a 160bits Sum which is actually HUGE for a set of records less than 0.0001% the size. I'm already doing collision tests using 32 and 64bit NON-Cryptographic functions like Fast-Hash, SipHash, Murmur3 (I recommend this one) using the fields provided above, NAME+SET+LANG+NUMBER? and it's working perfectly with 32bit Sums (a single integer) although I have found a lot of collisions on alternate art cards. So I'm currently trying to identify those.

Thoughts?

@Sembiance
Copy link
Contributor Author

@phrozen

The four fields you mention are not unique enough to represent a card.

Early sets do not have card numbers, so LEA for example has two swamps that have identical data, even the same artist, but they have different art so they are different cards.

This is why the imageName field exists. Yes, I agree it is a mtgjson specific field, but it's used in the id, and other apps/websites use it too for uniqueness purposes, so therefore I'll never be dropping support for it.

As for language, gatherer lacks foreign language info on many sets and mtgjson does not yet support multiple languages anyways (Issue #40) Still, when the day comes to add full foreign language support, I've detailed how I'll handle it in Issue #40 in regards to the id field so existing id's will not change.

As far as the choice of hash function and the size of the id field, I don't necessarily disagree with you on that.
However, the 'id' field has already gone live, and tons of websites and apps utilize mtgjson. Undoubtedly several have already started relying on the 'id' field and changing it at this stage would make life very painful for those folks. So I won't be changing the id field at this point.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants