Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some data are switched in code list csv #37

Open
dwaam opened this issue Nov 12, 2024 · 14 comments
Open

Some data are switched in code list csv #37

dwaam opened this issue Nov 12, 2024 · 14 comments

Comments

@dwaam
Copy link

dwaam commented Nov 12, 2024

Hi,

I wrote an issue previously: #34

In fact, the duplicated headers are gone, some unlocodes are fixed, but there are still issues:

  • all the data is duplicated
    • each data is present twice
    • there is an entry with 4 lines: country GR, code JSY

exemples:

// Line 6909 - 6911
,GR,JSY,Syra (Syros),Syra (Syros),,AI,1-------,9601,,,
,GR,SYO,Syra Island,Syra Island,,RL,--3-----,0201,,3726N 02455E,
,GR,JSY,Syros (Syra),Syros (Syra),,AI,1-------,9601,,,
// Line 122939 - 122941
,GR,JSY,Syra (Syros),Syra (Syros),,AI,1-------,9601,,,
,GR,SYO,Syra Island,Syra Island,,RL,--3-----,0201,,3726N 02455E,
,GR,JSY,Syros (Syra),Syros (Syra),,AI,1-------,9601,,,

// Here, both status are empty, but are switched with function in second entry
// Line 107845
,FR,LR7,L'Aiguillon-la-Rouge,L'Aiguillon-la-Rouge,61,-----6--,,2101,,4816N 00042E,
// Line 223874 
,FR,LR7,L'Aiguillon-la-Rouge,L'Aiguillon-la-Rouge,61,,-----6--,2101,,4816N 00042E,

// Same here with not empty status
// Line 37130
,SV,PDC,Paso de la Ceiba,Paso de la Ceiba,SA,RQ,--3----B,0901,,1425N 08926W,
// Line 153159
,SV,PDC,Paso de la Ceiba,Paso de la Ceiba,SA,--3----B,RQ,0901,,1425N 08926W,

Is it possible to fix the data?
Thanks !

@sabas
Copy link
Contributor

sabas commented Nov 12, 2024

@gradedSystem

@gradedSystem
Copy link
Member

@sabas @dwaam looking through it right now

@gradedSystem
Copy link
Member

@sabas @dwaam
I fixed down the dataset removing duplicate rows and enhancing the swap function between Function and Status columns, if any other issue happens just tag me will fix it anytime

@dwaam
Copy link
Author

dwaam commented Nov 14, 2024

Thanks !
I see that the Data in the UN seems not correct:
https://service.unece.org/trade/locode/gr.htm

image

The same UNLOCODE has two entries, and both are validated.

Do you think the data is correct ?
If not, do you know if there is a way to ask them to fix it?

@gradedSystem
Copy link
Member

gradedSystem commented Nov 14, 2024

  1. Great observations @dwaam I am not sure if the data is correct cause from the issues I have seen people were complaining about the source, but you can refer to this issue which is old, but I think addresses what are you pointing out : Double unlocode AXMHQ #17
  2. I am not sure, I wrote the email to the un.org website and pointed your issue there gonna wait an email from them and reply as soon as I get reply 🫡

@dwaam
Copy link
Author

dwaam commented Nov 14, 2024

Cool thanks @gradedSystem ;)

@sabas
Copy link
Contributor

sabas commented Nov 14, 2024

@dwaam @gradedSystem that's normal, note that the two entries are in the form X (Y) and Y (X), indeed the same as ̀#17
Probably the "fix" would be to make it an alias entry (see CHGVA for an example), but who takes the responsibility to choose which is the primary entry?
Can you generate a list of duplicates? I can ask at the next meeting but I think if's a "feature" more than a bug

@dwaam
Copy link
Author

dwaam commented Nov 14, 2024

Not really, because our unlocode will be our natural id and the key. In all the unlocodes, it is the only one that I see like that, so it seems to be a mistake in the UN.
If at least they had different status to be able to choose the right one, but not.

@dwaam
Copy link
Author

dwaam commented Nov 15, 2024

Hi,
It seems good with the correction, the data in the pull request seems ok, thanks ;)

┌───────┬─────────┬──────────┐
│ count │ Country │ Location │
│ int64 │ varchar │ varchar  │
├───────┼─────────┼──────────┤
│     4 │ US      │ TRI      │
│     3 │ US      │ BGM      │
│     3 │ US      │ LEB      │
│     3 │ US      │ GGG      │
│     3 │ US      │ MBS      │
│     3 │ US      │ PHF      │
│     2 │ HR      │ GRA      │
│     2 │ HU      │ FEL      │
│     2 │ HU      │ SZN      │
│     2 │ KH      │ PPT      │
│     2 │ PL      │ KET      │
│     2 │ SK      │ VRA      │
│     2 │ TR      │ MGL      │
│     2 │ BE      │ SPI      │
│     2 │ CZ      │ PKR      │
│     2 │ CZ      │ 9YI      │
│     2 │ FI      │ EJO      │
│     2 │ FI      │ HMN      │
│     2 │ FI      │ KIM      │
│     2 │ FI      │ MAX      │
│     2 │ FI      │ MIK      │
│     2 │ FI      │ RAU      │
│     2 │ FI      │ TER      │
│     2 │ GR      │ LEV      │
│     2 │ HU      │ UJR      │
│     2 │ SK      │ PDT      │
│     2 │ SK      │ ZKL      │
│     2 │ US      │ CVO      │
│     2 │ US      │ GON      │
│     2 │ BE      │ ODE      │
│     2 │ FI      │ DLS      │
│     2 │ FI      │ HKO      │
│     2 │ FI      │ KAA      │
│     2 │ FI      │ PAR      │
│     2 │ HR      │ MET      │
│     2 │ HU      │ TEY      │
│     2 │ MG      │ IVA      │
│     2 │ MT      │ SJN      │
│     2 │ SK      │ BNI      │
│     2 │ SK      │ MES      │
│     2 │ SK      │ R4B      │
│     2 │ SK      │ VEP      │
│     2 │ SO      │ DOW      │
│     2 │ TR      │ OPR      │
│     2 │ US      │ HIB      │
│     2 │ US      │ GSP      │
│     2 │ US      │ POY      │
│     2 │ BE      │ LNY      │
│     2 │ BE      │ SJN      │
│     2 │ BE      │ SLW      │
│     2 │ BE      │ SPO      │
│     2 │ FI      │ PRV      │
│     2 │ FI      │ KOK      │
│     2 │ FI      │ LAP      │
│     2 │ FI      │ UKI      │
│     2 │ FI      │ RYM      │
│     2 │ HU      │ BLA      │
│     2 │ HU      │ VES      │
│     2 │ HU      │ HAF      │
│     2 │ IN      │ MRM      │
│     2 │ IT      │ PFX      │
│     2 │ LV      │ SKR      │
│     2 │ US      │ MDJ      │
│     2 │ VN      │ VAG      │
│     2 │ AX      │ MHQ      │
│     2 │ BE      │ ESE      │
│     2 │ CZ      │ MUV      │
│     2 │ CZ      │ SIB      │
│     2 │ CZ      │ SVR      │
│     2 │ CZ      │ TEC      │
│     2 │ CZ      │ TZK      │
│     2 │ FI      │ HOU      │
│     2 │ FI      │ KAJ      │
│     2 │ FI      │ NLI      │
│     2 │ FI      │ TOR      │
│     2 │ HR      │ CAK      │
│     2 │ HU      │ BOY      │
│     2 │ PL      │ MRC      │
│     2 │ TR      │ MKP      │
│     2 │ US      │ RDM      │
│     2 │ BE      │ WBV      │
│     2 │ CZ      │ BOO      │
│     2 │ CZ      │ BVZ      │
│     2 │ FI      │ SKV      │
│     2 │ FI      │ KOR      │
│     2 │ FI      │ MHQ      │
│     2 │ FI      │ NRP      │
│     2 │ FI      │ SIP      │
│     2 │ GR      │ VTH      │
│     2 │ HU      │ VCS      │
│     2 │ HU      │ ZZB      │
│     2 │ HU      │ OTN      │
│     2 │ IN      │ NSA      │
│     2 │ MT      │ SGW      │
│     2 │ PL      │ GWM      │
│     2 │ US      │ LEW      │
│     2 │ US      │ BWI      │
│     2 │ US      │ PTN      │
│     2 │ CZ      │ PRY      │
│     2 │ FI      │ ESP      │
│     2 │ FI      │ KJA      │
│     2 │ FI      │ LPP      │
│     2 │ GR      │ LAV      │
│     2 │ HR      │ SDA      │
│     2 │ IT      │ FCO      │
│     2 │ PL      │ SLA      │
│     2 │ PL      │ STJ      │
│     2 │ RU      │ YEK      │
│     2 │ SK      │ HOK      │
│     2 │ SK      │ VUE      │
│     2 │ SN      │ TOU      │
│     2 │ TR      │ IZM      │
│     2 │ BE      │ VOS      │
│     2 │ BE      │ MOS      │
│     2 │ BE      │ SGI      │
│     2 │ CZ      │ JNR      │
│     2 │ FI      │ TKU      │
│     2 │ FI      │ KRK      │
│     2 │ FI      │ PRS      │
│     2 │ FI      │ NAU      │
│     2 │ LT      │ DID      │
│     2 │ SK      │ PEL      │
│     2 │ US      │ MRY      │
│     2 │ US      │ OXR      │
│     2 │ CZ      │ KTA      │
│     2 │ CZ      │ MAE      │
│     2 │ CZ      │ OST      │
│     2 │ FI      │ KIN      │
│     2 │ FI      │ INK      │
│     2 │ FI      │ LHI      │
│     2 │ FI      │ SVL      │
│     2 │ FI      │ POH      │
│     2 │ FI      │ TMP      │
│     2 │ LT      │ KEL      │
│     2 │ LU      │ SKK      │
│     2 │ BE      │ BRU      │
│     2 │ FI      │ KAL      │
│     2 │ FI      │ KVH      │
│     2 │ FI      │ MLX      │
│     2 │ FI      │ UKP      │
│     2 │ FI      │ TOK      │
│     2 │ GR      │ HER      │
│     2 │ HR      │ GSP      │
│     2 │ HR      │ OTO      │
│     2 │ HU      │ LOV      │
│     2 │ LV      │ BRC      │
│     2 │ PL      │ BED      │
│     2 │ PL      │ BEL      │
│     2 │ SK      │ VLN      │
│     2 │ US      │ HTS      │
│     2 │ DE      │ LAA      │
│     2 │ ES      │ LDT      │
│     2 │ FI      │ IIS      │
│     2 │ FI      │ VAT      │
│     2 │ GR      │ SYS      │
│     2 │ GR      │ JSY      │
│     2 │ LT      │ MOS      │
│     2 │ PL      │ MIK      │
│     2 │ BE      │ BTS      │
│     2 │ BE      │ SBK      │
│     2 │ BE      │ UKE      │
│     2 │ FI      │ RAA      │
│     2 │ FI      │ KAS      │
│     2 │ FI      │ PER      │
│     2 │ FI      │ SBG      │
│     2 │ SK      │ CEL      │
│     2 │ US      │ GSO      │
│     2 │ US      │ SUN      │
│     2 │ BE      │ KAN      │
│     2 │ CZ      │ CYD      │
│     2 │ FI      │ PIR      │
│     2 │ FI      │ TAI      │
│     2 │ FI      │ KUS      │
│     2 │ HR      │ KAS      │
│     2 │ LT      │ AGM      │
│     2 │ RO      │ RGU      │
│     2 │ SK      │ TOR      │
│     2 │ SN      │ DUR      │
│     2 │ US      │ SRQ      │
│     2 │ US      │ BYI      │
│     2 │ US      │ RDU      │
│     2 │ US      │ MSL      │
│     2 │ BE      │ OST      │
│     2 │ CZ      │ BAT      │
│     2 │ CZ      │ UCN      │
│     2 │ FI      │ ENF      │
│     2 │ FI      │ HEL      │
│     2 │ FI      │ KEM      │
│     2 │ FI      │ VAA      │
│     2 │ GR      │ KIM      │
│     2 │ HU      │ FZS      │
│     2 │ HU      │ MZK      │
│     2 │ LT      │ DEW      │
│     2 │ PL      │ KMS      │
│     2 │ BE      │ ITR      │
│     2 │ CZ      │ CVA      │
│     2 │ CZ      │ PEV      │
│     2 │ FI      │ HYV      │
│     2 │ FI      │ JPA      │
│     2 │ FI      │ LHJ      │
│     2 │ FI      │ LOV      │
│     2 │ FI      │ MER      │
│     2 │ HU      │ HAS      │
│     2 │ HU      │ KOZ      │
│     2 │ PL      │ DOL      │
│     2 │ PL      │ WLR      │
│     2 │ SK      │ VOC      │
│     2 │ TR      │ SRS      │
│     2 │ US      │ EWB      │
│     2 │ BE      │ MSJ      │
│     2 │ CZ      │ KAD      │
│     2 │ FI      │ JVP      │
│     2 │ FI      │ RUO      │
│     2 │ FI      │ VKO      │
│     2 │ GR      │ HYD      │
│     2 │ JP      │ AGC      │
│     2 │ LT      │ EMK      │
│     2 │ LV      │ MPS      │
│     2 │ MD      │ VUL      │
│     2 │ RO      │ DIM      │
│     2 │ SK      │ VNV      │
│     2 │ US      │ MPV      │
│     2 │ US      │ PSB      │
│     2 │ US      │ FHU      │
│     2 │ US      │ MFE      │
│     2 │ BE      │ TRN      │
│     2 │ BE      │ ZUN      │
│     2 │ FI      │ POR      │
│     2 │ FI      │ HMY      │
│     2 │ FI      │ KER      │
│     2 │ FI      │ KRS      │
│     2 │ FI      │ OUL      │
│     2 │ FI      │ TVS      │

Those are the remaining duplicates, but it seems to be an issue directly in UN data.

@sabas
Copy link
Contributor

sabas commented Nov 15, 2024

Yeah I suggest to treat that as aliases...

@gradedSystem
Copy link
Member

Hi @sabas @dwaam sorry for long time not replying I was ill, but now this time I am ready to work on this again, so as far as I understood we should keep the aliases right?

@dwaam
Copy link
Author

dwaam commented Nov 19, 2024

Hi, the fact that there are duplicated, I handled it directly. So no need for me now ;), but it's weird that the UN authorise duplicated entries.

@gradedSystem
Copy link
Member

@dwaam so the PR is correct right?

@dwaam
Copy link
Author

dwaam commented Nov 19, 2024

Yep, for the duplication it's good, I took care of it on my side.
I'll continue to watch your repo if you change the behavior, but if in the future, only one line per unlocode is in the CSV, it is even better for me :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants