Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pattern TZ does not include most timezones #172

Open
tithar opened this issue Sep 15, 2016 · 9 comments · May be fixed by #235
Open

Pattern TZ does not include most timezones #172

tithar opened this issue Sep 15, 2016 · 9 comments · May be fixed by #235

Comments

@tithar
Copy link

tithar commented Sep 15, 2016

From: https://github.com/logstash-plugins/logstash-patterns-core/blob/master/patterns/grok-patterns

Doesn't include IST.
https://www.timeanddate.com/time/zones/ist

From:
TZ (?:[APMCE][SD]T|UTC)

To:
TZ (?:[APMCEI][SD]T|UTC)

@talevy
Copy link

talevy commented May 9, 2017

nor does it support British Summer Time (BST).

@jordansissel
Copy link
Contributor

It's missing lots of timezones because it is continental-US focused for whatever reason.

So, do we try to add all known abbreviations for timezones? Or do we change TZ to be something like [A-Z]{3,4} to match any 3-4 character capitalized text?

@robin13 robin13 changed the title Pattern TZ does not include India Standard Time (IST) Pattern TZ does not include most timezones Sep 11, 2018
@robin13
Copy link

robin13 commented Sep 11, 2018

Here is a pretty comprehensive list of timezones:
https://www.timeanddate.com/time/zones/

+1 to the [A-Z]{1,5} pattern (because some are only one letter (e.g. military G), and some are up to 5 letters (e.g. IRKST for Irkutsk Summer Time)

@robin13
Copy link

robin13 commented Sep 11, 2018

I've made a pull request with this change: #235 but I'm not 100% sure it's the right thing to do...
Pro: it would now actually match all timezone abbreviations which would more accurately match what one would assume it should do
Con: it is now one to five upper case letters from A to Z... that includes a lot more than actual timezones...

@guyboertje
Copy link

@robin13
I guess we don't want INFO and DEBUG matched though.

@guyboertje
Copy link

guyboertje commented Sep 11, 2018

This is a list of abbrev (scraped from timeanddate.com page above), INFO DEBUG ALERT WARN should not be matched

ACDT ACST ACT ACWST ADT AEDT AEST AFT AKDT AKST ALMT AMST AMT ANAST ANAT AQTT ART AST AWDT AWST AZOST AZOT AZST AZT BNT BOT BRST BRT BST BTT CAST CAT CCT CDT CEST CET CHADT CHAST CHOST CHOT CHUT CIDST CIST CKT CLST CLT COT CST CVT CXT DAVT DDUT EASST EAST EAT ECT EDT EEST EET EGST EGT EST FET FJST FJT FKST FKT FNT GALT GAMT GET GFT GILT GMT GST GYT HDT HKT HOVST HOVT HST ICT IDT IOT IRDT IRKST IRKT IRST IST JST KGT KOST KRAST KRAT KST KUYT LHDT LHST LINT MAGST MAGT MART MAWT MDT MHT MMT MSD MSK MST MUT MVT MYT NCT NDT NFT NOVST NOVT NPT NRT NST NUT NZDT NZST OMSST OMST ORAT PDT PET PETST PETT PGT PHOT PHT PKT PMDT PMST PONT PST PWT PYST PYT QYZT RET ROTT SAKT SAMT SAST SBT SCT SGT SRET SRT SST SYOT TAHT TFT TJT TKT TLT TMT TOST TOT TRT TVT ULAST ULAT UYST UYT UZT VET VLAST VLAT VOST VUT WAKT WARST WAST WAT WEST WET WFT WGST WGT WIB WIT WITA WST WT YAKST YAKT YAPT YEKST YEKT INFO DEBUG ALERT WARN

here is one regex...

\b(?:((A[CEFKLMNRWZ]?[AMOW]?|AQT|B[NORT]?R?|C[ACEHIKLOSVX]?[AOU]?|DAV|DDU|E[ACEG]?S?|F[EJKN]|G[AEFIMY]?[LM]?|H[K]?|HOV|I[CORS]?[K]?|JAY|J|K[GOR]?|KRA|KUY|L[HIK]|LIG|LIN|M[AEHMPUVY]?[GRW]?|N[CFPRUZ]?|NOV|OMS|ORA|P[EGHKMOWY]?[ONT]?|QYZ|RE|ROT|S[ABCGR]?[EKM]?|SYO|T[FJKLMORV]|TAH|TRU|U[CUYZ]?|ULA|V[EU]|VLA|VOL|VOS|W[AEFG]?|WAK|WAR|WET|WI|Y[AE][KP])(D?ST|D?T|T)|BRA|MEZ|MSD|MSK|WIB|WITA|UTC|ZULU|Z))\b

@robin13 WDYT?

@guyboertje
Copy link

I think we should create a MIL_TZ with [A-IK-Z] no J at all.

@robin13
Copy link

robin13 commented Sep 14, 2018

Nice work @guyboertje ! :)
I'm not sure about excluding military - they are actually used timezones... many Grok patterns can match on things which they are not intended for, but it's all about sequence and context. If your pattern is:

%{DATE_EU} %{TIME} %{TZ}

This would match nicely against 01.01.2018 12:00:00 G but it would not match against This is Great even though the string contains two potential single-letter timezones (T and G).

@guyboertje
Copy link

@robin13

I guess the final Z in that regex can be replaced with [A-IK-Z]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants