Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"url:" tags parsing is surprising #5

Closed
Nadrieril opened this issue Mar 26, 2018 · 6 comments
Closed

"url:" tags parsing is surprising #5

Nadrieril opened this issue Mar 26, 2018 · 6 comments
Labels
Milestone

Comments

@Nadrieril
Copy link
Contributor

Tags whose value contain a colon confuse the parser. Granted, it's ruled out by the todo.txt spec but the url: tag it quite useful to me.
Would it be possible to not follow the standard for this use-case ?

If the task is:

2018-03-26 test url:http://example.org

then parsing and printing the task out displays:

2018-03-26 test//example.org url:http:                                                                                                                                                                          
@Nadrieril Nadrieril changed the title todo-txt doesn't like url: tags url: tags parsing is surprising Mar 26, 2018
@Nadrieril Nadrieril changed the title url: tags parsing is surprising "url:" tags parsing is surprising Mar 26, 2018
@sanpii sanpii added the bug label Mar 26, 2018
@sanpii
Copy link
Owner

sanpii commented Mar 26, 2018

It’s a difficult bug to balance: if the parser accept the / character in tag values, the URL in subject will be parsed as tag. Maybe we can exclude value beginning with a / character. It’s probably not possible to do that with a regex.

sanpii added a commit that referenced this issue Mar 26, 2018
@sanpii sanpii mentioned this issue Mar 26, 2018
@sanpii sanpii added this to the 1.2.0 milestone Mar 26, 2018
@Ekleog
Copy link
Contributor

Ekleog commented Mar 26, 2018

The spec states “Both key and value must consist of non-whitespace characters, which are not colons.”. So I guess the behavior in case of a tag that would contain two colons in the word is implementation-defined.

As such, I think we should think about what we want to do here. In my mind, a tag must always be enclosed by spaces: there is no reason a:b:c should be parsed as a:b being a tag and :c in the subject line.

Which means a:b:c should (at least in my mind) either be parsed as not being a tag at all (thus being all in the subject line), or as being a tag a:b with value c, or as a tag a with value b:c.

The tag a:b with value c does not make sense imo. Which leaves the other two options.

The first one is likely the closest to the todo.txt specification, but I'd argue use cases like putting URLs in tags trump strict specification compliance, and thus : should be allowed in tag values.

Adding a special-case for values beginning with a / character seems pretty fragile to me, as it'll exclude eg. mailto: URLs, etc. Also, it reduces overall consistency -- again, in my opinion only. :)

@Nadrieril
Copy link
Contributor Author

Nadrieril commented Mar 26, 2018

Nah, the problem is that there is already a workaround to prevent plain urls in the subject to be parsed as tags. This workaround simply disallows the / char in tags altogether.
From what I understand, in order to allow for url: tags, #6 changes the workaround to instead detect subjects that start with /. As such, the new workaround is closer to the spec than the previous one.
With the new workaround, a:b:c is parsed as a tag with key a and value b:c. The only tricky case was related to / chars.

@Ekleog
Copy link
Contributor

Ekleog commented Mar 26, 2018

Oh, I didn't even think there may already be a workaround for not detecting URLs as tags. Nevermind, then!

@sanpii
Copy link
Owner

sanpii commented Mar 27, 2018

Adding a special-case for values beginning with a / character seems pretty fragile to me, as it'll exclude eg. mailto: URLs, etc.

It’s impossible to say if mailto: is an URL or a tag. Maybe by using a blacklist for tag key, but it’s another issue (feel free to open another bug if you are interested by this problem).

Here I would like to find a good workaround to let URL (mostly begining with https://) in the subject and parse url:https://example.com as we can expect navely.

I think we can to not follow standard for this case, and propose our reflexion for the v2.

@Ekleog
Copy link
Contributor

Ekleog commented Mar 27, 2018

Good point indeed! I guess both options are good enough until the spec can catch up 😃

sanpii added a commit that referenced this issue Mar 28, 2018
@sanpii sanpii closed this as completed Jun 12, 2018
sanpii added a commit that referenced this issue Jun 12, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants