-
-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unwanted unescaping of http url strings #355
Comments
Wouldn't it make more sense to escape them with the character function, with backslashes instead of html codes? |
I see what the issue is: |
This is my tested recoding of this function. It completely removed the escaping functions, as they are not required anymore. What I am not sure about is, if inside quotes, backslashes should also quote the quote itself. Example: I can implement that, but it was not implemented before and I am not sure if the spec even allows that. This document states, that is is not implemented very often, so we could just stick to that? def parts(self):
"""
Split the content line up into (name, parameters, values) parts.
Example with parameter:
DESCRIPTION;ALTREP="cid:[email protected]":The Fall'98 Wild
Example without parameters:
DESCRIPTION:The Fall'98 Wild
https://icalendar.org/iCalendar-RFC-5545/3-2-property-parameters.html
"""
try:
st = self
name_split = None
value_split = None
in_quotes = False
# Any character can be escaped using a backslash, e.g.: "test\:test"
quote_character = False
for i, ch in enumerate(st):
# We can also quote using quotation marks. This ignores any output, until another quote appears.
if ch == '"':
in_quotes = not in_quotes
continue
# Ignore input, as we are currently in quotation mark quotes
if in_quotes:
continue
# Skip quoted character
if quote_character:
quote_character = False
continue
# The next character should be ignored
if ch == '\\':
quote_character = True
continue
# The name ends either after the parameter or value delimiter
if ch in ':;' and not name_split:
name_split = i
# The value starts after the value delimiter
if ch == ':' and not value_split:
value_split = i
# Get name
name = st[:name_split]
if not name:
raise ValueError('Key name is required')
validate_token(name)
# Check if parameters are empty
if not name_split or name_split + 1 == value_split:
raise ValueError('Invalid content line')
# Get parameters (text between ; and :)
params = Parameters.from_ical(st[name_split + 1: value_split],
strict=self.strict)
# Get the value after the :
values = st[value_split + 1:]
return (name, params, values)
except ValueError as exc:
raise ValueError(
"Content line could not be parsed into parts: '%s': %s"
% (self, exc)
) |
Here is a VEVENT to test with: BEGIN:VEVENT
DTSTART:20220305T200000Z
DTSTAMP:20220612T093000Z
UID:6co62d1l6cs3eb9lcgp3cb9k6ssm6b9ochim8b9g71hjedb4c8pj6p9pc4@google.com
CREATED:20220223T074954Z
DESCRIPTION:<html-blob>Feier Deine Jugend! <a href="https://www.facebook.co
m/events/1213722619037860?acontext=%7B%22event_action_history%22%3A[%7B%22s
urface%22%3A%22page%22%7D]%7D">https://www.facebook.com/events/121372261903
7860?acontext=%7B%22event_action_history%22%3A[%7B%22surface%22%3A%22page%2
2%7D]%7D</a></html-blob>
LAST-MODIFIED:20220225T121837Z
LOCATION:Removed
SEQUENCE:1
STATUS:CONFIRMED
SUMMARY:BRAVO HITS Party
TRANSP:OPAQUE
END:VEVENT |
And this is my code as monket patch, so people coming here via google can hotfix their library right now: # Monkey patch icalendar bug
# https://medium.com/@chipiga86/python-monkey-patching-like-a-boss-87d7ddb8098e
# https://github.com/collective/icalendar/issues/355
from icalendar.parser import validate_token
from icalendar.parser import Parameters
def parts_patched(self):
"""
Split the content line up into (name, parameters, values) parts.
Example with parameter:
DESCRIPTION;ALTREP="cid:[email protected]":The Fall'98 Wild
Example without parameters:
DESCRIPTION:The Fall'98 Wild
https://icalendar.org/iCalendar-RFC-5545/3-2-property-parameters.html
"""
try:
st = self
name_split = None
value_split = None
in_quotes = False
# Any character can be escaped using a backslash, e.g.: "test\:test"
quote_character = False
for i, ch in enumerate(st):
# We can also quote using quotation marks. This ignores any output, until another quote appears.
if ch == '"':
in_quotes = not in_quotes
continue
# Ignore input, as we are currently in quotation mark quotes
if in_quotes:
continue
# Skip quoted character
if quote_character:
quote_character = False
continue
# The next character should be ignored
if ch == '\\':
quote_character = True
continue
# The name ends either after the parameter or value delimiter
if ch in ':;' and not name_split:
name_split = i
# The value starts after the value delimiter
if ch == ':' and not value_split:
value_split = i
# Get name
name = st[:name_split]
if not name:
raise ValueError('Key name is required')
validate_token(name)
# Check if parameters are empty
if not name_split or name_split + 1 == value_split:
raise ValueError('Invalid content line')
# Get parameters (text between ; and :)
params = Parameters.from_ical(st[name_split + 1: value_split],
strict=self.strict)
# Get the value after the :
values = st[value_split + 1:]
return (name, params, values)
except ValueError as exc:
raise ValueError(
"Content line could not be parsed into parts: '%s': %s"
% (self, exc)
)
from icalendar import parser
parser.Contentline.parts = parts_patched
# End of monkey patch |
@NicoHood Would it be ok for you to create a pull request for this? It could be just the code. I am not a contributor to this project but willing to look at it. |
Sure. #356 |
Hi!
I have an url in my ical event description that is already html encoded. Here is an example:
This is how it looks in the ical file:
But some characters now get unescaped for some unknown reason:
It turns out, that this code causes the issue:
https://github.com/collective/icalendar/blob/master/src/icalendar/parser.py#L273
It converts
%3A
to:
which in my case is NOT wanted. The url is broken then.Why was this html unescape introduced and how can we fix that?
The text was updated successfully, but these errors were encountered: