-
Notifications
You must be signed in to change notification settings - Fork 870
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize raw HTML post-processor #1510
base: master
Are you sure you want to change the base?
Changes from all commits
4334dee
d1eadc3
1bc8c54
d4813dc
6a4bfcb
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -29,7 +29,7 @@ | |
import warnings | ||
from functools import wraps, lru_cache | ||
from itertools import count | ||
from typing import TYPE_CHECKING, Generic, Iterator, NamedTuple, TypeVar, TypedDict, overload | ||
from typing import TYPE_CHECKING, Callable, Generic, Iterator, NamedTuple, TypeVar, TypedDict, overload | ||
|
||
if TYPE_CHECKING: # pragma: no cover | ||
from markdown import Markdown | ||
|
@@ -44,7 +44,269 @@ | |
""" | ||
|
||
|
||
BLOCK_LEVEL_ELEMENTS: list[str] = [ | ||
# TODO: Raise errors from list methods in the future. | ||
# Later, remove this class entirely and use a regular set. | ||
class _BlockLevelElements: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Elsewhere the type is specified as There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My feeling was that it was OK to lie about the type, so that users actually get type warnings when they use it as a list instead of a set. That's one more (very effective) way of communicating the change. And since we implement all set functionality, the lie is not too big 🙂 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We could do that, but some users run type validation on their code and this could result in bug reports. Either we need to fully document our reasoning for the inconsistency in the code comments or we need to make a change to the code. |
||
def __init__(self, elements: list[str], /) -> None: | ||
self._list = elements.copy() | ||
self._set = set(self._list) | ||
|
||
def __add__(self, other: list[str], /) -> list[str]: | ||
warnings.warn( | ||
"Using block level elements as a list is deprecated, use it as a set instead.", | ||
DeprecationWarning, | ||
) | ||
Comment on lines
+55
to
+58
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I emit the same warning every time, so that users only get it once. The message goes straight to the point: don't use it as a list. |
||
# Using `+` means user expects a list back. | ||
return self._list + other | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Methods with no side-effects return a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not necessarily. The same object is likely to be accessed by multiple extensions. It is reasonable to assume that some newly updated extensions will assume a set and some older (not yet updated) extensions will assume a list. Whatever solution we chose needs to continue to work with both throughout the life of the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah I thought about that, though since these methods return a new object (that's what I meant with "no side-effect", which wasn't super clear), I figured there are very few chances multiple extensions would actually share references to such newly created lists/sets (each new list/set would probably be isolated to the extension that created it). Anyway, I can definitely change that to return a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry, I missed that this comment was in a specific method which returns a modified object. Yeah, in this case, returning the same object type makes sense. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No worries, I wasn't very clear about it. @facelessuser expressed the same concern, so we might want to return a new instance of |
||
|
||
def __and__(self, other: set[str], /) -> set[str]: | ||
# Using `&` means user expects a set back. | ||
return self._set & other | ||
|
||
def __contains__(self, item: str, /) -> bool: | ||
return item in self._set | ||
|
||
def __delitem__(self, key: int, /) -> None: | ||
warnings.warn( | ||
"Using block level elements as a list is deprecated, use it as a set instead.", | ||
DeprecationWarning, | ||
) | ||
element = self._list[key] | ||
del self._list[key] | ||
self._set.remove(element) | ||
|
||
def __getitem__(self, index: int, /) -> str: | ||
warnings.warn( | ||
"Using block level elements as a list is deprecated, use it as a set instead.", | ||
DeprecationWarning, | ||
) | ||
return self._list[index] | ||
|
||
def __iadd__(self, other: list[str], /) -> set[str]: | ||
warnings.warn( | ||
"Using block level elements as a list is deprecated, use it as a set instead.", | ||
DeprecationWarning, | ||
) | ||
# In-place addition should update both list and set. | ||
self._list += other | ||
self._set.update(set(other)) | ||
return self # type: ignore[return-value] | ||
|
||
def __iand__(self, other: set[str], /) -> set[str]: | ||
# In-place intersection should update both list and set. | ||
self._list = [element for element in self._list if element in other] | ||
self._set &= other | ||
return self # type: ignore[return-value] | ||
|
||
def __ior__(self, other: set[str], /) -> set[str]: | ||
# In-place union should update both list and set. | ||
for element in other: | ||
if element not in self._set: | ||
self._list.append(element) | ||
self._set |= other | ||
return self # type: ignore[return-value] | ||
|
||
def __iter__(self) -> Iterator[str]: | ||
return iter(self._list) | ||
|
||
def __len__(self) -> int: | ||
# Length of the list, for backwards compatibility. | ||
# If used as a set, both lengths will be the same. | ||
return len(self._list) | ||
|
||
def __or__(self, value: set[str], /) -> set[str]: | ||
# Using `|` means user expects a set back. | ||
return self._set | value | ||
|
||
def __rand__(self, value: set[str], /) -> set[str]: | ||
# Using `&` means user expects a set back. | ||
return value & self._set | ||
|
||
def __ror__(self, value: set[str], /) -> set[str]: | ||
# Using `|` means user expects a set back. | ||
return value | self._set | ||
|
||
def __rsub__(self, value: set[str], /) -> set[str]: | ||
# Using `-` means user expects a set back. | ||
return value - self._set | ||
|
||
def __rxor__(self, value: set[str], /) -> set[str]: | ||
# Using `^` means user expects a set back. | ||
return value ^ self._set | ||
|
||
def __sub__(self, value: set[str], /) -> set[str]: | ||
# Using `-` means user expects a set back. | ||
return self._set - value | ||
|
||
def __xor__(self, value: set[str], /) -> set[str]: | ||
# Using `^` means user expects a set back. | ||
return self._set ^ value | ||
|
||
def __reversed__(self) -> Iterator[str]: | ||
warnings.warn( | ||
"Using block level elements as a list is deprecated, use it as a set instead.", | ||
DeprecationWarning, | ||
) | ||
return reversed(self._list) | ||
|
||
def __setitem__(self, key: int, value: str, /) -> None: | ||
warnings.warn( | ||
"Using block level elements as a list is deprecated, use it as a set instead.", | ||
DeprecationWarning, | ||
) | ||
# In-place item-setting should update both list and set. | ||
old = self._list[key] | ||
self._list[key] = value | ||
self._set.discard(old) | ||
self._set.add(value) | ||
|
||
def __str__(self) -> str: | ||
return str(self._set) | ||
|
||
def add(self, element: str, /) -> None: | ||
# In-place addition should update both list and set. | ||
self._set.add(element) | ||
self._list.append(element) | ||
|
||
def append(self, element: str, /) -> None: | ||
warnings.warn( | ||
"Using block level elements as a list is deprecated, use it as a set instead.", | ||
DeprecationWarning, | ||
) | ||
# In-place addition should update both list and set. | ||
self._list.append(element) | ||
self._set.add(element) | ||
|
||
def clear(self) -> None: | ||
self._list.clear() | ||
self._set.clear() | ||
|
||
def copy(self) -> _BlockLevelElements: | ||
# We're not sure yet whether the user wants to use it as a set or list. | ||
return _BlockLevelElements(self._list) | ||
|
||
def count(self, value: str, /) -> int: | ||
warnings.warn( | ||
"Using block level elements as a list is deprecated, use it as a set instead.", | ||
DeprecationWarning, | ||
) | ||
# Count in list, for backwards compatibility. | ||
# If used as a set, both counts will be the same (1). | ||
return self._list.count(value) | ||
|
||
def difference(self, *others: set[str]) -> set[str]: | ||
# User expects a set back. | ||
return self._set.difference(*others) | ||
|
||
def difference_update(self, *others: set[str]) -> None: | ||
# In-place difference should update both list and set. | ||
self._set.difference_update(*others) | ||
self._list.clear() | ||
self._list.extend(sorted(self._set)) | ||
|
||
def discard(self, element: str, /) -> None: | ||
# In-place discard should update both list and set. | ||
self._set.discard(element) | ||
while True: | ||
try: | ||
self._list.remove(element) | ||
except ValueError: | ||
break | ||
|
||
def extend(self, elements: list[str], /) -> None: | ||
warnings.warn( | ||
"Using block level elements as a list is deprecated, use it as a set instead.", | ||
DeprecationWarning, | ||
) | ||
# In-place extension should update both list and set. | ||
for element in elements: | ||
if element not in self._list: | ||
self._list.append(element) | ||
self._set.update(elements) | ||
|
||
def index(self, value, start: int = 0, stop: int = sys.maxsize, /): | ||
warnings.warn( | ||
"Using block level elements as a list is deprecated, use it as a set instead.", | ||
DeprecationWarning, | ||
) | ||
return self._list.index(value, start, stop) | ||
|
||
def insert(self, index: int, element: str, /) -> None: | ||
warnings.warn( | ||
"Using block level elements as a list is deprecated, use it as a set instead.", | ||
DeprecationWarning, | ||
) | ||
# In-place insertion should update both list and set. | ||
self._list.insert(index, element) | ||
self._set.add(element) | ||
|
||
def intersection(self, *others: set[str]) -> set[str]: | ||
# User expects a set back. | ||
return self._set.intersection(*others) | ||
|
||
def intersection_update(self, *others: set[str]) -> None: | ||
# In-place intersection should update both list and set. | ||
self._set.intersection_update(*others) | ||
self._list.clear() | ||
self._list.extend(sorted(self._set)) | ||
Comment on lines
+250
to
+251
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I took a shortcut, but we should probably maintain the list order, while preventing duplicates. |
||
|
||
def isdisjoint(self, other: set[str], /) -> bool: | ||
return self._set.isdisjoint(other) | ||
|
||
def issubset(self, other: set[str], /) -> bool: | ||
return self._set.issubset(other) | ||
|
||
def issuperset(self, other: set[str], /) -> bool: | ||
return self._set.issuperset(other) | ||
|
||
def pop(self, index: int = -1, /) -> str: | ||
# In-place pop should update both list and set. | ||
element = self._list.pop(index) | ||
self._set.remove(element) | ||
return element | ||
|
||
def remove(self, element: str, /) -> None: | ||
# In-place removal should update both list and set. | ||
self._list.remove(element) | ||
self._set.remove(element) | ||
|
||
def reverse(self) -> None: | ||
warnings.warn( | ||
"Using block level elements as a list is deprecated, use it as a set instead.", | ||
DeprecationWarning, | ||
) | ||
self._list.reverse() | ||
|
||
def sort(self, /, *, key: Callable | None = None, reverse: bool = False) -> None: | ||
warnings.warn( | ||
"Using block level elements as a list is deprecated, use it as a set instead.", | ||
DeprecationWarning, | ||
) | ||
self._list.sort(key=key, reverse=reverse) | ||
|
||
def symmetric_difference(self, other: set[str], /) -> set[str]: | ||
# User expects a set back. | ||
return self._set.symmetric_difference(other) | ||
|
||
def symmetric_difference_update(self, other: set[str], /) -> None: | ||
# In-place symmetric difference should update both list and set. | ||
self._set.symmetric_difference_update(other) | ||
self._list.clear() | ||
self._list.extend(sorted(self._set)) | ||
Comment on lines
+294
to
+295
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same shortcut, though we should maintain list order by first removing then adding elements, preventing duplicates, probably. |
||
|
||
def union(self, *others: set[str]) -> set[str]: | ||
# User expects a set back. | ||
return self._set.union(*others) | ||
|
||
def update(self, *others: set[str]) -> None: | ||
# In-place union should update both list and set. | ||
self._set.update(*others) | ||
self._list.clear() | ||
self._list.extend(sorted(self._set)) | ||
Comment on lines
+304
to
+305
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shortcut again, we should maintain order and only append elements, preventing duplicates. |
||
|
||
|
||
# Type it as `set[str]` to express our intent for it to be used as such. | ||
BLOCK_LEVEL_ELEMENTS: set[str] = _BlockLevelElements([ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can't guarantee I didn't make subtle mistakes, so it should probably need proper testing. If this all seems too complicated, happy to try something simpler (it was implemented quickly, no problem with discarding all this 😄). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It should definitely get tested. Eventually it'll go away as well. There's probably ways in which people can still mess things up, but I think this tries to provide a reasonable attempt to keep stuff from breaking. I'm open to other ideas if someone has them as well. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Alright, I'll start writing tests 🙂 Just wanted initial feedback before tackling them 😊 |
||
# Elements which are invalid to wrap in a `<p>` tag. | ||
# See https://w3c.github.io/html/grouping-content.html#the-p-element | ||
'address', 'article', 'aside', 'blockquote', 'details', 'div', 'dl', | ||
|
@@ -56,9 +318,9 @@ | |
'math', 'map', 'noscript', 'output', 'object', 'option', 'progress', 'script', | ||
'style', 'summary', 'tbody', 'td', 'textarea', 'tfoot', 'th', 'thead', 'tr', 'video', | ||
'center' | ||
] | ||
]) # type: ignore[assignment] | ||
""" | ||
List of HTML tags which get treated as block-level elements. Same as the `block_level_elements` | ||
Set of HTML tags which get treated as block-level elements. Same as the `block_level_elements` | ||
attribute of the [`Markdown`][markdown.Markdown] class. Generally one should use the | ||
attribute on the class. This remains for compatibility with older extensions. | ||
""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just noticed this, but this was already being converted to a set in this extension. Maybe we should keep that behavior rather than leave it as our custom type.