Skip to content

Commit

Permalink
Improved docstrings for Formex4Parser
Browse files Browse the repository at this point in the history
  • Loading branch information
AlessioNar committed Dec 23, 2024
1 parent b7e7db5 commit e44247f
Show file tree
Hide file tree
Showing 4 changed files with 57 additions and 27 deletions.
6 changes: 3 additions & 3 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,17 +14,17 @@ Contents
--------

.. toctree::
:maxdepth: 2
:maxdepth: 3

getting_started

.. toctree::
:maxdepth: 2
:maxdepth: 3

download

.. toctree::
:maxdepth: 2
:maxdepth: 3

parsers

8 changes: 4 additions & 4 deletions docs/source/parsers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,22 +3,22 @@ Parsers

This package contains modules for parsing various types of legal documents. Below are the details for each module.

.. automodule:: parsers.parser
.. automodule:: tulit.parsers.parser
:members:
:undoc-members:
:show-inheritance:

.. automodule:: parsers.formex
.. automodule:: tulit.parsers.formex
:members:
:undoc-members:
:show-inheritance:

.. automodule:: parsers.akomantoso
.. automodule:: tulit.parsers.akomantoso
:members:
:undoc-members:
:show-inheritance:

.. automodule:: parsers.html
.. automodule:: tulit.parsers.html
:members:
:undoc-members:
:show-inheritance:
39 changes: 33 additions & 6 deletions tulit/parsers/formex.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,6 @@
from lxml import etree
from .parser import Parser

FMX_NAMESPACES = {
'fmx': 'http://formex.publications.europa.eu/schema/formex-05.56-20160701.xd'
}

class Formex4Parser(Parser):
"""
A parser for processing and extracting content from Formex XML files.
Expand All @@ -20,15 +16,46 @@ class Formex4Parser(Parser):
----------
namespaces : dict
Dictionary mapping namespace prefixes to their URIs.
schema : lxml.etree.XMLSchema or None
The XML schema used for validation.
valid : bool or None
Indicates whether the XML file is valid against the schema.
root : lxml.etree.Element or None
The root element of the parsed XML document.
metadata : dict
Extracted metadata from the XML document.
preface : str or None
Extracted preface text from the XML document.
preamble : lxml.etree.Element or None
The preamble section of the XML document.
formula : None
Placeholder for future use.
citations : list or None
List of extracted citations from the preamble.
recitals : list or None
List of extracted recitals from the preamble.
body : lxml.etree.Element or None
The body section of the XML document.
chapters : list
List of extracted chapters from the body.
articles : list
List of extracted articles from the body.
articles_text : list
List of extracted article texts.
conclusions : None
Placeholder for future use.
"""

def __init__(self):
"""
Initializes the parser.
"""
# Define the namespace mapping
self.namespaces = {}
self.namespaces = FMX_NAMESPACES

self.namespaces = {
'fmx': 'http://formex.publications.europa.eu/schema/formex-05.56-20160701.xd'
}

self.schema = None
self.valid = None

Expand Down
31 changes: 17 additions & 14 deletions tulit/parsers/parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,10 @@

class Parser(ABC):
@abstractmethod
def parse(self, data):
def parse(self):
"""
Abstract method to parse the data. This method must be implemented by the subclass.
"""
pass

def get_root(self, file: str):
Expand All @@ -31,19 +34,19 @@ def get_root(self, file: str):

def remove_node(self, tree, node):
"""
Removes specified nodes from the XML tree while preserving their tail text.
Parameters
----------
tree : lxml.etree._Element
The XML tree or subtree to process.
node : str
XPath expression identifying the nodes to remove.
Returns
-------
lxml.etree._Element
The modified XML tree with specified nodes removed.
Removes specified nodes from the XML tree while preserving their tail text.
Parameters
----------
tree : lxml.etree._Element
The XML tree or subtree to process.
node : str
XPath expression identifying the nodes to remove.
Returns
-------
lxml.etree._Element
The modified XML tree with specified nodes removed.
"""
if tree.findall(node, namespaces=self.namespaces) is not None:
for item in tree.findall(node, namespaces=self.namespaces):
Expand Down

0 comments on commit e44247f

Please sign in to comment.