You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Currently, Nevron lacks the ability to parse content from web links. Users cannot provide a link to fetch and process its content, limiting the framework's ability to gather information directly from online resources such as articles, blogs, and news sites.
Describe the solution you'd like
Add functionality to fetch and parse content from a given web link. This feature should enable the agent to extract meaningful information from web pages for use in workflows like memory updates, action planning, or contextual analysis.
Proposed Implementation Steps:
Add a Link Parsing Utility:
Use the requests library to fetch the content of the provided link.
Use BeautifulSoup and Goose3 to parse and extract meaningful content, such as:
Article title
Main text/body
Meta description
Relevant keywords
Handle various web content structures, including standard HTML and minimal HTML layouts.
Integration with Workflows:
Extend workflows (e.g., analyze_signal, analyze_news_workflow) to accept and process web links.
Store parsed content in the memory module for future reference.
Error Handling:
Gracefully handle exceptions like:
Invalid or unreachable links.
Unsupported or malformed HTML.
Parsing errors due to complex or unexpected layouts.
Log detailed error messages for debugging.
Configuration Options:
Allow users to configure link parsing settings in settings.py, such as:
User-Agent for HTTP requests.
Maximum allowed content size.
Timeout for HTTP requests.
Unit Tests:
Write unit tests to validate link parsing functionality using mock web pages:
Valid web pages with standard HTML structures.
Edge cases, such as pages with minimal or malformed HTML.
Links that are unreachable or return HTTP errors.
Describe alternatives you've considered
Use dedicated scraping tools/APIs: Services like Scrapy or Puppeteer could be used for more advanced scraping but may introduce overhead and complexity.
Rely only on BeautifulSoup: A simpler approach but limited in extracting structured content like article metadata and main body text.
Additional Context
Suggested utility function for link parsing:
importrequestsfrombs4importBeautifulSoupfromgoose3importGoosedefparse_link_content(url: str) ->dict:
""" Parse the content of a web link. Args: url (str): The URL to fetch and parse. Returns: dict: Parsed content including title, body text, and meta description. """try:
response=requests.get(url, headers={"User-Agent": "Nevron/1.0"})
response.raise_for_status()
soup=BeautifulSoup(response.content, "html.parser")
goose=Goose()
article=goose.extract(raw_html=str(soup))
return {
"title": article.title,
"meta_description": article.meta_description,
"content": article.cleaned_text,
}
exceptExceptionase:
raiseRuntimeError(f"Failed to parse link content: {e}")
Example use case:
A user provides a link to a news article. The framework fetches and parses the content, storing the extracted text in memory for further analysis.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
Currently, Nevron lacks the ability to parse content from web links. Users cannot provide a link to fetch and process its content, limiting the framework's ability to gather information directly from online resources such as articles, blogs, and news sites.
Describe the solution you'd like
Add functionality to fetch and parse content from a given web link. This feature should enable the agent to extract meaningful information from web pages for use in workflows like memory updates, action planning, or contextual analysis.
Proposed Implementation Steps:
Add a Link Parsing Utility:
requests
library to fetch the content of the provided link.BeautifulSoup
andGoose3
to parse and extract meaningful content, such as:Integration with Workflows:
analyze_signal
,analyze_news_workflow
) to accept and process web links.Error Handling:
Configuration Options:
settings.py
, such as:Unit Tests:
Describe alternatives you've considered
Additional Context
The text was updated successfully, but these errors were encountered: