chatflock.web_research.page_retrievers.selenium_retriever

Module Contents

class chatflock.web_research.page_retrievers.selenium_retriever.SeleniumPageRetriever(headless=False, main_page_timeout=10, main_page_min_wait=2, driver_implicit_wait=1, driver_page_load_timeout=None, include_iframe_html=False, iframe_timeout=10, user_agent=None)

Bases: chatflock.web_research.page_retrievers.base.PageRetriever

Helper class that provides a standard way to create an ABC using inheritance.

Parameters:
  • headless (bool)

  • main_page_timeout (int)

  • main_page_min_wait (int)

  • driver_implicit_wait (int)

  • driver_page_load_timeout (Optional[int])

  • include_iframe_html (bool)

  • iframe_timeout (int)

  • user_agent (Optional[str])

main_page_min_wait
main_page_timeout
driver_implicit_wait
driver_page_load_timeout
include_iframe_html
iframe_timeout
user_agent
headless
service: selenium.webdriver.chrome.service.Service | None = None
driver: selenium.webdriver.chrome.webdriver.WebDriver | None = None
create_driver_and_service()
Return type:

Tuple[selenium.webdriver.chrome.webdriver.WebDriver, selenium.webdriver.chrome.service.Service]

extract_html_from_driver(driver)
Parameters:

driver (selenium.webdriver.chrome.webdriver.WebDriver)

Return type:

str

retrieve_html(url, **kwargs)
Parameters:
  • url (str)

  • kwargs (Any)

Return type:

str

close()