Utils

ScraperFC.utils.botasaurus_getters.botasaurus_browser_get_json(url: str, headless: bool = True, block_images_and_css: bool = True, wait_for_complete_page_load: bool = True, delay: int = 0) dict

Use Botasaurus BROWSER module to get JSON from page

Parameters:
  • url (str) – The URL to scrape

  • headless (bool) – Whether to run the browser in headless mode

  • block_images_and_css (bool) – Whether to block images and CSS

  • wait_for_complete_page_load (bool) – Whether to wait for the page to load completely

  • delay (int) – Seconds to wait after the request (default: 0)

Raises:
  • TypeError – If any of the parameters are the wrong type

  • ValueError – If delay is negative

Returns:

JSON data

Return type:

dict

ScraperFC.utils.botasaurus_getters.botasaurus_browser_get_soup(url: str, headless: bool = False, block_images_and_css: bool = False, wait_for_complete_page_load: bool = True, delay: int = 0) BeautifulSoup

Use Botasaurus BROWSER module to get Soup from page.

Parameters:
  • url (str) – The URL to scrape

  • headless (bool) – Whether to run the browser in headless mode

  • block_images_and_css (bool) – Whether to block images and CSS

  • wait_for_complete_page_load (bool) – Whether to wait for the page to load completely

  • delay (int) – Seconds to wait after the request (default: 0)

Raises:
  • TypeError – If any of the parameters are the wrong type

  • ValueError – If delay is negative

Returns:

BeautifulSoup object

Return type:

BeautifulSoup

ScraperFC.utils.botasaurus_getters.botasaurus_request_get_json(url: str, delay: int = 0) dict

Use Botasaurus REQUESTS module to get JSON from page.

Parameters:
  • url (str) – The URL to request

  • delay (int) – Seconds to wait after the request (default: 0)

Raises:
  • TypeError – If any of the parameters are the wrong type

  • ValueError – If delay is negative

Returns:

JSON data

Return type:

dict

ScraperFC.utils.botasaurus_getters.botasaurus_request_get_soup(url: str, delay: int = 0) BeautifulSoup

Use Botasaurus REQUESTS module to get Soup from page.

Parameters:
  • url (str) – The URL to request

  • delay (int) – Seconds to wait after the request (default: 0)

Raises:
  • TypeError – If any of the parameters are the wrong type

  • ValueError – If delay is negative

Returns:

BeautifulSoup object

Return type:

BeautifulSoup