ScraperFC modules
Capology
- class ScraperFC.Capology.Capology
Bases:
object
- close()
Closes and quits the Selenium WebDriver instance.
- scrape_payrolls(year, league, currency)
Scrapes team payrolls for the given league season.
- Parameters:
year (int) – Calendar year that the season ends in (e.g. 2023 for the 2022/23 season)
league (str) – League. Look in shared_functions.py for the available leagues for each module.
currency (str) – The currency for the returned salaries. Options are “eur” for Euro, “gbp” for British Pount, and “USD” for US Dollar
- Returns:
The payrolls of all teams in the given league season
- Return type:
Pandas DataFrame
- scrape_salaries(year, league, currency)
Scrapes player salaries for the given league season.
- Parameters:
year (int) – Calendar year that the season ends in (e.g. 2023 for the 2022/23 season)
league (str) – League. Look in shared_functions.py for the available leagues for each module.
currency (str) – The currency for the returned salaries. Options are “eur” for Euro, “gbp” for British Pount, and “USD” for US Dollar
- Returns:
The salaries of all players in the given league season
- Return type:
Pandas DataFrame
ClubElo
- class ScraperFC.ClubElo.ClubElo
Bases:
object
- scrape_team_on_date(team, date)
Scrapes a team’s ELO score on a given date.
- Parameters:
team (str) – To get the appropriate team name, go to clubelo.com and find the team you’re looking for. Copy and past the team’s name as it appears in the URL.
date (str) – Must be formatted as YYYY-MM-DD
- Returns:
elo (int) – ELO score of the given team on the given date
-1 (int) – -1 if the team has no score on the given date
FBRef
- class ScraperFC.FBRef.FBRef
Bases:
object
ScraperFC module for FBRef
- close()
Closes and quits the Selenium WebDriver instance.
- complete_report_from_player_link(player_link)
Scrapes the FBRef scouting reports for a player.
- Parameters:
player_link (str) – URL to an FBRef player page
- Returns:
cleaned_complete_report (Pandas DataFrame) – Complete report with a MultiIndex of stats categories and statistics. Columns for per90 and percentile values.
player_name (str)
player_pos (str)
minutes (int)
- get(url)
Custom get function just for the FBRef module.
Calls .get() from the Selenium WebDriver and then waits in order to avoid a Too Many Requests HTTPError from FBRef.
- Parameters:
url (str) – The URL to get
- Return type:
None
- get_match_links(year, league)
Gets all match links for the chosen league season.
- Parameters:
year (int) – Calendar year that the season ends in (e.g. 2023 for the 2022/23 season)
league (str) – League. Look in shared_functions.py for the available leagues for each module.
- Returns:
FBRef links to all matches for the chosen league season
- Return type:
list
- get_season_link(year, league)
Returns the URL for the chosen league season.
- Parameters:
year (int) – Calendar year that the season ends in (e.g. 2023 for the 2022/23 season)
league (str) – League. Look in shared_functions.py for the available leagues for each module.
- Returns:
URL to the FBRef page of the chosen league season
- Return type:
str
- requests_get(url)
Custom requests.get function for the FBRef module
Calls requests.get() until the status code is 200.
- Parameters:
url (Str) – The URL to get
- Returns:
The response
- Return type:
requests.Response
- scrape_all_stats(year, league, normalize=False)
Scrapes all stat categories
Runs scrape_stats() for each stats category on dumps the returned tuple of dataframes into a dict.
- Parameters:
year (int) – Calendar year that the season ends in (e.g. 2023 for the 2022/23 season)
league (str) – League. Look in shared_functions.py for the available leagues for each module.
normalize (bool) – OPTIONAL, default is False. If True, will normalize all stats to Per90.
- Returns:
Keys are stat category names, values are tuples of 3 dataframes, (squad_stats, opponent_stats, player_stats)
- Return type:
dict
- scrape_complete_scouting_reports(year, league, goalkeepers=False)
Scrapes the FBRef scouting reports for all players in the chosen league season.
- Parameters:
year (int) – Calendar year that the season ends in (e.g. 2023 for the 2022/23 season)
league (str) – League. Look in shared_functions.py for the available leagues for each module.
goalkeepers (bool) – OPTIONAL, default is False. If True, will scrape reports for only goalkeepers. If False, will scrape reports for only outfield players.
- Returns:
per90 (Pandas DataFrame) – DataFrame of reports with Per90 stats.
percentiles (Pandas DataFrame) – DataFrame of reports with stats percentiles (versus other players in the top 5 leagues)
- scrape_league_table(year, league)
Scrapes the league table of the chosen league season
- Parameters:
year (int) – Calendar year that the season ends in (e.g. 2023 for the 2022/23 season)
league (str) – League. Look in shared_functions.py for the available leagues for each module.
- Returns:
Pandas DataFrame – DataFrame may be empty if the league has no tables. Otherwise, the league table.
tuple – If the league has multiple tables (e.g. Champions League, Liga MX, MLS) then a tuple of DataFrames will be returned.
- scrape_match(link)
Scrapes an FBRef match page.
- Parameters:
link (str) – URL to the FBRef match page
- Returns:
DataFrame containing most parts of the match page if they’re available (e.g. formations, lineups, scores, player stats, etc.). The fields that are available vary by competition and year.
- Return type:
Pandas DataFrame
- scrape_matches(year, league, save=False)
Scrapes the FBRef standard stats page of the chosen league season.
Works by gathering all of the match URL’s from the homepage of the chosen league season on FBRef and then calling scrape_match() on each one.
- Parameters:
year (int) – Calendar year that the season ends in (e.g. 2023 for the 2022/23 season)
league (str) – League. Look in shared_functions.py for the available leagues for each module.
save (bool) – OPTIONAL, default is False. If True, will save the returned DataFrame to a CSV file.
- Returns:
Pandas DataFrame – If save is False, will return the Pandas DataFrame with the the stats.
filename (str) – If save is True, will return the filename the CSV was saved to.
- scrape_stats(year, league, stat_category, normalize=False)
Scrapes a single stats category
Adds team and player ID columns to the stats tables
- Parameters:
year (int) – Calendar year that the season ends in (e.g. 2023 for the 2022/23 season)
league (str) – League. Look in shared_functions.py for the available leagues for each module.
stat_cateogry (str) – The stat category to scrape.
normalize (bool) – OPTIONAL, default is False. If True, will normalize all stats to Per90.
- Returns:
tuple of 3 Pandas DataFrames, (squad_stats, opponent_stats, player_stats).
- Return type:
tuple
FiveThirtyEight
- class ScraperFC.FiveThirtyEight.FiveThirtyEight
Bases:
object
- close()
Closes and quits the Selenium WebDriver instance.
- scrape_matches(year, league, save=False)
Scrapes matches for the given league season
- Parameters:
year (int) – Calendar year that the season ends in (e.g. 2023 for the 2022/23 season)
league (str) – League. Look in shared_functions.py for the available leagues for each module.
save (bool) – OPTIONAL, default is False. If True, output will be saved to a CSV file.
- Returns:
Pandas DataFrame – If save=False, FiveThirtyEight stats for all matches of the given league season
filename (str) – If save=True, filename of the CSV that the stats were saved to
- up_season(string)
Increments a string of the season year
- Parameters:
string (str) – String of a calendar year (e.g. “2022”)
- Returns:
Incremented calendar year
- Return type:
str
SofaScore
Transfermarkt
- class ScraperFC.Transfermarkt.Transfermarkt
Bases:
object
- close()
Closes and quits the Selenium WebDriver instance.
- get_club_links(year, league)
Gathers all Transfermarkt club URL’s for the chosen league season.
- Parameters:
year (int) – Calendar year that the season ends in (e.g. 2023 for the 2022/23 season)
league (str) – League. Look in shared_functions.py for the available leagues for each module.
- Returns:
List of the club URL’s
- Return type:
list
- get_player_links(year, league)
Gathers all Transfermarkt player URL’s for the chosen league season.
- Parameters:
year (int) – Calendar year that the season ends in (e.g. 2023 for the 2022/23 season)
league (str) – League. Look in shared_functions.py for the available leagues for each module.
- Returns:
List of the player URL’s
- Return type:
list
- get_players(year, league)
Gathers all player info for the chosen league season.
- Parameters:
year (int) – Calendar year that the season ends in (e.g. 2023 for the 2022/23 season)
league (str) – League. Look in shared_functions.py for the available leagues for each module.
- Returns:
Each row is a player and contains some of the information from their Transfermarkt player profile.
- Return type:
Pandas DataFrame
- class ScraperFC.Transfermarkt.TransfermarktPlayer(url)
Bases:
object
Class to represent Transfermarkt player profiles.
Initialize with the URL to a player’s Transfermarkt profile page.
Understat
- class ScraperFC.Understat.Understat
Bases:
object
- close()
Closes and quits the Selenium WebDriver instance.
- get_match_links(year, league)
Gets all of the match links for the chosen league season
- Parameters:
year (int) – Calendar year that the season ends in (e.g. 2023 for the 2022/23 season)
league (str) – League. Look in shared_functions.py for the available leagues for each module.
- Returns:
List of match links of the chosen league season
- Return type:
list
- get_season_link(year, league)
Gets URL of the chosen league season.
- Parameters:
year (int) – Calendar year that the season ends in (e.g. 2023 for the 2022/23 season)
league (str) – League. Look in shared_functions.py for the available leagues for each module.
- Returns:
URL to the Understat page of the chosen league season.
- Return type:
str
- get_team_links(year, league)
Gets all of the team links for the chosen league season
- Parameters:
year (int) – Calendar year that the season ends in (e.g. 2023 for the 2022/23 season)
league (str) – League. Look in shared_functions.py for the available leagues for each module.
- Returns:
List of team URL’s from the chosen season.
- Return type:
list
- remove_diff(string)
Removes the plus/minus from some stats like xG.
- Parameters:
string (str) – The string to remove the difference from
- Returns:
String passed in as arg with the difference removed
- Return type:
str
- scrape_attack_speeds(year, league)
Scrapes the attack speeds for each team in the year and league
- Parameters:
year (int) – Calendar year that the season ends in (e.g. 2023 for the 2022/23 season)
league (str) – League. Look in shared_functions.py for the available leagues for each module.
- Returns:
DataFrame containing the attack speeds of each team
- Return type:
Pandas DataFrame
- scrape_formations(year, league)
Scrapes the stats for each team in the year and league, broken down by formation used by the team.
- Parameters:
year (int) – Calendar year that the season ends in (e.g. 2023 for the 2022/23 season)
league (str) – League. Look in shared_functions.py for the available leagues for each module.
- Returns:
Keys are each team. Values are more dicts with keys for each formation and values are stats for each formation.
- Return type:
dict
- scrape_game_states(year, league)
Scrapes the game states for each team in the year and league
- Parameters:
year (int) – Calendar year that the season ends in (e.g. 2023 for the 2022/23 season)
league (str) – League. Look in shared_functions.py for the available leagues for each module.
- Returns:
DataFrame containing the game states
- Return type:
Pandas DataFrame
- scrape_home_away_tables(year, league, normalize=False)
Scrapes the home and away league tables for the chosen league season.
- Parameters:
year (int) – Calendar year that the season ends in (e.g. 2023 for the 2022/23 season)
league (str) – League. Look in shared_functions.py for the available leagues for each module.
normalize (bool) – OPTIONAL, default False. If True, normalizes stats to per90
- Returns:
home (Pandas DataFrame) – Home league table
away (Pandas DataFrame) – Away league table
- scrape_league_table(year, league, normalize=False)
Scrapes the league table for the chosen league season.
- Parameters:
year (int) – Calendar year that the season ends in (e.g. 2023 for the 2022/23 season)
league (str) – League. Look in shared_functions.py for the available leagues for each module.
normalize (bool) – OPTIONAL, default False. If True, normalizes stats to per90
- Returns:
The league table of the chosen league season.
- Return type:
Pandas DataFrame
- scrape_match(link)
Scrapes a single match from Understat.
- Parameters:
link (str) – URL to the match
- Returns:
match – The match stats
- Return type:
Pandas DataFrame
- scrape_matches(year, league, save=False)
Scrapes all of the matches from the chosen league season.
Gathers all match links from the chosen league season and then call scrape_match() on each one.
- Parameters:
year (int) – Calendar year that the season ends in (e.g. 2023 for the 2022/23 season)
league (str) – League. Look in shared_functions.py for the available leagues for each module.
save (bool) – OPTIONAL, default False. If True, saves the DataFrame of match stats to a CSV.
- Returns:
matches (Pandas DataFrame) – If save=False
filename (str) – If save=True, the filename the DataFrame was saved to
- scrape_shot_results(year, league)
Scrapes the shot results for each team in the year and league
- Parameters:
year (int) – Calendar year that the season ends in (e.g. 2023 for the 2022/23 season)
league (str) – League. Look in shared_functions.py for the available leagues for each module.
- Returns:
DataFrame containing the shot results data
- Return type:
Pandas DataFrame
- scrape_shot_xy(year, league, save=False, format='json')
Scrapes the info for every shot in the league and year.
- Parameters:
year (int) – Calendar year that the season ends in (e.g. 2023 for the 2022/23 season)
league (str) – League. Look in shared_functions.py for the available leagues for each module.
save (bool) – OPTIONAL, default if False. If True, shot XY’s will be saved to a JSON file.
format (str) – OPTIONAL, format of the output. Options are “json” and “dataframe”
- Returns:
Dict if save=False and format=json Dataframe if save=False and format=json Str if save=True. Filetype is determined by format argument
- Return type:
dict, Padnas DataFrame, or str
- scrape_shot_zones(year, league)
Scrapes the shot zones for each team in the year and league
- Parameters:
year (int) – Calendar year that the season ends in (e.g. 2023 for the 2022/23 season)
league (str) – League. Look in shared_functions.py for the available leagues for each module.
- Returns:
DataFrame containing the shot zones data
- Return type:
Pandas DataFrame
- scrape_situations(year, league)
Scrapes the situations leading to shots for each team in the chosen league season.
- Parameters:
year (int) – Calendar year that the season ends in (e.g. 2023 for the 2022/23 season)
league (str) – League. Look in shared_functions.py for the available leagues for each module.
- Returns:
DataFrame containing the situations
- Return type:
Pandas DataFrame
- scrape_timing(year, league)
Scrapes the timing of goals for each team in the year and league
- Parameters:
year (int) – Calendar year that the season ends in (e.g. 2023 for the 2022/23 season)
league (str) – League. Look in shared_functions.py for the available leagues for each module.
- Returns:
DataFrame containing the timing stats
- Return type:
Pandas DataFrame
- unhide_stats(columns)
Understat doesn’t display all stats by default.
This functions uses the stats currently shown in the table columns to unhide stats that aren’t being displayed.
- Parameters:
columns (Pandas DataFrame.columns) – The columns currently shown in the table being scraped
- Return type:
None