Developer Interface

Matcher Class

class whereabouts.Matcher.Matcher(db_name: str, how: str = 'standard', threshold: float = 0.5)

A class for geocoding and reverse geocoding addresses.

con

A DuckDB database connection.

Type:

duckdb.DuckDBPyConnection

how

The geocoding algorithm to use, either ‘standard’, ‘trigram’, or ‘skipphrase’. Defaults to ‘standard’.

Type:

str

threshold

The threshold for considering a match valid. Defaults to 0.5.

Type:

float

geocode(addresses: list[str] | str | ndarray | Series, top_n: int = 1, address_ids: list[int] | None = None, how: str | None = None) list[dict]

Geocode a list of addresses.

Parameters:
  • addresses (list of str or str) – A list of strings representing addresses or a single address string.

  • top_n (int, optional) – Max number of matches to return for each input address. Defaults to 1.

  • address_ids (list of int, optional) – A list of integers representing the IDs of the addresses. Defaults to None.

  • how (str, optional) – The geocoding algorithm to use. If not provided, the default ‘how’ attribute is used.

Returns:

results – A list of dictionaries representing geocoded addresses.

Return type:

list of dict

load_tree(tree_path: str) None

Load a pre-built KDTree and its reference data for reverse geocoding.

Parameters:

tree_path (str) – Path to the pickled KDTree file created by AddressLoader.create_kdtree().

query(query: str) DataFrame

Execute a generic SQL query using the matcher’s database.

Parameters:

query (str) – The SQL query to execute.

Returns:

results – The results of the query as a DataFrame.

Return type:

pd.DataFrame

reverse_geocode(points: list[tuple[float, float]]) list[dict]

Find the nearest addresses for given latitude and longitude coordinates.

Parameters:

points (list of tuple) – A list of (latitude, longitude) tuples representing coordinates.

Returns:

results – A list of dictionaries representing the nearest addresses.

Return type:

list of dict

MatcherPipeline Class

class whereabouts.MatcherPipeline.MatcherPipeline(matchers: list[Matcher])

MatcherPipeline class for concatenating Matcher objects to improve the recall of addresses.

matchers

A list of Matcher objects used for geocoding addresses.

Type:

list of Matcher

geocode(addresses, address_ids=None) :

Geocode a list of addresses using the Matcher objects in sequence.

geocode(addresses: list[str], address_ids: list[int] | None = None) list[dict]

Geocode a list of addresses by passing them through each Matcher object in the pipeline.

Parameters:
  • addresses (list of str) – A list of strings representing addresses or place names.

  • address_ids (list of int, optional) – A list of integers representing the IDs of the addresses or place names (default is None).

Returns:

results – A list of dictionaries containing the best match for each input address.

Return type:

list of dict

set_matches(matchers: list[Matcher]) None

Set the list of Matcher objects for the pipeline.

Parameters:

matchers (list of Matcher) – A list of Matcher objects to replace the current matchers.