GitHub - michaelort33/clusters · GitHub
Skip to content

michaelort33/clusters

Repository files navigation

Cluster API & Database Management


Overview

1. app.py

  • Description: A Flask API that exposes the endpoint /get_cluster_ids via POST.
  • Functionality:
    • Receives latitude and longitude (with 4 decimal places).
    • Searches an internal database for the closest cluster (based on minimum distance).
    • If within a defined maximum distance, it categorizes the coordinate under that existing cluster.
    • Otherwise, it creates a new cluster ID (sequentially, based on the last created cluster).

2. clustering.py

  • Description: Contains all the mathematical logic and model creation for clustering.

3. data_retrieval.py

  • Description: Holds all the queries to the database. Primarily used for retrieving data needed by the clustering processes.
  • Additional Function:
    • create_cluster_columns(): Creates the columns cluster_0, cluster_30, and cluster_100 to hold cluster IDs for different maximum radii (0 km, 30 km, and 100 km). (I have created both columns with alpha 0.1)

4. df_connection.py

  • Description: Provides functions to connect to the database via tokens.

5. db_connection.py

  • Description: Manages direct database operations.
  • Additional Function:
    • duplicate_table(): Creates a copy of the mongo_listings table into mongo_listings2 (I've used this entire all the tests).

6. assign_clusters_to_db.py

  • Description: Script to assign cluster values to the newly created columns (cluster_0, cluster_30, or cluster_100).
  • Parameters:
    • max_diameter_min (float): Minimum cluster diameter in km.
    • max_diameter_max (float): Maximum cluster diameter in km.
    • alpha (float, between 0 and 1): Penalty factor for the number of clusters (higher values create fewer clusters, lower values create more).
    • cluster_column (str): Specifies which column (cluster_0, cluster_30, or cluster_100) you want to populate with cluster IDs.
  • Warning: This script directly edits the database, so use with caution.

7. test_request.py

  • Description: Allows you to test the /get_cluster_ids endpoint directly against an already deployed API.
  • Default Test URL: http://100.26.144.244:5157/get_cluster_ids

TABLE_NAME is the name of the table used by the Flask endpoint queries.

When ready to switch everything back to mongo_listings, remember to update the TABLE_NAME variable.

Build and Run

docker compose up --build -d

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

Contributors