Sunbelt Computer Software

Profile Resolver Project

Overview & Chapters

A multi-stage system for crawling the web, enriching company data, synchronizing it to databases and Elasticsearch, and exposing a REST API for powerful search capabilities.

Chapters

Requirements

Python 3.x
PostgreSQL latest
Java 17
Elasticsearch latest

Tools

IntelliJ IDEA 2025 (Java API)
PyCharm 2025 (Python scripts)
pgAdmin4 (PostgreSQL management)
Terminal (Git, Docker, Curl)

How to Use

1. Scrape, store, analyze & merge data

1.1 Initialize the database using the seed method in the scraper. 1.2 Run the scraper. Higher timeout and depth values yield richer data. 1.3 Use the unifier to merge additional datasets with your findings.

2. Integrate Elasticsearch with your database

2.1 Start the Elasticsearch container.
2.2 Create the required index.
2.3 Sync data using the provided Python synchronizer script.

3. Query ElasticSearch via a REST API

3.1 Start the API:

mvn spring-boot:run

3.2 Hit the endpoint

http://localhost:8080/api/companies/search?query=key_word_about_company

Time tests scraper

Context:

my machine has 12cores, 24 threads.
max depth per domain = 5.
timeout per request = 3s.

Chunck size = how many threads will python access :

18 = 11m 31s
23 = 10m 34s
24 = 10m 13s

Database diagram

Preview

Hitting the API through Postman, it queries ElasticSearch and respondes with a json.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
API/untitled		API/untitled
Analysis		Analysis
Samples		Samples
Scraper		Scraper
db-backup		db-backup
elastic-search		elastic-search
unificator		unificator
.gitignore		.gitignore
README.md		README.md
db-diagram.png		db-diagram.png

Sunbelt Computer Software

PL/B Language Development and Support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Profile Resolver Project

Overview & Chapters

Chapters

Requirements

Tools

How to Use

1. Scrape, store, analyze & merge data

2. Integrate Elasticsearch with your database

3. Query ElasticSearch via a REST API

Time tests scraper

Database diagram

Preview

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Sunbelt Computer Software

PL/B Language Development and Support

Folders and files

Latest commit

History

Repository files navigation

Profile Resolver Project

Overview & Chapters

Chapters

Requirements

Tools

How to Use

1. Scrape, store, analyze & merge data

2. Integrate Elasticsearch with your database

3. Query ElasticSearch via a REST API

Time tests scraper

Database diagram

Preview

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages