Github

Fullstack Development
LLM Automation
Geospatial Analysis
Real Estate


GeoEstateChat is a research project exploring how LLMs can enhance the interaction between geospatial analysis and real-estate decision-making. The project studies LLMs as analytical intermediaries that translate natural-language questions into structured, multi-scale geospatial queries, rather than as sources of knowledge.

Geospatial analysis tools are characterized by high analytical complexity and flexibility but low usability, making them largely exclusive to domain experts, while most real-estate platforms prioritize usability at the cost of analytical depth, typically offering only basic attributes such as location and price. By integrating an LLM-driven reasoning layer with a deterministic geospatial backend, GeoEstateChat investigates how user intent, spatial scale, data filtering, and analytical logic can be inferred from ambiguous human language, enabling both high complexity and high usability within a single system. The research focuses on reducing the technical barriers of GIS while preserving analytical rigor, transparency, and reproducibility.

GeoEstateChat positions LLMs as a new infrastructural layer for geospatial research, examining their potential to reshape access, workflow design, and decision-support processes in urban and real-estate contexts.

The project aims to answer below questions.

How can NYC real estate be analyzed geospatially with user friendly platform in a way that modern tools cannot offer

How can geospatial data be made more accessible and intuitive for non-experts?

How can experience design and visualization methods enhance users' ability to explore complex geospatial information?

01
LLM as Query Engine
Can an LLM reliably translate ambiguous natural-language intent into valid, multi-step SQL queries over a geospatial database?
  • Structured system prompt encodes full database schema and spatial operation semantics
  • LLM infers spatial scale, filters, groupings, and sort order from conversational input
  • Deterministic SQL execution ensures reproducible, auditable results
02
Accessibility vs. Analytical Depth
How can a single interface serve both casual users and domain experts without sacrificing usability or analytical rigor?
  • Three interaction modes — Analyze, Search, Compare — target distinct user intents
  • No GIS expertise required: queries expressed as plain language
  • Backend preserves full geospatial complexity including multi-scale joins and spatial aggregations
03
Spatial Visualization
How can query results over spatial data be rendered to make geographic patterns immediately legible?
  • Results mapped to building footprints, street blocks, and neighborhood polygons
  • Dynamic choropleth rendering responds to each query result set
  • Statistical summaries accompany visual outputs to provide analytical context

The Problem

Real-estate decision-making is inherently spatial, yet available tools sit at opposite ends of a usability–complexity spectrum. GIS platforms like QGIS or ArcGIS offer deep analytical capability but require specialized domain knowledge that most users do not have. Consumer platforms like Zillow or StreetEasy are accessible but collapse rich geospatial data into a handful of listing attributes — price, bedrooms, photos. GeoEstateChat occupies the gap between these extremes: the LLM converts natural language into structured queries that are executed against a deterministic geospatial backend, so results are always data-driven and reproducible, never hallucinated.
Dimension GIS Tools (QGIS / ArcGIS) Consumer Platforms (Zillow / StreetEasy) GeoEstateChat
Required expertise High — GIS knowledge required None None — plain language interface
Analytical depth Full spatial analysis capabilities Basic: price, beds, location Multi-scale geospatial queries
Query interface Scripting / GUI tools Filter dropdowns Natural language conversation
Spatial data layers Any, user-configured Point listings only Buildings, street blocks, neighborhoods
Result reproducibility High (deterministic) Dependent on live listings High — LLM generates SQL, backend executes
Onboarding time Hours to days Seconds Seconds

Datatable features

Database Buildings & Streetblocks table features

built year
roof height
ground elevatiion
elevator
building value 2025
building value 2024
building gross sqft
residential gross sqft
building story
zoning
building class
average property value 2025
average property value 2024
value per sqft
Land Area
GeoID
borocode
population
building id number
last status
residential area share
...
The database integrates NYC PLUTO building data with street-block-level population and property value statistics. Features are selected to support both spatial (geometry-based joins) and attribute-level queries, enabling the system to reason about proximity, density, land use, and value simultaneously. All tables are spatially indexed with PostGIS for efficient bounding-box and distance queries at building, block, and neighborhood scales.

Project Structure

Data Flowchart

Once the user query is submitted, it is added with the system instruction to acquire appropriate information to be able to generate SQL for query. Eventually, once the data is extracted from database, it is sent back to frontend with statistically analysis summary and explanation.

LLM Query Pipeline

The core technical challenge is converting loosely structured natural language — "find street blocks near Central Park with high-rise buildings built after 2000" — into syntactically valid, semantically correct PostGIS SQL. GeoEstateChat uses a structured system prompt that encodes the full database schema, available spatial operations, column semantics, and expected output format. The LLM acts as a query planner: it selects tables, infers join conditions, and constructs spatial predicates. The resulting SQL is executed unchanged against the PostGIS backend, ensuring all results are deterministic and data-sourced.
1
User Query
Plain language question submitted via the chat interface
2
Prompt Assembly
Query combined with system prompt, column semantics, and spatial constraints
3
LLM Inference
Model generates structured SQL with appropriate spatial joins and filters
4
PostGIS Execution
SQL executed deterministically against the geospatial database
5
Response
Results returned with statistical summary and plain-language explanation

System Demonstration

Mode: Analyze

Mode: Search

Mode: Compare

Project Demonstration

Technical Stack

Frontend Vanilla JavaScript  ·  MapLibre GL  ·  Canvas API
Backend FastAPI  ·  Uvicorn  ·  Python 3
Database PostgreSQL  ·  PostGIS  ·  Spatial indexing (GIST)
LLM Integration OpenAI API  ·  Structured system prompt  ·  SQL generation pipeline
Geospatial GeoPandas  ·  Shapely  ·  PostGIS functions (ST_Within, ST_Distance, ST_DWithin)
Data Sources NYC PLUTO  ·  Street-block statistics  ·  Neighborhood polygon boundaries

Design Principles

Three principles distinguish GeoEstateChat from both GIS tools and consumer real-estate platforms.
LLM as Translator, Not Oracle
The LLM never invents facts about the city. It translates intent into SQL, and the database provides ground truth. This preserves analytical rigor while enabling natural interaction with no GIS expertise required.
Deterministic Backend
All results originate from SQL executed against real data. The same query returns the same answer regardless of LLM non-determinism at the natural-language level. Results are auditable and reproducible.
Spatial Transparency
The generated SQL is surfaced alongside results, allowing users and researchers to inspect the spatial logic applied, understand filter conditions, and verify how conclusions were reached.