Github
LLM Automation
Geospatial Analysis
Real Estate
GeoEstateChat is a research project exploring how LLMs can enhance the interaction between geospatial
analysis and real-estate decision-making. The project studies LLMs as analytical intermediaries that
translate natural-language questions into structured, multi-scale geospatial queries, rather than as
sources of knowledge.
Geospatial analysis tools are characterized by high analytical complexity and flexibility but
low usability, making them largely exclusive to domain experts, while most real-estate platforms
prioritize usability at the cost of analytical depth, typically offering only basic attributes such as
location and price. By integrating an LLM-driven reasoning layer with a deterministic geospatial backend,
GeoEstateChat investigates how user intent, spatial scale, data filtering, and analytical logic can be
inferred from ambiguous human language, enabling both high complexity and high usability within a single
system. The research focuses on reducing the technical barriers of GIS while preserving analytical rigor,
transparency, and reproducibility.
GeoEstateChat positions LLMs as a new infrastructural layer for geospatial research, examining their potential to reshape access, workflow design, and decision-support processes in urban and real-estate contexts.
The project aims to answer below questions.
GeoEstateChat positions LLMs as a new infrastructural layer for geospatial research, examining their potential to reshape access, workflow design, and decision-support processes in urban and real-estate contexts.
The project aims to answer below questions.
How can NYC real estate be analyzed geospatially with user friendly platform in a way that
modern tools cannot offer
How can geospatial data be made more accessible and intuitive for non-experts?
How can experience design and visualization methods enhance users' ability to explore complex
geospatial information?
01
LLM as Query Engine
Can an LLM reliably translate ambiguous natural-language intent into valid, multi-step SQL queries over a geospatial database?
- Structured system prompt encodes full database schema and spatial operation semantics
- LLM infers spatial scale, filters, groupings, and sort order from conversational input
- Deterministic SQL execution ensures reproducible, auditable results
02
Accessibility vs. Analytical Depth
How can a single interface serve both casual users and domain experts without sacrificing usability or analytical rigor?
- Three interaction modes — Analyze, Search, Compare — target distinct user intents
- No GIS expertise required: queries expressed as plain language
- Backend preserves full geospatial complexity including multi-scale joins and spatial aggregations
03
Spatial Visualization
How can query results over spatial data be rendered to make geographic patterns immediately legible?
- Results mapped to building footprints, street blocks, and neighborhood polygons
- Dynamic choropleth rendering responds to each query result set
- Statistical summaries accompany visual outputs to provide analytical context
The Problem
Real-estate decision-making is inherently spatial, yet available tools sit at opposite ends of a usability–complexity spectrum. GIS platforms like QGIS or ArcGIS offer deep analytical capability but require specialized domain knowledge that most users do not have. Consumer platforms like Zillow or StreetEasy are accessible but collapse rich geospatial data into a handful of listing attributes — price, bedrooms, photos. GeoEstateChat occupies the gap between these extremes: the LLM converts natural language into structured queries that are executed against a deterministic geospatial backend, so results are always data-driven and reproducible, never hallucinated.
| Dimension | GIS Tools (QGIS / ArcGIS) | Consumer Platforms (Zillow / StreetEasy) | GeoEstateChat |
|---|---|---|---|
| Required expertise | High — GIS knowledge required | None | None — plain language interface |
| Analytical depth | Full spatial analysis capabilities | Basic: price, beds, location | Multi-scale geospatial queries |
| Query interface | Scripting / GUI tools | Filter dropdowns | Natural language conversation |
| Spatial data layers | Any, user-configured | Point listings only | Buildings, street blocks, neighborhoods |
| Result reproducibility | High (deterministic) | Dependent on live listings | High — LLM generates SQL, backend executes |
| Onboarding time | Hours to days | Seconds | Seconds |
Datatable features
Database Buildings & Streetblocks table features
built year
roof height
ground elevatiion
elevator
building value 2025
building value 2024
building gross sqft
residential gross sqft
building story
zoning
building class
average property value 2025
average property value 2024
value per sqft
Land Area
GeoID
borocode
population
building id number
last status
residential area share
...
built year
roof height
ground elevatiion
elevator
building value 2025
building value 2024
building gross sqft
residential gross sqft
building story
zoning
building class
average property value 2025
average property value 2024
value per sqft
Land Area
GeoID
borocode
population
building id number
last status
residential area share
...
The database integrates NYC PLUTO building data with street-block-level population and property value statistics. Features are selected to support both spatial (geometry-based joins) and attribute-level queries, enabling the system to reason about proximity, density, land use, and value simultaneously. All tables are spatially indexed with PostGIS for efficient bounding-box and distance queries at building, block, and neighborhood scales.
Project Structure
Data Flowchart
Once the user query is submitted, it is added with the system instruction to acquire appropriate information
to be able to generate SQL for query. Eventually, once the data is extracted from database, it is sent back
to frontend with statistically analysis summary and explanation.
LLM Query Pipeline
The core technical challenge is converting loosely structured natural language — "find street blocks near Central Park with high-rise buildings built after 2000" — into syntactically valid, semantically correct PostGIS SQL. GeoEstateChat uses a structured system prompt that encodes the full database schema, available spatial operations, column semantics, and expected output format. The LLM acts as a query planner: it selects tables, infers join conditions, and constructs spatial predicates. The resulting SQL is executed unchanged against the PostGIS backend, ensuring all results are deterministic and data-sourced.
1
User Query
Plain language question submitted via the chat interface
→
2
Prompt Assembly
Query combined with system prompt, column semantics, and spatial constraints
→
3
LLM Inference
Model generates structured SQL with appropriate spatial joins and filters
→
4
PostGIS Execution
SQL executed deterministically against the geospatial database
→
5
Response
Results returned with statistical summary and plain-language explanation
System Demonstration
Mode: Analyze
Mode: Search
Mode: Compare
Project Demonstration
Technical Stack
| Frontend | Vanilla JavaScript · MapLibre GL · Canvas API |
| Backend | FastAPI · Uvicorn · Python 3 |
| Database | PostgreSQL · PostGIS · Spatial indexing (GIST) |
| LLM Integration | OpenAI API · Structured system prompt · SQL generation pipeline |
| Geospatial | GeoPandas · Shapely · PostGIS functions (ST_Within, ST_Distance, ST_DWithin) |
| Data Sources | NYC PLUTO · Street-block statistics · Neighborhood polygon boundaries |
Design Principles
Three principles distinguish GeoEstateChat from both GIS tools and consumer real-estate platforms.
LLM as Translator, Not Oracle
The LLM never invents facts about the city. It translates intent into SQL, and the database provides ground truth. This preserves analytical rigor while enabling natural interaction with no GIS expertise required.
Deterministic Backend
All results originate from SQL executed against real data. The same query returns the same answer regardless of LLM non-determinism at the natural-language level. Results are auditable and reproducible.
Spatial Transparency
The generated SQL is surfaced alongside results, allowing users and researchers to inspect the spatial logic applied, understand filter conditions, and verify how conclusions were reached.