Explorentory
Github
AI-Powered Rental Discovery Platform
Machine Learning · LLM Integration · Geospatial Visualization
Stack:
FastAPI · PostgreSQL + PostGIS
scikit-learn · OpenAI API
MapLibre GL · Vanilla JavaScript
Overview
Explorentory is a full-stack, AI-powered NYC rental discovery platform. Rather than
mimicking the filter-first paradigm of traditional real estate search — where users must
commit to a price cap and bedroom count before seeing anything — Explorentory inverts the
process. Users express natural-language preferences, explore a neighborhood on a live map,
rate a curated sample of real properties, and receive personalized recommendations ranked
by an OLS regression model trained on their own feedback. Three LLM-powered endpoints
provide plain-English context at every stage. The platform operates on a PostGIS database
of approximately 3 million synthetic NYC units derived from PLUTO, with MapLibre GL
rendering up to 5,000 results as an interactive choropleth map across 9 data dimensions.
Research Questions
01
AI-Driven Interaction
How can an AI-assisted experience translate natural language preferences into personalized rental recommendations?
- 4-step guided flow converts free-text intent and ranked priorities into structured API calls
/explainand/explain_resultuse GPT to narrate property matches and ML insights/chatparses natural language ("show me quiet studios under $2,500") into structured filter and sort directives applied client-side
02
Machine Learning Customization
How can machine learning adapt rental recommendations based on user behavior, constraints, and urban data?
- OLS regression trained live on 10 user-rated properties to infer latent preferences
- 11 engineered features — rent, sqft, bedroom/bathroom diff, proximity, noise, building age
- Hybrid final score:
(rule_score + ml_score) / 2 StandardScalerfit on 3M-record dataset prevents overfitting with sparse training data
03
Visualization & Experience Design
How can interactive visualizations clarify housing information and trade-offs in rental decision-making?
- Choropleth map with 9 switchable data dimensions
- Histogram showing result distribution vs. user's stated target value
- Radar triangle chart surfacing multi-axis trade-offs across 3–6 dimensions
- Dark / bright mode toggle with fully independent color palettes
Rethinking Rental Search
Traditional real estate platforms — Zillow, StreetEasy, Redfin — are built around a
filter-first paradigm: users must specify price, bedrooms, and neighborhood before seeing
any results. This forces premature decisions and obscures trade-offs. If a user's exact
criteria return zero results, they must manually adjust filters and re-search. Explorentory
replaces filter walls with ranked recommendations. The system learns what users actually
value from their rating behavior and surfaces properties that balance competing preferences,
including ones the user never explicitly stated.
| Feature | Zillow · StreetEasy · Redfin | Explorentory |
|---|---|---|
| Search paradigm | Hard filters — price range, beds, neighborhood set before results appear | Trade-off ranking — no filter walls, results always present |
| Personalization | None, or generic saved search alerts | OLS regression trained live on user's own 10 survey ratings |
| Result explanations | None — user must interpret listings manually | LLM-generated 2–3 sentence narrative per property |
| Neighborhood selection | Dropdown list or typed name search | Click directly on polygon boundaries of a 260-neighborhood live map |
| Data dimensions shown | Listing photos + price, beds, sqft | 9 choropleth modes + histogram distribution + radar triangle |
| Conversational interface | None | /chat — natural language filter and sort over live results |
| Trade-off awareness | User must infer trade-offs manually across listings | Radar triangle makes multi-dimensional trade-offs explicit |
| ML feedback loop | None | Survey ratings feed directly into the recommendation model each session |
| Data scale | Actual live listings (sparse, uneven coverage) | ~3M synthetic NYC units — dense and uniform across all neighborhoods |
User Experience Pipeline
The experience is structured as a four-step guided flow. Each step collects a different
layer of preference signal, from explicit numerical constraints, to geographic intent,
to implicit behavioral feedback, progressively building the data needed to produce a
personalized recommendation.
Step 1 — Preferences Modal
Step 2 — Neighborhood Selection Map
Step 3 — Property Rating Survey
Step 4 — Results Panel & Chat
System Architecture
Explorentory is a three-tier system. A PostGIS spatial database holds approximately
3 million property records and 260 neighborhood polygons. A FastAPI backend exposes
six REST endpoints — three for data retrieval and ML recommendation, three for
OpenAI-powered natural language features. A Vanilla JavaScript frontend renders results
with MapLibre GL and two custom Canvas-based charts, with no front-end framework dependency.
UX Features
01 Preference Modal
The preference modal is the entry point. A rent slider with live $ display
($1,500–$10,000 at $50 steps) sets the budget anchor. Bedroom and bathroom
numeric inputs define base constraints. A priority ranking interface lets the user
click Rent, Location, and Sqft in order of importance — these clicks assign
rule-based scoring weights of 3×, 2×, and 1× respectively. A free-text concern
field (e.g., "quiet street, close to a park") is passed verbatim into all
subsequent LLM prompts, grounding AI explanations in the user's stated intent.
Preference Modal Screenshot
02 Neighborhood Selection Map
The neighborhood map renders all 260 NYC neighborhood polygons via the
/neighborhoods endpoint, which reprojects geometry from EPSG:2263
to WGS84 and returns GeoJSON. Hovering highlights a boundary; clicking confirms
the selection with a colored fill. The centroid of the chosen polygon is sent to
/properties, which filters the 3M-unit database to properties
within rent and bedroom tolerances and computes Euclidean distance from the
centroid. Stratified sampling guarantees at least 5 of the 10 returned survey
properties come from the user's chosen borough, preventing geographic bias.
Neighborhood Map Screenshot
03 Rating Survey
Ten property cards are presented in a scrollable list, each showing
neighborhood, rent, square footage, and bed/bath count. Clicking a card triggers
map.flyTo() and highlights the building footprint polygon on the
map in #63adf2;
unselected properties render in
#1a6bc0. All
10 polygons and location pins are simultaneously visible, giving the user spatial
context when assigning ratings. Each card receives a 0–10 score before submission,
after which the ratings are sent to /recommend.
Rating Survey Screenshot
04 Results Panel
The results panel renders 5,000 top-scored properties as a MapLibre GL
choropleth. A toolbar switches between 9 data dimensions: overall score, rent,
sqft, year built, building stories, elevator availability, park distance, subway
distance, and noise level. A top-10 ranked card list appears in the sidebar;
clicking a card highlights the corresponding building footprint and exposes an
"Explain" button that calls
/explain for a 2–3 sentence GPT match
narrative. An "Explain Result" button at the top calls
/explain_result, which translates the OLS regression coefficients
into a plain-English preference summary such as "Your ratings show you value
proximity to green space more than building age."
Results Panel Screenshot
05 Natural Language Chat
A chat interface at the bottom of the results panel accepts natural language
queries against the live result set. The
/chat endpoint forwards
the user message and conversation history to GPT with a structured system prompt
that defines JSON output schemas for filter operations
({"filters": [...], "logic": "AND"}) and sort operations
({"sort": [{...}]}). The prompt embeds mappings for all 260
neighborhoods, building class codes, and all column semantics. The returned JSON
is applied client-side, re-ranking the displayed properties without an additional
server round-trip. A reset button restores the original recommendation order.
Dark and bright modes each have their own complete MapLibre style JSON and chart
color palette.
Chat Interface & Dark Mode Screenshot
Visualization & Charts
Explorentory includes two custom Canvas-based charts, built without a charting library.
This allows fine-grained pixel control, seamless dark/bright mode switching, and
eliminates a dependency on external rendering engines.
Histogram
Histogram Screenshot
Shows the distribution of the selected metric across all 5,000 recommended
properties. Bin width adapts per field: $10/bin for rent, 50 sqft/bin for
floor area. A red vertical line marks the user's stated target value, making
it immediately visible whether the recommendation set centers on the stated
preference or skews toward trade-offs. Updates live when the choropleth mode
is switched.
Radar Triangle
Radar Triangle Screenshot
A multi-axis spider chart with 3–6 dimensions: Rent, Location Distance, Sqft,
Subway Distance, Green Space, and Noise Level. Each axis is min-max normalized
across the full result set. An inner polygon represents the user's stated
priority weights; an overlay polygon shows the selected property's actual profile,
making multi-dimensional trade-offs legible at a glance rather than buried in
a list of numbers.
Machine Learning Model
The recommendation engine combines a deterministic rule-based scorer with an OLS
regression model trained in real time on the user's survey ratings. With only 10
training samples and 11 features, careful feature engineering and global scaling
are essential to prevent overfitting.
Feature Engineering
Raw property attributes are converted to user-relative differences before training.
For example,
bedroomnum becomes |bedroomnum − target_bedrooms|,
aligning the feature with what the user actually cares about — deviation from their
preference — rather than the raw count. borocode_match is a binary flag
indicating whether the property is in the user's chosen borough. All 11 engineered
features are scaled with a StandardScaler fit on the entire 3M-record
dataset, providing stable mean and variance that do not shift with 10-sample noise.
Rule-Based Score — explicit priorities weighted 3×, 2×, 1×
rule_score = Σ weight[i] × normalize(feature[i], direction[i])
ML Score — OLS prediction rescaled to [0, 1]
ml_score = (OLS_prediction − min) / (max − min)
Final Hybrid Score
final_score = (rule_score + ml_score) / 2
rule_score = Σ weight[i] × normalize(feature[i], direction[i])
ML Score — OLS prediction rescaled to [0, 1]
ml_score = (OLS_prediction − min) / (max − min)
Final Hybrid Score
final_score = (rule_score + ml_score) / 2
Design Rationale
The hybrid approach serves two purposes. The rule-based component ensures predictable,
interpretable behavior that respects user-stated priorities even when the ML signal is
noisy — which is inevitable with only 10 training samples. The ML component captures
implicit preferences that the user never explicitly stated: a consistent preference for
newer buildings, quieter streets, or larger units that emerged from rating behavior.
Averaging the two scores prevents either component from dominating and produces a more
robust, balanced final ranking. OLS coefficients are passed to
/explain_result
and translated into natural language, giving users a legible account of what the model
learned from their feedback.
Technical Stack
| Frontend | Vanilla JavaScript · MapLibre GL 3.6.1 · Canvas API (custom charts) |
| Backend | FastAPI 0.121.2 · Uvicorn · Python 3.12 |
| Database | PostgreSQL 16 · PostGIS (spatial extension) · EPSG:2263 → EPSG:4326 |
| Machine Learning | scikit-learn 1.7.2 — OLS LinearRegression · StandardScaler |
| Geospatial | GeoPandas 1.1.1 · Shapely · Pandas · NumPy |
| LLM Integration | OpenAI SDK — GPT with structured JSON output (3 endpoints) |
| Data | ~3M synthetic NYC units (KNN-imputed from PLUTO) · 260 neighborhood polygons |