Github
Python Package
Data Wrangling
Data Pipeline Tooling
The package provides utilities to prepare tables into merge-ready structures, measure similarity between tables and columns, and automatically suggest likely merge keys based on patterns in the data. It also includes diagnostic tools that explain why a merge succeeded or failed, highlighting issues like low key uniqueness, key mismatches, duplicates, or unexpected row expansion, so you can fix problems quickly and merge with confidence.
*mergeprep was created with cookiecutter and the py-pkgs-cookiecutter template.
Details are available in above github page.
Functions
1. calc_match_rate()
Compares every column pair between two tables and quantifies how much their values overlap to identify likely merge keys.
2. convert_style()
Standardizes values from two columns into a common format so that differently written but equivalent entries can be merged reliably.
3. similarity_mapping()
Analyzes two columns to find and rank similar values, producing a mapping that aligns mismatched labels across tables.
4. merge_with_mapping()
Merges two tables using a shared canonical key derived from optional value mappings, while preserving the original merge context.
5. diagnose_merge()
Summarizes a merge by reporting match rates, row counts, and value conversions to explain why the merge succeeded or failed.