Multi-Source Metadata Enrichment
Missing abstracts, DOIs, ORCIDs or categories weaken every downstream analysis. BibexPy fills them with a fetch-once-fill-all strategy across seven authoritative sources — writing only verifiable values, never machine-learning inference.

The sources
| Source | Notes | | --- | --- | | CrossRef | Free; email recommended (polite pool) | | OpenAlex | Free | | Scopus | Requires an API key (institutional) | | DataCite | Free | | Unpaywall | Free; email required | | Europe PMC | Free | | Semantic Scholar | Free; optional key raises rate limits |
Keys/emails are configured on the in-app Settings page — all optional except where noted; sources without credentials are skipped gracefully.
Fetch once, fill all
For each DOI, metadata is retrieved a single time from each source, then every missing field is populated under a predefined source-priority hierarchy. This replaces v1's sequential field-level strategy and reduces redundant API requests dramatically.
Reverse-DOI recovery
Records lacking a DOI go through reverse resolution: candidate DOIs are recovered from title, year and author information, then validated by Jaro–Winkler title similarity — reliability over recall. Unvalidated candidates are discarded, not written.
Identity & category fields
- Identity integration: ORCID/OI, ResearcherID/RI, ROR and country codes/CC are pulled in for downstream disambiguation and aggregation.
- Category cross-fill: WC ↔ SC subject categories are harmonized through deterministic mapping.
No hallucinated metadata
BibexPy writes only verifiable external values or deterministic rule outputs. Nothing is inferred by a generative model — your dataset never contains made-up DOIs or abstracts.
