BibexPy — V2 Helium

Multi-Source Metadata Enrichment

Missing abstracts, DOIs, ORCIDs or categories weaken every downstream analysis. BibexPy fills them with a fetch-once-fill-all strategy across seven authoritative sources — writing only verifiable values, never machine-learning inference.

Enrichment diagram
Reverse-DOI recovery → single fetch per DOI → fixed source-priority fill — verifiable values only.

The sources

| Source | Notes | | --- | --- | | CrossRef | Free; email recommended (polite pool) | | OpenAlex | Free | | Scopus | Requires an API key (institutional) | | DataCite | Free | | Unpaywall | Free; email required | | Europe PMC | Free | | Semantic Scholar | Free; optional key raises rate limits |

Keys/emails are configured on the in-app Settings page — all optional except where noted; sources without credentials are skipped gracefully.

Fetch once, fill all

For each DOI, metadata is retrieved a single time from each source, then every missing field is populated under a predefined source-priority hierarchy. This replaces v1's sequential field-level strategy and reduces redundant API requests dramatically.

Reverse-DOI recovery

Records lacking a DOI go through reverse resolution: candidate DOIs are recovered from title, year and author information, then validated by Jaro–Winkler title similarity — reliability over recall. Unvalidated candidates are discarded, not written.

Identity & category fields

  • Identity integration: ORCID/OI, ResearcherID/RI, ROR and country codes/CC are pulled in for downstream disambiguation and aggregation.
  • Category cross-fill: WC ↔ SC subject categories are harmonized through deterministic mapping.

No hallucinated metadata

BibexPy writes only verifiable external values or deterministic rule outputs. Nothing is inferred by a generative model — your dataset never contains made-up DOIs or abstracts.