Patent structure extraction

Turn patent PDFs into usable molecular data.

Structured rows. Molecule fields. Review-ready outputs.

456 structured rows in the anonymous chemistry benchmark
433 rows with SMILES in that benchmark run
57.2 minutes total wall time in the public benchmark snapshot

What you get

1. PDF intake

Send a single patent PDF or a small set of representative documents.

2. Structure pass

Pull chemistry-heavy rows instead of leaving the result as screenshots.

3. Review output

Return files your team can inspect, filter, and QC.

Useful outputs

  • SMILES and structure-heavy rows
  • IUPAC names and text-forward chemistry mentions
  • Bioactivity context such as IC50 where evidence exists

Best for

  • Chemistry-heavy patents with images, schemes, and dense tables
  • Private pilot work after first-pass target search is narrowed
  • Row-level outputs for medicinal chemistry or review workflows
  • Teams that want evidence-linked outputs, not only OCR text

FAQ

Do you need a whole patent set, or can I send a single PDF?

One PDF is enough to start. A single chemistry-heavy patent can be a good trial case.

What kinds of fields can PatWinnow return?

Depending on the document, useful fields can include structure rows, SMILES, IUPAC names, and evidence-linked IC50 or other bioactivity context.

Is structure extraction always the first step?

Usually no. It is often better to search first, filter the set, then run deeper extraction only on the strongest documents.

What should I send for a structure extraction trial?

Send one to several representative PDFs, the fields that matter most, and any review deadline.

Request a trial

Send a few representative PDFs and the output fields you care about.

If you already know which patents are high-value, start there. If not, we can scope the search step first and then move into deeper extraction.