A basic test of OpenAI’s Structured Output feature against financial disclosure reports and a newspaper’s police blotter. Code examples use the Python SDK and pydantic for the schema definition…
A basic test of OpenAI's Structured Output feature against financial disclosure reports and a newspaper's police blotter. Code examples use the Python SDK and pydantic for the schema defini…
Read in full here:
README.openai-structured-output-demo.md
# Extracting financial disclosure reports and police blotter narratives using OpenAI's Structured Output
> **tl;dr** this demo shows how to call OpenAI's [gpt-4o-mini model](https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/), provide it with URL of a screenshot of a document, and extract data that follows a schema you define. The results are pretty solid even with little effort in defining the data — and no effort doing data prep. OpenAI's API could be a cost-efficient tool for large scale data gathering projects involving public documents.
OpenAI announced [Structured Outputs for its API](https://openai.com/index/introducing-structured-outputs-in-the-api/), a feature that allows users to specify the fields and schema of extracted data, and guarantees that the JSON output will follow that specification.
For example, given a Congressional financial disclosure report, with assets defined in a table like this:
<img width="859" alt="image" src="https://gist.github.com/user-attachments/assets/e64c7ad1-d7af-4e51-a3f2-5961fde4fac3">
This file has been truncated. show original
extract-basic-financial-disclosure.py
#!/usr/bin/env python3
"""
extract-basic-financial-disclosure.py
Parses and extracts structured data — and lets the model infer the structure by itself —
from the screenshot at the given URL:
https://gist.github.com/user-attachments/assets/9c35e7a4-e6b7-4d5b-a4a2-a62b6ec28504
Full financial disclosure report:
This file has been truncated. show original
extract-financial-disclosure.py
#!/usr/bin/env python3
"""
extract-financial-disclosure.py
Parses and extracts structured data from the screenshot at the given URL:
https://gist.github.com/user-attachments/assets/9c35e7a4-e6b7-4d5b-a4a2-a62b6ec28504
Full financial disclosure report:
This file has been truncated. show original
There are more than three files. show original
This thread was posted by one of our members via one of our news source trackers.