In [11]:
!guardrails hub install hub://guardrails/valid_length --quiet
!guardrails hub install hub://guardrails/two_words --quiet
!guardrails hub install hub://guardrails/valid_range --quiet
!guardrails hub install hub://guardrails/lowercase --quiet
!guardrails hub install hub://guardrails/one_line --quiet

%pip install pypdfium2

Installing hub:[35m/[0m[35m/guardrails/[0m[95mvalid_length...[0m
✅Successfully installed guardrails/valid_length!


Installing hub:[35m/[0m[35m/guardrails/[0m[95mtwo_words...[0m
✅Successfully installed guardrails/two_words!


Installing hub:[35m/[0m[35m/guardrails/[0m[95mvalid_range...[0m
✅Successfully installed guardrails/valid_range!


Installing hub:[35m/[0m[35m/guardrails/[0m[95mlowercase...[0m
✅Successfully installed guardrails/lowercase!


Installing hub:[35m/[0m[35m/guardrails/[0m[95mone_line...[0m
✅Successfully installed guardrails/one_line!


Note: you may need to restart the kernel to use updated packages.


# Extracting entities from a Terms of Service document

!!! note
    To download this example as a Jupyter notebook, click [here](https://github.com/ShreyaR/guardrails/blob/main/docs/examples/extracting_entities.ipynb).

In this example, we will use Guardrails to extract key information from a Terms-of-Service document.

## Objective

We want to extract structured information about all fees and interest rates associated with the Chase credit card.

## Step 0: Download PDF and load it as string

To get started, download the document from [here](https://github.com/ShreyaR/guardrails/blob/main/docs/examples/data/chase_card_agreement.pdf) and save it in `data/chase_card_agreement.pdf`.

Guardrails has some built-in functions to help with common tasks. Here, we will use the `read_pdf` function to load the PDF as a string.

In [2]:
import guardrails as gd

from rich import print

content = gd.docs_utils.read_pdf("data/chase_card_agreement.pdf")

print(f"Chase Credit Card Document:\n\n{content[:275]}\n...")



## Step 1: Create the Pydantic RAIL model
Here, we request:

1. A list of the fees associated with the card. We ask for sub-information, each with its own quality criteria and corrective action.
2. A object (i.e. key-value pairs) for the interest.

In [3]:
from guardrails.hub import LowerCase, TwoWords, OneLine
from pydantic import BaseModel, Field

prompt = """
Given the following document, answer the following questions. If the answer doesn't exist in the document, enter 'None'.

${document}

${gr.complete_xml_suffix_v2}
"""


class Fee(BaseModel):
    name: str = Field(validators=[LowerCase(on_fail="fix"), TwoWords(on_fail="reask")])
    explanation: str = Field(validators=[OneLine(on_fail="noop")])
    value: float = Field(description="The fee amount in USD or as a percentage.")


class AccountFee(BaseModel):
    account_type: str = Field(validators=[LowerCase(on_fail="fix")])
    rate: float = Field(
        description="The annual percentage rate (APR) for the account type."
    )


class CreditCardAgreement(BaseModel):
    fees: list[Fee] = Field(
        description="What fees and charges are associated with my account?"
    )
    interest_rates: list[AccountFee] = Field(
        description="What are the interest rates offered by the bank on different kinds of accounts and products?"
    )

## Step 2: Create a `Guard` object with the RAIL Spec

We create a `gd.Guard` object that will check, validate and correct the output of the LLM. This object:

1. Enforces the quality criteria specified in the Pydantic RAIL spec.
2. Takes corrective action when the quality criteria are not met.
3. Compiles the schema and type info from the RAIL spec and adds it to the prompt.

In [4]:
guard = gd.Guard.for_pydantic(output_class=CreditCardAgreement)

As we can see, a few formatters weren't supported. These formatters won't be enforced in the output, but this information can still be used to generate a prompt.

We see the prompt that will be sent to the LLM. The `{document}` is substituted with the user provided value at runtime.

## Step 3: Wrap the LLM API call with `Guard`

In [8]:
# Add your OPENAI_API_KEY as an environment variable if it's not already set
# import os
# os.environ["OPENAI_API_KEY"] = "YOUR_API_KEY"

raw_llm_response, validated_response, *rest = guard(
    messages=[{"role": "user", "content": prompt}],
    prompt_params={"document": content[:6000]},
    model="gpt-4o-mini",
    max_tokens=2048,
    temperature=0,
)



The `guard` wrapper returns the raw_llm_respose (which is a simple string), and the validated and corrected output (which is a dictionary).

We can see that the output is a dictionary with the correct schema and types.

In [9]:
print(validated_response)

In [10]:
guard.history.last.tree