Skip to main content
guardrails hub install hub://guardrails/valid_sql --quiet
    Installing hub://guardrails/valid_sql...
✅Successfully installed guardrails/valid_sql!


Natural Language to Bug Free SQL

!!! note To download this example as a Jupyter notebook, click here.

In this example, we will use Guardrails to generate SQL queries from natural language. We will check that the SQL is free of any syntax errors.

Objective

We want to generate SQL queries from natural language, and check that the SQL is free of any syntax errors.

Step 0: Setup

In order to run this example, you will need to install the sqlvalidator package. You can do so by running the following command:

pip install sqlvalidator -q

Step 1: Create the RAIL Spec

Ordinarily, we would create an RAIL spec in a separate file. For the purposes of this example, we will create the spec in this notebook as a string following the RAIL syntax. For more information on RAIL, see the RAIL documentation. We will also show the same RAIL spec in a code-first format using a Pydantic model.

In this RAIL spec, we:

  1. Create an output schema that returns a single key-value pair. The key should be 'generated_sql' and the value should be the SQL query generated from the natural language, which is syntactically correct.

XML option:

rail_str = """
<rail version="0.1">

<output>
<string
name="generated_sql"
description="Generate SQL for the given natural language instruction."
format="bug-free-sql"
on-fail-bug-free-sql="reask"
/>
</output>

<messages>
<message role="user">

Generate a valid SQL query for the following natural language instruction:

${nl_instruction}

${gr.complete_xml_suffix}
</message>
</messages>


</rail>
"""

Pydantic model option:

from guardrails.hub import ValidSQL
from pydantic import BaseModel, Field

prompt = """

Generate a valid SQL query for the following natural language instruction:

${nl_instruction}

${gr.complete_xml_suffix}
"""

class ValidSql(BaseModel):
generated_sql: str = Field(description="Generate SQL for the given natural language instruction.", validators=[ValidSQL(on_fail="reask")])
    /Users/dtam/.pyenv/versions/3.12.3/envs/litellm/lib/python3.12/site-packages/sentence_transformers/cross_encoder/CrossEncoder.py:13: TqdmExperimentalWarning: Using `tqdm.autonotebook.tqdm` in notebook mode. Use `tqdm.tqdm` instead to force console mode (e.g. in jupyter console)
from tqdm.autonotebook import tqdm, trange

!!! note

In order to ensure that the SQL is syntactically correct, we use bug-free-sql as the formatter. This is a light-weight formatter that uses the sqlvalidator package to check that the SQL is free of any syntax errors. For your use case, you can create a custom SQL validator that connects to your database and checks that the SQL is valid.

Step 2: Create a Guard object with the RAIL Spec

We create a gd.Guard object that will check, validate and correct the output of the LLM. This object:

  1. Enforces the quality criteria specified in the RAIL spec.
  2. Takes corrective action when the quality criteria are not met.
  3. Compiles the schema and type info from the RAIL spec and adds it to the prompt.
import guardrails as gd

from rich import print

Create the guard from the XML RAIL spec string:

guard = gd.Guard.for_rail_string(rail_str)

Or from the Pydantic model:

guard = gd.Guard.for_pydantic(output_class=ValidSql)

Here, nl_language is the natural language instruction and will be provided by the user at runtime.

Step 3: Wrap the LLM API call with Guard

# Set your OPENAI_API_KEY as an environment variable
# import os
# os.environ["OPENAI_API_KEY"] = "YOUR_API_KEY"

raw_llm_response, validated_response, *rest = guard(
model="gpt-4o-mini",
messages=[{"role":"user", "content": prompt}],
prompt_params={
"nl_instruction": "Select the name of the employee who has the highest salary."
},
max_tokens=2048,
temperature=0,
)
    /Users/dtam/dev/guardrails/guardrails/validator_service/__init__.py:85: UserWarning: Could not obtain an event loop. Falling back to synchronous validation.
warnings.warn(

We can see the prompt that was sent to the LLM:

print(guard.history.last.iterations.last.inputs.messages[0]["content"])


Generate a valid SQL query for the following natural language instruction:

Select the name of the employee who has the highest salary.


Given below is XML that describes the information to extract from this document and the tags to extract it into.

<output>
<string description="Generate SQL for the given natural language instruction." format="guardrails/valid_sql: None
None" name="generated_sql" required="true"></string>
</output>

ONLY return a valid JSON object (no other text is necessary), where the key of the field in JSON is the `name`
attribute of the corresponding XML, and the value is of the type specified by the corresponding XML's tag. The JSON
MUST conform to the XML format, including any types and format requests e.g. requests for lists, objects and
specific types. Be correct and concise. If you are unsure anywhere, enter `null`.

Here are examples of simple (XML, JSON) pairs that show the expected behavior:
- `<string name='foo' format='two-words lower-case' />` => `{'foo': 'example one'}`
- `<list name='bar'><string format='upper-case' /></list>` => `{"bar": ['STRING ONE', 'STRING TWO', etc.]}`
- `<object name='baz'><string name="foo" format="capitalize two-words" /><integer name="index" format="1-indexed"
/></object>` => `{'baz': {'foo': 'Some String', 'index': 1}}`


The guard wrapper returns the raw_llm_respose (which is a simple string), and the validated and corrected output (which is a dictionary).

We can see that the output is a dictionary with the correct schema and types.

print(validated_response)
{'generated_sql': 'SELECT name FROM employees ORDER BY salary DESC LIMIT 1;'}
print(guard.history.last.tree)
Logs
└── ╭────────────────────────────────────────────────── Step 0 ───────────────────────────────────────────────────╮
╭─────────────────────────────────────────────── Messages ────────────────────────────────────────────────╮
│ ┏━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ │
│ ┃ Role Content ┃ │
│ ┡━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │
│ │ user │ │ │
│ │ │ │ │
│ │ │ Generate a valid SQL query for the following natural language instruction: │ │
│ │ │ │ │
│ │ │ Select the name of the employee who has the highest salary. │ │
│ │ │ │ │
│ │ │ │ │
│ │ │ Given below is XML that describes the information to extract from this document and the tags │ │
│ │ │ to extract it into. │ │
│ │ │ │ │
│ │ │ <output> │ │
│ │ │ <string description="Generate SQL for the given natural language instruction." │ │
│ │ │ format="guardrails/valid_sql: None None" name="generated_sql" required="true"></string> │ │
│ │ │ </output> │ │
│ │ │ │ │
│ │ │ ONLY return a valid JSON object (no other text is necessary), where the key of the field in │ │
│ │ │ JSON is the `name` attribute of the corresponding XML, and the value is of the type │ │
│ │ │ specified by the corresponding XML's tag. The JSON MUST conform to the XML format, including │ │
│ │ │ any types and format requests e.g. requests for lists, objects and specific types. Be │ │
│ │ │ correct and concise. If you are unsure anywhere, enter `null`. │ │
│ │ │ │ │
│ │ │ Here are examples of simple (XML, JSON) pairs that show the expected behavior: │ │
│ │ │ - `<string name='foo' format='two-words lower-case' />` => `{'foo': 'example one'}` │ │
│ │ │ - `<list name='bar'><string format='upper-case' /></list>` => `{"bar": ['STRING ONE', │ │
│ │ │ 'STRING TWO', etc.]}` │ │
│ │ │ - `<object name='baz'><string name="foo" format="capitalize two-words" /><integer │ │
│ │ │ name="index" format="1-indexed" /></object>` => `{'baz': {'foo': 'Some String', 'index': │ │
│ │ │ 1}}` │ │
│ │ │ │ │
│ │ │ │ │
│ └──────┴──────────────────────────────────────────────────────────────────────────────────────────────┘ │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭──────────────────────────────────────────── Raw LLM Output ─────────────────────────────────────────────╮
│ ```json │
│ {"generated_sql":"SELECT name FROM employees ORDER BY salary DESC LIMIT 1;"} │
│ ``` │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─────────────────────────────────────────── Validated Output ────────────────────────────────────────────╮
│ {'generated_sql': 'SELECT name FROM employees ORDER BY salary DESC LIMIT 1;'} │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯