Introducing Guardrails Server
July 18, 2024
We're beyond excited to announce the latest release of Guardrails with an industry-leading Guardrails Server and many more new features.
A Focus on Deployment Ready Guardrails
As more teams deploy Guardrails in production, we wanted to make it easier than ever to use Guardrails for safeguarding LLMs. Key highlights of this release that enable deployment-ready Guardrails are:
- Guardrails Server to provide API-access to guarded LLMs
- OpenAI SDK compatible endpoint for accessing Guardrails
- Cross-language support for running Guards
- Guardrails watch for cli-based monitoring or guardrail execution
- JSON generation for open source Huggingface models via constrained decoding
- (In Preview) Hosted models for ML-based guardrails
Guardrails Server
This is by far the most asked for feature by all our users. Guardrails Server has a host of benefits, including:
- Easy Cloud Deployment: With the new client-server model, you can take the Guards you're running on your local server and dockerize/deploy them on the cloud. We have docs, a sample repo for dockerization and a cookbook for deploying on AWS. Cookbooks on deploying to GCP and Azure coming soon!
- OpenAI SDK Compatible Endpoint: Guardrails Server is available via an OpenAI SDK compatible endpoint. If you're using OpenAI or popular LLM routers such as litellm, portkey, etc., then you can access a Guard-ed LLM endpoint via a single line substitution. More docs on how to use this are available here.
- Cross Language Compatibility: Since the Guards now run on their own servers, the OpenAI-compatible endpoint can be used on the client in any language where the OpenAI SDK is available.
You can run guardrails create followed by guardrails start to start running a guardrails server on localhost that you can talk to from any client. Docs on how to spin up Guardrails Server are available here.
Guardrails Watch and Telemetry Updates
Running guardrails watch on the command line allows you to observe your guardrails in real time and get detailed information about the latency, span and validation outcome of any guardrails running on a guard. Read more about how to use the new watch functionality here.
Additionally, we've introduced API-level metrics that can be toggled to talk to your OpenTelemetry OTLP collector (arize, grafana, splunk, new relic, datadog, etc all have endpoints for this). To get more information on what metrics are collected and how to configure OTLP export, check out docs here.
JSON Generation for Open-Source LLMs
The latest Guardrails release offers support for getting JSON from open source Huggingface models. This is a major step forward in enabling Guardrails to be used with any LLM, not just closed-source models. The JSON generation is done via constrained decoding, which we implement using jsonformer. More information on how to use this feature is available here.
import json from guardrails import Guard from pydantic import BaseModel from transformers import pipeline class LineItem(BaseModel): quantity: int price: float name: str class Receipt(BaseModel): total: float raw_text: str items: list[LineItem] r = """Albert's Emporium for the Distinguished Adventurer\nCloak of Shadows 1x $150\nBoots of Water Walking 2x $100\nPotion Bandolier $50\nTotal: $300\nThank you for shopping with us.""" g = Guard.from_pydantic(Receipt, output_formatter="jsonformer") pipe = pipeline("text-generation", "TinyLlama/TinyLlama-1.1B-Chat-v1.0") out = g(pipe, prompt=f"Please convert this receipt: {r}") print(json.dumps(out.validated_output, indent=2))
(In Preview) Hosted Models for Model-Based Guardrails
Guardrails now has preview inference endpoints for our most popular validators. These endpoints have sub-second latency, and help you do things like check for profanity, PII, toxicity, gibberish, and more for free. Setup only requires a single opt in during configuration or hub installation. To read more about how to use hosted models, read the documentation here.
Instructions on how to self-host these models so that they're compatible with validators is coming soon!
Support Our Work
You can start using the latest Guardrails release today by installing Guardrails:
pip install guardrails-ai
If you enjoy the work we do, you can leave
- Join our Discord community: https://discord.gg/U9RKkZSBgx
- Star the Github Repo: https://github.com/guardrails-ai/guardrails
- Sign up for Guardrails Hub: https://hub.guardrailsai.com/
- We're always looking for motivated contributors. If you're interested in contributing to the project, check out the list of open issues: https://github.com/guardrails-ai/guardrails/issues
Tags:
Similar ones you might find interesting
Handling fix results for streaming
How we handle fix results for streaming in Guardrails.
How we rewrote LLM Streaming to deal with validation failures
The new pipeline for LLM Streaming now includes ways to merge fixes across chunks after validating.
Latency and usability upgrades for ML-based validators
The numbers behind our validators