Skip to main content
guardrails hub install hub://guardrails/similar_to_previous_values

Check whether a value is similar to a set of other values

Using the SimilarToPreviousValues validator

This validator validates whether a new value is similar to a set of previously known values. It is useful for checking whether a new value is within a distribution of known values. It supports both integer and string values.

For integer values, this validator checks whether the value lies within the specified standard deviations of the mean of the previous values. (Assumes that the previous values are normally distributed.)

For string values, this validator checks whether the average semantic similarity between the generated value and the previous values is less than a threshold.

import guardrails as gd
# Create the Guard with the SimilarToList validator
from guardrails.hub import SimilarToPreviousValues

guard = gd.Guard.from_string(
validators=[SimilarToPreviousValues(standard_deviations=2, threshold=0.2, on_fail="fix")],
description="testmeout",
)

Test with integer values

# Test with a value that is within the distribution
output = guard.parse(
llm_output='3',
metadata={"prev_values": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]},
)

print(output.validated_output)
    3

As 3 is within the distribution of the given prev_values, the validator returns 3 as it is.

# Test with a value that is outside the distribution
output = guard.parse(
llm_output='300',
metadata={"prev_values": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]},
)

print(output.validated_output)
    None

As 300 is not within the distribution of the given prev_values, the validator returns None.

Test with string values

# Define embed function
# Create an embedding function that uses openAI Ada to embed the text.
import numpy as np
import openai

# Define the embed function
def embed_function(text: str) -> np.ndarray:
"""Embed the text using openAI Ada."""
response = openai.embeddings.create(
model="text-embedding-ada-002",
input=text,
)
embeddings = response.data[0].embedding
# response["data"][0]["embedding"]
return np.array(embeddings)
# Test with a value that is similar to prev values
# As it is, it will return that value
output = guard.parse(
llm_output="cisco",
metadata={
"prev_values": ["broadcom", "paypal"],
"embed_function": embed_function,
},
)

print(output.validated_output)
    HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


cisco
# Test with a value that is not similar to prev values
# As it is not, it will return None
output = guard.parse(
llm_output="taj mahal",
metadata={
"prev_values": ["broadcom", "paypal"],
"embed_function": embed_function,
},
)

print(output.validated_output)
    HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


None