Features

Core Capabilities

EdgeMask provides essential enterprise capabilities: real-time surgical PII redaction for SOC2/HIPAA compliance, and advanced Vector-Redis hybrid caching to slash LLM costs and latency.

Surgical PII Redaction Engine

Security

Our engine intercepts the data stream to LLMs in microseconds. It guarantees your internal data never leaks to OpenAI or Anthropic servers, ensuring SOC2, GDPR, and HIPAA compliance limitlessly.

Zero Retention Note: Raw unmasked prompts are NEVER written to disk on our system. Only anonymized metadata (e.g. pii_detected: true) makes it into the analytics logs.

Surgical JSON Redaction Example

Your App Sends (Raw):

request.json

json

{
  "model": "gpt-4-turbo",
  "messages": [
    {
      "role": "user",
      "content": "Customer's AWS password is AKIAIOSFODNN7EXAMPLE and SSN is 123-45-6789. Initialize the system."
    }
  ]
}

EdgeMask Forwards (Safe):

forwarded.json

json

{
  "model": "gpt-4-turbo",
  "messages": [
    {
      "role": "user",
      "content": "Customer's AWS password is [REDACTED_AWS_KEY] and SSN is [REDACTED_SSN]. Initialize the system."
    }
  ]
}

Semantic Cache & Cost Optimization

FinOps

EdgeMask is more than a firewall; it dramatically slashes operational LLM costs through a sophisticated pgvector + Redis hybrid caching infrastructure.

A large portion of LLM queries in modern workflows share extremely high semantic similarities. EdgeMask converts incoming requests into spatial embeddings and maps them. "I forgot my password" and "I don't remember my passcode" are matched semantically in milliseconds.

Enterprise Impact for CTOs

Up to 40% Cost Reduction

On a cache hit, requests are never sent to external providers. You pay zero token inference costs, slashing the ongoing billing overhead for B2B applications.

~8ms Super low Latency

Bypass LLM inference bottlenecks completely. Operations that typically take 2-5 seconds are reduced to near-instantaneous ~8ms cache retrievals, maximizing the end-user experience.

Vendor Rate Limit Defense

Cached responses mitigate RPM/TPM pressure on your provider's API. Under high traffic loads, EdgeMask acts as a load-absorber, preventing "429 Too Many Requests" errors.

Dashboard Integration: You can track total_semantic_cache_hits and savings in real-time from the Dashboard analytics screens, providing full financial transparency to your executive team.