How to Design a Fully Functional Enterprise AI Assistant with Retrieval Augmentation and Policy Guardrails Using Open Source AI Models

Source: MarkTechPost

In this tutorial, we explore how we can build a compact yet powerful Enterprise AI assistant that runs effortlessly on Colab. We start by integrating retrieval-augmented generation (RAG) using FAISS for document retrieval and FLAN-T5 for text generation, both fully open-source and free. As we progress, we embed enterprise policies such as data redaction, access control, and PII protection directly into the workflow, ensuring our system is intelligent and compliant. Check out the FULL CODES here.

!pip -q install faiss-cpu transformers==4.44.2 accelerate sentence-transformers==3.0.1   from typing import List, Dict, Tuple import re, textwrap, numpy as np, torch from sentence_transformers import SentenceTransformer import faiss from transformers import pipeline, AutoTokenizer, AutoModelForSeq2SeqLM   GEN_MODEL = "google/flan-t5-base" EMB_MODEL = "sentence-transformers/all-MiniLM-L6-v2"   gen_tok = AutoTokenizer.from_pretrained(GEN_MODEL) gen_model = AutoModelForSeq2SeqLM.from_pretrained(GEN_MODEL, device_map="auto") generate = pipeline("text2text-generation", model=gen_model, tokenizer=gen_tok)   emb_device = "cuda" if torch.cuda.is_available() else "cpu" emb_model = SentenceTransformer(EMB_MODEL, device=emb_device)

We begin by setting up our environment and loading the required models. We initialize FLAN-T5 for text generation and MiniLM for embedding representations. We ensure both models are configured to automatically use the GPU when available, so our pipeline runs efficiently. Check out the FULL CODES here.

DOCS = [  {"id":"policy_sec_001","title":"Data Security Policy",   "text":"All customer data must be encrypted at rest (AES-256) and in transit (TLS 1.2+). Access is role-based (RBAC). Secrets are stored in a managed vault. Backups run nightly with 35-day retention. PII includes name, email, phone, address, PAN/Aadhaar."},  {"id":"policy_ai_002","title":"Responsible AI Guidelines",   "text":"Use internal models for confidential data. Retrieval sources must be logged. No customer decisioning without human-in-the-loop. Redact PII in prompts and outputs. All model prompts and outputs are stored for audit for 180 days."},  {"id":"runbook_inc_003","title":"Incident Response Runbook",   "text":"If a suspected breach occurs, page on-call SecOps. Rotate keys, isolate affected services, perform forensic capture, notify DPO within regulatory SLA. Communicate via the incident room only."},  {"id":"sop_sales_004","title":"Sales SOP - Enterprise Deals",   "text":"For RFPs, use the approved security questionnaire responses. Claims must match policy_sec_001. Custom clauses need Legal sign-off. Keep records in CRM with deal room links."} ]   def chunk(text:str, chunk_size=600, overlap=80):    w = text.split()    if len(w) <= chunk_size: return [text]    out=[]; i=0    while i < len(w):        j=min(i+chunk_size, len(w)); out.append(" ".join(w[i:j]))        if j==len(w): break        i = j - overlap    return out   CORPUS=[] for d in DOCS:    for i,c in enumerate(chunk(d["text"])):        CORPUS.append({"doc_id":d["id"],"title":d["title"],"chunk_id":i,"text":c})

We create a small enterprise-style document set to simulate internal policies and procedures. We then break these long texts into manageable chunks so they can be embedded and retrieved effectively. This chunking helps our AI assistant handle contextual information with better precision. Check out the FULL CODES here.

def build_index(chunks:List[Dict]) -> Tuple[faiss.IndexFlatIP, np.ndarray]:    vecs = emb_model.encode([c["text"] for c in chunks], normalize_embeddings=True, convert_to_numpy=True)    index = faiss.IndexFlatIP(vecs.shape[1]); index.add(vecs); return index, vecs   INDEX, VECS = build_index(CORPUS)   PII_PATTERNS = [    (re.compile(r"bd{10}b"), ""),    (re.compile(r"b[A-Z0-9._%+-]+@[A-Z0-9.-]+.[A-Z]{2,}b", re.I), ""),    (re.compile(r"bd{12}b"), ""),    (re.compile(r"b[A-Z]{5}d{4}[A-Z]b"), "") ] def redact(t:str)->str:    for p,r in PII_PATTERNS: t = p.sub(r, t)    return t   POLICY_DISALLOWED = [    re.compile(r"b(share|exfiltrate)b.*b(raw|all)b.*bdatab", re.I),    re.compile(r"bdisableb.*bencryptionb", re.I), ] def policy_check(q:str):    for r in POLICY_DISALLOWED:        if r.search(q): return False, "Request violates security policy (data exfiltration/encryption tampering)."    return True, ""

We embed all chunks using Sentence Transformers and store them in a FAISS index for fast retrieval. We introduce PII redaction rules and policy checks to prevent misuse of data. By doing this, we ensure our assistant adheres to enterprise security and compliance guidelines. Check out the FULL CODES here.

def retrieve(query:str, k=4)->List[Dict]:    qv = emb_model.encode([query], normalize_embeddings=True, convert_to_numpy=True)    scores, idxs = INDEX.search(qv, k)    return [{**CORPUS[i], "score": float(s)} for s,i in zip(scores[0], idxs[0])]   SYSTEM = ("You are an enterprise AI assistant.n"          "- Answer strictly from the provided CONTEXT.n"          "- If missing info, say what is unknown and suggest the correct policy/runbook.n"          "- Keep it concise and cite titles + doc_ids inline like [Title (doc_id:chunk)].") def build_prompt(user_q:str, ctx_blocks:List[Dict])->str:    ctx = "nn".join(f"[{i+1}] {b['title']} (doc:{b['doc_id']}:{b['chunk_id']})n{b['text']}" for i,b in enumerate(ctx_blocks))    uq = redact(user_q)    return f"SYSTEM:n{SYSTEM}nnCONTEXT:n{ctx}nnUSER QUESTION:n{uq}nnINSTRUCTIONS:n- Cite sources inline.n- Keep to 5-8 sentences.n- Preserve redactions."   def answer(user_q:str, k=4, max_new_tokens=220)->Dict:    ok,msg = policy_check(user_q)    if not ok: return {"answer": f"❌ {msg}", "ctx":[]}    ctx = retrieve(user_q, k=k); prompt = build_prompt(user_q, ctx)    out = generate(prompt, max_new_tokens=max_new_tokens, do_sample=False)[0]["generated_text"].strip()    return {"answer": out, "ctx": ctx}

We design the retrieval function to fetch relevant document sections for each user query. We then construct a structured prompt combining context and questions for FLAN-T5 to generate precise answers. This step ensures that our assistant produces grounded, policy-compliant responses. Check out the FULL CODES here.

def eval_query(user_q:str, ctx:List[Dict])->Dict:    terms = [w.lower() for w in re.findall(r"[a-zA-Z]{4,}", user_q)]    ctx_text = " ".join(c["text"].lower() for c in ctx)    hits = sum(t in ctx_text for t in terms)    return {"terms": len(terms), "hits": hits, "hit_rate": round(hits/max(1,len(terms)), 2)}   QUERIES = [    "What encryption and backup rules do we follow for customer data?",    "Can we auto-answer RFP security questionnaires? What should we cite?",    "If there is a suspected breach, what are the first three steps?",    "Is it allowed to share all raw customer data externally for testing?" ] for q in QUERIES:    res = answer(q, k=3)    print("n" + "="*100); print("Q:", q); print("nA:", res["answer"])    if res["ctx"]:        ev = eval_query(q, res["ctx"]); print("nRetrieved Context (top 3):")        for r in res["ctx"]: print(f"- {r['title']} [{r['doc_id']}:{r['chunk_id']}] score={r['score']:.3f}")        print("Eval:", ev)

We evaluate our system using sample enterprise queries that test encryption, RFPs, and incident procedures. We display retrieved documents, answers, and simple hit-rate scores to check relevance. Through this demo, we observe our Enterprise AI assistant performing retrieval-augmented reasoning securely and accurately.

In conclusion, we successfully created a self-contained enterprise AI system that retrieves, analyzes, and responds to business queries while maintaining strong guardrails. We appreciate how seamlessly we can combine FAISS for retrieval, Sentence Transformers for embeddings, and FLAN-T5 for generation to simulate an internal enterprise knowledge engine. As we finish, we realize that this simple Colab-based implementation can serve as a blueprint for scalable, auditable, and compliant enterprise deployments.

Check out the FULL CODES here. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Asif Razzaq

Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.