A Coding Implementation to Build Agent-Native Memory Infrastructure with Memori for Persistent Multi-User and Multi-Session LLM Applications

Source: MarkTechPost

In this tutorial, we implement how Memori serves as an agent-native memory infrastructure layer for building more persistent, context-aware LLM applications. We start by setting up Memori in a Google Colab environment and connecting it to both synchronous and asynchronous OpenAI clients, so that every model call can automatically pass through the memory layer. We then move on to practical examples that show how user data is stored, retrieved, and separated across different identities, agent roles, and sessions. We also test streaming responses, async calls, and a small customer-support agent workflow to understand how memory behaves in realistic multi-turn applications. By the end of the tutorial, we gain a clear understanding of how Memori helps us build AI agents that do not treat each conversation in isolation but instead retain useful context across interactions.

import subprocess, sys def _pip(*pkgs):    subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", *pkgs]) _pip("memori>=3.3.0", "openai>=1.40.0", "nest_asyncio") import os, getpass, time, uuid, asyncio import nest_asyncio; nest_asyncio.apply() if not os.getenv("OPENAI_API_KEY"):    os.environ["OPENAI_API_KEY"] = getpass.getpass("OPENAI_API_KEY: ") if not os.getenv("MEMORI_API_KEY"):    v = getpass.getpass("MEMORI_API_KEY (leave blank for rate-limited tier): ")    if v.strip():        os.environ["MEMORI_API_KEY"] = v.strip()    else:        print("→ No MEMORI_API_KEY set. Continuing with rate-limited tier.")

We install Memori, OpenAI, and Nest AsyncIO so the tutorial runs smoothly inside Google Colab. We load the required Python modules and prepare the notebook to handle async execution without runtime issues. We also collect the OpenAI API key and optional Memori API key, allowing the workflow to run either with authenticated Memori access or the rate-limited tier.

from memori import Memori from openai import OpenAI, AsyncOpenAI client       = OpenAI() async_client = AsyncOpenAI() mem = Memori() mem.llm.register(client) mem.llm.register(async_client) MODEL        = "gpt-4o-mini" WRITE_DELAY  = 6 def ask(prompt, system=None):    msgs = []    if system: msgs.append({"role": "system", "content": system})    msgs.append({"role": "user", "content": prompt})    r = client.chat.completions.create(model=MODEL, messages=msgs)    return r.choices[0].message.content def banner(t): print("n" + "="*78 + f"n {t}n" + "="*78)

We import Memori and create both synchronous and asynchronous OpenAI clients for different LLM interaction patterns. We register both clients with Memori so that memory can automatically intercept and enrich chat completion calls. We also define a reusable ask() helper and a banner() function to keep the tutorial output clean and organized.

banner("Part 1 — Basic memory: facts persist across turns") mem.attribution(entity_id="[email protected]", process_id="personal-assistant") ask("My name is Alice. I love hiking, Italian food, and I'm allergic to peanuts.") time.sleep(WRITE_DELAY) print("[Alice]", ask("What do you know about me? Be specific.")) banner("Part 2 — Multi-tenant memory: Bob's facts don't leak into Alice's recall") mem.attribution(entity_id="[email protected]", process_id="personal-assistant") ask("I'm Bob. Vegetarian, write Rust for a living, live in Berlin.") time.sleep(WRITE_DELAY) mem.attribution(entity_id="[email protected]", process_id="personal-assistant") print("[Alice]", ask("What's my favorite cuisine and any dietary issues?")) mem.attribution(entity_id="[email protected]", process_id="personal-assistant") print("[Bob]  ", ask("Which programming language do I write professionally?"))

We begin by testing basic memory persistence: Alice shares personal facts, and the model later recalls them. We then switch to Bob and store a separate set of details to demonstrate multi-tenant memory isolation. We return to Alice and Bob separately to confirm that each user’s facts remain scoped to the correct entity.

banner("Part 3 — Same user, different agent personas via process_id") mem.attribution(entity_id="[email protected]", process_id="fitness-coach") ask("Goal: sub-25-minute 5K by June. Currently I run 30 minutes flat.") time.sleep(WRITE_DELAY) mem.attribution(entity_id="[email protected]", process_id="meal-planner") ask("Prefer low-carb dinners on weekdays.") time.sleep(WRITE_DELAY) mem.attribution(entity_id="[email protected]", process_id="fitness-coach") print("[fitness-coach]", ask("Remind me of my running goal.")) mem.attribution(entity_id="[email protected]", process_id="meal-planner") print("[meal-planner] ", ask("Suggest tonight's dinner.")) banner("Part 4 — Sessions group related turns") mem.attribution(entity_id="[email protected]", process_id="personal-assistant") project_session = f"project-fastapi-{uuid.uuid4().hex[:8]}" mem.set_session(project_session) ask("Notes: building a FastAPI app called 'Lighthouse', Python 3.12, "    "deploying to Fly.io.") time.sleep(WRITE_DELAY) ask("Decision: SQLAlchemy + Alembic for the data layer.") time.sleep(WRITE_DELAY) mem.new_session() ask("Random aside: I just adopted a puppy named Mochi.") time.sleep(WRITE_DELAY) mem.set_session(project_session) print("[project session]",      ask("Summarize what we've decided about Lighthouse so far."))

We show how the same user can have different memories across different agent personas using separate process_id values. We store Alice’s fitness goal under a fitness coach and her dinner preference under a meal planner, then verify that each agent recalls only its relevant context. We also create a project-specific session for a FastAPI app and show how session management keeps related project decisions separate from unrelated personal details.

banner("Part 5 — Streaming") mem.attribution(entity_id="[email protected]", process_id="personal-assistant") stream = client.chat.completions.create(    model=MODEL,    messages=[{"role": "user",               "content": "In two sentences, what do you remember about me?"}],    stream=True, ) print("[stream] ", end="") for chunk in stream:    d = chunk.choices[0].delta.content    if d: print(d, end="", flush=True) print(); time.sleep(WRITE_DELAY) banner("Part 6 — Async LLM calls") async def async_demo():    r = await async_client.chat.completions.create(        model=MODEL,        messages=[{"role": "user",                   "content": "What dietary restriction do I have? (asked async)"}],    )    return r.choices[0].message.content print("[async]", asyncio.run(async_demo())) banner("Part 7 — Mini support agent across multiple sessions") def support(user_id, prompt):    mem.attribution(entity_id=user_id, process_id="support-bot")    return ask(prompt, system=(        "You are a calm, helpful customer support agent. "        "Use what you remember about the user. If you don't know, say so."    )) USER = "[email protected]" mem.attribution(entity_id=USER, process_id="support-bot") mem.new_session() print("[support T1]", support(USER,    "Hi! I'm Charlie, on the Pro plan. Email: [email protected]. "    "Billing question for next month.")) time.sleep(WRITE_DELAY) mem.new_session() print("[support T2]", support(USER,    "Hey, me again. What plan am I on and what's my email of record?")) banner("Done. Open https://app.memorilabs.ai to inspect memories, "       "or use Memori BYODB to point at your own Postgres.")

We test Memori with streaming responses to confirm that memory continues working when tokens arrive incrementally. We then run an asynchronous OpenAI call and verify that the async client can also access stored user context. Also, we built a mini support-agent flow that remembers Charlie’s plan and email across separate sessions, demonstrating how Memori supports realistic, long-term customer interactions.

In conclusion, we built and tested a complete Memori-powered memory workflow for LLM agents. We saw how Memori stores basic user preferences, keeps Alice’s and Bob’s memories isolated, and allows the same user to maintain different memories across separate agent personas, such as a fitness coach and a meal planner. We also explored how sessions help us group project-specific conversations, while unrelated details stay outside the active session context. Beyond basic recall, we verified that Memori continues to work with streaming outputs, asynchronous OpenAI calls, and a mini support-agent scenario where a user’s plan and email are remembered across new conversations. Also, we created a practical foundation for building personalized AI assistants, support bots, workflow agents, and multi-agent systems that remember important context while keeping memory organized, scoped, and reusable.

Check out the Full Codes with Notebook here. Also, feel free to follow us on Twitter and don’t forget to join our 150k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? Connect with us

Sana Hassan

Sana Hassan, a consulting intern at Marktechpost and dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. With a keen interest in solving practical problems, he brings a fresh perspective to the intersection of AI and real-life solutions.