How to Build a Self-Designing Meta-Agent That Automatically Constructs, Instantiates, and Refines Task-Specific AI Agents

how-to-build-a-self-designing-meta-agent-that-automatically-constructs,-instantiates,-and-refines-task-specific-ai-agents

Source: MarkTechPost

In this tutorial, we build a Meta-Agent that designs other agents automatically from a simple task description. We implement a system that analyzes the task, selects tools, chooses a memory architecture, configures a planner, and then instantiates a fully working agent runtime. We go beyond static agent templates and instead build a dynamic, self-configuring architecture that can evaluate its own performance and refine itself as needed. We also demonstrate how agent design automation, tool selection, memory strategy, and iterative self-improvement can be unified into a cohesive, Colab-ready framework.

import os, re, json, math, time, textwrap, traceback, random from dataclasses import dataclass from typing import Any, Dict, List, Optional, Callable, Tuple   def _pip_install():    try:        import pydantic        import transformers        return    except Exception:        pass    import sys, subprocess    pkgs = [        "pydantic>=2.6.0",        "transformers>=4.41.0",        "accelerate>=0.30.0",        "sentencepiece",        "torch",        "numpy",        "scikit-learn",        "pandas",    ]    subprocess.check_call([sys.executable, "-m", "pip", "install", "-q"] + pkgs)   _pip_install()   import numpy as np import pandas as pd from pydantic import BaseModel, Field from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.neighbors import NearestNeighbors   try:    from transformers import pipeline, AutoTokenizer, AutoModelForSeq2SeqLM    _HAS_TRANSFORMERS = True except Exception:    _HAS_TRANSFORMERS = False   class ToolSpec(BaseModel):    name: str    description: str    inputs_schema: Dict[str, Any] = Field(default_factory=dict)   class MemorySpec(BaseModel):    kind: str = Field(default="scratchpad", description="scratchpad | retrieval_tfidf")    max_items: int = 200    retrieval_k: int = 5   class PlannerSpec(BaseModel):    kind: str = Field(default="react", description="react | plan_execute")    max_steps: int = 10    temperature: float = 0.2   class AgentConfig(BaseModel):    agent_name: str = "DesignedAgent"    objective: str    planner: PlannerSpec    memory: MemorySpec    tools: List[ToolSpec]    output_style: str = "concise"    safety_rules: List[str] = Field(default_factory=lambda: [        "Do not execute arbitrary OS commands.",        "Refuse harmful/illegal instructions; suggest safe alternatives.",        "If uncertain, ask for missing inputs or state assumptions.",    ])

We set up the complete foundational environment for our meta-agent system. We install required dependencies, import all necessary libraries, and define the core configuration schemas using Pydantic. We formalize structured specifications for tools, memory, planner, and the overall agent configuration to enable typed, automated agent construction.

class LocalLLM:    def __init__(self, model_name: str = "google/flan-t5-small", device: int = -1):        self.model_name = model_name        self.device = device        self._pipe = None        self._fallback = False          if not _HAS_TRANSFORMERS:            self._fallback = True            return          try:            tok = AutoTokenizer.from_pretrained(model_name)            mdl = AutoModelForSeq2SeqLM.from_pretrained(model_name)            self._pipe = pipeline(                "text2text-generation",                model=mdl,                tokenizer=tok,                device=device,            )        except Exception:            self._fallback = True      def generate(self, prompt: str, max_new_tokens: int = 256, temperature: float = 0.2) -> str:        if self._fallback or self._pipe is None:            return self._heuristic(prompt)          try:            out = self._pipe(                prompt,                max_new_tokens=max_new_tokens,                do_sample=temperature > 0,                temperature=max(temperature, 1e-6),                num_return_sequences=1,            )[0]["generated_text"]            return out.strip()        except Exception:            return self._heuristic(prompt)      def _heuristic(self, prompt: str) -> str:        p = prompt.lower()        if "tool" in p and "json" in p:            return '{"action":"final","final":"(fallback) I can’t load the model. Provide more details or enable internet in Colab to download the model."}'        return "(fallback) I can’t load the model. Please ensure Colab has internet access and retry."   class ScratchpadMemory:    def __init__(self, max_items: int = 200):        self.max_items = max_items        self.items: List[Dict[str, str]] = []      def add(self, role: str, content: str):        self.items.append({"role": role, "content": content})        if len(self.items) > self.max_items:            self.items = self.items[-self.max_items:]      def recent(self, n: int = 12) -> List[Dict[str, str]]:        return self.items[-n:]      def retrieve(self, query: str, k: int = 5) -> List[Dict[str, str]]:        return self.recent(k)   class TfidfRetrievalMemory:    def __init__(self, max_items: int = 200, retrieval_k: int = 5):        self.max_items = max_items        self.retrieval_k = retrieval_k        self.items: List[Dict[str, str]] = []        self._vectorizer = TfidfVectorizer(stop_words="english")        self._nn = None        self._X = None      def add(self, role: str, content: str):        self.items.append({"role": role, "content": content})        if len(self.items) > self.max_items:            self.items = self.items[-self.max_items:]        self._rebuild_index()      def _rebuild_index(self):        docs = [it["content"] for it in self.items] or [""]        self._X = self._vectorizer.fit_transform(docs)        n_neighbors = min(self.retrieval_k, self._X.shape[0])        self._nn = NearestNeighbors(n_neighbors=n_neighbors, metric="cosine")        self._nn.fit(self._X)      def recent(self, n: int = 12) -> List[Dict[str, str]]:        return self.items[-n:]      def retrieve(self, query: str, k: Optional[int] = None) -> List[Dict[str, str]]:        if not self.items:            return []        if self._nn is None:            self._rebuild_index()        k = k or self.retrieval_k        q = self._vectorizer.transform([query])        n_neighbors = min(k, self._X.shape[0])        dists, idx = self._nn.kneighbors(q, n_neighbors=n_neighbors)        hits = [self.items[i] for i in idx[0].tolist()]        return hits

We implement the LocalLLM wrapper that powers reasoning and tool-selection behavior. We configure a lightweight open-source model with a safe fallback mechanism to ensure robustness in Colab. We also define both scratchpad and retrieval-based memory systems to support contextual and semantic recall.

class ToolResult(BaseModel):    ok: bool    output: str    data: Optional[Any] = None   class Tool:    def __init__(self, name: str, description: str, fn: Callable[..., ToolResult], inputs_schema: Dict[str, Any]):        self.name = name        self.description = description        self.fn = fn        self.inputs_schema = inputs_schema      def call(self, **kwargs) -> ToolResult:        try:            return self.fn(**kwargs)        except Exception as e:            return ToolResult(ok=False, output=f"Tool error: {e}n{traceback.format_exc()}")   class ToolRegistry:    def __init__(self):        self._tools: Dict[str, Tool] = {}      def register(self, tool: Tool):        self._tools[tool.name] = tool      def has(self, name: str) -> bool:        return name in self._tools      def specs(self) -> List[ToolSpec]:        return [            ToolSpec(name=t.name, description=t.description, inputs_schema=t.inputs_schema)            for t in self._tools.values()        ]      def call(self, name: str, args: Dict[str, Any]) -> ToolResult:        if name not in self._tools:            return ToolResult(ok=False, output=f"Unknown tool: {name}")        return self._tools[name].call(**args)   _ALLOWED_MATH = {    "abs": abs, "round": round, "min": min, "max": max,    "sqrt": math.sqrt, "log": math.log, "exp": math.exp,    "sin": math.sin, "cos": math.cos, "tan": math.tan,    "pi": math.pi, "e": math.e } def tool_calc(expression: str) -> ToolResult:    expr = expression.strip()    if not expr:        return ToolResult(ok=False, output="Empty expression.")    if re.search(r"[A-Za-z_]w*", expr):        names = set(re.findall(r"[A-Za-z_]w*", expr))        bad = [n for n in names if n not in _ALLOWED_MATH]        if bad:            return ToolResult(ok=False, output=f"Disallowed names in expression: {bad}")    if re.search(r"__|import|exec|eval|open|os.|sys.", expr):        return ToolResult(ok=False, output="Disallowed tokens in expression.")    try:        val = eval(expr, {"__builtins__": {}}, dict(_ALLOWED_MATH))        return ToolResult(ok=True, output=str(val), data=val)    except Exception as e:        return ToolResult(ok=False, output=f"Failed to evaluate: {e}")   def tool_text_stats(text: str) -> ToolResult:    s = text or ""    words = re.findall(r"w+", s)    lines = s.splitlines() if s else []    out = {        "chars": len(s),        "words": len(words),        "lines": len(lines),        "unique_words": len(set(w.lower() for w in words)),    }    return ToolResult(ok=True, output=json.dumps(out, indent=2), data=out)   def tool_csv_profile(path: str, n_rows: int = 5) -> ToolResult:    try:        df = pd.read_csv(path)    except Exception as e:        return ToolResult(ok=False, output=f"Could not read CSV: {e}")    head = df.head(n_rows)    desc = df.describe(include="all").transpose().head(30)    out = (        f"Shape: {df.shape}nn"        f"Columns: {list(df.columns)}nn"        f"Head({n_rows}):n{head}nn"        f"Describe(top 30 cols):n{desc}n"    )    return ToolResult(ok=True, output=out, data={"shape": df.shape, "columns": list(df.columns)})   def default_tool_registry() -> ToolRegistry:    reg = ToolRegistry()    reg.register(Tool(        name="calc",        description="Evaluate a safe mathematical expression (no arbitrary code).",        fn=lambda expression: tool_calc(expression),        inputs_schema={"type":"object","properties":{"expression":{"type":"string"}}, "required":["expression"]}    ))    reg.register(Tool(        name="text_stats",        description="Compute basic statistics about a text blob (words, lines, unique words).",        fn=lambda text: tool_text_stats(text),        inputs_schema={"type":"object","properties":{"text":{"type":"string"}}, "required":["text"]}    ))    reg.register(Tool(        name="csv_profile",        description="Load a CSV from a local path and print a quick profile (head, describe).",        fn=lambda path, n_rows=5: tool_csv_profile(path, n_rows),        inputs_schema={"type":"object","properties":{"path":{"type":"string"},"n_rows":{"type":"integer"}}, "required":["path"]}    ))    return reg

We build the full tool infrastructure including tool registration, safe execution, and structured outputs. We implement secure mathematical evaluation, text statistics analysis, and CSV profiling capabilities. We design the ToolRegistry abstraction to allow the meta-agent to dynamically select and invoke tools during runtime.

class AgentRuntime:    def __init__(self, config: AgentConfig, llm: LocalLLM, tools: ToolRegistry, memory):        self.config = config        self.llm = llm        self.tools = tools        self.memory = memory      def _tool_prompt(self) -> str:        specs = self.config.tools        lines = []        for t in specs:            lines.append(f"- {t.name}: {t.description} | inputs_schema={json.dumps(t.inputs_schema)}")        return "n".join(lines)      def _format_context(self, task: str) -> str:        retrieved = self.memory.retrieve(task, k=getattr(self.config.memory, "retrieval_k", 5))        recent = self.memory.recent(8)          def pack(items):            return "n".join([f"[{it['role']}] {it['content']}" for it in items])          return (            f"OBJECTIVE:n{self.config.objective}nn"            f"TASK:n{task}nn"            f"SAFETY RULES:n- " + "n- ".join(self.config.safety_rules) + "nn"            f"AVAILABLE TOOLS:n{self._tool_prompt()}nn"            f"RETRIEVED MEMORY (may be relevant):n{pack(retrieved) if retrieved else '(none)'}nn"            f"RECENT CONTEXT:n{pack(recent) if recent else '(none)'}n"        )      def _react_step_prompt(self, task: str, scratch: str) -> str:        ctx = self._format_context(task)        return textwrap.dedent(f"""        You are an expert tool-using agent.        Use the following JSON-only protocol (no extra text):        {{          "action": "tool" | "final",          "tool_name": "name" (if action=tool),          "tool_args": {{...}} (if action=tool),          "final": "answer" (if action=final)        }}          Rules:        - If a tool is needed, pick ONE tool call per step.        - Keep args strictly matching the tool schema.        - If you can answer directly, output action="final".        - Output valid JSON only.          {ctx}          SCRATCHPAD (internal notes, may be incomplete):        {scratch}        """ ).strip()      def run(self, task: str, verbose: bool = True) -> str:        scratch = ""        self.memory.add("user", task)          for step in range(1, self.config.planner.max_steps + 1):            prompt = self._react_step_prompt(task, scratch)            raw = self.llm.generate(prompt, max_new_tokens=256, temperature=self.config.planner.temperature)              m = re.search(r"{.*}", raw, re.DOTALL)            raw_json = m.group(0).strip() if m else raw.strip()              try:                action = json.loads(raw_json)            except Exception:                final = f"(Parser fallback) I couldn't parse a tool plan. Here is what I can do:n- Clarify your goaln- Use available tools: {[t.name for t in self.config.tools]}nRaw model output:n{raw}"                self.memory.add("assistant", final)                return final              if verbose:                print(f"n--- Step {step}/{self.config.planner.max_steps} ---")                print("Model JSON:", json.dumps(action, indent=2))              if action.get("action") == "tool":                name = action.get("tool_name", "")                args = action.get("tool_args", {}) or {}                res = self.tools.call(name, args)                if verbose:                    print(f"Tool call: {name}({args})")                    print("Tool ok:", res.ok)                    print("Tool output:n", res.output[:2000])                  scratch += f"n[tool:{name}] args={args}nresult_ok={res.ok}nresult={res.output}n"                self.memory.add("tool", f"{name} args={args}n{res.output}")                  if not res.ok:                    scratch += "nNOTE: tool failed; consider alternative approach or ask for missing input.n"              elif action.get("action") == "final":                final = action.get("final", "").strip()                if not final:                    final = "I’m missing the final answer text. Please restate the task or provide more details."                self.memory.add("assistant", final)                return final            else:                final = f"Unknown action type in model output: {action}"                self.memory.add("assistant", final)                return final          final = "Reached max steps without a final answer. Provide missing inputs or simplify the request."        self.memory.add("assistant", final)        return final

We implement the core AgentRuntime that executes the designed agent configuration. We construct the structured ReAct-style prompting loop, enforce a strict JSON-based tool-calling protocol, and integrate memory retrieval into reasoning. We manage iterative use of tools, scratchpad updates, and controlled final answer generation.

class MetaAgent:    def __init__(self, llm: Optional[LocalLLM] = None):        self.llm = llm or LocalLLM()      def _capability_heuristics(self, task: str) -> Dict[str, Any]:        t = task.lower()          needs_data = any(k in t for k in ["csv", "dataframe", "pandas", "dataset", "table", "excel"])        needs_math = any(k in t for k in ["calculate", "compute", "probability", "equation", "optimize", "derivative", "integral"])        needs_writing = any(k in t for k in ["write", "draft", "email", "cover letter", "proposal", "summarize", "rewrite"])        needs_analysis = any(k in t for k in ["analyze", "insights", "trend", "compare", "benchmark"])        needs_memory = any(k in t for k in ["long", "multi-step", "remember", "plan", "workflow", "pipeline"])          return {            "needs_data": needs_data,            "needs_math": needs_math,            "needs_writing": needs_writing,            "needs_analysis": needs_analysis,            "needs_memory": needs_memory,        }      def design(self, task_description: str) -> AgentConfig:        caps = self._capability_heuristics(task_description)        tools = default_tool_registry()          selected: List[ToolSpec] = []        selected.append(ToolSpec(            name="calc",            description="Evaluate a safe mathematical expression (no arbitrary code).",            inputs_schema={"type":"object","properties":{"expression":{"type":"string"}}, "required":["expression"]}        ))        selected.append(ToolSpec(            name="text_stats",            description="Compute basic statistics about a text blob (words, lines, unique words).",            inputs_schema={"type":"object","properties":{"text":{"type":"string"}}, "required":["text"]}        ))        if caps["needs_data"]:            selected.append(ToolSpec(                name="csv_profile",                description="Load a CSV from a local path and print a quick profile (head, describe).",                inputs_schema={"type":"object","properties":{"path":{"type":"string"},"n_rows":{"type":"integer"}}, "required":["path"]}            ))          if caps["needs_memory"] or caps["needs_analysis"] or caps["needs_data"]:            mem = MemorySpec(kind="retrieval_tfidf", max_items=250, retrieval_k=6)        else:            mem = MemorySpec(kind="scratchpad", max_items=120, retrieval_k=5)          if caps["needs_analysis"] or caps["needs_data"] or caps["needs_memory"]:            planner = PlannerSpec(kind="react", max_steps=12, temperature=0.2)        else:            planner = PlannerSpec(kind="react", max_steps=8, temperature=0.2)          objective = "Solve the user task with tool use when helpful; produce a clean final response."        cfg = AgentConfig(            agent_name="AutoDesignedAgent",            objective=objective,            planner=planner,            memory=mem,            tools=selected,            output_style="concise",        )          for ts in selected:            if not tools.has(ts.name):                raise RuntimeError(f"Tool selected but not registered: {ts.name}")          return cfg      def instantiate(self, cfg: AgentConfig) -> AgentRuntime:        tools = default_tool_registry()        if cfg.memory.kind == "retrieval_tfidf":            mem = TfidfRetrievalMemory(max_items=cfg.memory.max_items, retrieval_k=cfg.memory.retrieval_k)        else:            mem = ScratchpadMemory(max_items=cfg.memory.max_items)        return AgentRuntime(config=cfg, llm=self.llm, tools=tools, memory=mem)      def evaluate(self, task: str, answer: str) -> Dict[str, Any]:        a = (answer or "").strip().lower()        flags = {            "empty": len(a) == 0,            "generic": any(p in a for p in ["i can't", "cannot", "missing", "provide more details", "parser fallback"]),            "mentions_max_steps": "max steps" in a,        }        score = 1.0        if flags["empty"]: score -= 0.6        if flags["generic"]: score -= 0.25        if flags["mentions_max_steps"]: score -= 0.2        score = max(0.0, min(1.0, score))        return {"score": score, "flags": flags}      def refine(self, cfg: AgentConfig, eval_report: Dict[str, Any], task: str) -> AgentConfig:        new_cfg = cfg.model_copy(deep=True)          if eval_report["flags"]["generic"] or eval_report["flags"]["mentions_max_steps"]:            new_cfg.planner.max_steps = min(18, new_cfg.planner.max_steps + 6)            new_cfg.planner.temperature = min(0.35, new_cfg.planner.temperature + 0.05)            if new_cfg.memory.kind != "retrieval_tfidf":                new_cfg.memory.kind = "retrieval_tfidf"                new_cfg.memory.max_items = max(new_cfg.memory.max_items, 200)                new_cfg.memory.retrieval_k = max(new_cfg.memory.retrieval_k, 6)          t = task.lower()        if any(k in t for k in ["csv", "dataframe", "pandas", "dataset", "table"]):            if not any(ts.name == "csv_profile" for ts in new_cfg.tools):                new_cfg.tools.append(ToolSpec(                    name="csv_profile",                    description="Load a CSV from a local path and print a quick profile (head, describe).",                    inputs_schema={"type":"object","properties":{"path":{"type":"string"},"n_rows":{"type":"integer"}}, "required":["path"]}                ))          return new_cfg      def build_and_run(self, task: str, improve_rounds: int = 1, verbose: bool = True) -> Tuple[str, AgentConfig]:        cfg = self.design(task)        agent = self.instantiate(cfg)          if verbose:            print("n==============================")            print("META-AGENT: DESIGNED CONFIG")            print("==============================")            print(cfg.model_dump_json(indent=2))          ans = agent.run(task, verbose=verbose)        report = self.evaluate(task, ans)          if verbose:            print("n==============================")            print("EVALUATION REPORT")            print("==============================")            print(json.dumps(report, indent=2))            print("n==============================")            print("FINAL ANSWER")            print("==============================")            print(ans)          for r in range(improve_rounds):            if report["score"] >= 0.85:                break            cfg = self.refine(cfg, report, task)            agent = self.instantiate(cfg)            if verbose:                print(f"nn==============================")                print(f"SELF-IMPROVEMENT ROUND {r+1}: UPDATED CONFIG")                print("==============================")                print(cfg.model_dump_json(indent=2))            ans = agent.run(task, verbose=verbose)            report = self.evaluate(task, ans)            if verbose:                print("nEVAL:", json.dumps(report, indent=2))                print("nANSWER:n", ans)          return ans, cfg   meta = MetaAgent()   examples = [    "Design an agent workflow to summarize a long meeting transcript and extract action items. Keep it concise.",    "I have a local CSV at /content/sample.csv. Profile it and tell me the top 3 insights.",    "Compute the monthly payment for a $12,000 loan at 8% APR over 36 months. Show the formula briefly.", ]   print("n==============================") print("RUNNING A QUICK DEMO TASK") print("==============================") demo_task = examples[2] _ = meta.build_and_run(demo_task, improve_rounds=1, verbose=True)

We implement MetaAgent, which analyzes tasks, designs agent configurations, instantiates runtimes, evaluates performance, and refines the architecture as needed. We apply capability heuristics to dynamically choose tools, memory strategy, and planner depth. We then demonstrate the full build-and-run pipeline, including optional self-improvement, to complete the automated agent design lifecycle.

In conclusion, we demonstrated how a Meta-Agent can move from passive task execution to active architecture construction. We designed agents programmatically, instantiated them automatically, evaluated their outputs, and refined their configurations through a self-improvement loop. We showed that agentic systems can reason not only about tasks but also about their own structure, capabilities, and limitations. This approach pushes us toward self-evolving AI systems in which the architecture becomes adaptive, automated, and increasingly autonomous, bringing us closer to fully self-designing agent ecosystems.


Check out Full Codes hereAlso, feel free to follow us on Twitter and don’t forget to join our 120k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Michal Sutter is a data science professional with a Master of Science in Data Science from the University of Padova. With a solid foundation in statistical analysis, machine learning, and data engineering, Michal excels at transforming complex datasets into actionable insights.