Source: MarkTechPost
In this tutorial, we build a Meta-Agent that designs other agents automatically from a simple task description. We implement a system that analyzes the task, selects tools, chooses a memory architecture, configures a planner, and then instantiates a fully working agent runtime. We go beyond static agent templates and instead build a dynamic, self-configuring architecture that can evaluate its own performance and refine itself as needed. We also demonstrate how agent design automation, tool selection, memory strategy, and iterative self-improvement can be unified into a cohesive, Colab-ready framework.
import os, re, json, math, time, textwrap, traceback, random from dataclasses import dataclass from typing import Any, Dict, List, Optional, Callable, Tuple def _pip_install(): try: import pydantic import transformers return except Exception: pass import sys, subprocess pkgs = [ "pydantic>=2.6.0", "transformers>=4.41.0", "accelerate>=0.30.0", "sentencepiece", "torch", "numpy", "scikit-learn", "pandas", ] subprocess.check_call([sys.executable, "-m", "pip", "install", "-q"] + pkgs) _pip_install() import numpy as np import pandas as pd from pydantic import BaseModel, Field from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.neighbors import NearestNeighbors try: from transformers import pipeline, AutoTokenizer, AutoModelForSeq2SeqLM _HAS_TRANSFORMERS = True except Exception: _HAS_TRANSFORMERS = False class ToolSpec(BaseModel): name: str description: str inputs_schema: Dict[str, Any] = Field(default_factory=dict) class MemorySpec(BaseModel): kind: str = Field(default="scratchpad", description="scratchpad | retrieval_tfidf") max_items: int = 200 retrieval_k: int = 5 class PlannerSpec(BaseModel): kind: str = Field(default="react", description="react | plan_execute") max_steps: int = 10 temperature: float = 0.2 class AgentConfig(BaseModel): agent_name: str = "DesignedAgent" objective: str planner: PlannerSpec memory: MemorySpec tools: List[ToolSpec] output_style: str = "concise" safety_rules: List[str] = Field(default_factory=lambda: [ "Do not execute arbitrary OS commands.", "Refuse harmful/illegal instructions; suggest safe alternatives.", "If uncertain, ask for missing inputs or state assumptions.", ])
We set up the complete foundational environment for our meta-agent system. We install required dependencies, import all necessary libraries, and define the core configuration schemas using Pydantic. We formalize structured specifications for tools, memory, planner, and the overall agent configuration to enable typed, automated agent construction.
class LocalLLM: def __init__(self, model_name: str = "google/flan-t5-small", device: int = -1): self.model_name = model_name self.device = device self._pipe = None self._fallback = False if not _HAS_TRANSFORMERS: self._fallback = True return try: tok = AutoTokenizer.from_pretrained(model_name) mdl = AutoModelForSeq2SeqLM.from_pretrained(model_name) self._pipe = pipeline( "text2text-generation", model=mdl, tokenizer=tok, device=device, ) except Exception: self._fallback = True def generate(self, prompt: str, max_new_tokens: int = 256, temperature: float = 0.2) -> str: if self._fallback or self._pipe is None: return self._heuristic(prompt) try: out = self._pipe( prompt, max_new_tokens=max_new_tokens, do_sample=temperature > 0, temperature=max(temperature, 1e-6), num_return_sequences=1, )[0]["generated_text"] return out.strip() except Exception: return self._heuristic(prompt) def _heuristic(self, prompt: str) -> str: p = prompt.lower() if "tool" in p and "json" in p: return '{"action":"final","final":"(fallback) I can’t load the model. Provide more details or enable internet in Colab to download the model."}' return "(fallback) I can’t load the model. Please ensure Colab has internet access and retry." class ScratchpadMemory: def __init__(self, max_items: int = 200): self.max_items = max_items self.items: List[Dict[str, str]] = [] def add(self, role: str, content: str): self.items.append({"role": role, "content": content}) if len(self.items) > self.max_items: self.items = self.items[-self.max_items:] def recent(self, n: int = 12) -> List[Dict[str, str]]: return self.items[-n:] def retrieve(self, query: str, k: int = 5) -> List[Dict[str, str]]: return self.recent(k) class TfidfRetrievalMemory: def __init__(self, max_items: int = 200, retrieval_k: int = 5): self.max_items = max_items self.retrieval_k = retrieval_k self.items: List[Dict[str, str]] = [] self._vectorizer = TfidfVectorizer(stop_words="english") self._nn = None self._X = None def add(self, role: str, content: str): self.items.append({"role": role, "content": content}) if len(self.items) > self.max_items: self.items = self.items[-self.max_items:] self._rebuild_index() def _rebuild_index(self): docs = [it["content"] for it in self.items] or [""] self._X = self._vectorizer.fit_transform(docs) n_neighbors = min(self.retrieval_k, self._X.shape[0]) self._nn = NearestNeighbors(n_neighbors=n_neighbors, metric="cosine") self._nn.fit(self._X) def recent(self, n: int = 12) -> List[Dict[str, str]]: return self.items[-n:] def retrieve(self, query: str, k: Optional[int] = None) -> List[Dict[str, str]]: if not self.items: return [] if self._nn is None: self._rebuild_index() k = k or self.retrieval_k q = self._vectorizer.transform([query]) n_neighbors = min(k, self._X.shape[0]) dists, idx = self._nn.kneighbors(q, n_neighbors=n_neighbors) hits = [self.items[i] for i in idx[0].tolist()] return hits
We implement the LocalLLM wrapper that powers reasoning and tool-selection behavior. We configure a lightweight open-source model with a safe fallback mechanism to ensure robustness in Colab. We also define both scratchpad and retrieval-based memory systems to support contextual and semantic recall.
class ToolResult(BaseModel): ok: bool output: str data: Optional[Any] = None class Tool: def __init__(self, name: str, description: str, fn: Callable[..., ToolResult], inputs_schema: Dict[str, Any]): self.name = name self.description = description self.fn = fn self.inputs_schema = inputs_schema def call(self, **kwargs) -> ToolResult: try: return self.fn(**kwargs) except Exception as e: return ToolResult(ok=False, output=f"Tool error: {e}n{traceback.format_exc()}") class ToolRegistry: def __init__(self): self._tools: Dict[str, Tool] = {} def register(self, tool: Tool): self._tools[tool.name] = tool def has(self, name: str) -> bool: return name in self._tools def specs(self) -> List[ToolSpec]: return [ ToolSpec(name=t.name, description=t.description, inputs_schema=t.inputs_schema) for t in self._tools.values() ] def call(self, name: str, args: Dict[str, Any]) -> ToolResult: if name not in self._tools: return ToolResult(ok=False, output=f"Unknown tool: {name}") return self._tools[name].call(**args) _ALLOWED_MATH = { "abs": abs, "round": round, "min": min, "max": max, "sqrt": math.sqrt, "log": math.log, "exp": math.exp, "sin": math.sin, "cos": math.cos, "tan": math.tan, "pi": math.pi, "e": math.e } def tool_calc(expression: str) -> ToolResult: expr = expression.strip() if not expr: return ToolResult(ok=False, output="Empty expression.") if re.search(r"[A-Za-z_]w*", expr): names = set(re.findall(r"[A-Za-z_]w*", expr)) bad = [n for n in names if n not in _ALLOWED_MATH] if bad: return ToolResult(ok=False, output=f"Disallowed names in expression: {bad}") if re.search(r"__|import|exec|eval|open|os.|sys.", expr): return ToolResult(ok=False, output="Disallowed tokens in expression.") try: val = eval(expr, {"__builtins__": {}}, dict(_ALLOWED_MATH)) return ToolResult(ok=True, output=str(val), data=val) except Exception as e: return ToolResult(ok=False, output=f"Failed to evaluate: {e}") def tool_text_stats(text: str) -> ToolResult: s = text or "" words = re.findall(r"w+", s) lines = s.splitlines() if s else [] out = { "chars": len(s), "words": len(words), "lines": len(lines), "unique_words": len(set(w.lower() for w in words)), } return ToolResult(ok=True, output=json.dumps(out, indent=2), data=out) def tool_csv_profile(path: str, n_rows: int = 5) -> ToolResult: try: df = pd.read_csv(path) except Exception as e: return ToolResult(ok=False, output=f"Could not read CSV: {e}") head = df.head(n_rows) desc = df.describe(include="all").transpose().head(30) out = ( f"Shape: {df.shape}nn" f"Columns: {list(df.columns)}nn" f"Head({n_rows}):n{head}nn" f"Describe(top 30 cols):n{desc}n" ) return ToolResult(ok=True, output=out, data={"shape": df.shape, "columns": list(df.columns)}) def default_tool_registry() -> ToolRegistry: reg = ToolRegistry() reg.register(Tool( name="calc", description="Evaluate a safe mathematical expression (no arbitrary code).", fn=lambda expression: tool_calc(expression), inputs_schema={"type":"object","properties":{"expression":{"type":"string"}}, "required":["expression"]} )) reg.register(Tool( name="text_stats", description="Compute basic statistics about a text blob (words, lines, unique words).", fn=lambda text: tool_text_stats(text), inputs_schema={"type":"object","properties":{"text":{"type":"string"}}, "required":["text"]} )) reg.register(Tool( name="csv_profile", description="Load a CSV from a local path and print a quick profile (head, describe).", fn=lambda path, n_rows=5: tool_csv_profile(path, n_rows), inputs_schema={"type":"object","properties":{"path":{"type":"string"},"n_rows":{"type":"integer"}}, "required":["path"]} )) return reg
We build the full tool infrastructure including tool registration, safe execution, and structured outputs. We implement secure mathematical evaluation, text statistics analysis, and CSV profiling capabilities. We design the ToolRegistry abstraction to allow the meta-agent to dynamically select and invoke tools during runtime.
class AgentRuntime: def __init__(self, config: AgentConfig, llm: LocalLLM, tools: ToolRegistry, memory): self.config = config self.llm = llm self.tools = tools self.memory = memory def _tool_prompt(self) -> str: specs = self.config.tools lines = [] for t in specs: lines.append(f"- {t.name}: {t.description} | inputs_schema={json.dumps(t.inputs_schema)}") return "n".join(lines) def _format_context(self, task: str) -> str: retrieved = self.memory.retrieve(task, k=getattr(self.config.memory, "retrieval_k", 5)) recent = self.memory.recent(8) def pack(items): return "n".join([f"[{it['role']}] {it['content']}" for it in items]) return ( f"OBJECTIVE:n{self.config.objective}nn" f"TASK:n{task}nn" f"SAFETY RULES:n- " + "n- ".join(self.config.safety_rules) + "nn" f"AVAILABLE TOOLS:n{self._tool_prompt()}nn" f"RETRIEVED MEMORY (may be relevant):n{pack(retrieved) if retrieved else '(none)'}nn" f"RECENT CONTEXT:n{pack(recent) if recent else '(none)'}n" ) def _react_step_prompt(self, task: str, scratch: str) -> str: ctx = self._format_context(task) return textwrap.dedent(f""" You are an expert tool-using agent. Use the following JSON-only protocol (no extra text): {{ "action": "tool" | "final", "tool_name": "name" (if action=tool), "tool_args": {{...}} (if action=tool), "final": "answer" (if action=final) }} Rules: - If a tool is needed, pick ONE tool call per step. - Keep args strictly matching the tool schema. - If you can answer directly, output action="final". - Output valid JSON only. {ctx} SCRATCHPAD (internal notes, may be incomplete): {scratch} """ ).strip() def run(self, task: str, verbose: bool = True) -> str: scratch = "" self.memory.add("user", task) for step in range(1, self.config.planner.max_steps + 1): prompt = self._react_step_prompt(task, scratch) raw = self.llm.generate(prompt, max_new_tokens=256, temperature=self.config.planner.temperature) m = re.search(r"{.*}", raw, re.DOTALL) raw_json = m.group(0).strip() if m else raw.strip() try: action = json.loads(raw_json) except Exception: final = f"(Parser fallback) I couldn't parse a tool plan. Here is what I can do:n- Clarify your goaln- Use available tools: {[t.name for t in self.config.tools]}nRaw model output:n{raw}" self.memory.add("assistant", final) return final if verbose: print(f"n--- Step {step}/{self.config.planner.max_steps} ---") print("Model JSON:", json.dumps(action, indent=2)) if action.get("action") == "tool": name = action.get("tool_name", "") args = action.get("tool_args", {}) or {} res = self.tools.call(name, args) if verbose: print(f"Tool call: {name}({args})") print("Tool ok:", res.ok) print("Tool output:n", res.output[:2000]) scratch += f"n[tool:{name}] args={args}nresult_ok={res.ok}nresult={res.output}n" self.memory.add("tool", f"{name} args={args}n{res.output}") if not res.ok: scratch += "nNOTE: tool failed; consider alternative approach or ask for missing input.n" elif action.get("action") == "final": final = action.get("final", "").strip() if not final: final = "I’m missing the final answer text. Please restate the task or provide more details." self.memory.add("assistant", final) return final else: final = f"Unknown action type in model output: {action}" self.memory.add("assistant", final) return final final = "Reached max steps without a final answer. Provide missing inputs or simplify the request." self.memory.add("assistant", final) return final
We implement the core AgentRuntime that executes the designed agent configuration. We construct the structured ReAct-style prompting loop, enforce a strict JSON-based tool-calling protocol, and integrate memory retrieval into reasoning. We manage iterative use of tools, scratchpad updates, and controlled final answer generation.
class MetaAgent: def __init__(self, llm: Optional[LocalLLM] = None): self.llm = llm or LocalLLM() def _capability_heuristics(self, task: str) -> Dict[str, Any]: t = task.lower() needs_data = any(k in t for k in ["csv", "dataframe", "pandas", "dataset", "table", "excel"]) needs_math = any(k in t for k in ["calculate", "compute", "probability", "equation", "optimize", "derivative", "integral"]) needs_writing = any(k in t for k in ["write", "draft", "email", "cover letter", "proposal", "summarize", "rewrite"]) needs_analysis = any(k in t for k in ["analyze", "insights", "trend", "compare", "benchmark"]) needs_memory = any(k in t for k in ["long", "multi-step", "remember", "plan", "workflow", "pipeline"]) return { "needs_data": needs_data, "needs_math": needs_math, "needs_writing": needs_writing, "needs_analysis": needs_analysis, "needs_memory": needs_memory, } def design(self, task_description: str) -> AgentConfig: caps = self._capability_heuristics(task_description) tools = default_tool_registry() selected: List[ToolSpec] = [] selected.append(ToolSpec( name="calc", description="Evaluate a safe mathematical expression (no arbitrary code).", inputs_schema={"type":"object","properties":{"expression":{"type":"string"}}, "required":["expression"]} )) selected.append(ToolSpec( name="text_stats", description="Compute basic statistics about a text blob (words, lines, unique words).", inputs_schema={"type":"object","properties":{"text":{"type":"string"}}, "required":["text"]} )) if caps["needs_data"]: selected.append(ToolSpec( name="csv_profile", description="Load a CSV from a local path and print a quick profile (head, describe).", inputs_schema={"type":"object","properties":{"path":{"type":"string"},"n_rows":{"type":"integer"}}, "required":["path"]} )) if caps["needs_memory"] or caps["needs_analysis"] or caps["needs_data"]: mem = MemorySpec(kind="retrieval_tfidf", max_items=250, retrieval_k=6) else: mem = MemorySpec(kind="scratchpad", max_items=120, retrieval_k=5) if caps["needs_analysis"] or caps["needs_data"] or caps["needs_memory"]: planner = PlannerSpec(kind="react", max_steps=12, temperature=0.2) else: planner = PlannerSpec(kind="react", max_steps=8, temperature=0.2) objective = "Solve the user task with tool use when helpful; produce a clean final response." cfg = AgentConfig( agent_name="AutoDesignedAgent", objective=objective, planner=planner, memory=mem, tools=selected, output_style="concise", ) for ts in selected: if not tools.has(ts.name): raise RuntimeError(f"Tool selected but not registered: {ts.name}") return cfg def instantiate(self, cfg: AgentConfig) -> AgentRuntime: tools = default_tool_registry() if cfg.memory.kind == "retrieval_tfidf": mem = TfidfRetrievalMemory(max_items=cfg.memory.max_items, retrieval_k=cfg.memory.retrieval_k) else: mem = ScratchpadMemory(max_items=cfg.memory.max_items) return AgentRuntime(config=cfg, llm=self.llm, tools=tools, memory=mem) def evaluate(self, task: str, answer: str) -> Dict[str, Any]: a = (answer or "").strip().lower() flags = { "empty": len(a) == 0, "generic": any(p in a for p in ["i can't", "cannot", "missing", "provide more details", "parser fallback"]), "mentions_max_steps": "max steps" in a, } score = 1.0 if flags["empty"]: score -= 0.6 if flags["generic"]: score -= 0.25 if flags["mentions_max_steps"]: score -= 0.2 score = max(0.0, min(1.0, score)) return {"score": score, "flags": flags} def refine(self, cfg: AgentConfig, eval_report: Dict[str, Any], task: str) -> AgentConfig: new_cfg = cfg.model_copy(deep=True) if eval_report["flags"]["generic"] or eval_report["flags"]["mentions_max_steps"]: new_cfg.planner.max_steps = min(18, new_cfg.planner.max_steps + 6) new_cfg.planner.temperature = min(0.35, new_cfg.planner.temperature + 0.05) if new_cfg.memory.kind != "retrieval_tfidf": new_cfg.memory.kind = "retrieval_tfidf" new_cfg.memory.max_items = max(new_cfg.memory.max_items, 200) new_cfg.memory.retrieval_k = max(new_cfg.memory.retrieval_k, 6) t = task.lower() if any(k in t for k in ["csv", "dataframe", "pandas", "dataset", "table"]): if not any(ts.name == "csv_profile" for ts in new_cfg.tools): new_cfg.tools.append(ToolSpec( name="csv_profile", description="Load a CSV from a local path and print a quick profile (head, describe).", inputs_schema={"type":"object","properties":{"path":{"type":"string"},"n_rows":{"type":"integer"}}, "required":["path"]} )) return new_cfg def build_and_run(self, task: str, improve_rounds: int = 1, verbose: bool = True) -> Tuple[str, AgentConfig]: cfg = self.design(task) agent = self.instantiate(cfg) if verbose: print("n==============================") print("META-AGENT: DESIGNED CONFIG") print("==============================") print(cfg.model_dump_json(indent=2)) ans = agent.run(task, verbose=verbose) report = self.evaluate(task, ans) if verbose: print("n==============================") print("EVALUATION REPORT") print("==============================") print(json.dumps(report, indent=2)) print("n==============================") print("FINAL ANSWER") print("==============================") print(ans) for r in range(improve_rounds): if report["score"] >= 0.85: break cfg = self.refine(cfg, report, task) agent = self.instantiate(cfg) if verbose: print(f"nn==============================") print(f"SELF-IMPROVEMENT ROUND {r+1}: UPDATED CONFIG") print("==============================") print(cfg.model_dump_json(indent=2)) ans = agent.run(task, verbose=verbose) report = self.evaluate(task, ans) if verbose: print("nEVAL:", json.dumps(report, indent=2)) print("nANSWER:n", ans) return ans, cfg meta = MetaAgent() examples = [ "Design an agent workflow to summarize a long meeting transcript and extract action items. Keep it concise.", "I have a local CSV at /content/sample.csv. Profile it and tell me the top 3 insights.", "Compute the monthly payment for a $12,000 loan at 8% APR over 36 months. Show the formula briefly.", ] print("n==============================") print("RUNNING A QUICK DEMO TASK") print("==============================") demo_task = examples[2] _ = meta.build_and_run(demo_task, improve_rounds=1, verbose=True)
We implement MetaAgent, which analyzes tasks, designs agent configurations, instantiates runtimes, evaluates performance, and refines the architecture as needed. We apply capability heuristics to dynamically choose tools, memory strategy, and planner depth. We then demonstrate the full build-and-run pipeline, including optional self-improvement, to complete the automated agent design lifecycle.
In conclusion, we demonstrated how a Meta-Agent can move from passive task execution to active architecture construction. We designed agents programmatically, instantiated them automatically, evaluated their outputs, and refined their configurations through a self-improvement loop. We showed that agentic systems can reason not only about tasks but also about their own structure, capabilities, and limitations. This approach pushes us toward self-evolving AI systems in which the architecture becomes adaptive, automated, and increasingly autonomous, bringing us closer to fully self-designing agent ecosystems.
Check out Full Codes here. Also, feel free to follow us on Twitter and don’t forget to join our 120k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
Michal Sutter
Michal Sutter is a data science professional with a Master of Science in Data Science from the University of Padova. With a solid foundation in statistical analysis, machine learning, and data engineering, Michal excels at transforming complex datasets into actionable insights.

