Members-Only
Recent Talks & Demos are for members only
You must be an AI Tinkerers active member to view these talks and demos.
Maestro: MCP Separation of Concerns
Explore Maestro's architecture, using MCP for separation of concerns. See live demos of Pandini, HQIQ Avatar Generator, and Locales, and learn lessons in building modular AI.
Independent library plus MCP-exposed packages, and the orchestration layer that composes them into Maestro - the voice-first generative UI featured in last week’s AI Tinkerers Post-Training newsletter.
I’ll walk through each package live, with the codebase open:
-
- Pandini: a browser-API-only screenshot compressor I built to stop blowing up my Claude Code context windows. 60 MB screenshots collapse to under 400KB with no quality loss, no install, no API. The “why this exists” alone is a useful builder takeaway about working with coding agents at scale.
-
- HQIQ Avatar Generator: full avatar lifecycle as a package: text-to-image via Replicate, character sheet generation, original voice, wardrobe, YAML manifest output. Exposed via Fast MCP so Maestro can invoke avatar creation as a tool at build time or runtime.
-
- HQIQ Avatar Locales: paired demo with the Avatar Generator. Same separation-of-concerns pattern: this package’s only job is making any avatar speak any language. Skill-MD-driven, MCP-exposed, fires on demand. I’ll add a new Arabic locale live during the talk and have Fatima speak it within minutes - no code change, no redeploy.
HQIQ deploys multilingual, 24/7 AI concierges across diverse industries.
- MCPMCP is the open-source standard for securely connecting AI agents (like LLMs) to external tools, data, and enterprise workflows.The Model Context Protocol (MCP) functions as a standardized integration layer: think of it as a USB-C port for AI applications. Developed and open-sourced by Anthropic, this protocol allows large language models (LLMs) to access real-time context and execute actions via external tools like GitHub, Jira, or proprietary databases . It uses a simple JSON-RPC interface to define tools, schemas, and endpoints, which enables AI agents to perform complex, state-changing tasks—such as creating a GitHub issue or running a test script—rather than just generating text . MCP is essential for building agentic AI systems that can autonomously pursue goals and operate within defined safety and permission boundaries .
- Fast MCPFastMCP is a high-performance Python framework designed to build Model Context Protocol (MCP) servers and clients with minimal boilerplate.FastMCP streamlines how developers connect LLMs to external tools and data sources. Developed by Prefect, this framework abstracts the low-level complexities of the Model Context Protocol (MCP) by auto-generating schemas, validation, and documentation directly from standard Python functions (using simple decorators like @mcp.tool). It manages transport negotiation, client sessions, and protocol lifecycles out of the box, allowing teams to focus entirely on core application logic. Whether you are building local developer tools or deploying production-grade agentic workflows, FastMCP provides the clean, Pythonic architecture needed to scale LLM capabilities safely and quickly.
- LiveKit AgentsBuild low-latency, conversational AI agents that see, hear, and speak using a unified framework for multimodal interaction.LiveKit Agents provides the infrastructure to deploy real-time AI with sub-250ms latency. It integrates directly with OpenAI (GPT-4o), Deepgram, and ElevenLabs, handling the complex orchestration of STT, LLM, and TTS pipelines through a single Python or Node.js SDK. The framework manages room-based signaling and media transport automatically, allowing developers to focus on logic rather than WebRTC internals. Whether you are building a voice-driven customer assistant or a vision-capable tutor, the system scales horizontally to support thousands of concurrent sessions across global edge nodes.
- OpenAI RealtimeLow-latency multimodal API for building seamless speech-to-speech agent experiences.OpenAI Realtime enables developers to bypass fragile text-to-speech pipelines by streaming audio directly through the GPT-4o model. It maintains sub-second response times (typically under 500ms) and preserves emotional inflection that traditional RAG systems lose. By utilizing the WebSocket protocol, the API handles simultaneous audio input and output, allowing for natural interruptions and fluid human-like pacing in voice assistants and customer support bots.
- Replicate APIRun open-source machine learning models with a single cloud API call, bypassing all infrastructure management.Replicate API lets you execute open-source AI models (like FLUX, SDXL, and Llama) using a clean HTTP interface. You authenticate with a standard bearer token, pass your inputs as a JSON payload, and receive outputs directly or via webhooks. By handling the underlying GPU provisioning, scaling, and cold starts automatically, it lets you integrate advanced generation and analysis capabilities into your codebase with just a few lines of Python, JavaScript, or cURL.