LIFEHUBBER
Theme

AI Resources

AI resources for curious browsing.

A selective list of notable AI models, tools, datasets, and experiments organized for useful browsing.

LifeHubber AI Resources is selective, not exhaustive. Use it as a starting point for browsing, not as an endorsement. Availability, access, usage limits, and terms can change, so check the original project materials before relying on a resource.

What this is

A selective reference

A selective collection of AI projects, tools, and resources organized for easier browsing.

How to use it

Browse by category

Each entry includes a short description, source label, and direct link to make scanning simpler.

Editorial approach

Selective, not exhaustive

The list leans toward open-ish, inspectable, or publicly documented projects, without treating every entry as open-source.

Browse the list

Explore by section or filter the page.

Showing 137 of 137 resources

Showing all resources

AI Models

Models and experiments

27

Arnis

louis-e/arnis

GitHub

Generates real-world locations inside Minecraft with a surprisingly high level of detail.

World generation, mapping

Command A+ W4A4

CohereLabs/command-a-plus-05-2026-w4a4

Hugging Face

A Cohere Labs Command A+ model variant with W4A4 quantization, positioned around agentic tool use, multimodal inputs, multilingual work, long context, and Cohere-reported lower hardware requirements.

Agentic, multimodal language model

DeepSeek-OCR-2

deepseek-ai/DeepSeek-OCR-2

GitHub

A newer DeepSeek OCR model release for image/PDF OCR, document-to-Markdown workflows, dynamic resolution, vLLM/Transformers inference, and visual causal flow research.

OCR, document understanding

DeepSeek-V4

deepseek-ai/deepseek-v4

Hugging Face

A DeepSeek model family release positioned around long-context intelligence, reasoning modes, coding benchmarks, and agentic task evaluation.

Reasoning models, long context

Gemma 4

google/gemma-4

Kaggle

A family of multimodal models from Google DeepMind that handle text and image input and generate text output.

Multimodal models

GLM-5.1

zai-org/GLM-5.1

Hugging Face

A flagship text-generation model positioned around agentic engineering, stronger coding performance, and longer-horizon tool use.

Agentic coding models

GLM-OCR

zai-org/GLM-OCR

GitHub

A multimodal OCR model for complex document understanding, positioned around strong real-world document parsing and efficient deployment.

OCR models, document understanding

Hy-MT1.5-1.8B-1.25bit

AngelSlim/Hy-MT1.5-1.8B-1.25bit

Hugging Face

A low-bit on-device translation model from AngelSlim, positioned around 33-language offline translation, GGUF access, Android demo use, and 1.25-bit compression.

On-device translation, model compression

Hy-MT2

Tencent-Hunyuan/Hy-MT2

GitHub

A Tencent-Hunyuan multilingual translation model family with 1.8B, 7B, and 30B-A3B variants, 33-language support, GGUF and FP8 options, IFMTBench, training notes, deployment guidance, and Tencent-reported translation results.

Multilingual translation models

Hy3 preview

tencent/Hy3-preview

Hugging Face

A Tencent Hy Team MoE model positioned around long-context reasoning, instruction following, coding, and agent task evaluation.

Reasoning models, coding agents

Kimi-K2.6

moonshotai/Kimi-K2.6

Hugging Face

A multimodal agentic model positioned around long-horizon coding, tool use, autonomous execution, and broader software workflows.

Agentic coding models

Lance

bytedance-research/Lance

Hugging Face

A ByteDance Research unified multimodal model for image and video understanding, generation, and editing, with model files, demos, inference scripts, Gradio setup, benchmark scripts, and a stated 40GB VRAM inference requirement.

Unified image and video model

LFM2.5-350M

LiquidAI/LFM2.5-350M

Hugging Face

A hybrid model in the LFM2.5 family built for on-device deployment, with extended pre-training and reinforcement learning.

On-device models

Ling-2.6-flash

inclusionAI/Ling-2.6-flash

Hugging Face

An inclusionAI instruct model positioned around faster responses, token efficiency, tool use, multi-step planning, and agent-oriented workloads.

Efficient agent model

LingBot-Map

robbyant/lingbot-map

GitHub

A feed-forward 3D foundation model for streaming scene reconstruction, positioned around geometric consistency, long sequences, and efficient real-time inference.

Streaming 3D reconstruction

Lyra

nv-tlabs/lyra

GitHub

A series of generative 3D world models from NVIDIA, positioned around explorable scenes, 3D consistency, and world-scale generation workflows.

3D world models

MiMo-V2.5

XiaomiMiMo/mimo-v25

Hugging Face

A Xiaomi MiMo model family positioned around multimodal understanding, agentic workflows, long-context use, and Pro variants for harder software and tool-heavy tasks.

Multimodal models, agentic workflows

MiniMax-M2.7

MiniMaxAI/MiniMax-M2.7

Hugging Face

A large MiniMax model focused on agentic work, software engineering, tool use, and complex productivity workflows.

Agentic models

MOSS-VL

OpenMOSS-Team/moss-vl

Hugging Face

An OpenMOSS vision-language family with Base and Instruct releases for image, video, OCR, and document understanding work.

Vision-language models, multimodal understanding

Nemotron-Labs-Diffusion-14B

nvidia/Nemotron-Labs-Diffusion-14B

Hugging Face

An NVIDIA 14B text-generation model from the Nemotron-Labs-Diffusion family, focused on switching between autoregressive, diffusion-style parallel decoding, and self-speculation for project-reported decoding efficiency gains.

Text generation, decoding efficiency

OLMoEarth

allenai/olmoearth

Hugging Face

An Ai2 remote-sensing foundation model family for satellite imagery and planetary-scale mapping, with v1.1 Base and BandExtractor models, model weights, training code, a technical report, and Ai2-reported lower compute cost.

Remote sensing, Earth observation

Qwen3.6-35B-A3B

Qwen/Qwen3.6-35B-A3B

Hugging Face

An open-weight multimodal model positioned around agentic coding, tool use, long-context work, and real-world software workflows.

Agentic coding models, long context

TIPS / TIPSv2

google-deepmind/tips

GitHub

A family of vision-language encoders from Google DeepMind, positioned around image-text pretraining, spatial awareness, and general-purpose multimodal applications.

Vision-language encoders, spatial understanding

TRELLIS.2

microsoft/TRELLIS.2

GitHub

A Microsoft 3D generation model for high-fidelity image-to-3D asset creation, using O-Voxel structured latents, PBR materials, inference code, and training tools.

3D generation, image-to-3D

Trinity-Large-Thinking

arcee-ai/trinity-large-thinking

Hugging Face

A model designed for coherent multi-turn behavior, clean tool use, constrained instruction following, and efficient serving at scale.

Reasoning models

WildDet3D

allenai/WildDet3D

GitHub

A promptable 3D detection system for real-world scenes, positioned around text, point, and box prompts for spatial perception workflows.

3D perception models

ZAYA1-8B

Zyphra/ZAYA1-8B

Hugging Face

A small Zyphra mixture-of-experts reasoning model with public weights, 760M active parameters, 8.4B total parameters, deployment notes, and project-reported math and coding evaluations.

Small MoE reasoning model

Speech Models

Speech input, output, and interaction

12

Cohere Transcribe

CohereLabs/cohere-transcribe-03-2026

Hugging Face

A 2B parameter automatic speech recognition model for audio-in, text-out transcription across 14 languages.

STT, ASR

Fish Audio S2 Pro

fishaudio/s2-pro

Hugging Face

A text-to-speech model with detailed control over prosody and emotional delivery.

TTS, expressive speech

KittenTTS

KittenML/KittenTTS

GitHub

A very small text-to-speech model designed to stay lightweight without feeling toy-like.

Compact TTS

Mega-ASR

xzf-thu/Mega-ASR

GitHub

A robust automatic speech recognition project for messy real-world audio, with code, inference and training paths, Hugging Face weights, a technical report, Voices-in-the-Wild-2M, and project-reported results across difficult acoustic scenarios.

Robust ASR, real-world audio

MiMo-V2.5-ASR

XiaomiMiMo/MiMo-V2.5-ASR

GitHub

A Xiaomi MiMo speech-recognition model focused on Mandarin, English, Chinese dialects, code-switched speech, noisy audio, songs, and multi-speaker transcription.

ASR, dialects, code-switching

MOSS-Audio

OpenMOSS/MOSS-Audio

GitHub

An audio-understanding model family for speech, sound, music, captioning, time-aware QA, ASR, and reasoning over real-world audio.

Audio understanding, ASR, reasoning

MOSS-TTS-Nano

OpenMOSS/MOSS-TTS-Nano

GitHub

A tiny multilingual speech generation model positioned for real-time TTS, CPU-friendly local use, and lightweight deployment.

TTS, realtime speech

PersonaPlex

NVIDIA/personaplex

GitHub

A real-time full-duplex speech-to-speech conversational model with persona control through role prompts and voice conditioning.

STS, conversational speech

sarashina2.2-tts

sbintuitions/sarashina2.2-tts

Hugging Face

A Japanese-centric text-to-speech system from SB Intuitions, with Japanese and English generation, style transfer, and zero-shot voice generation support.

Japanese TTS, voice generation

Supertonic

supertone-inc/supertonic

GitHub

An on-device multilingual text-to-speech system built around ONNX Runtime, with local inference, 31-language support, expression tags, model assets, and examples across browser, mobile, desktop, and edge runtimes.

On-device multilingual TTS

TADA

HumeAI/tada

Hugging Face

A speech-language model that aligns speech and text into a single synchronized stream.

Speech-language modeling

VoxCPM2

openbmb/VoxCPM2

Hugging Face

A multilingual text-to-speech model with voice design, controllable voice cloning, and streaming support.

TTS, voice cloning

Music / Image Gen Models

Generative media models

9

ACE-Step 1.5

ace-step/ACE-Step-1.5

GitHub

A local music generation model aimed at fast song creation on consumer hardware, with support across CUDA, AMD, Intel, Mac, and CPU setups.

Music generation

AniGen

VAST-AI-Research/AniGen

GitHub

A framework for generating animatable 3D assets from a single image, with mesh, skeleton, and skinning outputs for downstream animation and simulation workflows.

Animatable 3D asset generation

CoMoVi

IGL-HKUST/CoMoVi

GitHub

A framework for co-generating 3D human motion and realistic videos, with a focus on motion-conditioned video generation and training workflows.

Human motion and video generation

Fooocus

lllyasviel/Fooocus

GitHub

A local image-generation interface built around prompt-focused SDXL workflows, with Windows downloads, Colab access, inpainting, outpainting, image prompts, and presets.

Image generation UI

LongCat-Video-Avatar 1.5

meituan-longcat/LongCat-Video-Avatar-1.5

Hugging Face

A Meituan LongCat audio-driven avatar video model for single- and multi-person generation, with audio-text-to-video, audio-image-text-to-video, video continuation, model weights, GitHub quickstart, and project-reported evaluation materials.

Avatar video generation

LongLive

NVlabs/LongLive

GitHub

An NVIDIA Labs infrastructure codebase for long video generation, with LongLive 2.0 NVFP4 and parallel training/inference support, multi-shot generation, async decoding, model links, docs, configs, and project-reported FPS and VBench results.

Long video generation infrastructure

PersonaLive

GVCLab/PersonaLive

GitHub

A portrait image-animation framework for live-streaming-style video generation research, with offline and online inference, pretrained weights, a Web UI, and acceleration notes.

Portrait animation, video generation

Sana

NVlabs/Sana

GitHub

An NVIDIA Labs codebase for efficient high-resolution image and video generation, with Sana, Sana-1.5, Sana-Sprint, Sana-Video, training and inference pipelines, model zoo links, ComfyUI and diffusers paths, and newer world-model work.

Efficient image and video generation

ViMax

HKUDS/ViMax

GitHub

An agentic video-generation framework for turning ideas, scripts, or longer narratives into planned video workflows, with script generation, storyboards, shot planning, reference selection, consistency checks, and configurable chat, image, and video model providers.

Agentic video generation workflow

AI Agents

Agents and interfaces

50

AG2

ag2ai/ag2

GitHub

A Python framework for AI agents and multi-agent workflows, with conversable agents, orchestration patterns, tools, human-in-the-loop flows, code execution options, structured outputs, and an active v1.0 transition note.

Multi-agent framework

Agent-Reach

Panniantong/Agent-Reach

GitHub

A CLI that gives AI agents broader web reach across platforms like Twitter, Reddit, YouTube, GitHub, Bilibili, and XiaoHongShu without paid API usage.

Agent tooling, web access

AgentScope

agentscope-ai/agentscope

GitHub

An agent framework with core abstractions, visibility tooling, and built-in support for fine-tuning workflows.

Agent frameworks

AIPOCH Medical Research Skills

aipoch/medical-research-skills

GitHub

A curated library of medical research agent skills designed to support evidence review, protocol design, data analysis, and academic writing workflows.

Agent skills, medical research

Browser Use

browser-use/browser-use

GitHub

A browser automation framework for AI agents that can navigate websites, click elements, type into pages, use custom tools, and run browser tasks through code or CLI workflows.

Browser agents, website automation

cabinet

hilash/cabinet

GitHub

An AI-first knowledge base and workspace system with agents, memory, scheduled jobs, and local file-based storage.

Agent workspaces

CatchMe

HKUDS/CatchMe

GitHub

A lightweight, vectorless system for capturing a broader digital footprint as usable context.

Agent memory, context capture

Claude Code Game Studios

Donchitos/Claude-Code-Game-Studios

GitHub

A multi-agent game-development studio system for Claude Code, organized around specialized agents, workflow skills, hooks, rules, and templates.

Agent systems, game development

CodeGraph

colbymchenry/codegraph

GitHub

A local codebase knowledge graph for coding agents, with MCP-style tools, symbol relationships, call graphs, framework-aware routes, auto-sync, multi-agent setup paths, and project-reported savings in token use and tool calls.

Coding-agent codebase context

Composio

ComposioHQ/composio

GitHub

An agent tool-integration layer with Python and TypeScript SDKs, toolkits, authentication, sessions, triggers, tool search, and workbench features for connecting agents to external services.

Agent tools, authentication, integrations

CopilotKit

CopilotKit/CopilotKit

GitHub

A frontend stack for building agent-native applications with chat UI, generative UI, shared state, backend tool rendering, human-in-the-loop workflows, and React or Angular app paths.

Agent-native apps, generative UI

Cua

trycua/cua

GitHub

Infrastructure for computer-use agents, with sandboxes, SDKs, benchmarks, and model integrations for agents working across desktop environments.

Computer-use agents

DeepTutor

HKUDS/DeepTutor

GitHub

An agent-native personalized tutoring system with tutoring workflows, persistent memory, a web app, CLI access, and a broader learning-support architecture.

Education agents

DeerFlow

bytedance/deer-flow

GitHub

A ByteDance long-horizon agent harness for deep research, coding, file work, report generation, skills, sub-agents, memory, sandboxed execution, and message gateways.

Long-horizon agents, research workflows

Dify

langgenius/dify

GitHub

A visual platform for building agentic workflows and AI applications with workflow and chatflow builders, model-provider connections, RAG pipelines, tools, APIs, logs, and cloud or self-hosted paths.

Visual agentic workflow platform

Gemma 4 Browser Extension

nico-martin/gemma4-browser-extension

GitHub

An independent Chrome extension experiment for running an on-device browser agent with Transformers.js, WebGPU, Gemma 4, page RAG, tab tools, and semantic history search.

Browser agents, on-device AI

gitagent

open-gitagent/gitagent

GitHub

A framework-agnostic, git-native standard for defining and sharing AI agents.

Agent standards

Google Skills

google/skills

GitHub

A public Agent Skills repository for Google products and technologies, including Google Cloud, with installable skills for Gemini API in Agent Platform, cloud basics, onboarding, authentication, observability, and well-architected guidance.

Agent skills, Google Cloud workflows

goose

block/goose

GitHub

An on-machine AI agent for complex development work, including coding, execution, debugging, workflow orchestration, and API interaction.

Coding agents

Hindsight

vectorize-io/hindsight

GitHub

An agent memory system designed to help agents learn over time rather than only recall conversation history.

Agent memory, learning

holaOS

holaboss-ai/holaOS

GitHub

A beta agent-workspace environment for recurring AI work-streams, with living workspaces, memory, history, files, apps, dashboards, runtime state, and sub-agent coordination.

Agent workspaces, recurring AI work-streams

I-Agent

Intelligent-Internet/ii-agent

GitHub

An AI agent for practical work, built to be run, forked, and extended across solo, team, and internal-tooling use cases.

General-purpose agents

LiveKit Agents

livekit/agents

GitHub

A realtime framework for voice, video, and physical AI agents, with Python and Node.js paths, LiveKit room participants, WebRTC clients, telephony support, tools, testing, and deployment options.

Realtime voice and multimodal agents

LLM Wiki

nashsu/llm_wiki

GitHub

A cross-platform desktop app for turning documents into an LLM-maintained wiki, with source traceability, graph search, optional vector retrieval, local API access, and a companion agent skill.

Personal knowledge base, agent context

Mastra

mastra-ai/mastra

GitHub

A TypeScript framework for building AI agents and applications with model routing, workflows, human-in-the-loop steps, memory, tools, MCP servers, evals, and observability.

TypeScript agent framework

Mem0

mem0ai/mem0

GitHub

A memory layer for AI agents and assistants, with library, self-hosted, platform, SDK, CLI, cookbook, evaluation, and integration paths for persistent context.

Agent memory, persistent context

MiniMax Skills

MiniMax-AI/skills

GitHub

A development skills library for AI coding agents, with structured guidance across frontend, fullstack, Android, iOS, and shader work.

Agent skills, coding

MolmoWeb

allenai/molmoweb

GitHub

A multimodal web agent from Ai2 that can navigate browser tasks from natural-language instructions.

Web agents

nanobot

HKUDS/nanobot

GitHub

A lightweight personal AI agent project with packaged WebUI, goal tracking for longer tasks, image generation, provider presets, fallback models, plugin-style tools, channel integrations, and security hardening noted in its v0.2.0 release.

Personal agents, packaged WebUI

NanoClaw

qwibitai/nanoclaw

GitHub

A lightweight personal agent system that runs agents in isolated containers and connects them to messaging channels, memory, and scheduled jobs.

Personal agents, container isolation

NVIDIA AI-Q Blueprint

NVIDIA-AI-Blueprints/aiq

GitHub

An NVIDIA AI Blueprint for agentic research workflows, with shallow and deep research modes, citation-backed answers, YAML-configured tools and agents, CLI/web/async job paths, evaluation materials, and deployment assets.

Research agents, NVIDIA blueprint

NVIDIA Skills

NVIDIA/skills

GitHub

NVIDIA's public catalog of agent skills, framed around NVIDIA-verified skills, skill cards, scanning, signing, product-owned source repositories, and compatibility with the Agent Skills specification.

Agent skills, trust controls

Onyx

onyx-dot-app/onyx

GitHub

An application layer for LLMs with a self-hostable interface and capabilities like RAG, web search, code execution, file creation, and deep research.

Agent interfaces

OpenAgents

openagents-org/openagents

GitHub

A collaboration project centered on AI agent networks designed to work together across shared workflows.

Agent networks, collaboration

OpenAI Agents SDK

openai/openai-agents-python

GitHub

A lightweight framework for multi-agent workflows, with tools, handoffs, guardrails, sessions, tracing, sandbox agents, and realtime voice support.

Agent frameworks

OpenMAIC

THU-MAIC/OpenMAIC

GitHub

A multi-agent interactive classroom designed to offer an immersive learning experience with one-click setup.

Education agents, multi-agent learning

OpenSeeker

rui-ye/OpenSeeker

GitHub

A search agent system built around released training data, released models, and tool-based web information seeking.

Search agents, information seeking

OpenSpace

HKUDS/OpenSpace

GitHub

A framework focused on building agents that are smarter, lower-cost, and able to improve through self-evolving workflows.

Agent systems, self-evolving

Page Agent

alibaba/page-agent

GitHub

A JavaScript in-page GUI agent for controlling web interfaces with natural language, aimed at browser-based workflows and interface automation.

GUI agents, browser control

PageIndex

VectifyAI/PageIndex

GitHub

A vectorless, reasoning-based RAG framework for long-document retrieval, tree-structured indexing, traceable document search, and agent context workflows.

Vectorless RAG, agent context

Paperclip

paperclipai/paperclip

GitHub

A Node.js server and React UI for orchestrating teams of AI agents, assigning goals, and tracking work and costs from one dashboard.

Agent orchestration, dashboards

Pipecat

pipecat-ai/pipecat

GitHub

A Python framework and ecosystem for real-time voice and multimodal AI agents, with audio/video pipelines, transports, client SDKs, structured flows, and subagent support.

Voice agents, multimodal pipelines

RAGFlow

infiniflow/ragflow

GitHub

A practical RAG and agent-context platform for document ingestion, chunking, retrieval, citations, knowledge workflows, and self-hosted AI applications.

RAG, agent context

ReMe

agentscope-ai/ReMe

GitHub

A memory management framework for AI agents, with file-based and vector-based systems for long-term memory and cross-session recall.

Agent memory

Symphony

openai/symphony

GitHub

An OpenAI engineering preview and specification for orchestrating coding agents from project work queues into isolated autonomous implementation runs.

Agent orchestration, coding agents

TencentDB Agent Memory

Tencent/TencentDB-Agent-Memory

GitHub

A local memory plugin for AI agents, with symbolic short-term memory, layered long-term memory, SQLite defaults, OpenClaw integration, Hermes support, and project-reported benchmark results.

Agent memory, layered recall

TinyFish Skills

tinyfish-io/skills

GitHub

A public skills repo for TinyFish agent workflows, including web-agent automation and related utility skills.

Agent skills, web automation

UI-TARS Desktop

bytedance/UI-TARS-desktop

GitHub

A ByteDance GUI-agent desktop app and multimodal agent stack for local or remote computer and browser operation, with vision-language model control, CLI/Web UI paths, and MCP-oriented tooling.

GUI agents, computer-use automation

Understand Anything

Lum1104/Understand-Anything

GitHub

A codebase and knowledge-base graph tool for AI coding environments, with plugin paths for Claude Code, Codex, Cursor, Copilot, Gemini CLI, and others, plus search, chat, tours, diff impact views, and an interactive dashboard.

Coding-agent context, knowledge graphs

Vane

ItzCrazyKns/Vane

GitHub

A self-hostable AI answering engine for private search-style workflows, with local and cloud model providers, SearxNG-backed web search, cited sources, file uploads, and Docker setup paths.

Private AI answering engine

Embodied / Physical AI

Robotics and physical systems

5

dimos

dimensionalOS/dimos

GitHub

An operating system layer for controlling robots and other hardware platforms with natural-language workflows.

Agentic physical systems

elrobot

norma-core/hardware/elrobot

GitHub

A low-cost 3D-printed robotic arm intended for physical AI research and imitation learning.

Robotics hardware

FreeMoCap

freemocap/freemocap

GitHub

A research-grade motion capture system designed to stay low-cost, hardware-agnostic, and accessible for scientific, educational, and training use.

Motion capture, embodied AI

LabClaw

wu-yc/LabClaw

GitHub

A large package of workflow skills for biomedical and scientific AI work across multiple lab-heavy domains.

Scientific workflows

UnifoLM-WBT-Dataset

unitreerobotics/unifolm-wbt-dataset

Hugging Face

A real-world humanoid robot whole-body teleoperation dataset for open environments.

Robotics dataset

Productivity

Useful daily tools

5

Open Design

nexu-io/open-design

GitHub

A local-first AI design workspace that connects coding-agent CLIs to prototypes, decks, media outputs, design systems, sandboxed previews, and export workflows.

AI design workspace, agent-assisted prototypes

open-slide

1weiho/open-slide

GitHub

An agent-native slide framework for building React-based decks with coding agents, browser preview, comments, assets, present mode, and HTML/PDF export.

Agent-authored slides

OpenOats

yazinsai/OpenOats

GitHub

A meeting note-taking assistant designed to be more conversational and responsive than passive transcription.

Meetings, note-taking

Terax

crynta/terax-ai

GitHub

A lightweight AI-native terminal and developer environment built with Tauri, Rust, and React, with a terminal, code editor, file explorer, web preview, AI side panel, BYOK providers, local model support, and approval-style file tools.

AI terminal, developer workflow

Warp

warpdotdev/warp

GitHub

An agentic development environment born out of the terminal, with built-in coding-agent workflows and support for bringing external CLI agents into developer work.

Agentic developer environments

Ecosystem

Tools around the stack

22

Awesome DESIGN.md

VoltAgent/awesome-design-md

GitHub

A curated collection of DESIGN.md example files inspired by public websites, intended to help AI coding agents understand visual systems, design tokens, layout rules, and UI guardrails.

Design examples for AI coding agents

CocoIndex

cocoindex-io/cocoindex

GitHub

An incremental data engine for keeping AI-agent and LLM-app context fresh, with Python-native pipelines, delta-only processing, lineage, connectors, and targets for vector, graph, relational, and warehouse stores.

Incremental indexing, agent context

CubeSandbox

TencentCloud/CubeSandbox

GitHub

Sandbox infrastructure for AI agents, positioned around fast startup, isolation, high concurrency, and self-hosted code-execution workflows.

Agent execution infrastructure

DataDesigner

NVIDIA-NeMo/DataDesigner

GitHub

A synthetic data generation framework for creating structured datasets from scratch or seed data, with dependency-aware generation, validation, and quality scoring.

Synthetic data tooling

DESIGN.md

google-labs-code/design.md

GitHub

A format specification and CLI toolkit for describing a design system to coding agents, positioned around persistent visual guidance, linting, and token-level design workflows.

Design-system tooling for coding agents

Google Workspace CLI

googleworkspace/cli

GitHub

A command-line interface for Workspace services, with repository notes that it is not officially supported by Google and is still changing.

Workspace automation, project caveat

HyperFrames

heygen-com/hyperframes

GitHub

A video rendering framework for HTML-based compositions, positioned around agent-friendly workflows, previewing, and MP4 rendering.

Agent-friendly media tooling

insanely-fast-whisper

Vaibhavs10/insanely-fast-whisper

GitHub

An opinionated CLI for very fast on-device transcription with Whisper.

Transcription, local inference

k-dense-byok

K-Dense-AI/k-dense-byok

GitHub

A desktop co-scientist setup built around scientific skills and bring-your-own-key workflows.

Scientific assistants

LEANN

yichuan-w/LEANN

GitHub

A lightweight vector database for personal RAG and semantic search, designed to run locally with much lower storage overhead.

RAG infrastructure, vector databases

Lightpanda Browser

lightpanda-io/browser

GitHub

A headless browser designed with AI automation use cases in mind.

Automation infrastructure

liteparse

run-llama/liteparse

GitHub

A local PDF parsing tool focused on fast, lightweight parsing, bounding boxes, OCR flexibility, and screenshots for agent workflows.

Document parsing tools

LLaMA Factory

hiyouga/LlamaFactory

GitHub

A unified fine-tuning and deployment platform for 100+ LLMs and VLMs, with a zero-code CLI, web UI, and support for many training approaches.

Fine-tuning tooling

Manifest

mnfst/manifest

GitHub

A smart model router for personal AI agents, positioned around cost-aware request routing, fallbacks, provider control, and self-hosted agent workflows.

Model routing, agent infrastructure

MiniMax CLI

MiniMax-AI/cli

GitHub

The official MiniMax CLI for terminal and agent workflows, with commands for text, image, video, speech, music, vision, and search.

Multimodal CLI

OpenAI Plugins

openai/plugins

GitHub

A curated collection of Codex plugin examples, manifests, and supporting files for extending Codex-based workflows.

Codex plugins, developer tooling

OpenAI Privacy Filter

openai/privacy-filter

GitHub

A privacy-filtering model and local toolkit for detecting and masking personally identifiable information in text, positioned around high-throughput sanitization workflows.

Privacy tooling, PII filtering

PaddleOCR

PaddlePaddle/PaddleOCR

GitHub

A document AI toolkit for turning PDFs and images into structured, LLM-ready data, positioned around multilingual OCR, document parsing, and agent-ready extraction workflows.

OCR and document AI

Skill Seekers

yusufkaraaslan/Skill_Seekers

GitHub

A preprocessing layer for turning raw documentation into reusable inputs for skills, RAG pipelines, and AI coding tools.

Knowledge tooling

Spec Kit

github/spec-kit

GitHub

A toolkit for spec-driven development, positioned around structured workflows, predictable implementation paths, and AI coding agent integrations.

AI coding workflows

vllm-omni

vllm-project/vllm-omni

GitHub

A framework for serving and running omni-modality models more efficiently.

Inference infrastructure

Voicebox

jamiepine/voicebox

GitHub

A local-first voice synthesis studio for voice cloning, speech generation, effects, and voice-powered app workflows.

Voice tooling

Datasets

Benchmarks and corpora

7

ClawMark

evolvent-ai/ClawMark

GitHub

A living-world benchmark for multi-day, multimodal coworker agents, spanning 100 tasks across professional domains and real tool environments.

Agent benchmark, multimodal evaluation

General365

meituan-longcat/General365

GitHub

A manually curated benchmark for general reasoning in LLMs, designed around high difficulty, broad task diversity, K-12-scope knowledge, and hybrid scoring.

Reasoning benchmark

LARYBench

meituan-longcat/LARYBench

GitHub

A benchmark for evaluating latent action representations, with pipelines for action semantics, robotic control regression, and broader vision-to-action alignment.

Vision-to-action benchmark

Monitorability Evals

openai/monitorability-evals

GitHub

An OpenAI evaluation-data release for studying monitorability, with public eval splits, prompt templates, dataset mappings, and metric code from the Monitoring Monitorability paper.

Model monitoring, eval datasets

olmOCR-bench

allenai/olmOCR-bench

Hugging Face

A benchmark for evaluating how well OCR systems convert PDFs into useful markdown while preserving structure.

OCR benchmark

ParseBench

run-llama/ParseBench

GitHub

A document parsing benchmark for AI-agent workflows, focused on whether parsed PDFs preserve enough structure and meaning for reliable downstream use.

Document parsing benchmark

WaxalNLP

google/WaxalNLP

Hugging Face

A large multilingual speech corpus for African languages introduced through the WAXAL paper.

Speech dataset

Sponsored

Sponsored

Also in AI

Not sure where to go next?

A good next step might be AI Guides for help with choosing and using AI tools well, AI Access for free and low-cost ways to compare AI model access, AI Ballot for a clearer view of what readers are leaning toward.