Install, Run & Control
Everything
on Your Computer
with 1 Click.

Pinokio is a browser that lets you install, run, and manage ANY server application, locally.

Verified

Scripts from Verified Publishers

Dia is a 1.6B parameter text to speech model created by Nari Labs. Dia directly generates highly realistic dialogue from a transcript. You can condition the output on audio, enabling emotion and tone control. The model can also produce nonverbal communications like laughter, coughing, clearing throat, etc. https://github.com/nari-labs/dia

[NVIDIA ONLY] Generate Video Progressively. FramePack is a next-frame (next-frame-section) prediction neural network structure that generates videos progressively. https://github.com/lllyasviel/FramePack

[NVIDIA ONLY] Super Optimized Gradio UI for Wan2.1 video for GPU poor machines (5GB+ VRAM). Generate up to 12 sec videos https://github.com/deepbeepmeep/Wan2GP

[NVIDIA ONLY] Generate an image from multiple images https://github.com/bytedance/UNO

Orpheus-TTS-FastAPI

Orpheus TTS is an open-source text-to-speech system built on the Llama-3b backbone. Orpheus demonstrates the emergent capabilities of using LLMs for speech synthesis https://github.com/canopyai/Orpheus-TTS

[NVIDIA ONLY] Super Optimized Gradio UI for Hunyuan Video Generator that works on GPU poor machines. Generate up to 10~14 sec videos https://github.com/deepbeepmeep/HunyuanVideoGP

Hunyuan3D-2-LowVRAM

Text/Image to 3D (Cross Platform: Mac + Windows + Linux): High-Resolution 3D Assets Generation with Large Scale Hunyuan3D Diffusion Models. https://github.com/deepbeepmeep/Hunyuan3D-2GP

Roblox Foundation Model for 3D Intelligence --- Cross Platform (Mac, Windows, Linux): Requires 16GB+ VRAM PC or 18GB+ Memory Macs https://github.com/Roblox/cube

Generate songs with AI (up to 4 min 45 sec). Both with lyrics or instrumental https://github.com/ASLP-lab/DiffRhythm

The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface. https://github.com/comfyanonymous/ComfyUI

MatAnyone AI is a tool for editing videos by separating objects from their backgrounds. It is an AI to remove the background from videos effectively. Stable Video Matting with Consistent Memory Propagation: https://github.com/pq-yang/MatAnyone.git

[Mac Only] We make AI agents that control Mac apps: https://github.com/browser-use/macOS-use

An intelligent, interactive Image Editing System. Easily erase and add objects on a user-friendly interface.

Zonos-v0.1 is a leading open-weight text-to-speech model trained on more than 200k hours of varied multilingual speech, delivering expressiveness and quality on par with—or even surpassing—top TTS providers. https://github.com/Zyphra/Zonos

deep hermes, but without the need for a system prompt. Autonomously responds based on its OWN judgment https://github.com/cocktailpeanut/deeperhermes

Run AI Agent in your browser. https://github.com/browser-use/web-ui

[NVIDIA ONLY] YuEGP--A Web UI for YuE, an Open Full-song Generation Foundation Model (10G VRAM required), via https://github.com/deepbeepmeep/YuEGP

User-friendly WebUI for LLMs, supported LLM runners include Ollama and OpenAI-compatible APIs https://github.com/open-webui/open-webui

Prompt, run, edit, and deploy full-stack web apps. https://github.com/stackblitz-labs/bolt.diy

StyleTTS2 Studio

Build your own voice for StyleTTS2

FaceFusion 3.1.2

Industry leading face manipulation platform

Generate synchronized audio from video and/or text inputs https://github.com/hkchengrex/MMAudio

Pinokio System Programming: Make your own custom Pinokio

ai-video-composer

The ultimate video editor powered by natural language and FFMPEG https://huggingface.co/spaces/huggingface-projects/ai-video-composer

[NVIDIA ONLY] Make virtual avatars talk whatever you want with an image and an audio clip https://github.com/antgroup/echomimic_v2

Clarity Refiners UI

An enhanced local port of finegrain-image-enhancer powered by Refiners (https://huggingface.co/spaces/finegrain/finegrain-image-enhancer), which was adapted from philz1337x's Clarity Upscaler (https://github.com/philz1337x/clarity-upscaler)

Pyramd Flow Video Generation AI (text-to-video & image-to-video) https://github.com/jy0205/Pyramid-Flow

Enhanced background remove and replace app built around BRIA-RMBG-2.0 https://huggingface.co/briaai/RMBG-2.0

restore low-res images, restore broken images, recreate a new version of the image with a prompt https://huggingface.co/spaces/fffiloni/InstantIR

[NVIDIA ONLY] Autocomplete any voice(s), powered by Hertz AI (Standard Intelligence)

Multilingual Text-to-Speech with Voice Cloning (Supports: English, Japanese, Korean, Chinese, French, German, Arabic, and Spanish) https://github.com/fishaudio/fish-speech

[MAC ONLY] A powerful and user-friendly web interface for FLUX, powered by MLX and Gradio via MFLUX

Allegro-txt2vid

[NVIDIA ONLY] Generate videos with Allegro txt2vid model https://github.com/rhymes-ai/Allegro

A unified image generation model that you can use to perform various tasks, including but not limited to text-to-image generation, subject-driven generation, Identity-Preserving Generation, and image-conditioned generation. https://huggingface.co/spaces/Shitao/OmniGen

the simplest self-building coding agent https://github.com/yoheinakajima/ditto

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching https://huggingface.co/spaces/mrfakename/E2-F5-TTS

Diffusion for World Modeling https://diamond-wm.github.io/

[NVIDIA Only] Select a portrait, click to move the head around https://github.com/jbilcke-hf/FacePoke

MLX-Video-Transcription

[Mac Only] Super Fast MLX Powered Video Transcription https://github.com/RayFernando1337/MLX-Auto-Subtitled-Video-Generator/ by https://x.com/RayFernando1337

The Gen AI Platform for Pro Studios https://github.com/invoke-ai/InvokeAI

diffusers-image-fill

Remove objects from an image https://huggingface.co/spaces/OzzyGT/diffusers-image-fill

A Web UI for easy subtitle using whisper model.

[NVIDIA ONLY] Advanced Web UI for CogVideo (text to video, image to video, video to video, extend video, etc) -- Generate videos with less than 10GB VRAM

[Mac only] a speech-text foundation model for real time dialogue https://github.com/kyutai-labs/moshi

A simple, high-quality voice conversion tool focused on ease of use and performance. https://github.com/IAHispano/Applio

[NVIDIA Only] Dead simple web UI for training FLUX LoRA with LOW VRAM support (From 12GB)

[NVIDIA ONLY] Generate videos with less than 10GB VRAM https://github.com/THUDM/CogVideo

[NVIDIA ONLY] The most efficient way to run FLUX (Optimized to run even on low memory machines, as low as 3GB VRAM with 512x512 resolution) https://github.com/lllyasviel/stable-diffusion-webui-forge

Bring portraits to life! https://github.com/KwaiVGI/LivePortrait

Minimal Flux Web UI powered by Gradio & Diffusers (Flux Schnell + Flux Merged)

aura-sr-upscaler

AuraSR-v2 - An open reproduction of the GigaGAN Upscaler from fal.ai https://huggingface.co/spaces/gokaygokay/AuraSR-v2

audiocraft_plus

AudioCraft Plus is an all-in-one WebUI for the original AudioCraft, adding many quality features on top https://github.com/GrandaddyShmax/audiocraft_plus

Artist is a training-free text-driven image stylization method. You give an image and input a prompt describing the desired style, Artist give you the stylized image in that style. The detail of the original image and the style you provide is harmonically integrated https://huggingface.co/spaces/fffiloni/Artist

RC Stable Audio Tools

Advanced Gradio UI for Stable Audio https://github.com/RoyalCities/RC-stable-audio-tools

Customizing Realistic Human Photos via Stacked ID Embedding https://huggingface.co/spaces/TencentARC/PhotoMaker-V2

Minimal Stable Diffusion UI

AutoGPT is a powerful tool that lets you create and run intelligent agents https://github.com/Significant-Gravitas/AutoGPT

Generate Pinokio Launchers, Instantly. https://gepeto.pinokio.computer

An advanced vision foundation model from MicroSoft https://huggingface.co/spaces/gokaygokay/Florence-2

[NVIDIA Only] Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation https://github.com/fudan-generative-vision/hallo

[Mac Onlyl] An all-in-one LLMs Chat UI for Apple Silicon Mac using MLX Framework. https://github.com/qnguyen3/chat-with-mlx

Accelerating any conditional diffusion model for few steps image generation https://gojasper.github.io/flash-diffusion-project/

An Open Source Model for Audio Samples and Sound Design https://github.com/Stability-AI/stable-audio-tools

Phased Consistency Model - generate high quality images with 2 steps https://huggingface.co/spaces/radames/Phased-Consistency-Model-PCM

a local-install interface that allows you to interact with text generation AIs (LLMs) to chat and roleplay with custom characters. https://docs.sillytavern.app/

Build and customize your own version of AI town - a virtual town where AI characters live, chat and socialize https://github.com/a16z-infra/ai-town

Unify Efficient Fine-Tuning of 100+ LLMs https://github.com/hiyouga/LLaMA-Factory

StoryDiffusion Comics

create a story by generating consistent images https://github.com/HVision-NKU/StoryDiffusion

ZeST: Zero-Shot Material Transfer from a Single Image. Local port of https://huggingface.co/spaces/fffiloni/ZeST (Project: https://ttchengab.github.io/zest/)

Openvoice 2 Web UI - A local web UI for Openvoice2, a multilingual voice cloning TTS https://x.com/myshell_ai/status/1783161876052066793

An open-source, modern-design ChatGPT/LLMs UI/Framework. Supports speech-synthesis, multi-modal, and extensible (function call) plugin system. https://github.com/lobehub/lobe-chat

Improving Diffusion Models for Authentic Virtual Try-on in the Wild https://huggingface.co/spaces/yisol/IDM-VTON

Agentic AI Software Engineer https://github.com/stitionai/devika

a lightweight text-to-speech (TTS) model that can generate high-quality speech with features that can be controlled using a simple text prompt (e.g. gender, background noise, speaking rate, pitch and reverberation). https://huggingface.co/spaces/parler-tts/parler_tts_mini

Upload the picture of an image, and generate images with that image style. Instant generation with no LoRA required https://huggingface.co/spaces/InstantX/InstantStyle

diffusers InstantID + ControlNet inspired by face-to-many from fofr (https://x.com/fofrAI) - a localized Version of https://huggingface.co/spaces/multimodalart/face-to-all

A unified encoder-based framework for object customization in text-to-image diffusion models https://huggingface.co/spaces/TencentARC/CustomNet

Generate images with spatial accuracy https://huggingface.co/spaces/SPRIGHT-T2I/SPRIGHT-T2I

A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion https://huggingface.co/spaces/TencentARC/BrushNet

A Foundation Model of Human Faces https://huggingface.co/spaces/FoivosPar/Arc2Face

[NVIDIA ONLY] Text-driven, intelligent restoration, blending AI technology with creativity to give every image a brand new life https://supir.xpixel.group

a tiny vision language model that kicks ass and runs anywhere https://github.com/vikhyat/moondream

Zero-Shot Text-Based Audio Editing Using DDPM Inversion https://huggingface.co/spaces/hilamanor/audioEditing

differential-diffusion-ui

Differential Diffusion modifies an image according to a text prompt, and according to a map that specifies the amount of change in each region https://differential-diffusion.github.io/

Geometric 3D Vision Made Easy https://dust3r.europe.naverlabs.com/

open source chat UI for Ollama https://github.com/ivanfioravanti/chatbot-ollama

remove-video-bg

Video background removal tool https://huggingface.co/spaces/amirgame197/Remove-Video-Background

High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean https://github.com/myshell-ai/MeloTTS

An intuitive GUI for GLIGEN that uses ComfyUI in the backend https://github.com/mut-ex/gligen-gui

Stable Cascade from StabilityAI

Bark Voice Cloning

Upload a clean 20 seconds WAV file of the vocal persona you want to mimic, type your text-to-speech prompt and hit submit! A local version of https://huggingface.co/spaces/fffiloni/instant-TTS-Bark-cloning

[NVIDIA GPU ONLY] LGM

LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation https://huggingface.co/spaces/ashawkey/LGM

Background removal model developed by BRIA.AI, trained on a carefully selected dataset and is available as an open-source model for non-commercial use https://huggingface.co/spaces/briaai/BRIA-RMBG-1.4

[Runs fast on NVIDIA GPUs. Works on M1/M2/M3 Macs but slow] VideoCrafter is an open-source video generation and editing toolbox for crafting video content. It currently includes the Text2Video and Image2Video models https://github.com/AILab-CVC/VideoCrafter

state-of-the-art tuning-free method to achieve ID-Preserving generation with only single image, supporting various downstream tasks. https://instantid.github.io/

Customizing Realistic Human Photos via Stacked ID Embedding https://github.com/TencentARC/PhotoMaker

MAGNeT is a text-to-music and text-to-sound model capable of generating high-quality audio samples conditioned on text descriptions https://github.com/facebookresearch/audiocraft/blob/main/docs/MAGNET.md

Video to Openpose & DWPose (All OS supported) https://github.com/sdbds/vid2pose

Moore-AnimateAnyone-Mini

[NVIDIA ONLY] Efficient Implementation of Animate Anyone (13G VRAM + 2G model size) https://github.com/sdbds/Moore-AnimateAnyone-for-windows

Moore-AnimateAnyone

[NVIDIA GPU ONLY] Unofficial Implementation of Animate Anyone https://github.com/MooreThreads/Moore-AnimateAnyone

Instantly clone any voice from any text to any speech, in any language https://huggingface.co/spaces/myshell-ai/OpenVoice

IP-Adapter-FaceID

Enter a face image and transform it to any other image. Demo for the h94/IP-Adapter-FaceID model https://huggingface.co/spaces/multimodalart/Ip-Adapter-FaceID

When Expressive Talking Head Generation Meets Diffusion Probabilistic Models (https://github.com/ali-vilab/dreamtalk)

Stable Diffusion web UI

One-click launcher for Stable Diffusion web UI (AUTOMATIC1111/stable-diffusion-webui)

Turn any video into Openpose video https://huggingface.co/spaces/fffiloni/video2openpose2

Style Aligned Image Generation via Shared Attention https://style-aligned-gen.github.io/

MagicAnimate Mini

[NVIDIA GPU Only] An optimized version of MagicAnimate https://github.com/sdbds/magic-animate-for-windows

Convert your videos to densepose and use it on MagicAnimate https://github.com/Flode-Labs/vid2densepose

1 Click Installer for kohya_ss, a Stable Diffusion LoRa & Dreambooth WebUI (https://github.com/bmaltais/kohya_ss)

Separate Anything You Describe (https://huggingface.co/spaces/Audio-AGI/AudioSep)

clone voices into different languages by using just a quick 3-second audio clip. (a local version of https://huggingface.co/spaces/coqui/xtts)

1 Click Installer for Retrieval-based-Voice-Conversion-WebUI (https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI)

Install AnimateDiff Automatic1111 Extension and the models with one click

Text-to-Video (T2V) generation framework from Vchitect https://github.com/Vchitect/LaVie

Limitless Image Editing using Text-to-Image Models

Text Generation WebUI

A Gradio web UI for Large Language Models https://github.com/oobabooga/text-generation-webui

Latest

Latest Pinokio scripts from the community (tagged as 'pinokio' on GitHub)