Install, Run & Control
Everything
on Your Computer
with 1 Click.

Pinokio is a browser that lets you install, run, and programmatically control ANY application, automatically.

Download Explore Learn

Explore

Browse the Pinokio scripts shared by the community.

Verified

Scripts from Verified Publishers

script version 2.0

the simplest self-building coding agent https://github.com/yoheinakajima/ditto

script version 2.0

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching https://huggingface.co/spaces/mrfakename/E2-F5-TTS

script version 2.0

Diffusion for World Modeling https://diamond-wm.github.io/

script version 2.0

User-friendly WebUI for LLMs, supported LLM runners include Ollama and OpenAI-compatible APIs https://github.com/open-webui/open-webui

script version 2.0

[NVIDIA Only] Select a portrait, click to move the head around https://github.com/jbilcke-hf/FacePoke

script version 2.0

MLX-Video-Transcription

[Mac Only] Super Fast MLX Powered Video Transcription https://github.com/RayFernando1337/MLX-Auto-Subtitled-Video-Generator/ by https://x.com/RayFernando1337

script version 1.5

The Gen AI Platform for Pro Studios https://github.com/invoke-ai/InvokeAI

script version 2.0

diffusers-image-fill

Remove objects from an image https://huggingface.co/spaces/OzzyGT/diffusers-image-fill

script version 1.5

FaceFusion 3.0.0

Industry leading face manipulation platform

script version 2.0

A Web UI for easy subtitle using whisper model.

script version 2.1

[NVIDIA ONLY] Advanced Web UI for CogVideo (text to video, image to video, video to video, extend video, etc) -- Generate videos with less than 10GB VRAM

script version 2.0

[Mac only] a speech-text foundation model for real time dialogue https://github.com/kyutai-labs/moshi

script version 2.1.0

A simple, high-quality voice conversion tool focused on ease of use and performance. https://github.com/IAHispano/Applio

script version 2.1

[NVIDIA Only] Dead simple web UI for training FLUX LoRA with LOW VRAM support (From 12GB)

script version 2.0

The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface. https://github.com/comfyanonymous/ComfyUI

script version 2.0

[NVIDIA ONLY] The most efficient way to run FLUX (Optimized to run even on low memory machines, as low as 3GB VRAM with 512x512 resolution) https://github.com/lllyasviel/stable-diffusion-webui-forge

script version 2.0

Bring portraits to life! https://github.com/KwaiVGI/LivePortrait

script version 2.0

Minimal Flux Web UI powered by Gradio & Diffusers (Flux Schnell + Flux Merged)

script version 2.0

aura-sr-upscaler

AuraSR-v2 - An open reproduction of the GigaGAN Upscaler from fal.ai https://huggingface.co/spaces/gokaygokay/AuraSR-v2

script version 2.0

audiocraft_plus

AudioCraft Plus is an all-in-one WebUI for the original AudioCraft, adding many quality features on top https://github.com/GrandaddyShmax/audiocraft_plus

script version 2.0

Artist is a training-free text-driven image stylization method. You give an image and input a prompt describing the desired style, Artist give you the stylized image in that style. The detail of the original image and the style you provide is harmonically integrated https://huggingface.co/spaces/fffiloni/Artist

script version 2.0

RC Stable Audio Tools

Advanced Gradio UI for Stable Audio https://github.com/RoyalCities/RC-stable-audio-tools

script version 2.0

Customizing Realistic Human Photos via Stacked ID Embedding https://huggingface.co/spaces/TencentARC/PhotoMaker-V2

script version 2.0

Minimal Stable Diffusion UI

script version 2.0

AutoGPT is a powerful tool that lets you create and run intelligent agents https://github.com/Significant-Gravitas/AutoGPT

script version 2.0

Generate Pinokio Launchers, Instantly. https://gepeto.pinokio.computer

script version 1.5

An advanced vision foundation model from MicroSoft https://huggingface.co/spaces/gokaygokay/Florence-2

script version 1.5

[NVIDIA Only] Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation https://github.com/fudan-generative-vision/hallo

script version 1.5

[Mac Onlyl] An all-in-one LLMs Chat UI for Apple Silicon Mac using MLX Framework. https://github.com/qnguyen3/chat-with-mlx

script version 1.5

Accelerating any conditional diffusion model for few steps image generation https://gojasper.github.io/flash-diffusion-project/

script version 1.5

An Open Source Model for Audio Samples and Sound Design https://github.com/Stability-AI/stable-audio-tools

script version 1.5

Phased Consistency Model - generate high quality images with 2 steps https://huggingface.co/spaces/radames/Phased-Consistency-Model-PCM

script version 1.5

a local-install interface that allows you to interact with text generation AIs (LLMs) to chat and roleplay with custom characters. https://docs.sillytavern.app/

script version 1.5

Build and customize your own version of AI town - a virtual town where AI characters live, chat and socialize https://github.com/a16z-infra/ai-town

script version 1.5

Unify Efficient Fine-Tuning of 100+ LLMs https://github.com/hiyouga/LLaMA-Factory

script version 1.5

Describe UI and see it rendered live. Ask for changes and convert HTML to React, Svelte, Web Components, etc. Like vercel v0, but open source https://github.com/wandb/openui

script version 1.5

StoryDiffusion Comics

create a story by generating consistent images https://github.com/HVision-NKU/StoryDiffusion

script version 1.5

ZeST: Zero-Shot Material Transfer from a Single Image. Local port of https://huggingface.co/spaces/fffiloni/ZeST (Project: https://ttchengab.github.io/zest/)

script version 1.5

Openvoice 2 Web UI - A local web UI for Openvoice2, a multilingual voice cloning TTS https://x.com/myshell_ai/status/1783161876052066793

script version 1.2

An open-source, modern-design ChatGPT/LLMs UI/Framework. Supports speech-synthesis, multi-modal, and extensible (function call) plugin system. https://github.com/lobehub/lobe-chat

script version 1.5

Improving Diffusion Models for Authentic Virtual Try-on in the Wild https://huggingface.co/spaces/yisol/IDM-VTON

script version 1.5

Agentic AI Software Engineer https://github.com/stitionai/devika

script version 1.5

Edit images with just prompt, an unofficial demo for CosXL and CosXL Edit from Stability AI, https://huggingface.co/spaces/multimodalart/cosxl

script version 1.5

a lightweight text-to-speech (TTS) model that can generate high-quality speech with features that can be controlled using a simple text prompt (e.g. gender, background noise, speaking rate, pitch and reverberation). https://huggingface.co/spaces/parler-tts/parler_tts_mini

script version 1.5

Upload the picture of an image, and generate images with that image style. Instant generation with no LoRA required https://huggingface.co/spaces/InstantX/InstantStyle

script version 1.5

diffusers InstantID + ControlNet inspired by face-to-many from fofr (https://x.com/fofrAI) - a localized Version of https://huggingface.co/spaces/multimodalart/face-to-all

script version 1.5

A unified encoder-based framework for object customization in text-to-image diffusion models https://huggingface.co/spaces/TencentARC/CustomNet

script version 1.5

Generate images with spatial accuracy https://huggingface.co/spaces/SPRIGHT-T2I/SPRIGHT-T2I

script version 1.5

A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion https://huggingface.co/spaces/TencentARC/BrushNet

script version 1.5

A Foundation Model of Human Faces https://huggingface.co/spaces/FoivosPar/Arc2Face

script version 1.2

[NVIDIA ONLY] Text-driven, intelligent restoration, blending AI technology with creativity to give every image a brand new life https://supir.xpixel.group

script version 1.2

a tiny vision language model that kicks ass and runs anywhere https://github.com/vikhyat/moondream

script version 1.2

a state-of-the-art open-source model for fast feedforward 3D reconstruction from a single image, developed in collaboration between Tripo AI and Stability AI. https://huggingface.co/spaces/stabilityai/TripoSR

script version 1.2

Zero-Shot Text-Based Audio Editing Using DDPM Inversion https://huggingface.co/spaces/hilamanor/audioEditing

script version 1.2

differential-diffusion-ui

Differential Diffusion modifies an image according to a text prompt, and according to a map that specifies the amount of change in each region https://differential-diffusion.github.io/

script version 1.3

Geometric 3D Vision Made Easy https://dust3r.europe.naverlabs.com/

script version 1.2

open source chat UI for Ollama https://github.com/ivanfioravanti/chatbot-ollama

script version 1.2

remove-video-bg

Video background removal tool https://huggingface.co/spaces/amirgame197/Remove-Video-Background

script version 1.2

High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean https://github.com/myshell-ai/MeloTTS

script version 1.2

An intuitive GUI for GLIGEN that uses ComfyUI in the backend https://github.com/mut-ex/gligen-gui

script version 1.3

Stable Cascade from StabilityAI

script version 1.1

Bark Voice Cloning

Upload a clean 20 seconds WAV file of the vocal persona you want to mimic, type your text-to-speech prompt and hit submit! A local version of https://huggingface.co/spaces/fffiloni/instant-TTS-Bark-cloning

script version 1.1

[NVIDIA GPU ONLY] LGM

LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation https://huggingface.co/spaces/ashawkey/LGM

script version 1.1

Background removal model developed by BRIA.AI, trained on a carefully selected dataset and is available as an open-source model for non-commercial use https://huggingface.co/spaces/briaai/BRIA-RMBG-1.4

script version 1

[Runs fast on NVIDIA GPUs. Works on M1/M2/M3 Macs but slow] VideoCrafter is an open-source video generation and editing toolbox for crafting video content. It currently includes the Text2Video and Image2Video models https://github.com/AILab-CVC/VideoCrafter

script version 1.1

moondream1 is a tiny (1.6B parameter) vision language model trained by @vikhyatk that performs on par with models twice its size. It is trained on the LLaVa training dataset, and initialized with SigLIP as the vision tower and Phi-1.5 as the text encoder. https://huggingface.co/spaces/vikhyatk/moondream1

script version 1

state-of-the-art tuning-free method to achieve ID-Preserving generation with only single image, supporting various downstream tasks. https://instantid.github.io/

script version 1

Customizing Realistic Human Photos via Stacked ID Embedding https://github.com/TencentARC/PhotoMaker

script version 1

MAGNeT is a text-to-music and text-to-sound model capable of generating high-quality audio samples conditioned on text descriptions https://github.com/facebookresearch/audiocraft/blob/main/docs/MAGNET.md

script version 1

Video to Openpose & DWPose (All OS supported) https://github.com/sdbds/vid2pose

script version 1

Moore-AnimateAnyone-Mini

[NVIDIA ONLY] Efficient Implementation of Animate Anyone (13G VRAM + 2G model size) https://github.com/sdbds/Moore-AnimateAnyone-for-windows

script version 1

Moore-AnimateAnyone

[NVIDIA GPU ONLY] Unofficial Implementation of Animate Anyone https://github.com/MooreThreads/Moore-AnimateAnyone

script version 1

Instantly clone any voice from any text to any speech, in any language https://huggingface.co/spaces/myshell-ai/OpenVoice

script version 1

IP-Adapter-FaceID

Enter a face image and transform it to any other image. Demo for the h94/IP-Adapter-FaceID model https://huggingface.co/spaces/multimodalart/Ip-Adapter-FaceID

script version 1

StreamDiffusion

[NVIDIA ONLY] A Pipeline-Level Solution for Real-Time Interactive Generation https://github.com/cumulo-autumn/StreamDiffusion

script version 1

When Expressive Talking Head Generation Meets Diffusion Probabilistic Models (https://github.com/ali-vilab/dreamtalk)

script version 1.1

Stable Diffusion web UI

One-click launcher for Stable Diffusion web UI (AUTOMATIC1111/stable-diffusion-webui)

script version 1

Turn any video into Openpose video https://huggingface.co/spaces/fffiloni/video2openpose2

Style Aligned Image Generation via Shared Attention https://style-aligned-gen.github.io/

Turn any video into Openpose video https://huggingface.co/spaces/fffiloni/video2openpose2

MagicAnimate Mini

[NVIDIA GPU Only] An optimized version of MagicAnimate https://github.com/sdbds/magic-animate-for-windows

Convert your videos to densepose and use it on MagicAnimate https://github.com/Flode-Labs/vid2densepose

[NVIDIA GPU Only] Temporally Consistent Human Image Animation using Diffusion Model https://showlab.github.io/magicanimate/

Realtime StableDiffusion

Demo showcasing ~real-time Latent Consistency Model pipeline with Diffusers and a MJPEG stream server (https://github.com/radames/Real-Time-Latent-Consistency-Model)

Text-to-Video (T2V) generation framework from Vchitect https://github.com/Vchitect/LaVie

Limitless Image Editing using Text-to-Image Models

Diffusers SDXL Turbo

Demo showcasing ~real-time Latent Consistency Model pipeline with Diffusers and a MJPEG stream server (https://github.com/radames/Real-Time-Latent-Consistency-Model)

A Real-Time Text-to-Image Generation Model

Stable Video Diffusion

[NVIDIA ONLY] Stable Video Diffusion Streamlit App. Currently supports Nvidia GPU machines only.

A Realtime Creation Engine

An AI powered mirror

Realtime BakLLaVA

llama.cpp with BakLLaVA model describes what does it see (https://github.com/Fuzzy-Search/realtime-bakllava)

LLM-Based Pseudo Music Captioning

Separate Anything You Describe (https://huggingface.co/spaces/Audio-AGI/AudioSep)

Fast Image generator using Latent consistency models https://replicate.com/blog/run-latent-consistency-model-on-mac

Text Generation WebUI

A Gradio web UI for Large Language Models https://github.com/oobabooga/text-generation-webui

IllusionDiffusion

Generate stunning illusion artwork with StableDiffusion (A space by @angrypenguinPNGAP - created with Monster Labs QR ControlNet.

clone voices into different languages by using just a quick 3-second audio clip. (a local version of https://huggingface.co/spaces/coqui/xtts)

1 Click Installer for Retrieval-based-Voice-Conversion-WebUI (https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI)

1 Click Installer for kohya_ss, a Stable Diffusion LoRa & Dreambooth WebUI (https://github.com/bmaltais/kohya_ss)

Temporally consistent video editing. A local version of https://huggingface.co/spaces/weizmannscience/tokenflow

ModelScope Image2Video (Nvidia GPU only)

Turn any image into a video! (Web UI created by fffiloni: https://huggingface.co/spaces/fffiloni/MS-Image2Video)

An open source implementation of Microsoft's VALL-E X zero-shot TTS model

Dense Text-to-Image Generation with Attention Modulation

LoRA the Explorer

Stable Diffusion LoRA Playground (HuggingFace: https://huggingface.co/spaces/multimodalart/LoraTheExplorer)

1 Click Control-Lora for ComfyUI

Install Control-Lora Models and Workflows to ComfyUI with 1 click

[NVIDIA GPU ONLY] One click installer for Intel's ldm3d

A webui for different audio related Neural Networks

[Nvidia GPU only] One click installer for AudioLDM 2 Gradio UI

One click installer for AudioCraft MusicGen and AudioGen Gradio UI (Requires at least Pinokio v0.0.56)

Install AnimateDiff Automatic1111 Extension and the models with one click

Xorbits Inference

LLM Web UI and API

Port of Facebook's LLaMA model in C/C++

Pinokio Tutorial

Simple script examples that highlight all the Pinokio APIs

Latest

Latest Pinokio scripts from the community (tagged as 'pinokio' on GitHub)