Video-use: AI Video Editor with Claude Code and Transcription

The browser-use team, creators of a popular open-source browser automation agent, have released video-use — an AI tool for automated video editing. The project uses Claude Code for cutting, color grading, subtitle generation, and animations, all operating in the terminal without a graphical interface.

TL;DR: Video-use is an open-source tool from the creators of browser-use that automates video editing through Claude Code. The project offers filler word cutting, color grading, subtitle generation, and Manim/Remotion animations. Everything works as a terminal-based pipeline with self-evaluation and session memory, enabling iterative improvement of results without manual intervention.

How does video-use work and how does it differ from traditional editing tools?

Video-use is an open-source project available on GitHub, designed as an extension of the browser-use philosophy into the domain of video editing. The tool has no graphical interface — all editing is done through the terminal using text commands. Claude Code serves as the “brain” of the operation, analyzing video files, transcribing audio, making editing decisions, and applying effects. The project is available in the video-use repository on GitHub.

The fundamental difference from classic editing software is that the user does not manually manipulate a timeline. Instead, they describe the desired outcome in natural language, and Claude Code generates executable scripts. For example, you can request the removal of all pauses in speech longer than 2 seconds, adding Polish-language subtitles, and applying a “warm sunset” color grade.

However, this architecture has its limitations. The tool requires a stable connection to the Anthropic API, and complex projects with many layers may exceed the model’s context limits. Nevertheless, it is worth trying this solution with simpler materials, where automation brings the greatest time savings.

What features does video-use offer for automated editing?

The video-use repository documents several main capabilities of the tool. Below is an overview of the key features based on the project documentation:

Audio transcription — automatic speech recognition from a video file with the option of manual correction
Filler word cutting — removal of filler words (“um,” “uh,” “like”) based on transcript analysis
Color grading — applying color filters described in natural language
Subtitle generation — creating SRT/VTT files with synchronization to audio
Manim animations — generating mathematical and technical animations through the Manim library
Remotion animations — creating React-based video sequences using Remotion
Self-evaluation pipeline — automatic quality assessment of results by the model with iterative improvement
Session memory — remembering editing context between sessions

The list above shows the scope of automation. Moreover, each of these features is available as a separate pipeline step, allowing selective application — for example, filler word cutting alone without color grading.

Feature	Input Format	Output Format	Requires Anthropic API
Transcription	MP4, MOV, WebM	SRT, VTT, TXT	Yes
Filler word cutting	MP4 + transcript	MP4	Yes
Color grading	MP4, MOV	MP4	Yes
Subtitles	MP4 + transcript	MP4 with burned-in subtitles	Yes
Manim animations	Text description	MP4	Yes
Remotion animations	Description + data	MP4	Yes

How to install and run video-use with Claude Code?

Installing video-use requires a Python environment, Claude Code installed, and an Anthropic API key. The configuration process is described in the video-use repository documentation. The basic steps are cloning the repository, installing dependencies from requirements.txt, and configuring environment variables with the API key.

Claude Code itself can be configured according to the guide Claude Code Overview — Claude Code Docs. Video-use uses Claude Code as the backend for making editorial decisions — the model analyzes the material, generates FFmpeg scripts for cuts and filters, and then verifies the result through self-evaluation.

It is worth paying attention to costs. A single editing session consumes from several to tens of thousands of tokens, which at Opus pricing translates to costs on the order of several dollars per piece of content. Microsoft, as reported by iTHardware, burned through its annual budget for Claude Code in five months — a sign that costs scale quickly with mass usage.

What does the self-evaluation pipeline look like in video-use?

Self-evaluation is a mechanism where Claude Code, after generating a result, automatically assesses its quality and decides whether to retry. The pipeline operates in a loop: the model generates a version, analyzes it against specified criteria, identifies issues, and generates an improved version. This iterative process continues until a satisfactory result is achieved or the iteration limit is exhausted.

In practice, this means the user describes the expected outcome, and the tool independently strives to achieve it. For example, when removing filler words, the model may determine that a cut was too aggressive and left unnatural jumps — it will then automatically adjust parameters and try again.

However, self-evaluation is not perfect. The model can get stuck in a loop, judging the result as unsatisfactory despite objectively good quality. For this reason, the project allows configuring a maximum number of iterations as well as manual approval of changes after each round. I recommend setting a limit of 3 iterations as a starting point — this is sufficient for most standard editing tasks.

What are the realistic use cases for video-use?

Video-use will perform best with repetitive editing tasks that do not require creative human decisions. Examples from the repository documentation include: automatically removing pauses from podcast recordings, generating subtitles for educational materials, applying consistent color grading across an entire video series, and creating technical animations for presentations.

However, the tool’s application has clear boundaries. Creative editing, working with narrative pacing, and selecting shots for emotional impact — these tasks still require human intervention. Claude Code can remove filler words, but it cannot judge whether a particular pause builds tension and should be preserved.

Similarly to claude code /ultraplan, where the model plans programming tasks, video-use plans editing operations. In both cases, effectiveness depends on the quality of the initial instructions — the more precise the description, the better the result of automated editing.

How does video-use handle color grading and subtitles?

Color grading in video-use relies on generating FFmpeg scripts based on natural language descriptions. The user specifies the desired color style, and Claude Code creates the appropriate filters. This mechanism allows applying a consistent look across an entire series of materials without manually configuring color curves in editing software.

Additionally, subtitle generation uses transcription as input data. The tool creates SRT and VTT files synchronized to audio, and can then burn them into the image via FFmpeg. While automatic synchronization works well with clean audio, recordings with background noise may require correction.

The process looks like this: transcription, text correction, timestamp generation, subtitle rendering on video. Each step is a separate pipeline stage, giving control over quality.

Style description — the user types, for example, “warm sunset” or “cool, blue tone”
Mapping to FFmpeg filters — Claude Code generates appropriate eq, curves, colorbalance parameters
Result preview — the script creates a short test clip before processing the entire file
Iteration — self-evaluation checks whether the result matches the description

What are the limitations of video-use and when is traditional editing better?

Video-use has clear application boundaries documented in the repository. The project does not support multi-track editing with composition layers, does not offer live preview, and requires a stable connection to the Anthropic API. Complex projects with multiple video sources, transitions, and effects exceed the model’s context limits.

However, for simple tasks, the tool performs well. Removing filler words from a podcast, generating subtitles for educational material, applying a single color grade across a video series — these are scenarios where automation makes economic sense.

While traditional programs like DaVinci Resolve or Premiere Pro offer full creative control, they require hours of work on repetitive tasks. Video-use automates exactly those repetitive elements, leaving artistic decisions to humans.

Project limitations:

No GUI — all interaction through the terminal, precluding visual editing
Context limits — long materials over 30 minutes may exceed the context window
API costs — each iteration consumes tokens, and self-evaluation can generate multiple attempts
FFmpeg dependency — errors in generated scripts may require manual fixes
No multi-track editing — the tool does not support compositing from multiple video sources
Transcription quality — depends on audio clarity; background noise reduces accuracy
No live preview — the user sees the result only after rendering
English by default — transcription and subtitles in other languages may require additional configuration

What are the costs of using video-use?

The costs of video-use consist of two elements: Anthropic API fees and local rendering resources. Self-evaluation with multiple iterations can multiply these costs.

This example shows that with mass usage of tools based on Claude Code, costs scale quickly and unpredictably.

Local rendering through FFmpeg does not generate API costs but requires adequately powerful hardware. 4K materials can take tens of minutes on a standard laptop.

Cost Element	Estimated Cost	Notes
Anthropic API (Opus)	15-75 USD (~60-300 PLN) per 1M input tokens	Depends on material length and number of iterations
Anthropic API (Sonnet)	3-15 USD (~12-60 PLN) per 1M input tokens	Cheaper alternative for simpler tasks
Local rendering	0 USD	Requires own hardware
Electricity (4K rendering)	Depends on tariff	From several to over a dozen PLN per session

How does video-use leverage Manim and Remotion animations?

Video-use integrates two animation libraries: Manim for generating mathematical animations and Remotion for creating React-based video sequences. Claude Code generates animation code from a text description and then renders a finished MP4 file. This enables creating technical visualizations without manual coding.

For example, a user can describe “an animation showing the rotation of a cube along the Z axis with vertex labels” — Claude Code will generate a Manim script to accomplish this. Similarly with Remotion: “a slide with a bar chart of sales data” will result in generating a React component with animation.

Moreover, animations can be combined with other pipeline features. An MP4 file generated from Manim can undergo color grading, have subtitles added, and be composited into the main material. This provides a consistent workflow from start to finish.

Manim animations work well for educational and technical content. Remotion is better suited for business presentations and marketing materials with dynamic graphic elements.

Frequently Asked Questions

Does video-use work with models other than Claude?

No, video-use is designed exclusively for Claude Code as the decision-making backend. The project uses the Anthropic API to generate FFmpeg scripts, Manim code, and Remotion components. The video-use repository documentation does not list alternative models.

How long of a video can be processed in a single session?

Material length limits stem from Claude’s context window. Materials over 30 minutes may exceed token limits, especially with the self-evaluation feature enabled and multiple iterations. Documentation recommendation: divide long materials into segments and process them separately.

Does video-use support Polish language transcription?

Transcription in video-use is based on Anthropic speech models, which support multiple languages including Polish. Recognition quality depends on audio clarity — with a clean recording, accuracy is sufficient for subtitle generation. Background noise and fast speech reduce quality.

What are the hardware requirements for video-use?

Local rendering of Manim and Remotion animations additionally requires the dependencies of these libraries. For 4K materials, a multi-core processor and a minimum of 16 GB RAM are recommended.

Summary

Video-use fills a specific niche: automating repetitive editing tasks through Claude Code. The project does not replace a human editor for creative decisions, but eliminates tedious work involved in removing filler words, generating subtitles, and applying color grading.

Key takeaways:

Terminal-based pipeline — no GUI, all editing through text commands and natural language descriptions
Self-evaluation — automatic iterative improvement of results, with a configurable attempt limit
Costs scale with usage — each iteration consumes API tokens; budgets grow quickly with mass adoption
Manim and Remotion animations — generating technical and presentation visualizations from text descriptions
Context limitations — long materials and complex projects exceed model limits

If you regularly create video content with a repetitive pattern — podcasts, educational materials, presentations — check out video-use on GitHub. For one or two videos per month, traditional editing will be simpler. For ten or more, automation will start saving real time. More about configuring Claude Code can be found in Claude Code Overview — Claude Code Docs and Changelog — Claude Code Documentation.