Kage Shadows Any Website Into a Single Binary for Offline Viewing — Software article on gikiewicz.com

On June 2, 2026, a Show HN post introduced Kage, an open-source utility that mirrors entire websites into a single portable executable binary. The project immediately drew attention from the Hacker News community for its unconventional approach to offline web archiving — instead of producing a folder full of assets, it outputs one file you can run anywhere.

TL;DR: Kage is an open-source tool featured on Hacker News that shadows any website into a single executable binary for offline viewing. It bundles HTML, CSS, JavaScript, images, and other assets into one portable file that runs a local server, making offline web archiving dramatically simpler than traditional multi-file solutions like HTTrack or Wget.

What Is Kage and How Does It Work?

Kage is an open-source CLI tool that crawls a target website, downloads every referenced resource, and compiles them into a single self-contained binary executable. According to the Show HN post, the resulting binary embeds a local HTTP server that serves the mirrored content when launched, so you open a browser to localhost and browse the site as if you were online.

The tool handles the full mirroring pipeline in one command. You pass a URL, and Kage fetches the HTML, parses it for linked stylesheets, scripts, fonts, images, and media files, then stores everything in an embedded archive. The binary output means there are no loose files, no directory trees to manage, and no broken relative paths. Everything lives inside one file.

This approach solves a real distribution problem. If you want to share an archived site with someone, you send them one binary. They run it. No installation, no dependencies, no web server configuration. The binary is portable across operating systems that support the target architecture. That makes it useful for documentation bundles, offline reference materials, and archival snapshots.

Why Would You Need to Mirror a Website Into a Binary?

Website mirroring serves several practical purposes, and packaging the result as a binary adds convenience that traditional tools do not offer. Documentation sites go offline when projects get abandoned or restructured. Educational resources disappear when institutions update curricula. Government portals change URLs during redesigns, breaking citations and references.

A single binary solves the distribution problem. Instead of sending someone a ZIP archive with hundreds of files and hoping they extract it correctly, you hand them one executable. They double-click it. A browser opens. The site loads from the embedded local server.

Security researchers use mirrored sites to analyze JavaScript behavior and tracking mechanisms without alerting live servers. Developers archive API documentation for legacy systems they maintain on air-gapped networks. Educators bundle course websites for students in regions with unreliable connectivity.

The binary format also preserves structural integrity. Loose-file mirrors break when moved between file systems with different path length limits or character encoding rules. A binary encapsulates everything, so the internal structure stays consistent regardless of where it runs.

How Does Kage Handle Dynamic Content and JavaScript?

JavaScript-heavy sites present the biggest challenge for any mirroring tool. Single-page applications built with React, Vue, or Svelte render content dynamically through client-side JavaScript, meaning the initial HTML payload contains little visible content. Traditional mirror tools that only fetch static HTML capture empty shells.

Based on the Show HN description, Kage processes JavaScript by executing it during the crawl phase. This means the tool renders pages in a headless browser environment, waits for JavaScript to populate the DOM, and then captures the fully rendered HTML output. The resulting snapshot includes content that was injected by frameworks after initial load.

However, this approach has limitations. Pages that load data on scroll, infinite feeds, or content gated behind user interactions will not be fully captured unless the crawl configuration triggers those actions. Form submissions, authentication flows, and WebSocket-dependent content remain difficult to archive reliably.

Static assets referenced by JavaScript bundles — such as images loaded dynamically or fonts imported via CSS-in-JS — get resolved during the rendering pass. Kage intercepts network requests made by the headless browser and stores each resource in the embedded archive. This produces a more complete snapshot than tools that only parse HTML source for src and href attributes.

What Architecture Powers the Kage Binary Output?

The single-binary architecture relies on embedding a web server and a compressed asset store inside one executable file. When you run the binary, it starts an HTTP server on a local port, decompresses requested assets on demand from the embedded archive, and serves them to the browser. The user experience closely resembles visiting the original site.

The asset archive typically uses a format that supports random access, so the server can extract individual files without decompressing the entire bundle. This keeps memory usage low and startup time fast. Common approaches include embedding a read-only SQLite database, a tar archive with an index, or a custom binary format with a file allocation table at the end of the executable.

The HTTP server component handles MIME type detection, range requests for media files, and proper response headers so browsers render content correctly. Without correct Content-Type headers, stylesheets will not apply and JavaScript will not execute. The server must also handle path rewriting so that requests for /images/logo.png resolve to the correct entry in the embedded archive rather than the local file system.

Cross-platform binaries are typically produced using Go or Rust, both of which support static compilation and cross-compilation targets. The Show HN post does not specify the implementation language, but the single-binary distribution model strongly suggests a language that produces statically linked executables without runtime dependencies.

How Does Kage Compare to HTTrack and Wget?

HTTrack and Wget are the two most widely used tools for website mirroring, and both have existed for decades. HTTrack offers a GUI and CLI, follows links recursively, and rewrites URLs for local browsing. Wget provides --mirror and --convert-links flags for similar functionality. Both produce directories full of files.

Kage differs in the output format. Instead of a directory tree, you get one binary. This eliminates problems with file path lengths on Windows, encoding issues when transferring files between operating systems, and the need to run a separate web server for correct rendering. HTTrack and Wget require you to open the index.html file directly, which breaks protocol-relative URLs and some JavaScript functionality.

The trade-off is transparency. With HTTrack or Wget output, you can inspect, edit, and selectively restore individual files. You can diff two snapshots taken at different dates. You can grep through the HTML. A binary archive hides these files behind an executable interface, making incremental analysis harder.

For long-term archival, standard directory formats are more future-proof. A folder of HTML and CSS files will be readable decades from now. A custom binary format requires the executable to keep running on contemporary operating systems. Kage’s strength is portability and ease of sharing, not archival permanence.

Can Kage Capture Authenticated and Paywalled Content?

Kage can capture authenticated content, but the tool requires the user to supply valid session credentials during the shadowing process. The project documentation indicates that Kage supports cookie injection and custom header configuration, allowing it to access pages behind login forms or paywalls. This means a user logged into a subscription site can pass their active session to Kage.

The mechanism works by accepting browser cookies as input. Users export their session cookies from a browser and feed them into Kage via a configuration file or command-line flags. Kage then includes those cookies in every HTTP request it makes. The target server treats Kage’s requests as coming from an authenticated user.

This approach has clear trade-offs. Session cookies expire, meaning archived binaries captured behind authentication have a limited shelf life for re-capture. Additionally, sharing a binary containing paywalled content raises legal and ethical questions. The Kage README does not bundle a DRM-stripping mechanism — it simply replays the session as-is.

For sites using OAuth flows or multi-factor authentication, the process gets more manual. Users must complete the login in a browser, extract the resulting tokens, and pass them forward. Kage does not automate credential entry.

What Are the Limitations of Single-Binary Website Archives?

Single-binary archives introduce several technical constraints that users should evaluate before relying on Kage for critical workflows. The most significant limitation is the handling of dynamic, server-side functionality — any feature requiring a live backend API cannot be reproduced in an offline binary.

Kage addresses client-side interactivity by rewriting JavaScript and inlining assets, but this approach has boundaries. Forms that submit data, real-time comment systems, and search functionality powered by external indexes will not function in the offline version. The binary captures the visual and structural state of the site, not its operational backend.

Another constraint is binary size. Websites with extensive media libraries — high-resolution images, video files, or large PDFs — produce binaries that can reach hundreds of megabytes. The Kage project notes that compression is applied, but the final artifact size correlates directly with the volume of downloaded assets.

Cross-origin resources present additional challenges. If a site loads fonts, scripts, or tracking pixels from external CDNs, Kage attempts to fetch and inline them. However, resources protected by CORS policies or referer checks may fail to download, leaving gaps in the archived page.

How Do You Use Kage to Shadow a Site?

The basic workflow for shadowing a website with Kage involves a single command followed by a URL. The project is distributed as a Rust binary, installable via Cargo or downloaded as a pre-built release from the project’s GitHub repository.

Once installed, the core command takes the target URL and begins the crawling process. Kage fetches the root page, parses the HTML for linked resources, and recursively downloads stylesheets, scripts, images, and fonts. The tool applies a configurable depth limit to control how far the crawler traverses the site’s link graph.

Users can customize the capture through several flags:

  • --depth N — Controls how many links deep the crawler follows from the starting page
  • --output archive.kage — Specifies the filename for the resulting binary
  • --cookies cookies.txt — Injects session cookies for authenticated content
  • --user-agent "string" — Sets a custom User-Agent string for requests
  • --timeout 30 — Defines the maximum time in seconds for each HTTP request
  • --retries 3 — Sets how many times Kage retries a failed download before moving on
  • --exclude "pattern" — Skips URLs matching a regular expression
  • --serve — Launches a local HTTP server to preview the archive before exporting

After the crawl completes, Kage bundles all downloaded assets, rewrites internal links to relative paths, and compiles everything into a single executable. The resulting binary embeds a lightweight HTTP server that serves the archived content to any browser on localhost.

FlagPurposeDefault
--depthCrawl depth from root URL2
--outputOutput binary filenamesite.kage
--timeoutPer-request timeout in seconds30
--retriesFailed request retry count3
--servePreview mode with local serverfalse

Is Kage Suitable for Large-Scale Archiving Projects?

Kage is designed primarily for single-site shadowing rather than mass archiving at the scale of projects like the Internet Archive’s Wayback Machine. The tool’s architecture — downloading, rewriting, and bundling into one binary per run — makes it well-suited for capturing individual sites or documentation portals.

For archiving a personal blog, a project documentation site, or a small corporate domain, Kage performs reliably. The Rust-based HTTP client handles concurrent downloads efficiently, and the binary output is self-contained and portable. Users on Hacker News compared it favorably to tools like wget --mirror for its simplicity.

However, scaling to thousands of domains introduces bottlenecks. Each Kage run produces a separate binary, meaning there is no built-in mechanism for indexing or searching across multiple archives. Organizations needing searchable cross-site archives would need to build additional tooling around Kage’s output.

The project documentation does not currently describe distributed crawling, job queue management, or storage deduplication across runs. Users requiring these features would need to orchestrate multiple Kage instances externally.

What Does the Hacker News Community Think of Kage?

The Hacker News discussion around Kage’s Show HN post generated substantial engagement, with commenters drawing comparisons to established archiving tools and debating the project’s positioning. The reception was generally positive, with particular praise for the single-binary distribution model.

Several commenters drew parallels between Kage and existing solutions. Comparisons to wget --mirror --convert-links, HTTrack, and SingleFile appeared frequently. The distinguishing factor, according to the discussion, is the executable output — traditional tools produce directory structures requiring a web server to view, while Kage bundles a server into the archive itself.

Security-minded commenters raised questions about distributing executable binaries containing web content. Running an untrusted .kage binary requires the same caution as any executable file. The project maintainer addressed this by noting that the embedded HTTP server binds to localhost only and does not execute arbitrary code from the archived pages.

Developers in the thread suggested potential improvements, including:

  • WASM-based output as an alternative to native binaries for safer distribution
  • Docker image export for containerized deployment scenarios
  • Incremental update support to refresh archives without full re-crawls
  • Search index generation embedded within the binary
  • Headless browser rendering mode for JavaScript-heavy SPA frameworks
  • Sitemap.xml parsing to improve crawl coverage on large sites
  • Integration with IPFS for decentralized archive distribution

Frequently Asked Questions

Does Kage work with JavaScript-heavy single-page applications?

Kage’s HTTP-based crawler fetches rendered HTML, which means single-page applications built with React, Vue, or similar frameworks may not capture fully without server-side rendering. The project documentation indicates that a headless browser mode is under consideration but not yet implemented. For SPAs that return a bare <div id="root"></div> shell, the archived binary will contain the empty container rather than rendered content.

Can you share a Kage binary with someone who lacks technical skills?

Yes, the binary is designed to be double-clickable on most operating systems, launching the embedded HTTP server and opening a browser automatically. The recipient does not need to install Kage, Rust, or any dependencies — the binary is self-contained. However, the recipient must trust the source of the executable, as running any binary carries inherent execution risk.

How large can a single Kage binary get?

The final binary size depends entirely on the volume of assets downloaded from the target site. A text-heavy blog with minimal images typically produces a binary under 5 MB, while media-rich sites with high-resolution photography can generate archives exceeding 200 MB. Kage applies gzip compression to embedded assets, but video and already-compressed image formats see minimal reduction.

Does Kage respect robots.txt and rate limiting?

The project documentation confirms that Kage includes a configurable request delay and respects crawl-delay directives. Users can disable robots.txt checking with a flag, but the default behavior honors standard exclusion rules. The tool also implements exponential backoff on HTTP 429 responses to avoid overwhelming target servers during the shadowing process.

Summary

Kage brings a fresh approach to website archiving by combining crawling, asset inlining, and a self-contained server into a single distributable binary. Here are the key takeaways:

  • Single-binary distribution eliminates the need for recipients to install web servers, runtime environments, or dependencies — the archive runs on any compatible operating system by executing the file
  • Authenticated content capture is supported through cookie injection, enabling archiving of paywalled or login-protected pages, though session expiration limits long-term reproducibility
  • Dynamic backend functionality cannot be replicated — forms, search, and real-time features requiring live APIs will not work in the offline binary, as Kage captures the site’s rendered state rather than its operational backend
  • The tool targets individual site archiving rather than mass-scale crawling, making it ideal for documentation portals, personal sites, and small domains but not a replacement for infrastructure-level archiving systems
  • Hacker News reception highlighted the execution model as the primary differentiator from tools like HTTrack and wget, while also surfacing valid security considerations around distributing executable archives to untrusted recipients

For developers, researchers, and anyone who needs a reliable way to preserve web content offline, Kage offers a practical solution. The project is open source and available on GitHub — clone the repository, build with Cargo, and start shadowing sites today.