Talksht
Native Swift app + Python CLI, both MIT licensed

You see it. You say it.
Your AI acts on it.

Talkshot lives in your macOS menu bar. Screenshot what you're looking at, cursor circled in red, narrate the issue, and get structured, timestamped Markdown your AI coding agent can execute against immediately. All on-device. Nothing leaves your Mac.

Start with Ctrl+Option+N

How it works

Three steps from bug report to AI fix.

Record your screen and voice in one motion. Talkshot handles the rest.

You
Screenshot + record
Hit the hotkey once. Screen shot taken, cursor circled, mic recording starts, all in one motion.
Talkshot
Transcribe + structure
On-device transcription via Apple's Speech framework. Screenshot, crop, cursor position, transcript, all aligned by timestamp.
You
Paste into your agent
Open the session folder, drag notes.md into Claude Code or Cursor. Your agent sees what you saw, reads what you said.
Ctrl+Opt+N
start
talk
narrate
Ctrl+Opt+N
stop
Ctrl+Opt+E
finish session
⌘V
paste into agent

Why it matters

Feedback without the friction.

Stop retyping what you already see.

Screenshots alone lose intent ("what's wrong with this?"). Voice notes alone lose visual context. Talkshot captures both, cursor position, cropped region, and your spoken note, so your AI agent gets the same context a human coworker would.

Faster than typing out a bug report.

You talk at ~150 words per minute. You type at ~60. A two-minute narration gives your AI agent more precise context than ten minutes of writing. Hit the hotkey, say what's wrong, move on.

Everything stays on your Mac.

Apple's Speech framework runs entirely on-device. No cloud, no API keys, no telemetry. The screenshots never leave your disk. Talkshot is a native macOS app, no Electron, no browser, no network calls.

Under the hood

Built for developers who read source.

  • ScreenCaptureKit (macOS 14+) High-performance screen capture with proper permission handling. No subprocess screencapture hacks, direct framebuffer access.
  • On-device transcription Apple's Speech framework for local, private transcription. Also supports mlx-whisper via the Python CLI for larger model options.
  • Cursor-aware captures Every screenshot draws a red circle at your cursor position. A cropped, zoomed region around the cursor is saved alongside the full screen, your agent knows exactly where you were pointing.
  • Structured Markdown output Each note is a self-contained record: timestamp, cursor position (points + pixels), full screenshot, cropped region, and transcribed text. Ready for any LLM to consume.
  • Menu bar native SwiftUI MenuBarExtra app. No dock icon. No browser tab. Respects macOS dark/light mode automatically.
notes.md
# Session 20260704-142141

## Note 1 (2026-07-04T14:22:03Z)
Cursor at [847, 392] (screen points)

![full](shot_001.png)

![zoom](crop_001.png)

> The login button on mobile is way too small.
> The tap target overlaps with the "Forgot
> password" link, users keep hitting the wrong
> thing. Should have at least 44px height.

## Note 2 (2026-07-04T14:23:15Z)
Cursor at [120, 560] (screen points)

![full](shot_002.png)

![zoom](crop_002.png)

> The nav highlight is stuck on Dashboard
> even though I'm on Settings. Active state
> isn't updating on route change.

Get started

Build from your terminal.

Talkshot is a native Swift app that builds clean with xcodegen. The default build produces a locally-signed app, fine for running on your own Mac. For distribution, a separate release build pipeline handles Developer ID signing and notarization.

Also included: a Python CLI (talkshot.py) that does the same thing without the native menu bar UI. Useful for scripting, or if you want to swap in a different transcription model like mlx-whisper.

Terminal, zsh
~ $ git clone https://github.com/flowsxr/talkshot.git && cd talkshot
~/talkshot $ brew install xcodegen
~/talkshot $ cd native && ./build.sh
# Building Talkshot.app...
# Done → native/dist/Talkshot.app
# Grant Screen Recording +
# Microphone + Accessibility
# when prompted. Then:
~/talkshot $ open native/dist/Talkshot.app

Session output

Structured, timestamped, agent-ready.

Every session is a self-contained folder on your Desktop. Drag any file into your AI coding agent.

Session folder structure
talkshot-session-20260704-142141/
├── shot_001.png      # full screen, cursor circled
├── crop_001.png      # zoomed region around cursor
├── shot_002.png      # next note
├── crop_002.png
├── notes.json        # structured entries
└── notes.md          # paste this into your agent
notes.json Machine-readable. Each entry has a timestamp, cursor position (points + pixels), full screenshot path, cropped region path, and transcribed text. Integrate with anything that reads JSON.
notes.md Human-readable Markdown. Section headings per note, inline cursor coordinates, embedded screenshot references. Built for AI coding agents, paste it directly into Claude Code, Cursor, Windsurf, or Copilot.
shot_N.png / crop_N.png Full-screen capture with a red circle at your cursor, plus a zoomed crop of the area around it. Your agent sees both the big picture and the detail.

Integrations

Works where your agents work.

Claude Code

Paste notes.md straight into a Claude Code session. Claude reads the timestamped screenshots and transcripts as structured context, no more "the thing next to the other thing."

Cursor & Windsurf

Drop the session folder into your project. Cursor and Windsurf read Markdown natively, they see your screenshots inline with your notes.

GitHub Copilot

Copy notes.md content into a Copilot Chat prompt. Screenshot references and cursor positions give Copilot the visual context it needs to suggest accurate fixes.

GitHub Issues & Linear

The structured Markdown works as a bug report out of the box. Paste into GitHub Issues or Linear, screenshots, cursor positions, and transcripts are all there.

Features

Everything you need. Nothing you don't.

Menu bar native

Lives in your menu bar. No dock icon, no browser tab, no Electron. Pure SwiftUI MenuBarExtra, lightweight and always available.

Cursor-aware screenshots

Red circle drawn at your cursor position. Cropped, zoomed region around the cursor saved alongside the full screen. Your agent knows exactly where you pointed.

On-device transcription

Apple's Speech framework runs entirely locally. No cloud, no API keys, no data leaving your Mac. Python CLI also supports mlx-whisper for larger models.

Session management

Multiple notes per session. Each note is a complete record, screenshot, crop, cursor position, transcript. Finish the session and it saves everything, opens the folder, starts fresh.

Global hotkeys

Ctrl+Option+N to take a note, Ctrl+Option+E to finish the session. Works from any app. Also available from the menu bar if you prefer clicking.

MIT licensed

Free as in freedom, free as in beer. No telemetry, no tracking, no analytics. The full source is on GitHub. Use it, fork it, ship it.

FAQ

Common questions.

What is Talkshot?
Talkshot is an open-source macOS tool that lives in your menu bar. Hit a hotkey, it screenshots your main display (cursor circled in red), crops the area around your cursor, starts recording your mic, and transcribes what you say. The result is a structured folder of screenshots, transcripts, and Markdown, built to be pasted straight into AI coding agents like Claude Code, Cursor, or Copilot.
Does it need an internet connection?
No. Everything runs on-device. Screenshots are captured via ScreenCaptureKit. Transcription uses Apple's Speech framework, it runs locally on your Mac's Neural Engine. Nothing uploads to the cloud, no API keys required, no telemetry. The Python CLI optionally supports mlx-whisper if you want a larger transcription model, still fully offline.
How is this different from just taking a screenshot and typing a note?
Screenshots alone lose intent, "what's wrong with this?" Typed notes alone lose visual context, "this thing, right here." Talkshot captures both, aligned by timestamp, with your cursor position baked into every frame. Your AI agent gets the same context a human looking over your shoulder would. Plus, you talk at 150 wpm and type at ~60, it's faster.
What AI coding agents work with Talkshot?
Any agent that can read Markdown and view images. Confirmed to work well with: Claude Code, Cursor, Windsurf, and GitHub Copilot. The structured notes.md output with inline screenshot references is designed to be pasted directly into any LLM chat interface. Also works great as a bug report format for GitHub Issues and Linear.
What are the system requirements?
macOS 14 (Sonoma) or later, Apple Silicon or Intel. To build from source you'll need Xcode 15+ and XcodeGen (brew install xcodegen). The Python CLI works on Python 3.9+ with minimal dependencies.
Is Talkshot free?
Yes. Talkshot is MIT-licensed open source. No paid tiers, no premium features, no accounts. Build it yourself or grab the source from GitHub. If you find it useful, a star on the repo goes a long way.
Does it support multiple displays?
Currently Talkshot captures the main display (the one with the menu bar). Multi-display support, capturing whichever display the cursor is on, is on the roadmap. The cursor position is always recorded in absolute screen coordinates, so your agent still knows exactly where you were pointing regardless of which display you're on.
Can I change the hotkeys?
Yes. In the Swift app, hotkeys are configured in HotkeyService.swift under HotkeyConfig. In the Python CLI, they're at the top of talkshot.py under HOTKEYS. The default is Ctrl+Option+N for taking notes and Ctrl+Option+E for finishing sessions, chosen to avoid conflicts with Xcode and terminal shortcuts.