Chrome DevTools MCP

(developer.chrome.com)

259 points | by xnx 4 hours ago

34 comments

dataviz1000 2 hours ago
I use Playwright to intercept all requests and responses and have Claude Code navigate to a website like YouTube and click and interact with all the elements and inputs while recording all the requests and responses associated with each interaction. Then it creates a detailed strongly typed API to interact with any website using the underlying API.
Yes, I know it likely breaks everybody's terms of service but at the same time I'm not loading gigabytes of ads, images, markup, to accomplish things.
If anyone is interested I can take some time and publish it this week.
[-]
- bredren 54 minutes ago
  I also do this. My primary use case is for reproducing page layout and styling at any given tree in the dom. So, capturing various states of a component etc.
  I also use it to automatically retrieve page responsiveness behavior in complex web apps. It uses playwright to adjust the width and monitor entire trees for exact changes which it writes structured data that includes the complete cascade of styles relevant with screenshots to support the snapshots.
  There are tools you can buy that let you do this kind of inspection manually, but they are designed for humans. So, lots of clickety-clackety and human speed results.
  ---
  My first reaction to seeing this FP was why are people still releasing MCPs? So far I've managed to completely avoid that hype loop and went straight to building custom CLIs even before skills were a thing.
  I think people are still not realizing the power and efficiency of direct access to things you want and skills to guide the AI in using the access effectively.
  Maybe I'm missing something in this particular use case?
- halJordan 21 minutes ago
  I love how HN is loving this idea when it's the exact same thing Anthropic and OpenAi (and every other llm maker) did.
  It's God's gift to them when it lets them bypass ads and dl copyrighted material. But it's Satan's curse on humanity when the Zuck does it to train his llm and dl copyrighted material.
- Axsuul 2 hours ago
  Why even use Playwright for this? I feel like Claude just needs agent-browser and it can generate deterministic code from it.
  [-]
  - dsrtslnd23 1 hour ago
    you mean this one? https://github.com/vercel-labs/agent-browser
    [-]
    - dataviz1000 1 hour ago
      It is 2 months old!
      My excuse for not keeping up is that I'm in so deep that Claude Code can predict the stock market.
      I'll still publish mine and see if has any value but agent browser looks very complete.
      Thank you for sharing!
      [-]
      - bartek_gdn 50 minutes ago
        Yes please, maybe there will be some solution that will fit the problem better! I recently released something similar, and because of the small API, I'm more comfortable using it.
        https://news.ycombinator.com/item?id=47207790
- miohtama 19 minutes ago
  I just ask Claude to reverse engineer the site with Chrome MCP. It goes to work by itself, uses your Chrome logged in session cookies, etc.
- Johnny_Bonk 1 hour ago
  Yes, please do and ping me when it's done lol. Did you make it into an agent skill?
  [-]
  - dataviz1000 1 hour ago
    Exactly, it is an agent skill that interacts pressing buttons and stuff with a webpage capturing and documenting all the API requests the page makes using Playwright's request / response interception methods. It creates and strongly typed well documented API at the end.
    [-]
    - bengt 1 hour ago
      Sounds awesome. I've been using mitmproxy's --mode local to intercept with a separate skill to read flow files dumped from it, but interactive is even better.
- defen 2 hours ago
  Would this hypothetically be able to download arbitrary videos from youtube without the constant yt-dlp arms race?
  [-]
  - dawnerd 2 hours ago
    Don’t know how this could be more stable than ytdlp. When issues come up they’re fixed really quickly.
    [-]
    - varenc 1 hour ago
      yt-dlp was very recently broken for ~2 days for any Youtube videos that required cookies: https://github.com/yt-dlp/yt-dlp/issues/16212
      Here is what actually fixed it: https://github.com/yt-dlp/ejs/pull/53/changes
      yt-dlp is relatively stable, but still occasionally breaks for long periods. I get the sense YouTube is becoming increasingly adversarial to yt-dlp as well.
      I don't know the details, but it doesn't seem like yt-dlp is running the entire YouTube JS+DOM environment. Something like a real headless browser seems like it would break less often, but be much heavier weight. And Youtube might have all sorts of other mitigations against this approach.
      [-]
      - toomuchtodo 1 hour ago
        I think having a hook to an LLM endpoint to enable yt-dlp to attempt to self resolve until an official fix is available would be a useful enhancement.
  - dataviz1000 2 hours ago
    > yt-dlp arms race
    I don't know anything about yt-dlp.
    It would probably help people who want to go to a concert and have a chance to beat the scalpers cornering the market on an event in 30 seconds hitting the marketplace services with 20,000 requests.
    I can try to see if can bypass yt-dlp. But that is always a cat and mouse game.
    [-]
    - defen 2 hours ago
      To clarify - yt-dlp is a command line tool for downloading youtube videos, but it's in a constant arms race with the youtube website because they are constantly changing things in a way that blocks yt-dlp.
- heystefan 3 minutes ago
  Commenting to follow up.
- mikrl 1 hour ago
  I was doing similar by capturing XHR requests while clicking through manually, then asking codex to reverse engineer the API from the export.
  Never tried that level of autonomy though. How long is your iteration cycle?
  If I had to guess, mine was maybe 10-20 minutes over a few prompts.
- schainks 1 hour ago
  Very interested. Would even pay for an api for this. I am doing something similar with vibium and need something more token efficient.
- liamdgray 40 minutes ago
  Please do!
- xrd 2 hours ago
  Yes, please do!
  [-]
  - dataviz1000 2 hours ago
    100% I'll response to this by Friday with link to Github.
    I use Patchright + Ghostery and I have a cleaver tool that uses web sockets to pass 1 second interval screenshots to the a dashboard and pointer / keyboard events to the server which allow interacting with websites so that a user can create authentication that is stored in the chrome user profile with all the cookies, history, local storage, ect.. in the cloud on a server.
    Can you list some websites that don't require subscription that you would like to me to test against? I used this for Robinhood and I think Linked in would be a good example for people to use.
    [-]
    - zzleeper 1 hour ago
      Another +1, it would be incredibly useful to play with this approach! (and fun)
- toomuchtodo 1 hour ago
  Please publish!
- retinaros 1 hour ago
  isnt it what everyone that needs web validation does?
- lizhang 1 hour ago
  [dead]
RALaBarge 2 minutes ago
I made a websocket proxy + chrome extension to give control of the DOM to agents for my middleware app: https://github.com/RALaBarge/browserbox
The thing I am working on is improving at the moment agentic tool usage success rates for my research and I use this as a proxy to access everything with the cookies I allow in the session.
mmaunder 2 hours ago
Google is so far behind agentic cli coding. Gemini CLI is awful. So bad in fact that it’s clear none of their team use it. Also MCP is very obviously dead, as any of us doing heavy agentic coding know. Why permanently sacrifice that chunk of your context window when you can just use CLI tools which are also faster and more flexible and many are already trained in. Playwright with headless Chromium or headed chrome is what anyone serious is using and we get all the dev and inspection tools already. And it works perfectly. This only has appeal to those starting out and confused into thinking this is the way. The answer is almost never MCP.
[-]
- zeroxfe 1 hour ago
  > Also MCP is very obviously dead, as any of us doing heavy agentic coding know.
  As someone that does heavy agentic coding (using basically all the tools), this is so far from the truth. People claiming this have probably never worked in large enterprise environments where things like authentication, RBAC, rate limiting, abuse detection, centralized management/updates/ops, etc. are a huge part of the development and deployment workflow.
  In these situations you can't just use skills and cli tools without a gigantic amount of retooling and increased operational and security complexity. MCP is really useful here, and allows centralized eng and ops teams to manage their services in a way that aligns with the organizations overall posture, policies, and infrastructure.
  > Google is so far behind agentic cli coding. Gemini CLI is awful.
  This part I totally agree. It's really hard to express how bad it is (and it's really disappointing.)
  [-]
  - moritonal 8 minutes ago
    Given MCP is supposed to just be a standardised format for self-describing APIs, why are all the features you listed MCP related things? It sounds more like it's forced the enterprise to build such features which cli tooling didn't have?
    [-]
    - rsalus 0 minutes ago
      mostly by virtue of being a common standard. MCP servers are primarily useful in a remote environment, where centralized management of cross-cutting concerns matters. also its really useful for integrating existing distributed services, e.g., internal data lakes.
      I think it's clear a self-describing CLI is optimal for local-first tooling and portability. I personally view remote MCP servers as complementary in the space.
- rsalus 2 hours ago
  MCP is very much not dead. centralized remote MCP servers are incredibly useful. also bespoke CLIs still require guidance for models to use effectively, so it's clear that token efficiency is still an issue regardless.
  [-]
  - abhis3798 1 hour ago
    I see remote MCP servers as a great interface to consume api responses. The idea that you essentially make your apis easily available to agents to bring in relevant context is a powerful one.
    When folks say MCP is dead, I don't get it. What other alternatives exist in place of MCP? Arbitrary code via curl/sdks to call a remote endpoint?
    [-]
    - attentive 45 minutes ago
      > What other alternatives exist in place of MCP? Arbitrary code via curl/sdks to call a remote endpoint?
      cli?
      for example aws cli. It's a full interface to aws API. Why would you need mcp for that?
      and if you have any doubts, agents use it with a great effect even without any relevant skill. "aws help" is fully discoverable.
      [-]
      - rsalus 29 minutes ago
        yes, but clis thus need self-service commands to provide guidance, and their responses need to be optimized for consumption by agents. in a sense, this is the same sort of context tax that MCP servers incur. so in my view cli and MCP are complementary tools; one is not strictly superior over the other.
  - Torn 1 hour ago
    Tbh I find self-documenting CLIs (e.g. with a `--help` flag, and printing correct usage examples when LLMs make things up) plus a skill that's auto invoked to be pretty reliable. CLIs can do OAuth dances too just fine.
    MCP's remaining moats I think are:
    - No-install product integrations (just paste in mcp config into app)
    - Non-developer end users / no shell needed (no terminal)
    - Multi-tenant auth (many users, dynamic OAuth)
    - Security sandboxing (restrict what agents can do), credential sandboxing (agents never see secrets)
    - Compliance/audit (structured logs, schema enforcement)?
    If you're a developer building for developers though, CLI seems to be a clear winner right
    [-]
    - quotemstr 57 minutes ago
      Imagine if, in addition to local MCP "servers", the MCP people had nurtured a structured CLI-based --help-equivalent consumable by LLMs and shell completion engines alike. Doing so, you unify "CLI" (trivial deployment; human accessibility) and MCP-style (structured and discoverable tool calling) in a single DWIM artifact.
      But since when has this industry done the right thing informed by wisdom and hindsight?
      [-]
      - rsalus 25 minutes ago
        that's a pretty interesting idea. It would be nice if there was such a standard. the approach I'm taking right now: a CLI that accepts structured JSON as input, with an 'mcp' subcommand that starts a stdio server. I bundle a 'help' command with a 'describe' action for self-service guidance scoped to a particular feature/tool.
  - mattnewton 1 hour ago
    I think cli’s are more token efficient- the help menu is loaded only when needed, and the output is trivially pipe able to grep or jq to filter out what the model actually wants
  - nojito 2 hours ago
    all you need is a simple skills.md and maybe a couple examples and codex picks up my custom toolkit and uses it.
    [-]
    - dominotw 1 hour ago
      whats your custom toolkit
- danpalmer 42 minutes ago
  > So bad in fact that it’s clear none of their team use it.
  I use it extensively, many of my colleagues do. I get a ton of value out of it. Some prefer Antigravity, but I prefer Gemini CLI. I get fairly long trajectories out of it, and some of my colleagues are getting day-long trajectories out of it. It has improved massively since I started using it when it first came out.
- cheema33 1 hour ago
  > Also MCP is very obviously dead
  Some people will push back on this. They are holding out hope that the recent improvements Anthropic has made in this regard have improved the context rot problem with MCP. Anthropic's changes improve things a little. But it is akin to putting lipstick on a pig. It helps, but not much.
  The reason MCP is dying/dead is because MCP servers, once configured, bloat up context even when they are not being used. Why would anybody want that?
  Use agent skills. And say goodbye to MCP. We need to move on from MCP.
  [-]
  - Rapzid 20 minutes ago
    The bloat problem is already out dated though. People are having the LLM pick the MCP servers it needs for a particular task up front, or picking them out-of-band, so the full list doesn't exist in the context every call.
  - dominotw 1 hour ago
    i am using notion mcp. is there a corresponding skill. also wtf is a plugin.
- sega_sai 1 hour ago
  I don't know if this just anecdotal random impression, but in a last week or two I had mostly good experience with Google cli. While previously I constantly complained about it. I have been using it together with codex, and I would not say that one is much better than another.
  It is hard to say nowadays, when things change so quickly
- girvo 2 hours ago
  I know it’s a bit of a tangent but man you’re right re. Gemini CLI. It’s woefully bad, barely works. Maybe because I was a “free” user trying it out at the time, but it was such a bad experience it turned me off subscribing to whatever their coding plan is called today.
  [-]
  - ElCapitanMarkla 39 minutes ago
    I had this exp too, but I trialed the pro sub a few weeks back and it has been great. I have no complaints this time
  - luckydata 2 hours ago
    it's not the CLI, it's the model. The model wasn't trained to do that kind of work, was trained to do one shot coding, not sustained back and forth until it gets it right like Claude and ChatGPT.
- quotemstr 1 hour ago
  > Why permanently sacrifice that chunk of your context window when you can just use CLI tools which are also faster and more flexible and many are already trained in
  What about all the CLI tools not baked into the model's priors?
  Every time someone says "extensibility mechanism X is dead!", I think "Well, I guess that guy isn't doing anything that needs to extend the statistical average of 2010s-era Reddit"
- spiderfarmer 2 hours ago
  MCP is not just used for coding.
aadishv 4 hours ago
Someone already made a great agent skill for this, which I'm using daily, and it's been very cool!
https://github.com/pasky/chrome-cdp-skill
For example, I use codex to manage a local music library, and it was able to use the skill to open a YT Music tab in my browser, search for each album, and get the URL to pass to yt-dlp.
Do note that it only works for Chrome browsers rn, so you have to edit the script to point to a different Chromium browser's binary (e.g. I use Helium) but it's simple enough
[-]
- Etheryte 4 hours ago
  On one hand, cool demo, on the other, this is horrifying in more ways than I can begin to describe. You're literally one prompt injection away from someone having unlimited access to all of your everything.
  [-]
  - mh- 3 hours ago
    Not the person you're replying to, but: I just use a separate, dedicated Chrome profile that isn't logged into anything except what I'm working on. Then I keep the persistence, but without commingling in a way that dramatically increases the risk.
    edit: upon rereading, I now realize the (different) prompt injection risk you were calling out re: the handoff to yt-dlp. Separate profiles won't save you from that, though there are other approaches.
    [-]
    - bartek_gdn 57 minutes ago
      That's also my approach, built quickly a cli for this with lightweight session management
      https://news.ycombinator.com/item?id=47207790
    - sofixa 2 hours ago
      Even without the bash escape risk (which can be mitigated with the various ways of only allowing yt-dlp to be executed), YT Music is a paid service gated behind a Google account, with associated payment method. Even just stealing the auth cookie is pretty serious in terms of damage it could do.
      [-]
      - mh- 2 hours ago
        Agreed. I wouldn't cut loose an agent that's at risk of prompt injection w/ unscoped access to my primary Google account.
        But if I understood the original commenter's use case, they're just searching YT Music to get the URL to a given song. This appears[0] to work fine without being logged in. So you could parameterize or wrap the call to yt-dlp and only have your cookie jar usable there.
        [0]: https://music.youtube.com/search?q=sandstorm
        [1]: https://music.youtube.com/watch?v=XjvkxXblpz8
        [-]
        sofixa 2 hours ago
        Oh, that's true, even allows you to play without an account. I can swear that at some point it flat out refused any use unless you're logged in with an account that has YT Music (I remember having to go to regular YouTube to get the same song to send it to someone who didn't have it).
  - sheepscreek 3 hours ago
    As long as it’s gated and not turned on by default, it’s all good. They could also add a warning/sanity check similar to “allow pasting” in the console.
    [-]
    - hrmtst93837 2 hours ago
      Relying on warnings or opt-ins for something with this blast radius is security theater more than protection. The cleverest malware barely waits for you to click OK before making itself at home, so that checkbox is a speed bump on a highway.
      Chrome's 'allow pasting' gets ignored reflexively by most users anyway. If this agent can touch DevTools the attack surface expands far faster than most people realize or will ever audit.
  - aadishv 4 hours ago
    Of course I still watch it and have my finger on the escape key at all times :)
    [-]
    - glenpierce 3 hours ago
      I am in awe of the confidence you have in your reflexes.
      [-]
      - aadishv 2 hours ago
        You get used to it :) And especially once you get used to the YOLO lifestyle, you end up realizing that practically any form of security is entirely worthless when you're dealing with a 200 IQ brainwashed robot hacker.
        I think using the Pi coding agent really got me used to this way of thinking: https://mariozechner.at/posts/2025-11-30-pi-coding-agent/#to...
    - bergheim 3 hours ago
      For now you are. All these things fall with time, of course. You will stop caring once you start feeling safe, we all do.
      Also. AAarrgh, my new thing to be annoyed at is AI drivel written slop.
      "No browser automation framework, no separate browser instance, no re-login."
      Oh really, nice. No separate computer either? No separate power station, no house, no star wars? No something else we didn't ask for? Just one a toggle and you go? Whoaaaaaa.
      Edit: lol even the skill itself is vibe coded:
      Lightweight Chrome DevTools Protocol CLI. Connects directly via WebSocket — no Puppeteer, works with 100+ tabs, instant connection.
      I feel like there's nothing fucking left on the internet anymore that is not some mean of whatever the LLM is trained to talk like now.
      [-]
      - tacitusarc 3 hours ago
        What can you do? I mentioned the use of AI on another thread, asking essentially the same question. The comment was flagged, presumably as off topic. Fair enough, I guess. But about 80% (maybe more) of posted blogs etc that I see on HN now have very obvious signs of AI. Comments do too. I hate it. If I want to see what Claude thinks I can ask it.
        HN is becoming close to unusable, and this isn’t like the previous times where people say it’s like reddit or something. It is inundated with bot spam, it just happens the bot spam is sufficiently engaging and well-written that it is really hard to address.
        [-]
        bergheim 2 hours ago
        I hear you and I agree. I don't know. Gated communities?
- paulirish 2 hours ago
  To be clear, this isn't a skill for the devtools mcp, but an independent project. It doesn't look bad, but obviously browser automation + agents is a very busy space with lots of parallel efforts.
  DevTools MCP and its new CLI are maintained by the team behind Chrome DevTools & Puppeteer and it certainly has a more comprehensive feature set. I'd expect it to be more reliable, but.. hey open source competition breeds innovation and I love that. :)
  (I used to work on the DevTools team. And I still do, too)
- xmorse 1 hour ago
  Does anyone really use these hacked up with duct tape skills? why not use something more reliable like playwriter.dev?
paulirish 2 hours ago
The DevTools MCP project just recently landed a standalone CLI: https://github.com/ChromeDevTools/chrome-devtools-mcp/blob/m...
Great news to all of us keenly aware of MCP's wild token costs. ;)
The CLI hasn't been announced yet (sorry guys!), but it is shipping in the latest v0.20.0 release. (Disclaimer: I used to work on the DevTools team. And I still do, too)
[-]
- commanderkeen08 2 hours ago
  MCPs cost nothing in CC now with Tool Search.
  [-]
  - cheema33 1 hour ago
    > MCPs cost nothing in CC now with Tool Search.
    This is incorrect. Plenty of people have run the numbers. Tool search does not fix all problems with MCP.
    [-]
    - ehsanu1 1 hour ago
      What are the numbers? Are there problems other than context usage you refer to?
  - wahnfrieden 1 hour ago
    Codex also has this…
paseante 1 hour ago
The real problem this thread exposes is that we're duct-taping browser automation (Playwright, CDP, MCP wrappers) onto an interface designed for humans — the DOM. Every approach discussed here is fighting the same battle: too many tokens to represent page state, flaky selectors, hallucinated DOM structures, massive context cost.
What we actually need is a standard for websites to expose a machine-readable interaction layer alongside the human one. Something like robots.txt but for agent capabilities — declaring available actions, their parameters, authentication requirements, and response schemas. Not scraping the DOM and hoping the AI figures out which button to click.
The web already went through this evolution once: we went from screen-scraping HTML to structured APIs. Now we're regressing back to scraping because agents need to interact with sites that only have human interfaces. A lightweight standard — call it agents.json or whatever — where sites declare "here are the actions you can take, here are the endpoints, here's the auth flow" would eliminate 90% of the token waste, security concerns, and fragility discussed in this thread.
Until that exists, we'll keep building increasingly clever hacks on top of a 30-year-old document format that was never designed for machine consumption.
[-]
- raincole 0 minutes ago
  The ultimate conflict of interest here is that the sites people want to crawl the most are the ones that want to be crawled the least (e.g. Youtube). So people will end up emulating genuine human users one way or another.
- Lucasoato 3 minutes ago
  They’re trying to solve it by making it easier to get Markdown versions of websites.
  For example, you can get a markdown out of most OpenAI documentation by appending .md like this: https://developers.openai.com/api/docs/libraries.md
  Not definitive, but still useful.
- maxaw 13 minutes ago
  Fully agree. Will take some time though as immediate incentive not clear for consumer facing companies to do extra work to help ppl bypass website layer. But I think consumers will begin to demand it, once they experience it through their agent. Eg pizza company A exposes an api alongside website and pizza company B doesn’t, and consumer notices their agent is 10x+ faster interacting with company A and begins to question why.
- codybontecou 1 hour ago
  Is this just a well-documented API?
- ElectricalUnion 56 minutes ago
  > interface designed for humans — the DOM.
  Citation needed.
  > The web already went through this evolution once: we went from screen-scraping HTML to structured APIs. Now we're regressing back to scraping because agents need to interact with sites that only have human interfaces.
  To me, sites that "only have human interfaces" are more likely that not be that way totally on purpose, attempting to maximize human retention/engagement and are more likely to require strict anti-bot measures like Proof-of-Work to be usable at all.
- imiric 26 minutes ago
  > What we actually need is a standard for websites to expose a machine-readable interaction layer alongside the human one.
  We had this 20 years ago with the Semantic Web movement, XHTML, and microformats. Sadly, it didn't pan out for various reasons, most of them non-technical. There's remnants of it today with RSS feeds, which is either unsupported or badly supported by most web sites.
  Once advertising became the dominant business model on the web, it wasn't in publishers' interest to provide a machine-readable format of their content. Adtech corporations took control of the web, and here we are. Nowadays even API access is tightly controlled (see Reddit, Twitter, etc.).
  So your idea will never pan out in practice. We'll have to continue to rely on hacks and scraping will continue to be a gray area. These new tools make automated scraping easier, for better or worse, but publishers will find new ways to mitigate it. And so it goes.
  Besides, if these new tools are "superintelligent", surely they're able to navigate a web site. Captchas are broken and bot detection algorithms (or "AI" themselves) are unreliable. So I'd say the leverage is on the consumer side, for now.
- quotemstr 1 hour ago
  > expose a machine-readable interaction layer alongside the human one
  Which is called ARIA and has been a thing forever.
boomskats 3 hours ago
Been using this one for a while, mostly with codex on opencode. It's more reliable and token efficient than other devtools protocol MCPs i've tried.
Favourite unexpected use case for me was telling gemini to use it as a SVG editing repl, where it was able to produce some fantastic looking custom icons for me after 3-4 generate/refresh/screenshot iterations.
Also works very nicely with electron apps, both reverse engineering and extending.
zxspectrumk48 3 hours ago
I found this one working amazingly well (same idea - connect to existing session): https://github.com/remorses/playwriter
tonyhschu 2 hours ago
Very cool. I do something like this but with Playwright. It used to be a real token hog though, and got expensive fast. So much so that I built a wrapper to dump results to disk first then let the agent query instead. https://uisnap.dev/
Will check this out to see if they’ve solved the token burn problem.
[-]
- mambodog 1 hour ago
  my workaround for this was to make a wrapper mcp server which uses claude haiku to summarize the page snapshot returned in the response of each playwright mcp call, and that has worked pretty well for me: https://github.com/jsdf/playwright-slim-mcp
- Torn 1 hour ago
  Mostly, yes: https://github.com/microsoft/playwright-cli
netdur 1 hour ago
I wrote an ai agent that do chrome testing, yes, chrome MCP do work https://github.com/netdur/hugind/tree/main/agent/chrome_test...
cheema33 1 hour ago
How does this compare with playwright CLI?
https://github.com/microsoft/playwright-cli
[-]
- Torn 1 hour ago
  I personally found playwright-cli, and agent-browser which wraps playwright, both more token-efficient than using the raw mcp.
  Odd that this article from Dec 2025 has been posted to the top of HN though
- EGreg 1 hour ago
  It’s made by Google and comes with Chrome
bartek_gdn 52 minutes ago
My approach is a thin cli wrapper instead.
https://news.ycombinator.com/item?id=47207790
anesxvito 1 hour ago
Been using MCP tooling heavily for a few months and browser debugging integration is one of those things that sounds gimmicky until you actually try it. The real question is whether it handles flaky async state reliably or just hallucinates what it thinks the DOM looks like?
rossvc 2 hours ago
I've been using the DevTools MCP for months now, but it's extremely token heavy. Is there an alternative that provides the same amount of detail when it comes to reading back network requests?
[-]
- nerdsniper 2 hours ago
  It's probably not fully optimized and could be compacted more with just some effort, and further with clever techniques, but browser state/session data will always use up a ton of tokens because it's a ton of data. There's not really a way around that. AI's have a surprising "intuition" about problems that often help them guess at solutions based on insufficient information (and they guess correctly more often than I expect they should). But when their intuition isn't enough and you need to feed them the real logs/data...it's always gonna use a bunch of tokens.
  This is one place where human intuition helps a ton today. If you can find the most relevant snippets and give the AI just the right context, it does a much better job.
- Torn 1 hour ago
  https://github.com/microsoft/playwright-cli and https://agent-browser.dev/
- DimitriBouriez 1 hour ago
  i'm experimenting with a different approach (no CDP/ARIA trees, just Chrome extension messaging that returns a numbered list of interactive elements). Way lighter on tokens and undetectable but still very experimental : https://github.com/DimitriBouriez/navagent-mcp
- mmaunder 2 hours ago
  Yes. CLI. Always CLI. Never MCP. Ever. You’re welcome.
  [-]
  - nerdsniper 1 hour ago
    That doesn't solve the issue here because the amount of data in the browser state dwarfs the MCP overhead.
    [-]
    - bartek_gdn 42 minutes ago
      Can't we just iteratively inspect the network traces then? We don't need to consume the whole 2mb of data, maybe just dump the network trace and use jq to get the fields to keep the context minimal. I haven't added this in https://news.ycombinator.com/item?id=47207790 , but I feel it would be a good addition. Then prompt it with instructions to gradually discover the necessary data.
      But then I wonder, where the balance is between a bunch of small tool calls, vs one larger one.
      I recall some recent discussion here on hn on big data analysis
    - cheema33 1 hour ago
      > That doesn't solve the issue here because the amount of data in the browser state dwarfs the MCP overhead.
      The problem with MCP is that you are paying the price in token usage, even if you are not using the MCP server. Why would anybody want that?
      And no, the tool search function recently introduced by Anthropic does not completely solve this problem.
senand 2 hours ago
I suggest to use https://github.com/simonw/rodney instead
[-]
- meowface 2 hours ago
  Unfortunately there are like a billion competitors to this right now (including Playwright MCP, Playwright CLI, the new baked-in Playwright feature in Codex /experimental, Claude Code for Chrome...) and I can never quite decide if or when I should try to switch. I'm still just using the ordinary Playwright MCP server in both Codex and Claude Code, for the time being.
  [-]
  - bartek_gdn 37 minutes ago
    I would use whatever you are comfortable with, I wanted a similar tool so I coded my own. Smaller API so that understand what is going on and it is easy not to get lost
    https://news.ycombinator.com/item?id=47207790
raw_anon_1111 3 hours ago
I don’t do any serious web development and haven’t for 25 years aside from recently vibe coding internal web admin portals for back end cloud + app dev projects. But I did recently have to implement a web crawler for a customer’s site for a RAG project using Chromium + Playwrite in a Docker container deployed to Lambda.
I ran the Docker container locally for testing. Could a web developer test using Claude + Chromium in a Docker container without using their real Chrome instance?
[-]
- vesselapi 8 minutes ago
  Yes, running Chromium in a Docker container works well for this. There are prebuilt images like https://hub.docker.com/r/browserless/chrome that give you a headless instance you can connect to via CDP (Playwright, Puppeteer). Keeps everything isolated from your actual browser profile and credentials.
- bartek_gdn 41 minutes ago
  Take a look at https://news.ycombinator.com/item?id=47207790
NiekvdMaas 3 hours ago
Also works nicely together with agent-browser (https://github.com/vercel-labs/agent-browser) using --auto-connect
silverwind 2 hours ago
I found Firefox with https://github.com/padenot/firefox-devtools-mcp to work better then the default Chrome MCP, is seems much faster.
speedgoose 3 hours ago
Interesting. MCP APIs can be useful for humans too.
Chrome's dev tools already had an API [1], but perhaps the new MCP one is more user friendly, as one main requirement of MCP APIs is to be understood and used correctly by current gen AI agents.
[1]: https://chromedevtools.github.io/devtools-protocol/
teaearlgraycold 31 minutes ago
I love how in their demo video where they center an element it ends up off-center.
glerk 2 hours ago
Note that this is a mega token guzzler in case you’re paying for your own tokens!
oldeucryptoboi 2 hours ago
I tell Claude to use playwright so I don't even need to do the setup myself.
[-]
- nomilk 2 hours ago
  Similarly, cursor has a built in browser and visit localhost to see the results in the browser. Although I don't use it much (I probably should).
pritesh1908 2 hours ago
I have been using Playwright for a fairly long time now. Do checkout
holoduke 51 minutes ago
One tip for the illegal scrapers or automators out there. Casperjs and phanthomjs are still working very well for anti bot detection. These are very old libs no longer maintained. But I can even scrape and authenticate at my banks.
slrainka 2 hours ago
chrome-cli with remote developer port has been working fine this entire time.
JKolios 2 hours ago
Now that there's widespread direct connectivity between agents and browser sessions, are CAPTCHAs even relevant anymore?
Yokohiii 3 hours ago
Was already eye rolling about the headline. Then I realized it's from chrome.
Hoping from some good stories from open claw users that permanently run debug sessions.
robutsume 1 hour ago
[dead]
aplomb1026 1 hour ago
[dead]
ptak_dev 2 hours ago
[dead]
myrak 3 hours ago
[dead]
AlexDunit 3 hours ago
[flagged]
[-]
- David-Brug-Ai 3 hours ago
  This is the exact problem that pushed me to build a security proxy for MCP tool calls. The permission model in most MCP setups is basically binary, either the agent can use the tool or it can't. There's nothing watching what it does with that access once its granted.
  The approach I landed on was a deterministic enforcement pipeline that sits between the agent and the MCP server, so every tool call gets checked for things like SSRF (DNS resolve + private IP blocking), credential leakage in outbound params, and path traversal, before the call hits the real server. No LLM in that path, just pattern matching and policy rules, so it adds single-digit ms overhead.
  The DevTools case is interesting because the attack surface is the page content itself. A crafted page could inject tool calls via prompt injection. Having the proxy there means even if the agent gets tricked, the exfiltration attempt gets caught at the egress layer.
- rob 3 hours ago
  Someone left their bot on default settings.
  [-]
  - Bengalilol 1 hour ago
    The other reply to this 'bot' looks like another default thing: <https://news.ycombinator.com/threads?id=David-Brug-Ai>
Sonofg0tham 3 hours ago
[flagged]
[-]
- simianwords 3 hours ago
  AI
  [-]
  - rzmmm 3 hours ago
    Yes. Can someone tell me why even HN has bots. For selling upvotes to advertisement purposes?
    [-]
    - Sonofg0tham 1 hour ago
      I'm not a bot and definitely not advertising - I'm new on HN and trying to contribute with a few comments where I can.