13 comments

  • p0w3n3d 1 hour ago

      1. LOL I've just downloaded literally whole internet and copyrighted books and put them through a neural network. Now I have this whole knowledge in my LLM.
    
      2. Hey? Are you using my NN for training your NN? you're a thief!
    • matheusmoreira 32 minutes ago
      Remember how Kim Dotcom got destroyed for criminal copyright infringement? One would think the big tech CEOs would face the same fate, that police officers would rappel down helicopters, storm their mansions and bring them out in cuffs.

      Instead the AI companies reached these absurd settlements with publishers that made a mockery out of all the previous copyright enforcement victims.

      • xienze 4 minutes ago
        Remember how people used to justify their own personal software piracy with arguments like "information wants to be free", "no one stole anything, you still have the data", "I was never going to buy it anyway", and "copyright should be abolished?"

        > Instead the AI companies reached these absurd settlements with publishers that made a mockery out of all the previous copyright enforcement victims.

        Isn't that at least something? How many people pirating software ever settled with the companies they "victimized?"

    • yubblegum 2 minutes ago
      Whatever happened to honor among theives? What is this world coming to..
    • short_sells_poo 54 minutes ago
      The corollary is that there are no morals once the stakes are in the $ billions, let alone hundreds of billions.

      This isn't even about a single person or personality. Very few people in such position could stand fast by their moral code. In any case, an environment that favors profit above everything will naturally select for individuals who are unencumbered by such hindrances.

      There might've been 100s of Altmans and Amodeis who had a strong moral code but we don't know about them because they dropped out of the "race" because of said moral hurdles.

      • spinningslate 4 minutes ago
        > an environment that favors profit above everything will naturally select for individuals who are unencumbered by such hindrances.

        Exactly. Dairy farms optimise for milk production so favour cows that produce the most milk.

        The market economy optimises for profit so favours those most willing/able to generate it. Zuckerberg, Musk, Thiel, Andreesen and co are products of the system.

      • rlpb 47 minutes ago
        Copyright law is an artificial legal construct, not a moral code.

        I think appropriate attribution is a moral code, but I am not able to attribute every idea I have to all those who helped me develop the general intelligence that I use to develop such ideas.

        • raxxorraxor 21 minutes ago
          I think this behaviour has shown that there are no morals involved. Pirate if you want to, just don't get caught if you don't have a giant backing.
    • TZubiri 2 minutes ago
      I never get tired of posting this answer because everyone on the internet is adopting this hot take:

      If you look at it with your eyes crossed, Anthropic and the chinese are doing the same thing.

      If you look at it with nuance 1 the chinese are doing way worse stuff, and 2 stealing from a thief would still be stealing

      1. The chinese are making multiple accounts (at least 49,000)[1][2], using proxies/VPNs, possibly using residential computers and infected computers (unless you think the chinese are doing due diligence to ensure their purchased IPs are kosher). All accounts need to be created with a real name, and especially so if the paid models need to be accessed and paid with a credit card. So this is beyond IP theft and getting closer to fraud. These are all techniques that are well studied because they are used by criminals and cybercriminals, textbook stuff. Consider if that was not sufficient, that China is banned from using the product, so they need to use identities and locations not just to avoid relating the accounts between themselves, but merely to allow account creation. What identities are they using to create accounts.

      Compare this to Anthropic which reads notes made a deal in an IP theft case paying billions because they bought books and scanned them but buying the books wasn't sufficient retribution for the authors. Or that they gasp scanned the internet, like Google.

      Not having nuance to see the difference between the two companies is something I expect of the twitter echo chamber copying hot takes for upvotes, not hacker news.

      [1] https://arstechnica.com/tech-policy/2026/06/anthropic-claims... [2] https://www.anthropic.com/news/detecting-and-preventing-dist...

  • bhouston 59 minutes ago
    All remote AI are a massive security risk for individuals/companies/governments that may be targeted by the US government.

    It is likely that the US will get a live feed from each AI provider that they are inspecting in real time to identity things of interest, terrorist attacks or foreign government planning or even foreign companies competitive to key US companies.

    It will give them access to the though process in those companies as well as much of their text-based IP (source code, docs, meeting transcripts, etc)

    Also if you are using local AI that you didn’t train yourself you can never be sure it doesn’t have purposeful biases in its reasoning that may disadvantage you - such as directing you away from certain plans or ideas or patents etc.

    • londons_explore 18 minutes ago
      It is worth thinking about the fact the total throughput of even a big LLM provider isn't many megabits.

      If a token compresses to around a byte, worldwide AI input and output is around 1 gigabyte per second.

      For any intelligence agency, they can afford to keep and store all of that forever, and later do analysis on it.

    • general1465 22 minutes ago
      Leakage of IP and training on your data is something what I am pointing out too, but people will turn around and try to smooth me down that TOS does not allow that if you are an enterprise client. Are you really going to believe that AI companies won't ignore TOS, when they were ignoring literal laws which sent others to jail in the past? Especially when more data = better model?
  • eunos 2 hours ago
    What Claude Code did is absolutely mindboggling tho, if Chinese harness did that probably POTUS would lose sleep.
    • usef- 1 hour ago
      It seemed pretty mild compared to what's collected by modern websites and apps, though? How many don't know your Timezone?
      • dijit 56 minutes ago
        > How many don't know your Timezone?

        The timezone fetch was to alter program behaviour at runtime, not to send arbitrary timezones for tracking reasons.

        It was one way of detecting if it was a chinese person using the program and then behaving differently.

        Malware behaves this way. STUXNET for example was wired to do nothing except propagate unless the environment had the right conditions.

        • usef- 33 minutes ago
          The article on HN only said that they seemed to be collecting this to detect resellers. How else did the behavior change?

          Most services I know that are trying to block abuse do collect device info

          • dijit 31 minutes ago
            regardless of anything else, whether what you said is true or not: blocking program execution based on the detected environment is a runtime behaviour change.
            • usef- 27 minutes ago
              Agreed. And it also applies to the "I'm not a bot" checkbox on most websites. And hundreds of other things people use every day.
    • yard2010 1 hour ago
      Wait what do you mean "if"?
    • ironbound 1 hour ago
      And I'm the king of France
    • cognitiveinline 2 hours ago
      Exaggerate much? If you think POTUS would lose sleep about a date format timezone marker, I don't know what to tell you.
    • youre-wrong3 1 hour ago
      Maybe if they didn’t farm all the data from Claude to train their own trash models. Anthropic wouldn’t feel the need to do it.
      • InsideOutSanta 1 hour ago
        Who is "they", and which Chinese models are trash?
      • vrganj 1 hour ago
        Anthropic stole the entire internet. Excuse my language, but they can fuck right off.
        • breppp 57 minutes ago
          The issue here is not whether Anthropic used Common Crawl, Alibaba also does that.

          The issue is that by distilling Claude, Alibaba reuses the IP anthropic used to train the model that's more akin to historical Chinese reverse engineering methods and disrespect of IP

          • snovv_crash 27 minutes ago
            Alibaba paid for that data though, right? They didn't hack Anthropic, they bought accounts and ran them normally.

            Also, you can't copyright AI outputs. So worst case they violated the ToS.

          • blackoil 35 minutes ago
            'Issue' for who?
          • matheusmoreira 38 minutes ago
            > reuses the IP anthropic used to train the model

            > disrespect of IP

            Nobody other than Anthropic cares.

          • messe 50 minutes ago
            > Alibaba reuses the IP anthropic used to train the model that's more akin to historical Chinese reverse engineering methods and disrespect of IP

            Why is this any worse than Anthropic's disrepect of IP? You've apparently drawn a distinction between the two here, but I'm failing to see what it actually is.

          • vrganj 55 minutes ago
            Anthropic clearly doesn't respect other people's IP, it's real rich that they now insist on theirs being worthy of protection.

            Fwiw, I think the concept of IP in general is counter to human progress.

            • kataklasm 44 minutes ago
              The practical implementation of IP? Sure, that's debatable. But the concept of IP is rooted in favoring progress. The thought process being, that if one's intellectual work can be copied and reused and modified and what not without issues, why should anyone invent things anymore? Just wait for the next person to do it and then copy their work, that's way less effort than inventing things yourself. IP aims to protect progress by making sure inventors have actual incentive to invent stuff. They way it's implemented is fundamentalst flawed, I agree, but the concept itself? I'm not so clear on that
            • breppp 50 minutes ago
              It's more complicated than that because Google has been legally displaying other people copyrighted material for years.

              In any case there's still a difference between publicly available copyrighted data and whether you can use it for model training, and the innovation around model training, RLHF, etc which you presumably have some interest as a country to allow companies to invest in with some legal protections (like the diff between patent law vs copyright law)

              • platinumrad 6 minutes ago
                So you're saying it's more important to safeguard slop outputs than the original work of human beings.
  • johnathan101 2 hours ago
    Regardless of whether this specific claim is true, enterprises are becoming much more cautious about developer tools that can read large portions of proprietary codebases.
    • soraminazuki 1 hour ago
      It's insane that it's becoming a concern now. It should've ended the discussion from the very beginning.
      • yurish 44 minutes ago
        Enterprises host their entire infrastructure on US-base clouds. And for many, it still is not a problem.
      • dan_i 31 minutes ago
        [dead]
    • saidnooneever 1 hour ago
      not to mention they are kind of capable of executing code and susceptible to injections which also amounts to being practically backdoors if youre not super careful about how u use the tooling
    • spwa4 1 hour ago
      Wasn't one of the big promises the AI labs made "uncopyrighting"? Ie. the ability to reconstruct large works, including source code, without actual access to the source code? Everything from movies to operating systems.
      • silon42 57 minutes ago
        Cleverly compressing and decompressing doesn't de-copyright it. ... and if it's not the same who'd trust it.
    • llm_nerd 1 hour ago
      Becoming? We've moved entirely in the opposite direction.

      When these tools first appeared the overwhelming conversation was about the risk of letting a remote tool siphon your code and intellectual property (where eventually they're going to add that to their training). Now everyone is using them, and that fear seems to have dissolved. Every corporation is sprinkled with Claude Code, Antigravity, Copilot, Codex, and so on. Even the long fear-mongered Chinese providers are being heavily used in many spaces.

      In this case this is a PR battle between two firms, and it isn't much more. And Alibaba isn't worried about the "proprietary code" (the truth is that there is incredibly little interest in most orgs code), but that the tool is a backdoor, or at least that is the claim.

      • DanielHB 1 hour ago
        > there is incredibly little interest in most orgs code

        I think from a commercial perspective yes, but access to source code is very good for finding exploits which could be very valuable for governments. I could also see a future where companies are directly cyber-attacking competitors in hostile markets too...

      • otabdeveloper4 1 hour ago
        > and that fear seems to have dissolved

        Until the first big incident, yes.

  • jdw64 1 hour ago
    I got curious and asked my Chinese friends, and they gave me a Reddit link[1]. It looks like it's about location data collection, and they suggested that might be the reason for the issue.

    [1]https://www.reddit.com/r/ClaudeAI/comments/1ujila1/anthropic...

  • Jeff9James 18 minutes ago
    Story of Z.ai:

    use claude-code see how good it is send 100k bots to distill fable 5 (GLM 5.2 is the result of this) release Zcode ditch claude-code ban claude-code

  • bushido 35 minutes ago
    What's very interesting to me is these moves will introduce a good amount of doubt in future claims by Claude etc, that the open source and non-US models are only getting better because they're distilling from frontier labs.
  • ravenstine 47 minutes ago
    Employers in 2022:

    > No! Don't install that lodash thing without explicit approval from IT. Oh, you want a license for Charles Proxy? Gee, I dunno... we've got a budget to maintain.

    Employers in 2023:

    > No! You can't use ChatGPT at work – it's a security risk.

    Employers in 2024:

    > Okay, you can use Github Copilot I guess, but you'll have to endure boring corporate training on what you're allowed to do with it.

    Employers with dollar signs in their eyes in 2025:

    > We attended a seminar about vibe coding. Why aren't you dumbasses keeping up with the times? Use Claude Code for everything! Don't write any of your own code anymore. We don't even really care if you use yolo mode. Just review code and push 10x more features! Use unlimited tokens! Money printer go brrrrr.

    Employers in 2026:

    > You mean giving one or two companies full autonomous access to our workstations while stupifying our engineers wasn't a sound business plan?

    • dan_i 42 minutes ago
      [dead]
  • yanhangyhy 3 hours ago
    i gonna ask: how can they still use claude? i thought all users in china are banned
    • dgellow 2 hours ago
      Alibaba has engineers in Hongkong, Singapore, North America. It’s a global corporation
      • itake 2 hours ago
        when i was in hongkong, chatgpt and gemini were disabled. Maybe this has changed though. When I was in China, the corporate vpn (zscaler) routed traffic through hk
    • xyzsparetimexyz 1 hour ago
    • bravetraveler 2 hours ago
      Same way every ban is evaded, smurfing
    • playnuu9 2 hours ago
      There is a reason Singapore tops the rank on Claude usage
      • chinathrow 4 minutes ago
        Source?
      • byzantinegene 2 hours ago
        the government also actively promotes AI usage in work environments
    • _flux 2 hours ago
      Does Alibaba only have developers in the China?
    • dist-epoch 2 hours ago
      The same way they buy "banned" and "sanctioned" NVIDIA GPUs.
    • josh-wrale 3 hours ago
      Cc can be used with non Anthropic models.
    • re-thc 3 hours ago
      > how can they still use claude?

      Workarounds aside, it says Claude Code not Claude.

      i.e. they are using the CLI running any model. You can for instance run GLM with it.

  • rvnx 2 hours ago
    Can't say they are wrong, after the latest backdoor, or let's say, undocumented functionality that leaks some data that was pushed in Claude Code few days ago

    https://news.ycombinator.com/item?id=48759754

    • dgellow 2 hours ago
      That’s not what a backdoor is…
      • tpoacher 2 hours ago
        Rear entrance then
      • rvnx 2 hours ago
        When a company can remotely push code without explicit user approval, and code that was hostile / almost malicious, it is a backdoor
        • jitl 43 minutes ago
          so like… any website
  • rvz 2 hours ago
    Another reason to use open source coding agents and local language models.

    Claude Code is neither and it is literally info stealing malware.

  • HlessClaudesman 1 hour ago
    Translation: Alibaba will continue distillation attacks using accounts that aren't directly attributable to it's own corporate infrastructure.
    • ampersandwhich 1 hour ago
      I think we should start calling it "distillation terrorism" just to make it sound even more absurd.
      • InsideOutSanta 1 hour ago
        It's pure model murder, and if you call it anything else, you're an anti-American communist.
    • lelanthran 1 hour ago
      > Translation: Alibaba will continue distillation attacks using accounts that aren't directly attributable to it's own corporate infrastructure.

      What's a "distillation attack"? How is it different from simply distillation?

      • kouteiheika 1 hour ago
        It's pretty much the same as when "installing programs on your computer" is called "sideloading". Deliberately deceptive, weaponized language to make it seem like a bad thing.
      • dizhn 1 hour ago
        The target doesn't want to be distilled.
        • julianlam 3 minutes ago
          You wouldn't distill a car.
    • RobotToaster 1 hour ago
      (Mis)anthropic already performed "distillation attacks" on the internet.
    • vorticalbox 1 hour ago
      i can see why they want to stop it but 1. you have to pay for the "attack" 2. these AI companies trained on copyrighted content without permission or attribution to anyone who's data was used to train.
    • exe34 1 hour ago
      As long as they're paying for the tokens, there's no attack . Otherwise you have to call training on copyrighted material theft.
      • feverzsj 1 hour ago
        They are not paying for most tokens. The actual users in China do. All they need is the logs.
        • InsideOutSanta 1 hour ago
          Anthropic still gets paid.

          Unlike the vast majority of people Anthropic stole from.

        • dizhn 1 hour ago
          In that case it's already bought and paid for by the users, is it not?
    • surgical_fire 1 hour ago
      How exactly the word attack fits in that phrase?
    • vrganj 1 hour ago
      Did Anthropic perform "distillation attacks" when they hoovered up the entire internet?
  • feverzsj 2 hours ago
    Considering their massive distillation, if US companies stop publishing new models to the public, would China still be able to develop new open weight models?
    • bel8 2 hours ago
      I don't think China would strugle to scrape the internet for fresh data.

      And they constantly publish state of the art LLM research (see DS4 context compaction and cache tech).

      They have very capable tech giants. So while not being able to distill western models would probably have some impact, it's probably becoming lesser as time passes.

      We might even see Western LLMs distilling Chinese models soon. If they aren't already to some extent.

    • margorczynski 2 hours ago
      China has most probably already achieved "escape velocity" on the software side. Now if they achieve parity, to some degree at least, on the hardware side with Nvidia it is very possible they'll overtake the US.
    • pjmlp 39 minutes ago
      Of course, it is like any other kind of weapon system, eventually the knowledge gets acquired.
    • tristanj 2 hours ago
      Yes, 100%. GLM 5.2 is capable of RSI. It's too late to stop.
    • surgical_fire 1 hour ago
      Probably yes.

      More than a year ago, when Anthropic and OpenAI started to gide the reasoning bits from the output, a lot of people here on HN predicted that Chinese models days were numbered.

      Fast forward to today, and models such as DeepSeek and MiMo are nothing short of excellent. I haven't used GLM or Qwen but heard very good things about them as well.

      This "massive distillation" sounds a lot like anxiety about how companies from outside the US can develop very good models themselves.