• 0 Posts
  • 11 Comments
Joined 1 year ago
cake
Cake day: February 5th, 2025

help-circle

  • My last (confirmed) faulty hardware crash (resulting from user operation, not just an outright failure to boot, or random crash “for no particular reason” other than a program trying to access a failing SSD or similar) came in the late 90s with a GPU card that would take down the system bus voltage in response to certain CAD operations - repeatably - do this rotation, watch the CPU do a hard reboot every time. Stay away from the GPU heavy operations - no problems.

    These days the browser is the OS for over half of what happens on my work machines. And they’re almost, but not quite, 100% reliable, until they’re not. Working out those rare problems takes a long time, and with “progress” it feels like they’ve reached a kind of equilibrium where the rate of new problem introduction is about the same as the rate of known problem fixes.



  • One that I have to copy-paste over and over are vulnerabilities in the CUPS printer driver chain that don’t apply because we don’t print arbitrary things, we only print things that we create. Yeah, there’s a vulnerability here in image-magick if you throw it such and such maliciously crafted… well, we only allow it to process our internally generated reports and there’s no pathway for maliciously crafted input to reach it, so…


  • You need a person with a lot of experience to get something useful from this bot,

    Not entirely true. You get a lot more useful things from the bots when they are driven with people with a lot of experience. The problem that’s coming now is a magnified version of the “skript kiddiez” from early Google days where inexperienced people could just find exploits on the web and copy-paste them. Today, the LLMs actually can find vulns and develop exploits for people who don’t have any knowledge of the languages the exploits are being written in.

    every time we actually measure, the results that your experienced person will be quicker and better not using it at all, and doing the same work themselves.

    From my perspective, your data is out of date. I’ve been tracking the “usefulness” of frontier models in accelerating development speed for experienced people over the past 2 years. Two years ago, total waste of time. One year ago - equivocal, sometimes it accelerates an implementation, sometimes not. Six months ago, it was clearly helping more than hurting in most cases, and it has only continued to improve since then.

    Knowing what you are doing helps. Trusting that the LLM will help, helps - if you set out to show it’s a waste of time, a waste of time it will be. Lately, treating the LLM like a consultant, just hired, likely to disappear any day, helps. Take the time to run all the formal processes, develop the requirements documentation, tests, etc. Yes, that “slows things down” but not in the long run across realistic project life cycles - even with humans doing the work. Also along those lines: keep designs modular, with modules of reasonable complexity - monolithic monster blocks of logic don’t maintain well for people either. LLM implementations start falling apart when their effective context windows get exceeded (and, in truth, people do too.)


  • no CVE list, no CVSS distribution, no severity bucket, no disclosure timeline, no vendor-confirmed-novel table, no false-positive rate

    Yeah, that’s cooked data - it’s too easy to ask the LLM to give you the CVE list, the CVSS distribution / severity buckets, timelines, everything you might want.

    I have LLMs doing pull request reviews and as a default response they just give potshots, but if you prompt them they will point directly to the files and line numbers where the problems they are pointing out reside…



  • A vuln that doesn’t really give you anything, that you can only exploit locally, when already having elevated privileges? That’s going to be low priority for a fix.

    And, yet, here I am - rebuilding a new interim image for our security team to scan so they can generate a spreadsheet with hundreds of lines of “items of concern” which are above our “threshold of concern” and most of them are being dismissed because of those justifications you just gave: local exploit only, etc. but I have to read every one, tease out the “local exploit only” language, quote it for the justification, over and over and over every few months.

    Corporate anxiety is limitless.



  • most of the results are technically correct, but, within the context of the project, not something anyone’s going to take the time to fix.

    I don’t mind leaving “technically correct” vulnerabilities in place while there’s no known way to create an exploit. If you’ve got a vuln with a known exploit and are relying on “but nobody is ever going to actually try that on us” - then you’re part of the problem, a big part.


  • In other words, it’s like adding an automated security researcher to your team. Not a zero-day machine that’s too dangerous for the world.

    Missing the point? Hiring an elite human researcher isn’t easy, or cheap. It’s beyond the means of the vast majority of people out there. $20/Month Claude Pro subscription? Not so much.

    The question for me: How much better is Mythos than Opus 4.6 or 4.7, or Sonnet for that matter? Those models and similar from other companies are already being effectively leveraged by threat actors. If Mythos reduces the time x money cost of finding a new zero-day by a factor of 10 vs Opus 4.7 - that’s concerning. If it’s a factor of 1.1 - meh… the world is going to have to learn how to deal with these things sooner than later, and that means the “white hats” are going to need superior funding to the “black hats” along with cooperation to close the gaps they find, or the “black hats” are going to be getting a lot more annoying than they already are.