• makeshift0546@lemmy.today
    link
    fedilink
    arrow-up
    36
    arrow-down
    3
    ·
    14 hours ago

    Have you actually done this? It’s usually the other way with it being pedantic or wanting you to fix long term problems in the code base.

    • TheFogan@programming.dev
      link
      fedilink
      English
      arrow-up
      9
      arrow-down
      1
      ·
      7 hours ago

      lol the amount of times I’ve seen AI do.

      No wait that’s wrong try this

      Oh wait that’s also wrong that still doesn’t work try this.

      No still wouldn’t do it.

      On just questions without asking it to correct itself, (which tells me internally they have some kind of… basically reviewing it’s own answers before they go out.

      Honestly I do wonder if we’ll get there

      a hello world program would just spin back and forth “Rejected demanded these fixes”, Rejected making it more like the original, rejected. retrying

      “you have spent your entire $50,000 token budget… would you like to restock and keep going”.

      • Buddahriffic@lemmy.world
        link
        fedilink
        arrow-up
        1
        ·
        2 hours ago

        You can train it on all the source code, meta data for that source code, and documentation you want but it will never understand programming. It’s a text predictor that was trained on both sides of a bunch of debates. Contradictions mean nothing to it, but it usually only predicts what one side of the debate will say to champion its side, which means it will use confident and absolute language to “sell” whatever side of the debate it looks like the previous tokens are headed towards.

        It is impressive what it can output sometimes and it makes a decent debate/exploration partner, but it will always have a chance at predicting a useless series of tokens or contradicting the previous thing it just said because a) its training data only trains it to predict tokens from statistics, and b) its training data includes some of those contradictions directly.

        I have lost count of the times I’ve been “thinking out loud” about something with an LLM and realize something about what I’m thinking about that contradicts what it is currently saying, then I’ll add my new perspective and it agrees entirely, despite the contradiction. Sometimes it tries to resolve the contradiction, sometimes it just abandons what it said previously entirely, sometimes it adds more to the perspective that I hadn’t considered.

        That’s fine for just shooting the shit about some random topic but horrible for a tool intended to provide expertise and reliability, when the response matters because it feeds into something else and you want to automate it. Should a tool just inject “are you sure?” after each response? What if it makes it second guess something that was correct? What if it’s one of those debates and it will endlessly switch sides when it faces any opposition? That’s a waste of resources and time.

        Funny thing is I’m expecting this to eventually go back to scripting for automation. An LLM has a higher chance of outputting a script that does what you want (depending on the task) while you hold its hand than it does of consistently giving the correct output when it is thrown into an automated system directly. But you get “goodish” results much quicker just trying putting the LLMs everywhere, even if there’s some selection bias on the results (“didn’t work, didn’t work, oh it worked, great!”).

        • jj4211@lemmy.world
          link
          fedilink
          arrow-up
          1
          ·
          2 hours ago

          Yep, seen this.

          Also, each iteration saying “ok, all problems are now addressed, the check should be fine, but running it just in case” (generates even more build errors than before). Rinse and repeat until my token quota is exhausted and I just code the good old fashioned way, no skin off my back. And I’m doing a ‘good job’ with utilization, despite having burned most of my quota on a failure that got thrown away.

      • Jakeroxs@sh.itjust.works
        link
        fedilink
        arrow-up
        2
        arrow-down
        1
        ·
        6 hours ago

        The amount of times I’ve had AI solve a problem my 30+ year senior couldn’t figure out on his own applications he’s been working on through that entire time…

    • sudoMakeUser@sh.itjust.works
      link
      fedilink
      arrow-up
      17
      ·
      11 hours ago

      GitHub’s Copilot review is pretty good. I thought it would just catch nitpick style things but it actually catches bugs and bad architecture.

      • makeshift0546@lemmy.today
        link
        fedilink
        arrow-up
        5
        ·
        3 hours ago

        I love it as a first pass, it’s quite good. It really gets hung up on some out the ghosts of our codebase though haha

      • Hudell@lemmy.dbzer0.com
        link
        fedilink
        English
        arrow-up
        13
        ·
        10 hours ago

        Copilot can review the code but is still pretty bad at reviewing the changes themselves. It misses a lot of potential issues and at the same time complains about many things that aren’t problems.

        • jj4211@lemmy.world
          link
          fedilink
          arrow-up
          2
          ·
          2 hours ago

          Another thing is that it kind of instills a false confidence. Reviewers are getting lazy when the LLM gives a ‘LGTM’ and letting stuff through that bites us in the ass…