Anthropic/OpenAI may be spending more than $1000 for every $100 you pay them

Trilogy3452@lemmy.world · 15 hours ago

Anthropic/OpenAI may be spending more than $1000 for every $100 you pay them

justOnePersistentKbinPlease@fedia.io · 13 hours ago

My first use of Claude this week, for code reviews only(since no LLM can be trusted to write a user story or test suite), had it gaslight me.

It marked down my code for using a specific practice to make some xml safer and easier to read.

When I tried things its way, it wanted me to change it back.

ramble81@lemmy.zip · 9 hours ago

I found it’s decent for some light QC but when it asks “do you want me to change it?” I’m like “nah, thanks for pointing it out but I’ll do it myself”

rozodru@piefed.world · 13 hours ago

oh it’s great isn’t it? you ask it for help on some code, provides its solution, you try it and it doesn’t work so you respond with the error, it claims YOU wrote it wrong and then when yo utell it “I just copy and pasted what you provided” it says “you’re right, i’m sorry.”

Claude is to the point now where it just starts hallucinating on the first prompt. it’s 100% unreliable now when before it was like 90%. no point in using it, it’s garbage. and Claude Code is just as bad now. If you or anyone is using Claude Code to develop ANYTHING I would highly suggest you stop right now because I can guarantee you with nearly 100% certainty that whatever shit it’s writing into your stuff isn’t going to work. period.

Crylos@lemmy.world · 13 hours ago

I use it a lot, and if you are getting these kinds of results you are either trolling, or just flat out not providing the details and guardrails required with your prompts.

I’ve been in software for decades, and if used correctly, yes it can accelerate velocity of building code out. 10x? No… if you are lucky and careful perhaps 2-4x.

As ALWAYS the human should be in the loop and is on the hook for any code generated.

tomjuggler@lemmy.world · 2 hours ago

Lots of anti-ai people in this thread it seems. I get it - personally I HATE the fake image generation! But I have to agree with you in terms of coding that using LLM’s correctly can offer huge benefits.

Modern harnesses are getting more and more sophisticated and your milage varies depending on how well you use them (like any complex tool). At the end of the day it’s still up to the developer to take the code and make sure it’s correct - no different from before where we used to copy code from Stack Overflow or other examples and modify them for our own use.

One thing I have to add - I honestly don’t understand why anyone would use Claude or chatgpt at their ridiculous prices when DeepSeek exists…

Nighed@feddit.uk · 2 hours ago

The question comes down to cost. The actually good models are already expensive, yet still apparently subsidised. Once we have to pay the true cost they will only be worth using when you are truly stuck.

Lots of use for the simpler models for basic util creation and simpler cleanup refactor stuff though. Quite nice if it actually turns out like that.

justOnePersistentKbinPlease@fedia.io · 10 hours ago

I was using a set of template files designed for LLMs to review that project. It is absolutely the fault of Claude that it tools me to do something one way, then told me to try another and when I reverted it said it was the optimal approach.

Where I find it helps is in getting initial starts and as a start to code review. But in both cases they aren’t ever operating on their own and their feedback is filtered through myself or another senior dev.

WYLD_STALLYNS@lemmy.dbzer0.com · 13 hours ago

Exactly, never trust an LLM to code. And if it argues back, explain why it’s wrong and that you have nothing but time and experience. Most tend to fold when you point out it’s not a free thinking AI, it’s an entrapped corporate model they designed with preprogrammed biases. But I love arguing 😂.

Arrandee@lemmy.world · 13 hours ago

I’ve used Claude and Codex, and while both are based on untenable economics, I can at least attest that my use of Codex has yielded some productive results. Claude, so far, has delivered fuck all that’s useful to me.

SleeplessCityLights@programming.dev · 12 hours ago

I have found the opposite. Codex spits back mostly useless code that is twice the length it needs to be with a bunch of unessesary stuff and Claude is the only thing I get useful output from.