I’ll give an example. At my previous company there was a program where you basically select a start date, select an end date, select the system and press a button and it reaches out to a database and pulls all the data following that matches those parameters. The horrors of this were 1. The queries were hard coded.
-
They were stored in a configuration file, in xml format.
-
The queries were not 1 entry. It was 4, a start, the part between start date and end date, the part between end date and system and then the end part. All of these were then concatenated in the program intermixed with variables.
-
This was then sent to the server as pure sql, no orm.
-
Here’s my favorite part. You obviously don’t want anyone modifying the configuration file so they encrypted it. Now I know what you’re thinking at some point you probably will need to modify or add to the configuration so you store an unencrypted version in a secure location. Nope! The program had the ability to encrypt and decrypt but there were no visible buttons to access those functions. The program was written in winforms. You had to open the program in visual studio, manually expand the size of the window(locked size in regular use) and that shows the buttons. Now run the program in debug. Press the decrypt button. DO NOT EXIT THE PROGRAM! Edit the file in a text editor. Save file. Press the encrypt button. Copy the encrypted file to any other location on your computer. Close the program. Manually email the encrypted file to anybody using the file.
I basically fix other people shitty voice for a living (replacing it with my own shitty code), the “best” one was by a guy, I suppose he was a self taught c programmer from how he wrote code, writing a complex python program. I saw:
- a function called randomNumberGenerator. It was a function which started a webserver. While looking for a python tutorial for something I found out why: he copy pasted the tutorial snippet but then didn’t bother renaming the function
- a program whose job was to listen to all other services and send them to another service via udp BUT it had a maximum buffer size so messages sometimes got truncated. I just directly put the listener in the target program and deleted it
- like another guy in this thread he didn’t use git. First day on the job they told me “yes, we need to check which machine has the latest code because he ssh into them and work there”. His version control was basically putting code in different machines
- lot of copied variables, because of c I suppose? Things like var = self.var
- camelCase python (ok this is just styling in the end)
- files with 10k lines of code
- half the services were in python 2, half in python 3. Don’t ask me why
- variables name in his original language (not English, not the client language)
- single letter variables, I fondly remember self.I (upper case i)
- I remember an
if a == a: (I left it there because lol) - he added a license check which used the ethernet mac address. Too bad ethernet was removed from the machine, and his code launched an exception which returned 00:00:00:00 as mac address, so all licenses were working on all machines
And many other things…
In another project I saw a backend running on the frontend, as in, this guy wrote the logic for a machine on the Javascript running the user interface of the screen
One time, I had to request firewall access for a machine we were deploying to, and they had an Excel sheet to fill in your request. Not great, I figured, but whatever.
Then I asked who to send the Excel file to and they told me to open a pull request against a Git repo.
And then, with full pride, the guy tells me that they have an Ansible script, which reads the Excel files during deployment and rolls out the firewall rules as specified.In effect, this meant:
- Of course, I had specified the values in the wrong format. It was just plaintext fields in that Excel, with no hint as to how to format them.
- We did have to go back and forth a few times, because their deployment would fail from the wrong format.
- Every time I changed something, they had to check that I’m not giving myself overly broad access. And because it’s an Excel, they can’t really look at the diff. Every time, they have to open it and then maybe use the Excel version history to know what changed? I have no idea how they actually made that workable.
Yeah, the whole time I was thinking, please just let me edit an Ansible inventory file instead. I get that they have non-technical users, but believe it or not, it does not actually make it simpler, if you expose the same technical fields in a spreadsheet and then still use a pull request workflow and everything…
The corporate world runs on excel, never the best option, but everyone knows it so…
Yep; I’ve seen excel files that at like 10MB because it’s a database in Excel
I’ve had legacy systems that would encrypt user passwords, but also save the password confirmation field in plain text. There was a multitenent application that would allow front end clients to query across any table for any tenant, if you knew how to change a header. Oh and an API I discovered that would validate using “contains” for a pre-shared secret key. Basically if the secret key was “azh+37ukg”, you could send any single individual character like “z” and it would accept the request.
Shits focked out here, mate.
I found code that calculated a single column in an HTML table. It was “last record created on”.
The algorithm was basically:
foreach account group foreach account in each account group foreach record in account.records if record.date > maxdate max = maxdateIt basically loaded every database record (the basic unit of record in this DATA COLLECTION SYSTEM) to find the newest one.
Customers couldn’t understand why the page took a minute to load.
It was easily replaced with a SQL query to get the max and it dropped down to a few ms.
The code was so hilariously stupid I left it commented out in the code so future developers could understand who built what they are maintaining.
A registration form and backend that would return the error “please choose more unique password” if you choose a password that was already stored (in plain text) in the database against another username.
I shit you not.
A program that HR had built so that all employees could they their payment receipts online
The username was the companies’ email address, the password was a government personal id code that you can lookup online, a don’t change, and you can’t update the password to something else.
So I told the director of HR this was a bad idea. She told me I was overreacting until I showed her her own receipt, then she finally understood that this is a really fucking bad idea.
Okay, so now she out me in charge of debugging that program.
So I setup a meeting with the director of the company they hired, he came by with the developer: a 21 yo girl who I think hadn’t finished college yet. Great start! Apparently it was her idea to do the authentication like that so that explains a few things.
So we dive in to the code.
First of all, the “passwords” were stored in blank, no hashing, no encryption, nothing. That wasn’t the worst.
For the authentication she made a single query to check if the user email existed. Of that was true, then step two was a second query to see if the password existed. If that were true, the email had been authenticated.
So let’s say, hypothetically, that they had actual passwords that people could change… I could still login with the email from anyone, and then use MY OWN password to authenticate.
This just blew my mind so hard that I don’t think I ever fully recovered, I still need treatment. The stupidity hurts
I wouldnt blame that on stupidity as much as on ignorance and naivety. Many people simply don’t think about anybody deliberately misusing their design. The idea that somebody could even want to access somebody elses receipts didn’t occur to them. And if they were still doing their studies they might not have known that you can “combine” SQL queries and ask for two things at once.
I don’t blame the girl, but whoever chose her to design a system with sensitive information.
Long time ago, but by far the worst for me was when I inherited some code that a previous programmer had done. Every variable was a breakfast item. So if biscuit>bacon then scrambledeggs=10. Shit like that. It was a nightmare and luckily I only had to deal with it infrequently.
Why do people do stuff like this, is the logic not difficult enough to follow on it’s own without a secondary definition table to consult!? Fucking hell.
Java webapp. Customer facing. E-commerce application, so in PCI scope and dealt with credit card info and such.
There was one specific cookie that stored some site-wide preference for the customer. (Why not just put that preference in the database associated with the user? Because that would make too much sense is why.)
But the way they encoded the data to go into the cookie? Take the data, use the Java serialization framework (which is like Python’s “Pickle” or Go’s “Gob”) to turn that into a string. But that string has binary data in it and raw binary data is kindof weird to put in a cookie, so you base64 encode the result. (The base64 encoding was the only sane step in the whole process.) Then you do the reverse when you receive the cookie back from the browser. (And no, there was no signature check or anything.)
The thing about the Java serialization framework, though is that decoding back into Java objects runs arbitrary object constructors and such. As in, arbitrary code execution. And there’s no checking in the deserialization part of the Java serialization framework until your code tries to cast the object to whatever type you’re expecting. And by that point, the arbitrary code execution has already happened. In short, this left a gaping vulnerability that could easily have been used to extremely ill effect, like a payment information breach or some such.
So all a malicious user had to do to run arbitrary code on our application server was serialize something, base64 encode it, and then send it to our servers as a cookie value. (Insert nail biting here.)
When we found out that there was a severe vulnerability, I got the task of closing the hole. But the existing cookies had to continue to be honored. The boss wasn’t ok with just not honoring the old cookies and developing a new cookie format that didn’t involve the Java serialization framework.
So I went and learned enough about the internal workings of how the Java serialization framework turned a Java value into a binary blob to write custom code that worked for only the subset of the Java serialization format that we absolutely needed for this use case and no more. And my custom code did not allow for arbitrary code execution. It was weird and gross and I made sure to leave a great big comment talking about why we’d do such a thing. But it closed the vulnerability while still honoring all the existing cookies, making it so that customers didn’t lose the preference they’d set. I was proud of it, even though it was weird and gross.
The value that was serialized to put into the cookie? A single Java int. Not a big POJO of any sort. Just a single solitary integer. They could just as well have “serialized” it using base-10 rather than using the Java serialization framework plus base64.
So, this is completely off topic, but some of the comments here reminded me of it:
An elderly family friend was spending a lot of her time using Photoshop to make whimsy collages and stuff to give as gifts to friends and family.
I discovered that when she wanted to add text to an image, she would type it out in Microsoft Word, print it, scan the printed page, then overlay the resulting image over the background with a 50% opacity.
I showed her the type tool in Photoshop and it blew her mind.First of all, lack of ORM isn’t bad. It’s not a good or bad thing to use them out not use them. What’s bad is not sanitizing your query inputs and you don’t need an ORM to do that.
I think the worst thing I’ve seen is previous devs not realize there’s a cost to opening a DB connection. Especially back when DBs were on spinning rust. So the report page that ran one query to get the all the items to report on, then for each row ran another individual query to get that row’s details was probably one of the slowest reports I’ve ever seen. Every DB round trip was at minimum 0.1 seconds just to open the connection, run the query, send back the data, then close the connection. So 10 rows per second could be returned. Thousands of rows per page has people waiting several minutes, and tying up our app server. A quick refactor to run 2 queries instead of hundreds to thousands and I was a hero for 10 min till everyone forgot how bad it was before I fixed it.
