Ask HN: Inherited the worst code and tech team I have ever seen. How to fix it?
- this code generates more than 20 million dollars a year of revenue
- it runs on PHP
- it has been developed for 12 years directly on production with no source control ( hello index-new_2021-test-john_v2.php )
- it doesn't use composer or any dependency management. It's all require_once.
- it doesn't use any framework
- the routing is managed exclusively as rewrites in NGInX ( the NGInX config is around 10,000 lines )
- no code has ever been deleted. Things are just added . I gather the reason for that is because it was developed on production directly and deleting things is too risky.
- the database structure is the same mess, no migrations, etc... When adding a column, because of the volume of data, they add a new table with a join.
- JS and CSS is the same. Multiple versions of jQuery fighting each other depending on which page you are or even on the same page.
- no MVC pattern of course, or whatever pattern. No templating library. It's PHP 2003 style.
- In many places I see controllers like files making curl requests to its own rest API (via domain name, not localhost) doing oauth authorizations, etc... Just to get the menu items or list of products...
- no caching ( but there is memcached but only used for sessions ...)
- team is 3 people, quite junior. One backend, one front, one iOS/android. Resistance to change is huge.
- productivity is abysmal which is understandable. The mess is just too huge to be able to build anything.
This business unit has a pretty aggressive roadmap as management and HQ has no real understanding of these blockers. And post COVID, budget is really tight.
I know a full rewrite is necessary, but how to balance it?
But before you re-write once line of code - get some testing in place. Or, a lot of testing. If you have end-to-end tests that run through every feature that is currently used by your customer base, then you have a baseline to safely make changes. You can delete code as long as the tests pass. You can change code as long as the tests pass.
Once you are at that point, start picking off pieces to modernize and improve.
Also, respect the team. Maybe they aren't doing what you would, but they are keeping this beast alive, and probably have invaluable knowledge of how to do so. Don't come in pushing for change... come in embracing that this beast of a codebase makes 20 million a year. So talk about how the team can improve it, and modernize their skills at the same time.
Because if you walk in, saying, "This all sucks, and so do you, lets throw it out", do you really have to wonder why you are hitting resistance?
Those seem like low hanging fruit that are unlikely to effect prod.
You should also probably spend a decent amount of time convincing management of the situation. If they're oblivious that's never going to go well.
I agree a full rewrite is a mistake and you have to instead fixed bite sized chunks. It also will help to do that if you start to invest in tooling, a deploy story and eventually tests (I'm assuming there are none). If I was making 20 million off some code I'd sure as heck prioritize testing stuff (at least laying the groundwork).
Its probably also worth determining how risk tolerant the product is and you could probably move faster cleaning up if it is something that can accept risk. If it's super critical and I'd seriously prioritize setting up regression testing in some form first
2. Slowly start extracting code and making small functions. Document like crazy in the code as you learn. Keep the single file or close to it, and don't worry about frameworks yet.
3. Introduce unit tests with each new function if you can.
After all that is done, make a plan for next steps (framework, practices, replace tech etc).
Along the way, take the jr backend engineer under your wing, explain everything, and ensure they are a strong ally.
Call me crazy, but that project sounds like fun.
The best approach is to:
-Assess the situation
-Create a task list
-Decide what needs immediate attention
-Create a time line for it all
-Get feedback from team
-Add the business roadmap to you list
-With upper management work on a timeline
-Define your project with realistic times
Execute and manage the project.
BTW, this type of team and codebase is not out of the ordinary. Companies start to program with the idea that eventually the problems will be fixed yet it never happens. Upper management does not care because all they care about is reducing cost and getting the results they need. You're dealing with the results.
I sort of think... if you have to ask this here you might be in the wrong job? Was this a job that seemed like something else then became this? This sounds like a job for an experienced VP Engineering. It is a tough order. Wouldn't know how to do it myself. Lots of technical challenges, people challenges, growth challenges, and managing up and down.
The resistance to change is something you need to get to the bottom of. People are naturally resistant to change if they are comfortable, and we've all been through 'crappy' changes before at companies and been burned.
The solution might be to get them to state the problems and get them to suggest solutions. You are acting more like a facilitator than an architect or a boss. If one of them suggests using SVN or Git because they are pissed off their changes got lost last week, then it was their idea. No need to sell it.
This assumes the team feels like a unit. If the 3 are individualistic, then that should be sorted first. E.g. if Frank thinks it is a problem but no one else does, and they can't agree amongst themselves, then the idea is not sold yet.
Once you know more about what your team think the problems are and add in a pinch of your own intuitions you might be able to formulate confidently the problems, so you can manage their expectations.
In these types of situations, the problems are social and possibly political and rarely technical, even though the technical problems are the symptoms that present themselves so readily.
figure out what you want to fix first, and then fix that. then go to the next thing. but keep in mind - "management and HQ has no real understanding", and as far as they are concerned, what they have works.
if this doesn't sound like something you want to do, then find a new job. you are effectively the property manager for a run-down rental property. you aren't going to convince the owners to tear it down and build a new set of condos.
This isn't going to come off nicely, but your assumption that it needs a full rewrite, is in my eyes a bigger problem than the current mess itself.
The "very junior" devs who are "resistant" to change are potentially like that in your view for a reason. Because of the cluster they deal with I suspect the resistance is more they spend most of their time doing it XYZ way because that's the way they know how to get it done without it taking even more time.
What it sounds like to me is that this business could utilize someone at the table who can can understand the past, current, and future business - and can tie those requirements in with the current environment with perhaps "modernizing" mixed in there.
Get some type of CI/devops thing going so you can deploy to a temporary test environment whenever you want. This applies to the data too so that means getting backups working. Don't forget email notifications and stuff like that.
Next comes some manner of automated testing. Nothing too flash, just try to cover as much of the codebase as possible so you can know if something has broken.
Go over the codebase looking for dramatic security problems. I bet there's some "stringified" SQL in there. Any hard coded passwords? Plaintext API calls?
And now everything else. You're going to be busy.
First, get everything in source control!
Next, make it possible to spin service up locally, pointing at production DB.
Then, get the db running locally.
Then get another server and get cd to that server, including creating the db, schema, and sample data.
Then add tests, run on pr, then code review, then auto deploy to new server.
This should stop the bleeding… no more index-new_2021-test-john_v2.php
Add tests and start deleting code.
Spin up a production server, load balance to it. When confident it works, blow away the old one and redeploy to it. Use the new server for blue/green deployments.
Write more tests for pages, clean up more code.
Pick a framework and use it for new pages, rewrite old pages only when major functionality changes. Don’t worry about multiple jquery versions on a page, lack of mvc, lack of framework, unless overhauling that page.
That gives you an annual maintenance cost which will include, say "every 2 years something goes badly wrong with the flargle blargle, and costs $10,000 to fix", or "every 3 days we have to clear out the wurble gurble to stop it all crashing".
Finally, you put together the same thing but for a re-written version, or even with some basic improvements as others have suggested, and hopefully you see a lower total cost of maintenance.
At that point, you can weigh up the cost of either a rewrite or incremental improvements in actual dollars.
This should be the thing that starts every conversation. Because IT WORKS for the intended purpose.
Someone else said it. Put everything in source control first.
And just fix things that directly impact that 20 Million dollars a year.
Then if you want, you can put together a small sub-team that would be responsible to transitioning certain pages into a framework. But don't rewrite the whole thing.
If you are not managing them directly and they don't want to do those kind of things because it sounds hard or foreign, then you can't really do anything about it.
From a business perspective, nothing is broken. In fact, they laid a golden goose.
> team is 3 people, quite junior. One backend, one front, one iOS/android. Resistance to change is huge.
My mistake, they didn't lay a golden goose--they built a money printer. The ROI here is insane.
> productivity is abysmal which is understandable. The mess is just too huge to be able to build anything.
But you just told me they built a $20M revenue product with 3 bozos. That sounds unbelievably productive.
> This business unit has a pretty aggressive roadmap as management and HQ has no real understanding of these blockers
You should consider quitting your job.
As far as the business is concerned, there are no problems... because well... they have a money printer, and your team seems not to care enough to advocate for change. Business people don't give a damn about code quality. They give a damn about value. If 2003 style PHP code does that, so be it. Forget a rewrite, why waste time and effort doing simple refactoring? To them, even that has negative financial value.
From their perspective, you're not being paid to make code easy to work with, you're being paid to ship product in a rats nest. Maybe you could make a business case for why its valuable to use source control, dependency management, a framework, routing outside of nginx, and so on... but it doesn't sound like any of that mattered on the road to $20M a year, so it will be very difficult to convince them otherwise especially if your teammates resist.
This, again, is why you should consider leaving.
Some developers don't mind spaghetti, cowboy coding. You do. Don't subject yourself to a work environment and work style that's incompatible with you, especially when your teammates don't care either. I guarantee you will hate your job.
But I would start by choosing how and whether to fix up the crown jewels, the database.
You say that instead of adding columns, team has been adding new tables instead. With such behaviours, it's possible your database is such a steaming pile of crap that you'll be unable to move at any pace at all until you fix the database. Certainly if management want e.g. reporting tools added, you'd be much better to fix the database first. On the other hand, if the new functionality doesn't require significant database interaction (maybe you're just tarting up the front end and adding some eye candy) then maybe you can leave it be. Unlikely I would imagine.
Do not however just leave the database as a steaming pile of crap, and at the same time start writing a whole lot of new code against it. Every shitty database design decision made over the previous years will echo down and make it's ugly way into your new nice code. You will be better for the long run to normalise and rationalise the DB first.
The problem with this plan is corporate politics. Say that OP takes on this challenge. He makes a plan and carefully and patiently executes it. Say that in six months he's already fixed 30% of the problem, and by doing do he meaningfully improved the team's productivity.
The executives are happy. The distaster was averted, and now they can ask for more features and get them more quickly, which they do.
Congratulations, OP. You are now the team lead of a mediocre software project. You want to continue fixing the code beyond the 30%? Management will be happy for you to take it as a personal project. After all, you probably don't have anything to do on the weekend anyway.
You could stand strong and refuse to improve the infrastructure until the company explicitly prioritizes it. But then why would that job be better than just taking a random position in a FAANG company? The code quality will be better and so will the pay.
* Create a git repo from the code as it exists
* If the other team is still doing things live, create a workflow that copies the code from the prod server to git as-is nightly so you have visibility into changes. Here’s an opportunity for you to see maybe what the team gets stuck on or frustrated with, and you can build some lines of communication and most importantly some trust. You can suggest fixed and maybe even develop the leadership role you need.
* Get a staging instance up and running. If I had to guess why the team does things live, maybe the project is a huge pain to get sample data for. If that’s the case, figure out the schemas and build a sample data creation tool. Share with the team and demonstrate how they can make changes without having to risk breaking production (and for goodwill - it helps prevent them from having to work evenings, weekends, and vacations because prod goes down!)
* PHP isn’t so bad! Wordpress runs a huge chunk of the web with PHP!
* tailwind might be a cool way to slowly improve CSS - it can drop into a project better than other css frameworks IMO
* Pitch your way of fixing this to management while quoting the cost of a rebuild from different agencies. Throw in the cost of Accenture to rebuild or whatever to scare management a little. You are the most cost effective fix for now and they need to know that.
Or does this software "facilitate" $20 million of revenue, instead of generate it single handedly.
What if were talking about a car sales website that 'generates' $20 million in revenue via selling 500 $40k cars?
After you've got a working time window for getting things right, prepare a workflow that should take half the time you've discussed, as it will probably take twice the time than anticipated. (if you've negotiated on 3 months of fixing the mess, assume you have only 1.5 months or even 1 month and prepare 1 month's worth of work)
Then I think the very first thing should be moving to Git (or other SVN), setup development/staging environment and using CI/CD.
After making 100% sure the environments are separated, start writing tests. Perhaps not hundreds or thousands at this stage, but ones that catch up the critical/big failures at least.
After it start moving to a dependency manager and resolving multiple-version conflicts in the process.
Then find the most repeated parts of the code and start refactoring them.
As you have more time you can start organizing code more and more.
It sucks but it's not something that can't be fixed.
Also finally, given the work environment before you came, it might be a good idea to block pushes to the master/production branch and only accept it through PRs with all the tests requiring to pass, to prevent breaking anything in production.
I would: 1. Get it in source control without “fixing anything”. 2. Get a clone of the prod server up and running, vs a clown of the db. 3. Put in something to log all of the request/response pairs. 4. Take snapshots of the database at several time points and note where they occur on the log history from number 3.
You now have the raw material to make test cases that verify the system works as it did before, but for bug, when you refactor. If the same set of requests creates the same overall db changes and response messages, you “pass tests”.
First thing to refactor is stochastic code. Make it consistent even if it’s a little slower so you can test.
Once you can refactor, you can do anything. Including a full rewrite but in steps that don’t break it.
If you try to rewrite it from scratch it will probably just never be deployable. But you’d an rewrite it safely in chunks with the above.
1. Convince the business team that these team members might leave and put the $20mn revenue at risk. There is no way you can make them learn and do things properly. Therefore, take separate budget, hire a new separate team. Do full rewrite of backend and plug the app and new website into it. It would be 1-2 year project with high chance of getting failed (on big bang release...stressful and large chance of you getting fired but once done you can fire the oldies and give the business team a completely new setup and team) or partial failed (that means large part of traffic would move to new system but some parts would remain...making the whole transition slow, painful and complex plus never ending).
2. Add newer strong and senior php members to the existing team. Ask new senior members to not fight with them but train them. They would listen to them as these guys would know more. Slowly add version control, staging-dev envs, add php framework for new code, add caching, CI/CD pipeline, bring on a automated test suite built by external agency etc. This would be low risk as business team would see immediate benefits/speedups. Rewrite portions of code which are too rusty and remove code bases which do not required anymore. This would be possibly take 5-6 years to complete, giving you ample job security while achieving results in a stable manner.
There's really only way to help improve a codebase / development process in a situation like this: one small incremental step after another, for a very very very long time. If you don't think you can enjoy that and have the patience to stay with the problem for a few years, consider looking for another job.
You first need to fixup obvious brokenness, turn on error logging and warnings within fpm, next fix absolute path issues, next fix any containerization issues (deps, etc) and containerized it, next roll out some sort linter and formatter.
At this point you have a ci system with standardized formatting and linting now slowly part out things or do a full rewrite as you now can read the code make changes locally
1. Commit the entire production codebase to git and push it to a host (GitHub would be easiest here)
2. Set up a cron that runs once every ten minutes and commits ALL changes (with a dummy commit message) and pushes the result
Now you have a repo that's capturing changes. If someone messes up you have a chance to recover. You can also keep track of what changes are being applied using the commit log.
You can put this in place without anyone having to change their current processes.
Obviously you should aim to get them to use git properly, with proper commit messages - and eventually with production deploys happening from your git repository rather then people editing files in production!
But you can get a lot of value straight away from using this trick.
It's basically a form of git scraping: https://simonwillison.net/2020/Oct/9/git-scraping/
Second: Doing a full rewrite with a junior team is not going to end well. They’ll just make other mistakes in the rewritten app, and then you’ll be back where your started.
You need to gradually introduce better engineering practices, while at the same time keeping the project up and running (i.e. meeting business needs). I’d start with introducing revision control (git), then some static testing (phpstan, eslint), then some CI to run the test automatically, then unit/integration tests (phpunit), etc. These things should be introduced one at a time and over a timespan of months probably.
I’d also have a sort of long term technical vision to strive against, like “we are going to move away from our home-written framework towards Laravel”, or “we are moving towards building the client with React Native”, or whatever you think is a good end outcome.
You also need to shield the team from upper management and let them just focus on the engineering stuff. This means you need to understand the business side, and advocate for your team and product in the rest of the organization.
You have a lot of work ahead of you. Be communicative and strive towards letting people and business grow. I can see you focus a lot on the technical aspects. Try to not let that consume too much of your attention, but try to shift towards business and people instead.
A full rewrite of a functional 12-year old application? Yea, you're going to waste years and deliver something that is functionaly worse than what you have. It took 12-years to build it would realistically take years to rebuild. Fixing this will take years and honestly some serious skill.
What you want to do is build something in front of your mudball application. For the most part your application will be working. It's just a mudball.
Step 0. Make management and HQ understand the state of the application. To do this I would make a presentation explaining and showing best practices from various project docs and then show what you have. Without this step, everything else is pointless.
If they don't understand how bad it is. You will fail. Failure is the only option.
If the team is not willing to change and you're not able to force change then you're going to fail.
So once you have the ability to implement changes.
Step 1. Add version control.
Step 2. Add a deployment process to stop coding developing in production.
Step 3. Standardise the development env.
If you have views and not intermingled php & html:
Step 4. Start a new frontend and create endpoints that reuse the original code to return json for all the variables.
Step 4. Add views. Copy all the html into another file and then make a note of the variables. Step 5. Start a new frontend and create endpoints that reuse the original code to return json for all the variables.
... Carry on moving things over to the new frontend until everything is in the frontend.
Probably a year later.
Step 6. When adding new functionality you can either rewrite that section, do a decorator approach, or edit the original functionality.
That's without fixing the database mess or infra mess.
I mean, if you see this a fantastic opportunity to grow or whatever then fine, have at it.
However, you’re going to be fighting a two-front battle, both against the devs and against management, for widely different reasons. It’s going to take a toll on you.
Ask yourself if you really want to spend the next few years doing work you probably won’t see any recognition for.
Practically: cut the bleeding, get the current team at least using version control and working with a CI environment. That will be a lot of effort (been there before with a similar .Net product but much better team).
Then you're going to need significant resources to re-build on a modern architecture. I would simply go with releasing another product if that's at all possible. You clearly have some market and channel to sell into.
Just beware: this sounds like a problem which will take 3-5 years to solve and whose chance of success is dependant on organisational buy-in. So you need to ask yourself if you're willing to commit to that. If not, quit early.
The best solution - for me - ended up dropping them as a client. There was zero interest in change from both developers and management (no matter how senior).
We parted ways and I wished them good luck.
Occasionally I wonder what happened to the application containing 50,000 procedural PHP files. Yes, 50k. And no source control or off-server backup.
This is the key point. Why is there resistence to change if everything is as bad as you say? How does tings look from the perspecive of the developers?
A strategy you can use is to incorporate any refactor into the estimates for a "new feature" development with the idea being that if you have to touch this part of the codebase that it gets refactored.
In this case since there's no framework I suggest to have a framework gradually take over the functionality of the monolith and the fact all the routes are in nginx will actually help you here because you can just redirect the route to the new framework when the functionality is refactored and ported into the new framework.
Do not refactor the database as interoperability between the legacy project and the new project can fail although migrations should be executed in the new project.
What I do suggest is to get development, staging, pre-production and production environments going because you will have to write a lot of pure selenium tests to validate that you didn't break important features and that you did correctly recreate/support the expected functionality.
You can run these validation tests against a pre-production environment with a copy of production. This also gives you feedback if your migrations worked.
On the team, that's the hard part. If they walk out on you, you will lose all context of how this thing worked.
As precaution, get them to record a lot of video walkthroughs of the code as documentation and keep them on maintaining the old project while you educate them on how to work in the new system. The video walkthroughs will be around forever and is a good training base for new senior devs you bring in.
Last, make sure you have good analytics (amplitude for example) so you know which features are actually used. Features that nobody uses can just be deleted.
Over time, you will have ported all the functionality that mattered to the new project and feature development in the new project will go much faster (balancing out the time lost refactoring).
A business making 20 million/year should be able to afford a proper dev-team though, what are they doing with all that money?
You should be able to get budget for a team of 5 seniors and leave the juniors on maintenance of the old system.
Some these things are terrible choices but some of these are just weird choices that aren't neccesarily terrible or a minor inconvinence at most.
E.g. no source control - obviously that is terrible. But its also trivial to rectify. You could have fixed that in less time it took to write this post.
Otoh "it runs on php" - i know php aint cool anymore, but sheesh not being cool has no bearing on how maintainable something is.
> "it doesn't use composer or any dependency management. It's all require_once."
A weird choice, and one that certainly a bit messy, but hardly the end of the world in and of itself.
>it doesn't use any framework
What really matters is if its a mess of spaghetti code. You can do that with or without a framework.
> no caching ( but there is memcached but only used for sessions ...)
Is performance unacceptable? If no, then then sounds like the right choice (premature optimization)...
> the database structure is the same mess, no migrations, etc... When adding a column, because of the volume of data, they add a new table with a join.
Not ideal... but also pretty minor.
Anyways, my point is that what you're describing is definitely unideal, but on the scale of legacy nightmeres seems not that bad.
- Large fraction of features are unused. Have internal analytics that will answer you which features/code paths are used and which are safe to delete/ignore. It's much easier to migrate xx% of features than have 1:1 parity.
- Lack of tests is a huge pain. Makes incremental migration near impossible. Find a workaround for it before jumping to migration (forcing huge code coverage increase for already submitted code never worked for me in the past)
- See if some parts can be proxied. Put proxies in place and migrate features behind it (in one past project, the logic was split between stored procedures in Oracle DB, backend code and js code -- which made it possible to proxy stored procedures and break the migration in milestones)
- Hackatons are great tool for exploring options, uncovering blockers and dedicating a large chunk of focused time. Make it clear that the result is experimental, not that it must be merged to main. A nice way for introducing frameworks, vcs etc. without high friction.
The rest depends on the management support, the teams aptitude, intake of feature requests & bugs, the difficulty of maintenance etc. You are the best to judge how to approach there.
I inherited something similar 12 years ago, also cobbled together PHP, also no separation of code and rendering - making any sort of progress was painful.
As others have said there are a myriad of ways to extend code like this, encapsulating the old with a better facade. Splitting some pieces off - but it needs to be approached as a piecemeal project that takes a decent amount of time, but can be done in parallel with shipping new features.
Whatever else you do, I hope you and the organization figure out how to celebrate that those three people are generating 20 million dollars of revenue (or at least keeping part of the machinery that does that running.
"I know a full rewrite is necessary, but how to balance it?"
How much code is it? How much traffic does it receive?
We also introduced git as well as dev and staging tiers and some agile methodologies. Definitely do some that first!
Now, as management and customers are happy, the backend can be refactored step by step. Here, more test coverage might come in handy.
So, I'd recommend to be a bit picky about where to create value. You can restructure the whole database and that'll be good for maintenance (and most likely performance) but management & customers won't literally "see" much. Ask the people with the money for their preferences, excite them to get more runway. Regarding "backend stuff": Think like a Microservice architect and identify components that are least strongly coupled and have a big (performance) impact. Work on those when management is happy and you've got plenty of budget.
Your job is to create value and reduce risk. Not to create something that's technically awesome ;)
At least half of the stuff you listed will probably never change. Congrats! Being the senior person means becoming comfortable with people making objectively worse decisions than you would, and putting the structure and architecture in place so that it still works anyway. As a bonus, most of those “objectively worse” decisions can be really good and better suited for the team than your decisions would have been ;).
Unless you have power at the executive level, or are brought in as an expensive consultant to make big changes , you are wasting your time.
I would tell you to stick around and shovel shit just to take cash home but from your post it doesn’t sound like you are happy there to begin with.
What you are seeing here is a symptom of leadership not valuing engineer so trying to improve this requires a culture change from the top which is highly unlikely from where you stand.
If the pay is really good, you might consider sticking with it for a bit and then move on. However if you feel like it will push you towards burnout, abandon ship ASAP.
My younger self would have stayed and tried to be the unsung hero but now that I’m older I chuckle at that foolishness.
Don’t be the silent hero.
> Resistance to change is huge.
These 2 quotes tell me they haven't yet recognized the grave danger and pain of their complexity. They will eventually, but for now neither management nor the team seem open to the radical change which they desperately need. Eventually collapse will come, but for now it's a no-win situation for you. Unless the money is insanely good and worth the stress, best path is to get the heck out.
What industry and type of business is this?
You have inherited working code generating revenue but in state which makes it hard to develop new features and manage productively.
As you say that the roadmap is agressive and management has no understanding of the situation, you have already established what you have to do: explain to management what makes development difficult (avoid statements like this is the worst and focus on what needs to be done to establish best practices and identify where the quick wins to gain development velocity are - more expertise and less judgement is always a good idea). Then you propose a realistic roadmap and start making the changes that need to be done.
If you stay you need to manage your relationship with the management team. This involves the usual reporting, lunches etc. You need to setup some sort of metrics immediately. Just quarterly might be sufficient. Nobody is going to care about bug fix counts, your metrics should be around features.
Testing and version control are a good place to start. But you are going to need to get them started there and you will pretty much need to instill good discipline. You will be herding cats for quite a while. If you can't get these two items going well in 3 months then abort and leave. You don't want to stick around for when the money printer stops working and nobody can figure out why.
At least from a technical perspective, the key to making this manageable is not a re-write, that’s probably the worst approach especially if you have little to no buy-in from above. From a business perspective, a re-write provides little to no benefit and will only be a large cost and time sink, so you will never get buy-in on that anyways.
The key here is slow, progressive improvement. For example, get it in source control, that’s a relatively simple task and provides an endless amount of benefit. The next step which is a bit more complicated, is get a way to run this in a local development environment.
Getting a local environment for this type of situation can certainly be tough, and you have to be prepared to accept what may be considered a “non-optimal” solution for it. Does your code have a bunch of hard coded credentials and URLs that you would accidentally hit 3rd party services or databases from local and cause problems? The answer to that is NOT to try and extract all those things, because that will take a ton of time and you have no test environment. Instead cut the container off from all internet access and add-in a proxy container to it and give it a container it can proxy through for outbound, then you can explicitly control what it can reach and what it can’t, now you can progressively fix the hard coded problems.
Basically the key is to accept that shooting for “ideal” here is a mistake, and you have to sneak the small progressive improvements alongside meeting the business goals that have been set for the team.
In my experience, if you can sneak some simpler but very impactful changes in, then demonstrate how those help deliver on things, it will be easier to get buy in. If you can point to be being able to deliver a feature weeks ahead of previous estimates and attribute it to say having a sane deployment strategy, or a local dev environment, the advantages become clearer from a business perspective. If you say “we need time to fix this” but have no data or concrete examples of how this helps the business, you won’t get buy in.
1) A rewrite from scratch is almost always a bad idea, especially if the business side is doing just fine. By the way, when you want to sell a rewrite, you don't sell a rewrite, you sell an investment in a new product (with a new team) and a migration path; it's a different mindset, and you have to show business value in the new product (still ends up failing most of the time, but it has a better chance of getting approved).
2) You never ever try to change people (or yourself) directly. It's doomed to failure. You change the environment, then the environment changes the people (if the changes are slow and inertia is working for you, otherwise people just leave).
Since probably it would be too hard to change the environment by yourself and given that your team seems fine with the status quo, my advice it to just manage things as they are while you look for another job. Otherwise my bet is that your life will be miserable.
So uh, good luck. You're going to be the one everyone hates.
I'd just quit in your shoes, to be completely honest. Your desire for a solid foundation will never be seen as anything but a roadblock to an organization that just wants more floors added to the house with reckless abandon for safety.
Any securities gained by improvements you champion will go unnoticed. You will be blamed when the inevitable downtime from molding a mountain of shit into less of a mountain of shit happens.
You are going to lose this fight. Please just quit and go work for a software engineering organization, you seem to have taken a job at a sausage factory for some reason. I'd also try to learn from that...
So let's stick to advice that is universal to all roles and I think most people who have been in similar situations would agree with. First, let's be clear about one thing: This situation isn't the least bit unusual. From the facts above it doesn't look very bad. The team is small, and you can all gather in the same room and communicate. The fact that there is no framework and no patterns in place is good given the circumstances, awful codebases based on ancient frameworks and legacy patterns are generally an order of magnitude more work to understand.
Second, be humble towards the team and the problem. After such a long time, there's bound to be details that you don't know, and you have to find out about them sooner rather than later. People may seem resistant to change, but understand their angle and work with them. The likely want their codebase to improve, too, even if they see other problems as more pressing. It all depends on what your role is, and if you intent to help out with the actual work or not. But again, this is a small team with a shared goal.
Third, start with the lowest hanging fruit. Personal opinions come into play here, but I probably would look at operational issues early. Get monitoring in place. Test backups (yes, really). Some key metrics, both application wise (on some key processes such as login or payments) and operational (memory, open files, sockets). Learn about version control and start using it. Get proper test environments in place (including databases and mocked external integrations).
Good luck! Things are probably not as bad as you think. This type of work is really quite rewarding, because results are quickly very visible to everyone.
I suggested full rewrite and got fired in 3 weeks (actually it was a subcontractor role). I have been considering myself as be really good at presentations and persuading executive people to understand what I am doing and what I will be doing, but the situation was too much for me to take on. They didn't like this unrealistic 3-months roadmap to rewrite the whole thing, which does nothing on their point of view but still needs paying the whole team (even though I was the only one). So I told them we are gradually improving it, and did this full-rewrite underground on my own. It consumed me ~13 hours every day, but I was happy myself and was enjoying the birth of the product. Finally after 10 weeks, I gave up to myself and their frustration.
Regarding your problem, I totally suggest dumping your codebase into a git repo first of all, add some cypress/playwright testing to carefully probe the major functionalities, build ci for these, and start gradually removing old version files. After then, just forget how messy it was, what you thought in the first place, consider this beast as a perfect engineering gift (like linux kernel), then start making small changes then adapt yourself into it. Guide the team to follow your methodologies to treat the code, and tell the executive team that the legacy codebase looks great but complex enough to move quickly as it was a brand new startup project.