Ask HN: Do I need AWS? Or am I thinking this wrong?
I'm pretty comfortable with AWS from work experience, but I also recall the insane bills we paid. Of course, the place I worked at was/is doing great so the insane bills aren't a problem for them (when we're talking billions in revenue, a few million per year on AWS is okay I guess). They also have many, many petabytes of data and millions of users, though I'd argue there is also a ton of over-engineering.
Now that I'm working on my own company, I'm curious about alternatives to AWS.
I'm keeping things simple, so I've got mostly Go services, pg + caching, and a svelte webapp. I deployed my Go services on a low-ish end bare metal provider, and for now it is fine. Deployments are triggered via scripts, and so far so good. Is it sexy, using all the latest and greatest tech? No, its just simple shell scripts. But it works?
I also benchmarked each endpoint with tens of millions of records (not a whole lot but still) and I'm seeing a pretty good latency to throughput ratio. In fact, the performance is better than what we got during peak at work, and this setup is costing me tens of dollars a month.
That makes me think if I even ever really need AWS. If I ever need to do multi region, I can just spin up a new machine there. CDN covers all static content.
Am I wrong to think that I could probably scale like crazy and avoid AWS completely with my stack? Why should I pay hundreds/thousands per month plus a premium for bandwidth? I'm also enjoying staying sane avoiding IAM.
What are you doing for backups and redundancy? The main reason I use RDS and so forth is I pay AWS to do that for me. I do have a script which backs up my database to an on premise host so I can put it on a tape, but that is faintly ridiculous when AWS is so good.
What do you think would be more beneficial to your company in the short and medium term?
> I could probably scale like crazy and avoid AWS completely with my stack
You don't need AWS for that (although they do a good job convincing you otherwise).
AWS didn't invent anything. They just simplify (or make complex) systems in order to send you hefty bill.
How baremetal are we talking? Hardware dies. Can you deal with the outage?
If your current setup starts to feel insufficiently robust, then there are several options to consider before AWS, namely things like Heroku, Render, and Fly.io. You'll hear critical reviews about all of these, so do your research first.
Personally, I have used Heroku in production at more than one place, and plan on using it again.
If you're not using AWS services, there is no good reason to stay with AWS.
But if you see yourself reimplementing AWS services on your own stack, you should fire up Excel and do the math which is more cost-effective.
Your business is delivering product to your customers, not building and maintaining the technology to do so.
As a startup it's not about scaling.
Either you have to scale than you should have the revenue or VC for doing so or you don't need it.
Cloud provider is about were to spend your resources: do you want to build features which make money or do you want to manage your deploy scripts and build monitoring etc yourself.
Using any cloud only makes sense if a) you don't know how to setup things yourself and b) your project is so big, that salaries for ops would be higher than cloud cost.
Also, be careful with b) since you still need people that know how to do stuff, otherwise, you will experience either data breach or long downtimes.
What most people won't admit when recommending cloud is that, yes, cloud has all sorts of utility built into, like data replication, backups, scaling, etc. but things go wrong in cloud as well and if you don't have good knowledge of how things work, you will be in for a wild ride.
IMO, it is much better to learn setup things yourself, even in non-optimized way and have some knowledge on what to do when stuff begins to break.
> your project is so big, that salaries for ops would be higher than cloud cost.
Couldn't it be the other way? On lower scale saving significant amounts of time just to save a few hundreds or thousands might not be worth it. OTH if you're paying millions to AWS hiring an extra person or two to save 20-50% (or whatever) might be a very good deal.
Location plays a role here. If you are in the industry where your people are paid 200k usd minimum then cloud will probably pay off, if you can pay them 1/5 of that, then it is probably another way around. Ahrefs cloud story reminds me of this in some way.
Well kinda, there’re indeed some services can cost more hosting than cloud hosting especially when it comes to geo-location, auto scaling, preventing dead servers but there’re are companies that maintain data centers, etc it depends
This is so wrong.
Cloud is for DECLARE the setup, if you don't care or you don't know then avoid cloud.
On-premise you can find always dodgy scripts with no copy, no backups, some people praying everyday if the service restart he'll quit, or something working but you don't know why and the worst one, everyone afraid of update anything
The cloud is that but with TerraForm, or god forbid, CloudFormation.
I don't know your experience, but mine is what I stated and I have been through two on-prem to cloud migration, one done with outside help and one without, one at Fortune 500 company with fairly, not Google scale, but large setup. General theme was that if can't get your things straight on-prem, you surely won't be able to manage cloud as it adds additional level of complexity on top of your usual setup.
I would change your b to, your project needs a lot of scalability. You will pay big salaries to make sure it is run correctly.
I’m biased, but I originally came from the sort of setup you have into AWS. And I’ve never looked back. The time I spent tinkering with bare metal servers is just wasted. AWS has figured it out. Yes, there’s a bit of premium you’ll pay for that. But that’s your own time saved. Only you can tell what’s your time worth. But as an engineer it’s probably a lot and every hour you spend tinkering can probably pay for the whole AWS monthly bill.
For me, the strength of AWS is in scalability and reproducibility. I use AWS CDK to define infra as code. Which allows me to make changes with confidence. Test changes in other environments. Use CI/CD and pipelines even on small projects.
I only needed to figure this out once. Then all of it is reused across any project from first commit to deployed infra is matter of minutes. When I’m done I can tear it down in minutes.
AWS re-invents the wheel so that you don't have to!
You could also just use the wheel.
Changing something for the sake of changing something is meaningless. If whatever works for you works - you're fine, don't change a thing.
Though, I'd comment that you don't need to go all-in in AWS, you can use EC2 instances the same way as bare metal servers from Hetzner and get some cool benefits, if you need them (and ok with 10x+ extra costs) - easy backup/snapshots, migrations, better server access and management (I like having SSM to connect to the server with MFA).
If you really want to use AWS, AWS Lightsail is the service to look into.
Probably unpopular opinion here, but my policy is,
If I can choose to not pay rent, I will choose to not pay rent.
I also considered AWS but then decided to go with Hetzner. For me it's a learning experience as well so I enjoy figuring out stuff. AWS would have cost 3-4x although I am using RDS for now.
I have scripts but use Ansible to setup snapshots to spin new servers from. And Terraform to configure everything. The biggest downside as I see it, is that _if_ I had to scale up the stack there would be issues regarding auto-scaling and additional regions.
However, Im not worried about that and if it happens, I probably have money to move back AWS - at least in part.
So my advice is that if it works and the servers stay up - keep it. AWS is no silver bullet and moreover, if you have little to no revenue it's better to keep costs down. As a small company I think you get a little more lee-way also regarding server downtime.
I help big companies reduce their cloud costs. Frequently I recommend moving Postgres and Kubernetes to on-prem.
When you’re spending over $1 million per year on either of those then you’ll save a lot of money by moving to on-prem.
The ideal cloud lifecycle is to iterate quickly on the cloud while you find product market fit, then when you know what you need start moving to on-prem to save money.
I don’t think you’ll ever need to migrate to AWS.
Is one million really the point where on prem becomes cheaper? I would think it’s much more like 10 million. Between having to pay server engineers and regional distribution and power costs I’d think it adds up to far more than 1 million. Where am I doing my math wrong?
In a large company the engineers , regional distribution etc are already there and pis for by other parts of the company I think. Your math is otherwise correct.
If you can do things yourself or with your staff, you probably don't need aws.
But when the cost of you doing something (either direct or opportunity cost) becomes higher than that of using aws (or another cloud provider), you should use them.
Yes, you can get really really really far without needing to scale if you spend a lot of energy and effort on optimisation, but that's time you're not doing biz dev, or building out capability.
And even now you probably are currently spending a lot of energy on things that are useless outside of saving $ you'd pay to Amazon. As you grow, the cost:value of doing these things yourselfor even in-house changes.
If you're successful and scaling, it's often waaaaay easier and cheaper to throw $$ at a problem short term than getting engineers to actually prioritise and look at it.
This is mostly FUD. A single 8core server made in the last 10 years, with 64GB RAM and SSD/NVME drives, can saturate a 10gb/s link even with some db lookups thrown into the mix. How many lifestyle businesses would even need more than this, aside from backups and redundancy?
Many people underestimate how much easier it is as a business to throw money at things than to actually solve possibly hard problems. "Database failover and scale to z1d.12xlarge" as a strategy is insanely cheap in comparison to spending days/weeks/months trying to prevent spikes from impacting performance.
As much as folks here might not want to hear it, throwing an army of mediocre (or, ideally, decent) developers at a problem and paying through the nose for managed infrastructure often winds up being a much (much!) cheaper way to arrive at a good outcome than a smaller number of more skilled engineers with a shoestring infrastructure budget.
With (if using Postgres) WAL and log shipping, ZFS send/receive snapshots every minute, and other dead-simple setups, a single developer can easily protect against hardware failure, have solid backups, and unless using an absolutely horrific stack, can easily handle a very large amount of business with the hardware I have mentioned.
Have you ever worked with mediocre developers? Have you ever been responsible for a built-out implementation of the sort you are talking about?
And, have you ever actually run intensive tasks on hardware where you owned the entirety of the system and could have visibility into the full OS? (Many VM providers oversubscribe their CPU and RAM, and, you have no visibility into memory bandwidth performance either.)
What you need to understand (at any given point in time) is:
- how is the complexity & scale of your system going? Make projections
- identify potential breaking points in this
- figure out roughly how long it'd take to migrate to e.g. AWS
- the cost of migration (not only the diff between your current costs and AWS costs, but the opportunity costs all the projects you won't do while you're migrating)
- now you have your "buffer" figured out, as in, when you need to start acting.
It's good, don't touch it. If you need AWS you'll know.
A low complexity org with low complexity tech stack (both good things) can serve a ton users, and don't have much benefit from huge scale, elasticity and features of AWS.
In particular network transfer fees on AWS are ridiculous, IMO.
This all depends on product requirements. But the single simplest reason for choosing a cloud provider like AWS is reliability. If a region or machine goes down, how do you handle it? AWS has the best reliability among cloud providers, so by choosing them, the odds you even need to think about it are minimized. And they have answers to multi-AZ and regions. But if a few mins of downtime every few months isn’t the end if the world, by all means save yourself the trouble until it is.
You are doing just fine. If you need to eventually scale or just add reliability, spend some time learning HAProxy and slap that in front of a couple of nodes.
If you’re concerned about security, spend some time learning SELinux. I’m assuming you’ve already done the reasonable defaults like public key required for ssh, no root login, fail2ban, etc.
It sounds like you are doing things right!
When I pay for cloud infrastructure myself, my first stop is Hetzner because they combine low prices with good service. I do see some complaints on HN occasionally about Hetzner, so do some research. I have never had problems myself with them.
I worked at Google for a while ten years ago, so out of nostalgia I sometimes use GCP but I watch my spend.
You seem to manage fine. I do think most projects do not need AWS, and at work we run most of our infrastructure on our own servers (colo, we rent racks and buy servers).
Remember to consider more annoying points that can often become hidden costs, or hidden risks: - backups (and more importantly, restores :) ) - maintenance and security (you are in charge of the lower levels too) - decent authentication, authorization, and accounting (AAA) - SLA, both 'in theory' (what does the contract with your provider say) and 'in practice' (how much would an interruption could cost you and what happens if the provider does not respect SLA at all) - how do you handle outages (ideally you want to have a plan to know what to do when the obvious things go wrong. Then for the non-obvious things) - how do you transfer knowledge to a potential new employee. Is there only one person that know how something works. (called 'bus factor')
And yes, most of those also apply if you use AWS, but typically they would cover different parts or have different risks. And usually some associated product you can pay to manage that risk
If you find yourself constantly reinventing the wheel when AWS offers a managed service that handles that for you, it might be time to switch to AWS. Some examples might be: writing a complicated postgres backup solution instead of using RDS Automated Backups, or running your own Kubernetes control plane instead of using EKS. But if you have an infrastructure that is easy for yourself to maintain, then spend your effort generating value elsewhere.
Depending on your current provider, there may be advantages moving your existing bare metal server to EC2. For example, what happens if the drive on your server fails? You'll probably need to write a backup script and write documentation on how to restore from it. If that happened to an EC2, AWS would just restart your instance on a new host and your EBS volume would come with it, and automated snapshots can be set up with a single click. Security groups are simpler to set up than configuring a Linux firewall. Lastly, EC2 has support for automated horizontal scaling with on-demand pricing and Spot instances. But none of these are a must-have, they are conveniences, and you pay a premium cost for it. Hetzner has servers with 64 GB memory for €37 per month, and a similarly-specced EC2 will easily be 10x that cost.
> similarly-specced EC2 will easily be 10x that cost.
I actually tried this, it's closer to 200-300x. The number of premium add on fees that appear as you build a higher performing EC2 server is just stupid.
It's not so bad if you scale out horizontally instead of vertically. But that complicates the architecture when other providers don't charge crazy pricing for vertical scaling.
Can you give some specific examples of the add-on fees when vertically scaling EC2 instances?
Dedicated vCPU requires a "enable dedicated vCPU" service that is $2/hr (per region). It starts billing you as soon as you enable a dedicated vCPU. I had no idea this existed till I saw the charges.
Max IOPS without paying for provisioned IOPS is 3000 (gp3 stock setting). Much lower than other providers and will cap your database performance. That's combined too, 1500 read and 1500 write. Throughput is 125mb/s.
Hetzner gives you 45k if memory serves, vultr gives you a ton, akamai cloud gives you at least 40k and some instances give you 125k. With throughput in the >3GB/s range.
Provisioned IOPS cost thousands per month to reach these levels via io2 volume. Io2 is capped at 45k if memory serves unless you have a bare metal instance. Which is a whole new pricing tier I didn't investigate.
I believe it is cheaper per IOP to go wide with gp3, complete with additional compute, than to provision io2. Which really kills the whole "go tall not wide" architecture strategy.
Big price bump with dedicated vCPU encourages wide rather than tall also. You can buy a ton of compute for that $2/hr fee.
Moral of the story. Don't try to go tall on Amazon.
AWS has some complicated features that they don't really explain well. In your case, you're talking about "Dedicated vCPU" having a $2/hr fee, and this was confusing to me because I don't recall this fee existing. It turns out they have two completely different offerings for dedicated hardware: Dedicated Hosts and Dedicated Instances. The latter has the $2/hr fee, but I've only ever used the former which has no fee. I'm still not exactly sure what the advantages are for the latter and why those advantages cost $2/hr.
1: When you use a Dedicated Host, you won't see a "fee" on your bill, but you are charged more. As an example, in us-east-1, an m5 Dedicated Host costs 10% more than a m5.metal AKA m5.24xlarge. It's about $0.40/hr more expensive. You'd think it would get cheaper since you're buying in bulk, but I can understand this actually maybe is not preferable for AWS because it is difficult to find a completely unused server on demand.
> I'm keeping things simple, so I've got mostly Go services, pg + caching, and a svelte webapp. I deployed my Go services on a low-ish end bare metal provider, and for now it is fine. Deployments are triggered via scripts, and so far so good. Is it sexy, using all the latest and greatest tech? No, its just simple shell scripts. But it works?
Don't change a thing. This is perfect.
> Am I wrong to think that I could probably scale like crazy and avoid AWS completely with my stack? Why should I pay hundreds/thousands per month plus a premium for bandwidth? I'm also enjoying staying sane avoiding IAM.
You're not wrong at all. Check out the hardware stack for Stack Overflow as of 2016: https://nickcraver.com/blog/2016/02/17/stack-overflow-the-ar...
Don't over think it. Focus on the software and its features. Focus on getting users and ramping up your MRR.
One thing I would point out is that the simplicity of StackOverflow's hardware stack means that it would also be very simple to host in a cloud provider like AWS.
I would also point out that StackOverflow isn't an incredibly good performer compared to a number of other websites. Check it out on PageSpeed Insights. I also used a couple other tools that suggest that it loads slower in some regions than others.
> Check it out on PageSpeed Insights. I also used a couple other tools that suggest that it loads slower in some regions than others.
And yet Stack Overflow has such strong SEO that outright cloning their content on a malware page will get you top 10 results.
I think people seriously overestimate the value of Web Core Vitals. At the end of the day a website is usually trying to deliver some value to a visitor: if you deliver more value by rejecting complexity, but sacrifice some speed to do it: users will still value the end result, and search engines will reward you.
Is StackOverflow slow? Maybe. Has this ever bothered me as a user? Nope.
I would not give much on those gambled page speed analysis. It comes down to: is ist fast enough? The answer clearly was yes during all those years for the users.
This is exactly right.
> One thing I would point out is that the simplicity of StackOverflow's hardware stack means that it would also be very simple to host in a cloud provider like AWS.
Yes, of course. Every cloud provider also does PaaS, but often it is also not the most cost effective option - often it is the most expensive one!
The cheapest option is often the option where toy get reliant on cloud-specific sdks, serverless, etc.
Simple, sure, but AWS still charges insane rates.
I can get 60k IOPS on a nanode at $5/mo rate. You actually get a random amount between 20k-60k depending on the type of machine they provision your nanode on.
You want 60k IOPS on AWS? Be prepared to pay like 4-5k/mo. Want multiple systems with it so you can cluster a reasonably performance database? Pay for each and you need to special request the ability to go past 100k IOPS in a region.
Want the privilege of a dedicated vCPU? It's $1500/mo per region just to turn on the feature.
I tried to use AWS just as you mention here and the cost was 100-1000x what it would cost me at akamai cloud or Hetzner. All these fees I was unfamiliar with popped out and the cost was crazy high.
The system I tried to provision was functionally similar to the bottom end dedicated akamai cloud offering and it was going to cost 15-20k/mo instead of $72. Over 200x. And that was just the hourly provisioned rate, not including egress and other bits...
A m7gd.medium looks like it would do all that inexpensively.
Not sure about arm, different ISA is a pain.
But the base configuration, for the money, is pretty sad compared to what you get else where. Scaling up is horrendously expensive compared to other providers.
Why would you bother with AWS for this I don't even know. There is no draw to AWS if you are only using EC2. You couple your architecture to their specialized services and go all in and eat the platform risk, and long term cost.
Or... don't even bother with AWS. It makes no sense to use just EC2.
These high costs are Amazon's way of telling you that you aren't architecting for the cloud. In my ~10 years doing cloud infrastrcuture on AWS I've never worked at a company had to purchase dedicated IOPS or vCPU.
Which takes then off the list of providers you can do this with. They could have reasonable prices but they don't.
You couple to AWS with your whole system architecture or you stay away.
If you tell me I need to go 250 miles per hour to get to the grocery store then I guess I have to buy a Bugatti.
The advantage of the cloud was supposed to be elastic compute, but what this thread is trying to claim is that's not the case.
Can't carry much in a Bugatti.
It's a real problem if you build on a VPS as your base. I was asked to prep such a system for scale on AWS and choked on the pricing. So we settled for what they offer at reasonable pricing and if further growth is needed, will need to hit eject on AWS and just go elsewhere.
I recommend starting elsewhere and save the migration of you just use EC2.
Most places will let you scale a VPS quite well and there is a logical handoff to your own hardware where you can scale to a system with 512 vCPU and TBs of ram for less than Amazon is charging for step two.
Or maybe you never worked in certain domains of problems ? Not everything is a web app
This. AWS can be as simple or complex as you want it to be.
Just not cheap.
It's definitely cheap(er) if you have very predictable load patterns and need to scale up and down fast.
Yeah, I have to strongly disagree with that. Sure, AWS can be stupidly expensive, however it can also save you a lot of money.
A few years ago, I was the VPoE for a small-ish startup in Denmark. I was the only ops person, and I was able to provide 100% uptime for a year, including a migration from k8s clusters, AWS AZ outages and whatnot. During that time, I also reduced our monthly spend from $15k to $5k. This was a fully auto-scaling, redundant and highly available system.
When I started, every night I got woken up by alerts (working alongside the CEO and another engineer to fix things). By the time I had stabilised things, the only thing waking us up were our providers having outages.
I used to run a similar business earlier, without AWS. We owned our own hardware and had ops people going to data centres. I can guarantee you we did not spend less than $12-13k/month (AWS fee + my salary) in that company. Think closer to $100-150k/month.
AWS can be cheaper when you factor in the cost of employees. It can also be stupidly expensive when you use it the wrong way.
Another example: I host a small service that gets very seldom, seldom use. Maybe 20-50 people discover it and use it per month. I have the backend running on a Lambda, and it costs me about $0.5 per month. It took me 20 minutes to write the CloudFormation for that and push it through my CI pipeline to have it deployed. There is no way I could get cheaper hosting, uptime/availability, and faster time to market than with a Lambda.
If and when that service becomes more used where it warrants running full time, I’ll rewrite the request handling and throw it in my k8s cluster. But until then, I do believe this is the cheapest solution (for me).
I have a similar story, though from almost 10 years ago, so maybe things have changed. As VPoE for a small startup with mobile messaging apps (something like a couple million MAU and ~200M messages/month), I orchestrated a lift-and-shift from a top tier managed hosting provider to AWS EC2. The result was about a 60% drop in monthly spend on hosting. We used almost nothing except S3 and EC2, with some provisioned IOPs for our SQL backend.
That said, it was a long time ago, and I'm sure competition and prices have changed dramatically.
Yeah that Lambda setup you mention is terribly cheap, and it is one of the very few extremely attractive services pricewise - but many people get lured into using other AWS services that aren't cheap, even though Amazon makes people think so by blurring the total price by splitting it into hours, calculating egress separately, dependencies on services that are billable by the hour (it's not obvious when you start that a private network has the cost of an NAT GW attached to it). Using only Lambda is really cost effective but frankly, few people resist the temptation and use it the way you described.
> it's not obvious when you start that a private network has the cost of an NAT GW attached to it
It’s not actually required. And I don’t mean the completely unrealistic “if you run stuff on your private network that never needs to connect to the internet, then you don’t need a NAT gateway!”. I just mean you can implement routing to the internet in cheaper ways.
If you don’t need 100Gbps highly available outbound traffic routing like a NAT gateway provides… you run a t3.nano instance doing forwarding and NAT that will sustain 30Mbps and burst to 5Gbps for $3.80/mo. Just on raw bandwidth, unless you need more than a t3.large can push (~500Mbps continuous, 5Gbps burst), it’s cheaper to run an EC2 instance as your gateway rather than use a NAT gateway.
> ... I also reduced our monthly spend from $15k to $5k. This was a fully auto-scaling, redundant and highly available system.
I believe you, but most people don't do this, sadly. You very clearly _get_ Cloud, but most don't, and financial optimisations often come much later down the line.
If you want to get your product out fast, its a lot cheaper than paying a bunch of developers to roll up everything for you on bare metal.
Exactly this. The only thing unmentioned, but that you should be aware of, is the effort to deploy this stack from scratch. You can assume the DB survives, or doesn't, but the ability to go from zero to a running deployment from a recent restore is probably the only thing you need to really worry about. If you can get that to be less than hour, or ideally a few minutes, then you're golden.
Don't be afraid to add CDN/Caching when its appropriate.
You can always buy a beefier instance, or a dedicated box.
Deal with the scale later when its truely needed. Give yourself permission, or plan to hire a team, to handle the scale issues later.
This is the most important advice for an early stage startup trying to scale.
Being able to automatically build production from a recent snapshot is a profoundly important point.
When we bootstrapped on the cheap we would always maintain a production infrastructure at one company and a test infrastructure at another with a relatively low TTL dns infrastructure hosted at a third party like dnsmadeeasy. Backups would be pushed to the second site. We then would “build our demo” from the backup data (stripping PII at runtime). But the demo served as a second clone production infrastructure. If the primary went down, it was always a quick restore and scale up at the secondary location with recent backups!
Worked really well, but you needed to make sure it was automated or it would blow up.
Definitely cheap and works like a charm.
I’m sure that’s over complicated for a lot of people, but it’s really just a few scripts and a couple of days to set up, and you can be up and rolling with a working system in the time it takes to change DNS.
There's nothing wrong with feeling this way, but you're comparing a very small company (your own) to a much larger corporation.
The appeal of AWS' "insane" bills are that they are not so "insane" when compared to the salaries required to maintain a home grown stack in-house, especially as complexity increases.
It's not just application complexity, it's organizational complexity. Doing everything yourself is fine until that complexity increases and you start finding out that all your dozens or hundreds of expensive employees are spending a lot of time on home grown overhead.
For example, when you use shell scripts for your deploys, you can't as easily prove things to auditors like you can with a CI/CD system that's tied in to role based access control. That sort of thing will be dealbreaker for customers who expect your company to maintain industry compliance standards.
When you use a product like RDS you can tell your customers that Amazon is the one responsible for handling backups, security patching, etc.
Also keep in mind that anyone with a significant amount of AWS infrastructure should be buying reservations and savings plans. The sticker price of AWS is not what corporations pay.
As you grow your company, I encourage you to focus on the activities that make your business profitable and deliver value to customers. Every hour you spend upgrading a database or patching an operating system is an hour you could have been spending on developing a unique feature in your product that nobody else has.
For now, I'm sure using this setup is much better than getting something too complex for your size, but that can change quickly if you're lucky enough to reach a larger scale.
You make it sound as though there’s some obscene sprawl of interrelated tooling, barely held together with shell scripts, and only OP knows how it all works.
Nothing described is that weird, nor should it have difficulty horizontally scaling when and if the time comes - add another node, and front it with a load balancer.
The problem is that DevOps has shifted so heavily to Dev that “I’m running services natively on a Linux box” is somehow seen as Byzantine and arcane.
> when you use shell scripts for deploys… can’t as easily prove things to auditors as you can with a CI/CD system
What do you think a CI/CD system is running? Also, if /var/log/auth.log isn’t enough, there are other auditing systems available that could make this more granular.
> industry compliance standards
IME, these are a joke, and auditors routinely miss a dizzying amount of glaring problems, because they don’t look beyond records that humans generated.
> every hour you spend upgrading a database or patching an OS…
Patching the OS should be automated. If it isn’t, that’s on you. Ansible isn’t hard to learn.
As to the database, if you can’t read docs (both MySQL and Postgres have excellent documentation on this procedure) and perform them as written, frankly you shouldn’t be dealing with RDS either. It’s not like AWS docs aren’t confusing, spread over a million pages, and sometimes contradictory.
Pardon the seething undertone, but as an ops-heavy SRE/DBRE, I’m very tired of seeing people lambast ops as being somehow beneath modern practices, not worthy of their time, or worse, not cost-effective. A well-written app can absolutely run just fine on a tiny server, and does not need the miasma of shit that is cloud-native. Computers are blindingly fast. Stop demanding infinitely-scaling vCPUs because optimization is hard. Stop pretending that your time is more valuable than minutiae like “my ORM is producing garbage queries resulting in hideous latency because I don’t understand SQL or schema design.”
Try to see the forest for the trees here: ask yourself how does the business make money?
If I throw my application on some PaaS thing and put the database on RDS, let’s say I can hire 5 developers who know almost nothing about infrastructure to develop and deploy the application. Each developer working on the application delivers valuable business logic. Let’s say each developer makes $200,000 total compensation and they bring in $500,000 in revenue. This setup makes me $1,500,000 in operating income per year.
Now you’re saying I should find someone who knows enough about infrastructure architecture who database administration and all the other bare metal Linux fundamental topics to manage it in house rather than paying AWS all this wasteful money.
So now I have 4 developers and 1 Ops/DBA person. My AWS bill was eliminated, saving $100,000 thanks to the infrastructure and database expert doing a wonderful job cost optimizing.
But now my 4 developers have one less person to deliver features that motivate customers to sign deals with us, so now we’re making $1,300,000 operating income. (Losing $300k from the loss of a developer and gaining $100k back from cost savings.
Obviously this is a made up scenario in my favor but that’s basically how it works. Businesses shopping for employees can’t get every skill they want in a job description. One of the most important parts of business strategy is making trade-offs.
This scenario reminds me of a landlord that I had who fixed a simple problem with my dishwasher by just replacing it with a new one. They didn’t give a shit that a little bit of extra knowledge and a spare part would have fixed it for no exaggeration 100x lower cost, because resolving the problem quickly with no risk was worth more to them. In any event, I was giving that landlord many times the cost of that dishwasher every month.
The compliance certifications are kind of like the real estate agent that has a big car payment on their Mercedes so that they can sell more expensive houses to wealthier clients. Yes, we all know that buying a Mercedes isn’t the most cost effective way to get from point A to B, but you can’t sell mansions if you show up in a Kia Rio.
So hire an SRE. You’re gonna have garbage / non-existent CI/CD, IaC, and all the other stuff that makes expansion (and audits, apparently) easier. Bonus, any decent SRE should also be able to code.
I don’t think you _need_ a DBA/DBRE initially, but I do think if you’re in the range of hiring five devs, one of them should know about infrastructure.
In the mid-1990s a large ISP I am aware of, rented "pizzabox" and larger Sun SPARC systems (like SPARCstation 2s; but even the larger systems ran the same OS) to corporate clients.
The team of 5 sysadmins setup, installed, secured and managed 2000 such systems, mostly running SunOS 4.1.x , which didn't have nearly the automation that we have now on newer Unix based OSes.
Using only the bash shell and the many scripts they wrote for managing them. Guess the profitability of that kind of setup? 1 sysadmin per 400 systems... the kind of overhead people think exists, doesn't.
"A large ISP" will have the money to hire a team of 5 experienced sysadmins to be 24/7 on-call to handle 2000 systems.
A small to medium business won't have the 500k to 1M per year pay the sysadmins salaries.
For them AWS or other managed services are the correct choice.
Wouldn't it still be 1 sysadmin per 5 systems instead of 400? It's not like you can hire 1.25% of a sysadmin directly, while on AWS you can you just pay a significant premium.
Sysadmin consulting exists.
Sounds like you've answered your own questions.
You have it figured out.
If you are worried about redundancy, get a second server at a different company with the server in a different part of the country, and back up to it every hour (incremental backups) with a full backup every night, or whatnot.
You can definitely avoid the hyperscalers, but unless your ops are very simple, you'll end up paying for it with the cost of managing your infra yourself. Think about resiliency for example. What happens if the DC you're in goes down? Have you got another DC to direct traffic at? Do you need that? These sorts of questions are the ones that being in "the cloud" makes simpler to answer. I'm of the opinion that not every app needs 99.99999% uptime and not every app needs to be in the cloud though, so godspeed to you should you continue with this approach. There's no technical reason why you can't dramatically scale from this point.
Imo IAM aka OIDC is keeping it simple. Do you want to develop your own auth solution, or focus on features of your service?
For the rest, of course. AWS, GCP, Azure, all too expensive.
Chances are very high you won't even need more than 3 machines, if you'd like to play it safe, 1 if downtime is acceptable.
The challenge is long term. You need to be able to switch after EOL of your current OS version, and easily. Data is something that's expensive. It requires constant backups. Snapshots. If you can afford it, have a dedicated machine or cluster for data. This way you can swap the webapi layer when you receive better offers over the years (cheaper hardware, EOL OS, etc) Do think in blocks or components and layers, but I'm guessing you do that anyway.
What cloud does is it adds convenience for ridiculous costs.
Personally I have an incubation server where all trials and or developing projects are hosted. Once and if something takes off, it receives dedicated hardware. In the unlikely case that this dedicated setup isn't enough I'll start thinking about cloud. But even in that case, I'm pretty sure, a private cloud with dedicated personnel is cheaper than AWS etc
You pay a premium for
- Lower chance of fatal bugs/downtime - More reliable load balancing - Proven backup startegies - Not having to learn everything
I've spent 7 years at a "unicorn" and the first five years we used Digital Ocean VPSs for everything. My main takeaways are that the smaller providers don't have "real" load balancing and that you absolutely should not manage your own databases or logging/metrics, it's a pain in the ass unless you have a team for it. I spent countless hours on learning infrastructure instead of building the product. It worked out in the end, but we would have gotten there much quicker if we'd paid the premium.
If I did it again I'd still use VPSs, but with one of the big players and pay for dbs and observability.
How simple are you willing to go here? One thought that occurred to me was a reverse ssh proxy into your phone or a small always-on computer at your home. Perhaps both?
>I'm also enjoying staying sane avoiding IAM.
This sticks out to me. IAM isn't crazy, and having access controls in place is going to be the very first thing you want to do when you bring someone else on board. Maybe you're not there yet, but it would be a wise time investment to understand how these work, to the point where they don't feel confusing.
For a lifestyle business that's never going to grow into needing enterprise offerings, IAM is overkill. Like K8 would be for HN or other simple web forum.
I disagree. IAM is like having a house with doors that lock. It doesn't matter how big or small your house is, you're going to want the ability to restrict who can go where and do what. It's part of basic security.
If they're not doing that, and their 1 server is talking to AWS resources, it means its using superadmin credentials. If that 1 server is compromised, can you see why that would be a bad idea?
But does not using IAM mean that they are not doing anything to handle credentials securely?
To keep with the house analogy, not using IAM is like building your own locks for your house.
It might be enough for now, but if you grow big enough to be a target it's very likely your home-spun lock won't stand up to professionals.
I think it's figuring out where to put them or installing them yourself vs building the locks.
Most things have accounts (database, servers). Are you using separate ones or a global admin?
To use the house analogy, it's more like a shed vs a mansion. How many doors are there? How many different keys are there for them, and is there a master key? If you have a shed, there's only going to be one key, for the door. Maaaaaybe a key for the safe inside. If it's an exorbitant mansion with three kitchens and two garages, there's a key for the front door, a key for your bedroom, a key for the safe room, a key to the wine cellar, a key to the garage, a key to the IT closet, a key to the office, a key to the... you get the point.
If it's only ever going to be a shed, there's no need for the infrastructure to support that many keys.
Thanks, that makes sense. What does IAM offer that a stand-alone service like Keybase doesn’t? At what scale of project does it make sense to use one over another?
How do you secure AWS credentials on a compromised machine?
Any solution that you build is going to be more complicated and less secure than IAM. In IAM, your workload/server can have an identity. The software running on the server is issued temporary credentials as needed, and only has access to resources linked to the role. How do you do this without identity and access management? Roll your own because IAM is too crazy?
You can restrict different services to different users, or if using containers, give access based on which container. IAM is not the only way to have shared, controlled access.
I'm not hugely familiar with IAM, but can't you get basically the same, for free, using Cloudflare Zero Trust or even Tailscale?
How many RPS are you serving? What about caching later for assets in the svelte app?
37 Signals is living proof nobody needs a public cloud.
And from my own experience, basically what all cloud providers are relying on is ops being lazy.
Set up some decent hardware in a good data center, set up a good platform on top of it, automate the hell out of it and with all current open source software you can run everything by yourself.
At least, that's something I am currently actively researching because our cloud bills are going through the proverbial roof and I rather would spend that money on our own stuff, than Jeff's next house.
> And from my own experience, basically what all cloud providers are relying on is ops being lazy.
This makes no sense to me, I've worked jobs where the app was hosted entirely on perm and jobs where it's hosted in AWS. By far the AWS shops required more work. There's so many more limitations, random footguns, arbitrary rate limits, capacity issues, having to do "cost optimization" because it happens that what is simple and maintainable is expensive. Once you have a setup you like on-prem it will hum along until you decide to break it.
S3 is the one of the 7 technical wonders of the world but I would rather run Postgres/Redis/Kafka/RabbitMQ on my own hw than the AWS managed service given the choice.
The premise of cloud computing is click and click ... and you magically have a fully operational database up and running without ever thinking about the important bits like scaling and storage, because that's all managed away for you (as a example).
That's what I mean with that.
Funny thing is that you can easily do that on your own hardware using a solution such as KubeDB just as well these days, after you get through the pain of getting Kubernetes on bare metal operational.
To be fair, 37 signals still has terabytes of data in S3 according to DHH... So I don't know if it's fair to say that they're cloudless...
Terabytes of data in s3 is not a lot, and can be replicated easily on prem.
Is this chatGPT?
You can choose the middle between AWS and baremetal. Something like DO apps. They provide managed deployment, db, object storage etc