2026-04-25
i have been building side projects for 6 years. before i knew how to code, i had an idea: there was one government site that showed which colleges students from specific high schools went to. but there was not a reverse one; high schools where they went to college. even the school administration didn't know where their students went. how about taking these and saving them somewhere and searching based on high schools, not colleges. and i built this in 2.5 months with flask, html, sqlite and published it. scraped data with selenium and saved to sqlite i mean. (at that time i didn't know ssr or spa stuff and i realized 1 year later i could scrape by literally making requests and parsing html). published on godaddy with a cpanel 512mb ram server by watching tutorials. after i launched, in a couple of months there was high school selection day and i got 1k visitors per day on some days. and this was mindblowing. i did something and unknown people visit my website, spend 5 minutes and analyze high schools. that made me realize i will be a builder.
after this i started learning more about the web. learned js and built some client side toy stuff. after that i accepted an internship at a startup. they said you need to learn react. i had 0 react knowledge and for 2 weeks watched tutorials and built mini components etc. and they shared the prod codebase with me and i figured out some parts by myself and edited these parts. started development with the team. git branches, merges etc. after some time the team (2 founders and me) were speaking about terraform, ansible, aws stuff, microservices about running python with bert embeddings, and connecting to a scala backend. that made me research about infra as code and embeddings (mid 2022 at that time). i left this startup due to my university lectures (mathematical engineering at a technical university); i couldn't take both internship + school at the same time.
i spent my free time with aws and building aws lambda functions and building backends with sst and building complex crud apps using dynamodb, aws lambda, cognito etc for making multi regional FULLY SCALABLE STUFF BRO. but you have your database in one region and making this whole trip anyway and adding cold start problems. to make a warm function keep one function alive. why not just use a single ec2 with an autoscaling group to solve the scale problem. anyway. after these times building literally nothing i took a break and was like i don't wanna be a software engineer. reading novels, literary magazines, poems and solving partial differential equations.
like 4 months later, after a fight i got into (mom i swear he started it), i made calls with my lawyer intern friends and got information etc. not a big deal, just asking some scenarios like what if this happened. and that triggered me like, how about a poor guy with no lawyer friends. they need info right. okay, talked to friends and decided to build a rag about it. that time was like 2023, gpt was like dumb and hallucinates hard on case laws and statutes, so hard. after i came back to the hacker mindset and checked all law related government websites, checked the network tab and endpoints. made a request and boom. i can scrape 40 of them in 1 minute. then rented 10 of them with digitalocean free credits and scraping 10 million of them took like 2-3 weeks. and i did this. running postgres on a hetzner bare metal vps and inserting again there. after running a gpu server with azure free tier credits and making these embeddings and saving to postgres again.
database went big, i tried to run pgvector and that was slow af. vectors were 1024 dim i think and i decided on other options. learned qdrant, weaviate. and decided to go with hybrid search. but weaviate does not support turkish tokenizers and characters so i went with separate elasticsearch + qdrant. when a user queries with filters, the same query goes to both. if some data is returned from elasticsearch and not qdrant, i made the backend layer rerank them, and did the same thing for the ones qdrant contains. i built some crazy stuff about accuracy and even used a reranker. all these took me like 1 year i think, more like a side project while going to university (hardest year of uni, advanced lectures + lectures left over from last year while i was an intern). i published a paper about fuzzy logic, neural networks. i became diamond at league of legends etc. all these with zero consumer development. i share my project with my friends and boom. they queried things like "what is the 4th criminal law", give me this containing this but not containing some stuff. and my whole pipeline was not optimized for these types of queries. i tried to educate my customers, because they used gpt 3 etc and could talk with an llm like this, but not with my chat. i could have combined an llm api with it, but the cost would be crazy. and i failed. i rented a 60 dollar bare metal hetzner per month and the system does not support continuous scraping. as a poor student i shut down the website like after 2 months.
but with this last project i reached a serverless database provider company ceo somehow and we did a zoom talk. i talked about what i did etc and he gave me a 1 week task like "we have this search but accuracy is bad, fix this". that is it. i cloned the repo and he gave me 2 .env variables, no messages later. after 1 week we talked again and i had literally fixed their problem, made it more accurate. we continued more. after 2 weeks i accepted an internship from an autonomous humanless aircraft company at their artificial intelligence infrastructure team. i thought this will be cool af. after i accepted the internship 3 days later this serverless db company made a job offer for me. pay was like 4x what the aircraft company offered (still not that much but for a student pretty decent). after he even said do both of them. i accepted but 2 weeks later the internship turkish company was taking 14+ hours per day with the commute, 4 days per week, and a couple of hard lectures from university (complex analysis, optimization, thesis and more), i needed to finish university. i couldn't continue. i dropped this company. biggest mistake i made in my whole life. after this aircraft stuff had some problems (whole another blog about this), i finished the 4 months internship there and finished college lectures and went back home.
there were 2 options. go find a job at banking or local startups. or found your own company. i decided the second one. talked with my family and they were incredibly supportive. the idea was like i can scrape data and build an application around it. that's it. how about scraping data, storing them, making vectors and making a search around all data. what will be the application layer? idk we will see, probably not a chatbot but show it at dashboards etc.
started literally using aws lambda to fetch data and push data to backblaze b2 (s3). yes literally .json the dumbest shit i ever made. optimized the system to hit cold starts for new ips after hitting rate limiting etc. because i wanted it fast and did not think about a lot of infra. after wasting 400 dollars credit (i opened my family members' free tier ones) i had like 15 million .json files at b2. reading these json's and converting to parquet. took like 2 weeks to read because backblaze is slow as hell. and wasted a lot of time like exporting and converting to parquet and still keeping data at b2 and i scraped this system like 2 different times, there are different schemas and when i tried to merge there were a lot of problems, conflicts etc. i am literally embarrassed while writing these.
and these b2 parquets pushed to a google cloud bucket and used bigquery to convert data structures and normalize stuff (gcp free credit). after, i decided to move to cloudflare r2 for better performance and no bandwidth cost because i move data from one server to another for chasing free credits. the system was like, at that time no data scraping real time, no working database, nothing but just 15 different buckets with shitty bucket naming:
(later organizing these took me a couple of days). absolutely no using v1 v2 etc. and some low performant scraper using nodejs scripts. and that time i was in my 3-4th month. the problem was clear. no infra, no clear plan but wanting to make something fast and overworking but no checkpoint plan.
after, i realized that i can’t query 1 billion data to qdrant either, for one big baremetal is not enough and data will grow. maybe i can rent multiple servers for horizontal scaling but each 64gb costs like 60 euros, so as a no money guy i searched for a different approach. first i started using lancedb to make vector searches at the s3 layer. after some tests i realized at 200m vectors there is a memory leak and when you use lancedb if you want to update you need to add vectors on top of the index. after some inserts you need to update the index and this, for a lot of vectors for indexing, needs a lot of ram. and for 1 billion you need 5 of them. for every 200m you can create a new table and search parallely 5 of them and rerank later. made a microservice about it
but after realizing i don't need to query 1 billion of them because my texts are 50% duplicate and actually texts are too similar to each other, and at my project i don't need the exact most similar text, i need more like a count of topics. you can see there is no application layer on my mind, i was just thinking to build a data layer that can be compatible with every application layer. i created 1 million data points with a custom script using faiss etc. then, i took the most related texts for these 1 million topics and sent them to a local 7b llm i rented on runpod to label them. so i made a cluster of about 1 million endpoints at any db you can use. after each query, i created embeddings for all of them, made clusters around them, 1 million clusters. created a faiss index and for labeling. and make it like all text will be hashed. if a new text has this old hash, do not run embedding and take the label directly. there are no recreating labels because there are 1 million different labels, i make queries to clickhouse with a batch of 10k of them over some time. until this i still had zero continuous data fetches.
i made this. rented t3a.nano nanos at aws with 10 different regions with rotating ipv6 fleets. what i said, the scraper code was nodejs. this nodejs one had like 5k github stars. after claude gave me 100 dollars credit. i thought,
"how about making these libraries rewritten in rust."
yes this magical thinking, rewrite in rust. and opened claude web and cloned the repo and gave the repo and said this. 30 minutes later that was ready. and almost one shotted (because the nodejs one was mostly a wrapper but still impressive). my scraper because of using 300mb ram went to 40mb ram so with a 512mb ram nano server i could run 10 in a parallel way. that made like a 10x improvement alone. using aws linux something like that supported 2 gb ebs.
set up the whole system with terraform
at another server (this is a microservice bro) postgres and rabbitmq (because it supports prioritization queues) which nats or kafka i think do not support. rabbitmq and postgres connect with rust scrap. read data from rabbitmq and do something about data like check data hashes are there same texts, we don't need to create embeddings. if not there, then save it to a local .bin file with zip and a cron job pushes this to r2. another cron job triggers a runpod gpu server every 12 hours, reads these data from r2 and creates embeddings and labels and pushes to r2 again and another cron job reads from r2 and pushes to kafka and clickhouse to save all data. you can understand there are moving a lot of parts which is bad practice i think and debugging hell. probably i should have used upstash workflows or something like that. as you guys understand i don't have a 7/24 gpu server running because of money and running clickhouse mounted with a hetzner storage box with juicefs for making minimal storage cost.
and started working. and boom. i thought i will have an aws bill like 100 dollars via computational like i rented vcpu ram and ebs maybe. but i had like a 300 dollar bill. while i had 0 free credits. and i realized i scrape tb's of data fetches, gzip these and push to rabbitmq, but still data egress is expensive because i rent bare metal and vps at hetzner. 1 for the scraping microservice, one for clickhouse+redpanda, one for the app backend layer. i thought for like 1 week how can i reduce this, i ask llm and 3 of them (gpt, claude, gemini) said decrease worker count :D move to another provider, but nothing. and i asked llm this important question. are there any services that have free tier data egress. and boom. lightsail and cdn stuff. cdn has problems about after 1tb this takes too much cost. i went with lightsail. all regional workers connect to that region's lightsail and lightsail pushes data to my backend. there became less cost but still based on https working, some tcp connection stuff, i had data egress but i reduced it by 80%. at that time the system was running for 2 months and building the application layer.
****
i only thought and wasted time with building this infra but nothing about business. i built a dashboard style saas, nobody cared, there are 15 years old companies that have it. i built an mcp and gave it to my friend which is like:
and i realized that people became stupid, they can’t even think how to write a prompt or make an action about their professional area. and i realized making mcp requires user-triggered work. i need to make it the dumbest way, like if they have their own company, say to me your company, that's it, i will give you reports. do not think or use your brain. made me realize this at the 7th month and making pivots. dashboard → mcp → ai agent report. and realized if i will send reports, workload is very predictable. one server is okay for me. how about downtime, read only replicas etc. bro i will make this a FREE tool.
making a 20 dollar saas is not making me rich, i realized this too. the market has let's say 1 million people. only 5% make money. if i reach all of them and they visit my website, these 5% will sign up and 5% will pay to me. as a “math engineer” that means 125 paying customers and making 2500 dollars. make this 40 dollars, making 5k. after tax and server, llm stuff etc it gives me like 3-3.5k max. i can get this money by being an employee without the headache about running a business 7/24 or the solopreneur no money situation.
-.-
i will go find a job. being a solopreneur, while my friends don't understand what i am doing and i can't discuss what i'm doing, is depressive. sometimes speaking to an llm to see if i am on a good or bad path. what am i doing? i can't validate my technical abilities or know if i am on a good path, a seniorless path. learning by doing without seeing any senior infra project. and you can see there. being with no code knowledge and no prior experience i made a website. making 1k visitors per day. with years of experience still my most successful project is my first project. i think business people are okay, they are not useless i learned that.
when i zone out, it feels like i'm just repeating the same thing over and over: just moving data around from one place to another. but when you look at it, 99% of software is actually just that.