This is in response to the code task for TPXImpact.
Note: I've added an extra end point to the API. Retrieving all the URLs isn't a scalable option past a few dozen URLs so I added a pagable version which is mentioned further in the backend docs and is testable using curl and from the front end.
Spin up the database by utilising the docker compose file in this root folder (this requires a docker environment of course)
docker compose -f mongo.yml up
Next, run the front end and back end from their folders (instructions are in the corresponding folders in the README files) The front-end requires npm installed and the backend requires maven.
In addition you can also run monitoring on the application. This also has its corresponding monitoring folder with the instructions inside its README.
A possible system design
Some brief points:
- Security wasn't added as it often obfuscates a coding test like this. However it would clearly be required that only a registered user could create a url, and only the creating user being able to delete it. To aid this an extra field could be added to the table to store which user created it. Potentially also cleanup functions would delete urls created by an expired user. Security would also be added to the database.
- For speed of development mongo autoconfiguration (with no security) has been used, this would have to be changed in order to expand the containerisation for a fully fleshed out system using docker compose and kubernetes
- As there would probably be much greater traffic on the read aspect we could split the service into 2; one for reading and one for reading. This would allow independent scaling. If the database is scaled in scaled into primary and secondary then the write would communicate with the primary and the read with any of the secondaries.
- The database was picked as no relational information is stored, no complex queries will be used and it's easy to add extra fields (for example user, creation date, access count)
- The database could be partitioned with the
shortUrlkey (consistent hashing) to distribute load evenly. - The database could also be replicated to give greater resilience and reduced latency.
- A caching layer (with TTL/LRU) could be added in front of the database due to high reads and hot lookups.
- Improve the dashboards on grafana, specifically tracking business metrics: requests per short URL, cache hit ratio and add alerting for warnings and errors.
- We may wish to track the number of hits each URL has, the 302 redirect means that the URL is not being cached and, whilst this creates more traffic, more accurately tracks the usage.
- To deploy this anywhere we need something like Kubernetes, which would allow us to describe the structure and desired state of the system, vital for robustness. Possibly in conjunction with EKS
- To setup EKS (or whatever offering we go with) we should use some form of "infrastructure as code" option like Terraform
- Load balancers should be used in front of the services (see above) when multiple instances are required.
- An API gateway should be used for architectural goodness such as rate limiting, authentication/authorization etc