Implement crawling controller to fetch directory URLs containing OpenAPI definitions. - Implementing Queue-Based Architecture of Downloading Index Files from Common Crawl Server Using RabbitMQ NOTE: All code is contained within the downloadAndProcessIndexFilesInBackground() function#5
Open
priyanshu-kun wants to merge 12 commits into
Open
Implement crawling controller to fetch directory URLs containing OpenAPI definitions. - Implementing Queue-Based Architecture of Downloading Index Files from Common Crawl Server Using RabbitMQ NOTE: All code is contained within the downloadAndProcessIndexFilesInBackground() function#5priyanshu-kun wants to merge 12 commits into
priyanshu-kun wants to merge 12 commits into
Conversation
added 8 commits
April 1, 2023 18:54
…e crawling process. Implement a crawling controller and create the Common Crawl driver.
…elay while fetching directories from cc server.
Contributor
vinitshahdeo
left a comment
There was a problem hiding this comment.
@priyanshu-kun Please move the Dummy App to a separate branch - feature/backup-dummy-app
vinitshahdeo
suggested changes
Jun 22, 2023
Contributor
vinitshahdeo
left a comment
There was a problem hiding this comment.
@priyanshu-kun Have completed initial review, please take a look.
Contributor
|
@priyanshu-kun In order to prevent rate-limiting issues, you can explore back-off and sleep methods. |
added 2 commits
July 9, 2023 17:24
… Common Crawl Server Using RabbitMQ NOTE: All code is contained within the downloadAndProcessIndexFilesInBackground() function.
HimanshuS129
reviewed
Aug 5, 2023
| return res.badRequest('Data source not provided'); | ||
| } | ||
|
|
||
| try { |
Contributor
There was a problem hiding this comment.
no need for this try catch as we have one already and we are not making any explicit handling for this
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
In this pull request, a new crawling controller is presented, whose job it is to fetch directory URLs that are particularly linked to OpenAPI specifications. The controller makes it easier to get the desired OpenAPI definitions by retrieving the directories that contain them.
In this commit, a queue-based architecture is implemented to handle the downloading of index files from the Common Crawl server. RabbitMQ is utilized as the message broker for managing the queue. The downloadAndProcessIndexFilesInBackground() function contains all the necessary code for performing the background download and processing of the index files.
This implementation ensures a more efficient and scalable approach to handle long-running operations while keeping the server responsive and preventing overloading. The queue-based architecture allows for asynchronous processing of index files, providing better performance and fault tolerance.
By leveraging RabbitMQ and encapsulating the functionality within the downloadAndProcessIndexFilesInBackground() function, the codebase is organized and modular, making it easier to maintain and extend in the future."