Return and seach for word categories#84
Open
claw89 wants to merge 10 commits intosuyashb95:masterfrom
Open
Conversation
Word entries on wiktionary are associated with various categories. This commit adds the list of associated categories to the returned json structure.
The new function fetch_category returns the words included under the provided category. Words are returned in a list.
A return_categories option is added to the fetch function defaulting to false; with this option set to false, the fetch function will return the original word information in json format. If this option is set to true, the function will return a pair of the word information and a list of its categories. This change was made to make sure the function passes the unit test.
Category pages on wiktionary may have associated subcategories. This commit adds the option to return these subcategories as a list along with the category words. The function fetch_category can now return a pair of lists (i.e., words and subcategories)
Revised code for parsing words on a category page for consistency with the approach for parsing subcategories.
Wiktionary limits category pages to 200 words per page. This commit ensures that fetch_category returns all the words by updating self.soup to the next page of words.
Updated readme to include examples using categories
This commit corrects the parser_next_page_links function which was limited to Category:English_phrasebook. The category name is now passed as an argument, so the function is applicable to all categories.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Supersedes #80
For my use case, I need to obtain information on all words in a particular category. For example, I might need to collect etymology information for all English words derived from the Bible.
The two main additions of this PR are:
Using these addition, the above use case can be completed with the following code: