diff --git a/Final_Task.md b/Final_Task.md new file mode 100644 index 00000000..2e2e618a --- /dev/null +++ b/Final_Task.md @@ -0,0 +1,195 @@ +# Introduction to Python. Final task. +You are proposed to implement Python RSS-reader using **python 3.9**. + +The task consists of few iterations. Do not start new iteration if the previous one is not implemented yet. + +## Common requirements. +* It is mandatory to use `argparse` module. +* Codebase must be covered with unit tests with at least 50% coverage. It's a mandatory requirement. +* Yor script should **not** require installation of other services such as mysql server, +postgresql and etc. (except Iteration 6). If it does require such programs, +they should be installed automatically by your script, without user doing anything. +* In case of any mistakes utility should print human-readable. +error explanation. Exception tracebacks in stdout are prohibited in final version of application. +* Docstrings are mandatory for all methods, classes, functions and modules. +* Code must correspond to `pep8` (use `pycodestyle` utility for self-check). + * You can set line length up to 120 symbols. +* Commit messages should provide correct and helpful information about changes in commit. Messages like `Fix bug`, +`Tried to make workable`, `Temp commit` and `Finally works` are prohibited. +* All used third-party packages should be written in the `requirements.txt` file and in installation files (`setup.py`, `setup.cfg`, etc.). +* You have to write a file with documentation. Everything must be documented: how to run scripts, how to run tests, how to install the library and etc. + +## [Iteration 1] One-shot command-line RSS reader. +RSS reader should be a command-line utility which receives [RSS](wikipedia.org/wiki/RSS) URL and prints results in human-readable format. + +You are free to choose format of the news console output. The textbox below provides an example of how it can be implemented: + +```shell +$ rss_reader.py "https://news.yahoo.com/rss/" --limit 1 + +Feed: Yahoo News - Latest News & Headlines + +Title: Nestor heads into Georgia after tornados damage Florida +Date: Sun, 20 Oct 2019 04:21:44 +0300 +Link: https://news.yahoo.com/wet-weekend-tropical-storm-warnings-131131925.html + +[image 2: Nestor heads into Georgia after tornados damage Florida][2]Nestor raced across Georgia as a post-tropical cyclone late Saturday, hours after the former tropical storm spawned a tornado that damaged +homes and a school in central Florida while sparing areas of the Florida Panhandle devastated one year earlier by Hurricane Michael. The storm made landfall Saturday on St. Vincent Island, a nature preserve +off Florida's northern Gulf Coast in a lightly populated area of the state, the National Hurricane Center said. Nestor was expected to bring 1 to 3 inches of rain to drought-stricken inland areas on its +march across a swath of the U.S. Southeast. + + +Links: +[1]: https://news.yahoo.com/wet-weekend-tropical-storm-warnings-131131925.html (link) +[2]: http://l2.yimg.com/uu/api/res/1.2/Liyq2kH4HqlYHaS5BmZWpw--/YXBwaWQ9eXRhY2h5b247aD04Njt3PTEzMDs-/https://media.zenfs.com/en/ap.org/5ecc06358726cabef94585f99050f4f0 (image) + +``` + +Utility should provide the following interface: +```shell +usage: rss_reader.py [-h] [--version] [--json] [--verbose] [--limit LIMIT] + source + +Pure Python command-line RSS reader. + +positional arguments: + source RSS URL + +optional arguments: + -h, --help show this help message and exit + --version Print version info + --json Print result as JSON in stdout + --verbose Outputs verbose status messages + --limit LIMIT Limit news topics if this parameter provided + +``` + +In case of using `--json` argument your utility should convert the news into [JSON](https://en.wikipedia.org/wiki/JSON) format. +You should come up with the JSON structure on you own and describe it in the README.md file for your repository or in a separate documentation file. + + + +With the argument `--verbose` your program should print all logs in stdout. + +### Task clarification (I) + +1) If `--version` option is specified app should _just print its version_ and stop. +2) User should be able to use `--version` option without specifying RSS URL. For example: +``` +> python rss_reader.py --version +"Version 1.4" +``` +3) The version is supposed to change with every iteration. +4) If `--limit` is not specified, then user should get _all_ available feed. +5) If `--limit` is larger than feed size then user should get _all_ available news. +6) `--verbose` should print logs _in the process_ of application running, _not after everything is done_. +7) Make sure that your app **has no encoding issues** (meaning symbols like `'` and etc) when printing news to _stdout_. +8) Make sure that your app **has no encoding issues** (meaning symbols like `'` and etc) when printing news to _stdout in JSON format_. +9) It is preferrable to have different custom exceptions for different situations(If needed). +10) The `--limit` argument should also affect JSON generation. + + +## [Iteration 2] Distribution. + +* Utility should be wrapped into distribution package with `setuptools`. +* This package should export CLI utility named `rss-reader`. + + +### Task clarification (II) + +1) User should be able to run your application _both_ with and without installation of CLI utility, +meaning that this should work: + +``` +> python rss_reader.py ... +``` + +as well as this: + +``` +> rss_reader ... +``` +2) Make sure your second iteration works on a clean machie with python 3.9. (!) +3) Keep in mind that installed CLI utility should have the same functionality, so do not forget to update dependencies and packages. + + +## [Iteration 3] News caching. +The RSS news should be stored in a local storage while reading. The way and format of this storage you can choose yourself. +Please describe it in a separate section of README.md or in the documentation. + +New optional argument `--date` must be added to your utility. It should take a date in `%Y%m%d` format. +For example: `--date 20191020` +Here date means actual *publishing date* not the date when you fetched the news. + +The cashed news can be read with it. The new from the specified day will be printed out. +If the news are not found return an error. + +If the `--date` argument is not provided, the utility should work like in the previous iterations. + +### Task clarification (III) +1) Try to make your application crossplatform, meaning that it should work on both Linux and Windows. +For example when working with filesystem, try to use `os.path` lib instead of manually concatenating file paths. +2) `--date` should **not** require internet connection to fetch news from local cache. +3) User should be able to use `--date` without specifying RSS source. For example: +``` +> python rss_reader.py --date 20191206 +...... +``` +Or for second iteration (when installed using setuptools): +``` +> rss_reader --date 20191206 +...... +``` +4) If `--date` specified _together with RSS source_, then app should get news _for this date_ from local cache that _were fetched from specified source_. +5) `--date` should work correctly with both `--json`, `--limit`, `--verbose` and their different combinations. + +## [Iteration 4] Format converter. + +You should implement the conversion of news in at least two of the suggested format: `.mobi`, `.epub`, `.fb2`, `.html`, `.pdf` + +New optional argument must be added to your utility. This argument receives the path where new file will be saved. The arguments should represents which format will be generated. + +For example: `--to-mobi` or `--to-fb2` or `--to-epub` + +You can choose yourself the way in which the news will be displayed, but the final text result should contain pictures and links, if they exist in the original article and if the format permits to store this type of data. + +### Task clarification (IV) + +Convertation options should work correctly together with all arguments that were implemented in Iterations 1-3. For example: +* Format convertation process should be influenced by `--limit`. +* If `--json` is specified together with convertation options, then JSON news should +be printed to stdout, and converted file should contain news in normal format. +* Logs from `--verbose` should be printed in stdout and not added to the resulting file. +* `--date` should also work correctly with format converter and to not require internet access. + +## * [Iteration 5] Output colorization. +> Note: An optional iteration, it is not necessary to implement it. You can move on with it only if all the previous iterations (from 1 to 4) are completely implemented. + +You should add new optional argument `--colorize`, that will print the result of the utility in colorized mode. + +*If the argument is not provided, the utility should work like in the previous iterations.* + +> Note: Take a look at the [colorize](https://pypi.org/project/colorize/) library + +## * [Iteration 6] Web-server. +> Note: An optional iteration, it is not necessary to implement it. You can move on with it only if all the previous iterations (from 1 to 4) are completely implemented. Introduction to Python course does not cover the topics that are needed for the implementation of this part. + +There are several mandatory requirements in this iteration: +* `Docker` + `docker-compose` usage (at least 2 containers: one for web-application, one for DB) +* Web application should provide all the implemented in the previous parts of the task functionality, using the REST API: + * One-shot conversion from RSS to Human readable format + * Server-side news caching + * Conversion in epub, mobi, fb2 or other formats + +Feel free to choose the way of implementation, libraries and frameworks. (We suggest you `Django Rest Framework` + `PostgreSQL` combination) + +You can implement any functionality that you want. The only requirement is to add the description into README file or update project documentation, for example: +* authorization/authentication +* automatic scheduled news update +* adding new RSS sources using API + +--- +Implementations will be checked with the latest cPython interpreter of 3.9 branch. +--- + +> Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live. Code for readability. **John F. Woods** diff --git a/RSS reader/src/__pycache__/rss_reader.cpython-310.pyc b/RSS reader/src/__pycache__/rss_reader.cpython-310.pyc new file mode 100644 index 00000000..55491560 Binary files /dev/null and b/RSS reader/src/__pycache__/rss_reader.cpython-310.pyc differ diff --git a/RSS reader/src/__pycache__/work_xml.cpython-310.pyc b/RSS reader/src/__pycache__/work_xml.cpython-310.pyc new file mode 100644 index 00000000..66957900 Binary files /dev/null and b/RSS reader/src/__pycache__/work_xml.cpython-310.pyc differ diff --git a/RSS_reader/Noto_Sans/NotoSans-Black.ttf b/RSS_reader/Noto_Sans/NotoSans-Black.ttf new file mode 100644 index 00000000..298c2402 Binary files /dev/null and b/RSS_reader/Noto_Sans/NotoSans-Black.ttf differ diff --git a/RSS_reader/Noto_Sans/NotoSans-BlackItalic.ttf b/RSS_reader/Noto_Sans/NotoSans-BlackItalic.ttf new file mode 100644 index 00000000..5b5c4bab Binary files /dev/null and b/RSS_reader/Noto_Sans/NotoSans-BlackItalic.ttf differ diff --git a/RSS_reader/Noto_Sans/NotoSans-Bold.cw127.pkl b/RSS_reader/Noto_Sans/NotoSans-Bold.cw127.pkl new file mode 100644 index 00000000..80ea17ad Binary files /dev/null and b/RSS_reader/Noto_Sans/NotoSans-Bold.cw127.pkl differ diff --git a/RSS_reader/Noto_Sans/NotoSans-Bold.pkl b/RSS_reader/Noto_Sans/NotoSans-Bold.pkl new file mode 100644 index 00000000..cdc7ac4c Binary files /dev/null and b/RSS_reader/Noto_Sans/NotoSans-Bold.pkl differ diff --git a/RSS_reader/Noto_Sans/NotoSans-Bold.ttf b/RSS_reader/Noto_Sans/NotoSans-Bold.ttf new file mode 100644 index 00000000..3e68bc24 Binary files /dev/null and b/RSS_reader/Noto_Sans/NotoSans-Bold.ttf differ diff --git a/RSS_reader/Noto_Sans/NotoSans-BoldItalic.pkl b/RSS_reader/Noto_Sans/NotoSans-BoldItalic.pkl new file mode 100644 index 00000000..e0db9acf Binary files /dev/null and b/RSS_reader/Noto_Sans/NotoSans-BoldItalic.pkl differ diff --git a/RSS_reader/Noto_Sans/NotoSans-BoldItalic.ttf b/RSS_reader/Noto_Sans/NotoSans-BoldItalic.ttf new file mode 100644 index 00000000..4b563517 Binary files /dev/null and b/RSS_reader/Noto_Sans/NotoSans-BoldItalic.ttf differ diff --git a/RSS_reader/Noto_Sans/NotoSans-ExtraBold.ttf b/RSS_reader/Noto_Sans/NotoSans-ExtraBold.ttf new file mode 100644 index 00000000..ce254440 Binary files /dev/null and b/RSS_reader/Noto_Sans/NotoSans-ExtraBold.ttf differ diff --git a/RSS_reader/Noto_Sans/NotoSans-ExtraBoldItalic.ttf b/RSS_reader/Noto_Sans/NotoSans-ExtraBoldItalic.ttf new file mode 100644 index 00000000..1575114c Binary files /dev/null and b/RSS_reader/Noto_Sans/NotoSans-ExtraBoldItalic.ttf differ diff --git a/RSS_reader/Noto_Sans/NotoSans-ExtraLight.ttf b/RSS_reader/Noto_Sans/NotoSans-ExtraLight.ttf new file mode 100644 index 00000000..ebddc566 Binary files /dev/null and b/RSS_reader/Noto_Sans/NotoSans-ExtraLight.ttf differ diff --git a/RSS_reader/Noto_Sans/NotoSans-ExtraLightItalic.ttf b/RSS_reader/Noto_Sans/NotoSans-ExtraLightItalic.ttf new file mode 100644 index 00000000..2e0a8763 Binary files /dev/null and b/RSS_reader/Noto_Sans/NotoSans-ExtraLightItalic.ttf differ diff --git a/RSS_reader/Noto_Sans/NotoSans-Italic.pkl b/RSS_reader/Noto_Sans/NotoSans-Italic.pkl new file mode 100644 index 00000000..7a7d4315 Binary files /dev/null and b/RSS_reader/Noto_Sans/NotoSans-Italic.pkl differ diff --git a/RSS_reader/Noto_Sans/NotoSans-Italic.ttf b/RSS_reader/Noto_Sans/NotoSans-Italic.ttf new file mode 100644 index 00000000..eedc5e45 Binary files /dev/null and b/RSS_reader/Noto_Sans/NotoSans-Italic.ttf differ diff --git a/RSS_reader/Noto_Sans/NotoSans-Light.ttf b/RSS_reader/Noto_Sans/NotoSans-Light.ttf new file mode 100644 index 00000000..9f9453e8 Binary files /dev/null and b/RSS_reader/Noto_Sans/NotoSans-Light.ttf differ diff --git a/RSS_reader/Noto_Sans/NotoSans-LightItalic.ttf b/RSS_reader/Noto_Sans/NotoSans-LightItalic.ttf new file mode 100644 index 00000000..0e67cb97 Binary files /dev/null and b/RSS_reader/Noto_Sans/NotoSans-LightItalic.ttf differ diff --git a/RSS_reader/Noto_Sans/NotoSans-Medium.ttf b/RSS_reader/Noto_Sans/NotoSans-Medium.ttf new file mode 100644 index 00000000..02dad4e2 Binary files /dev/null and b/RSS_reader/Noto_Sans/NotoSans-Medium.ttf differ diff --git a/RSS_reader/Noto_Sans/NotoSans-MediumItalic.ttf b/RSS_reader/Noto_Sans/NotoSans-MediumItalic.ttf new file mode 100644 index 00000000..def607c3 Binary files /dev/null and b/RSS_reader/Noto_Sans/NotoSans-MediumItalic.ttf differ diff --git a/RSS_reader/Noto_Sans/NotoSans-Regular.cw127.pkl b/RSS_reader/Noto_Sans/NotoSans-Regular.cw127.pkl new file mode 100644 index 00000000..0e7ce217 Binary files /dev/null and b/RSS_reader/Noto_Sans/NotoSans-Regular.cw127.pkl differ diff --git a/RSS_reader/Noto_Sans/NotoSans-Regular.pkl b/RSS_reader/Noto_Sans/NotoSans-Regular.pkl new file mode 100644 index 00000000..1b55db85 Binary files /dev/null and b/RSS_reader/Noto_Sans/NotoSans-Regular.pkl differ diff --git a/RSS_reader/Noto_Sans/NotoSans-Regular.ttf b/RSS_reader/Noto_Sans/NotoSans-Regular.ttf new file mode 100644 index 00000000..973bc2ed Binary files /dev/null and b/RSS_reader/Noto_Sans/NotoSans-Regular.ttf differ diff --git a/RSS_reader/Noto_Sans/NotoSans-SemiBold.ttf b/RSS_reader/Noto_Sans/NotoSans-SemiBold.ttf new file mode 100644 index 00000000..182ac5d9 Binary files /dev/null and b/RSS_reader/Noto_Sans/NotoSans-SemiBold.ttf differ diff --git a/RSS_reader/Noto_Sans/NotoSans-SemiBoldItalic.ttf b/RSS_reader/Noto_Sans/NotoSans-SemiBoldItalic.ttf new file mode 100644 index 00000000..ea5812a8 Binary files /dev/null and b/RSS_reader/Noto_Sans/NotoSans-SemiBoldItalic.ttf differ diff --git a/RSS_reader/Noto_Sans/NotoSans-Thin.ttf b/RSS_reader/Noto_Sans/NotoSans-Thin.ttf new file mode 100644 index 00000000..6d5ce81f Binary files /dev/null and b/RSS_reader/Noto_Sans/NotoSans-Thin.ttf differ diff --git a/RSS_reader/Noto_Sans/NotoSans-ThinItalic.ttf b/RSS_reader/Noto_Sans/NotoSans-ThinItalic.ttf new file mode 100644 index 00000000..04f52786 Binary files /dev/null and b/RSS_reader/Noto_Sans/NotoSans-ThinItalic.ttf differ diff --git a/RSS_reader/Noto_Sans/OFL.txt b/RSS_reader/Noto_Sans/OFL.txt new file mode 100644 index 00000000..90b73326 --- /dev/null +++ b/RSS_reader/Noto_Sans/OFL.txt @@ -0,0 +1,93 @@ +Copyright 2015-2021 Google LLC. All Rights Reserved. + +This Font Software is licensed under the SIL Open Font License, Version 1.1. +This license is copied below, and is also available with a FAQ at: +http://scripts.sil.org/OFL + + +----------------------------------------------------------- +SIL OPEN FONT LICENSE Version 1.1 - 26 February 2007 +----------------------------------------------------------- + +PREAMBLE +The goals of the Open Font License (OFL) are to stimulate worldwide +development of collaborative font projects, to support the font creation +efforts of academic and linguistic communities, and to provide a free and +open framework in which fonts may be shared and improved in partnership +with others. + +The OFL allows the licensed fonts to be used, studied, modified and +redistributed freely as long as they are not sold by themselves. The +fonts, including any derivative works, can be bundled, embedded, +redistributed and/or sold with any software provided that any reserved +names are not used by derivative works. The fonts and derivatives, +however, cannot be released under any other type of license. The +requirement for fonts to remain under this license does not apply +to any document created using the fonts or their derivatives. + +DEFINITIONS +"Font Software" refers to the set of files released by the Copyright +Holder(s) under this license and clearly marked as such. This may +include source files, build scripts and documentation. + +"Reserved Font Name" refers to any names specified as such after the +copyright statement(s). + +"Original Version" refers to the collection of Font Software components as +distributed by the Copyright Holder(s). + +"Modified Version" refers to any derivative made by adding to, deleting, +or substituting -- in part or in whole -- any of the components of the +Original Version, by changing formats or by porting the Font Software to a +new environment. + +"Author" refers to any designer, engineer, programmer, technical +writer or other person who contributed to the Font Software. + +PERMISSION & CONDITIONS +Permission is hereby granted, free of charge, to any person obtaining +a copy of the Font Software, to use, study, copy, merge, embed, modify, +redistribute, and sell modified and unmodified copies of the Font +Software, subject to the following conditions: + +1) Neither the Font Software nor any of its individual components, +in Original or Modified Versions, may be sold by itself. + +2) Original or Modified Versions of the Font Software may be bundled, +redistributed and/or sold with any software, provided that each copy +contains the above copyright notice and this license. These can be +included either as stand-alone text files, human-readable headers or +in the appropriate machine-readable metadata fields within text or +binary files as long as those fields can be easily viewed by the user. + +3) No Modified Version of the Font Software may use the Reserved Font +Name(s) unless explicit written permission is granted by the corresponding +Copyright Holder. This restriction only applies to the primary font name as +presented to the users. + +4) The name(s) of the Copyright Holder(s) or the Author(s) of the Font +Software shall not be used to promote, endorse or advertise any +Modified Version, except to acknowledge the contribution(s) of the +Copyright Holder(s) and the Author(s) or with their explicit written +permission. + +5) The Font Software, modified or unmodified, in part or in whole, +must be distributed entirely under this license, and must not be +distributed under any other license. The requirement for fonts to +remain under this license does not apply to any document created +using the Font Software. + +TERMINATION +This license becomes null and void if any of the above conditions are +not met. + +DISCLAIMER +THE FONT SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, +EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTIES OF +MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT +OF COPYRIGHT, PATENT, TRADEMARK, OR OTHER RIGHT. IN NO EVENT SHALL THE +COPYRIGHT HOLDER BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, +INCLUDING ANY GENERAL, SPECIAL, INDIRECT, INCIDENTAL, OR CONSEQUENTIAL +DAMAGES, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING +FROM, OUT OF THE USE OR INABILITY TO USE THE FONT SOFTWARE OR FROM +OTHER DEALINGS IN THE FONT SOFTWARE. diff --git a/RSS_reader/README.md b/RSS_reader/README.md new file mode 100644 index 00000000..2279d979 --- /dev/null +++ b/RSS_reader/README.md @@ -0,0 +1,49 @@ +Thank you for choosing our program. +1. After downloading the program from the RSS_reader directory, +enter the commands in console: + python3 setup.py sdist + pip install dist/rss_reader-1.4.tar.gz +2. please install all the libraries listed in requirements.txt file. +3. RSS reader a command-line utility which receives RSS URL and outputs the results in a human-readable format +You can enter to console: +example: rss_reader https://vse.sale/news/rss +or from directory crc +example: python3 rss_reader.py https://vse.sale/news/rss +4. If you need limited number of articles then enter after the link --limit +example: rss_reader https://vse.sale/news/rss --limit 3 +5. If you want to know the version of the application enter after --version +example: rss_reader https://vse.sale/news/rss --version +or example: rss_reader --version +6. If you want out in console news in format json then enter after the link --json +example: rss_reader https://vse.sale/news/rss --json +7. If you want write the information from the article to a file in the format pdf or html, +you need then enter after the link --pdf or --html after Required - path to save. +if your sistem Linux: +example: rss_reader https://vse.sale/news/rss --pdf /home/m/PycharmProjects/Homework_new/RSS_reader +or if your sistem Windows: +example: rss_reader https://vse.sale/news/rss --html C:\\Program Files\\RSS_reader\\ +8. If you want get information from the cache, you need enter --date and +the date you need in the format '%Y%m%d' example: 20220627 (2022 is year, 06 is month, 27 is day) +exemple: rss_reader https://vse.sale/news/rss --date 20220627 +or exemple: rss_reader --date 20220627 +The cache is stored in a file with the json format. Example structure: +{ + '20220628': { + 'https://vse.sale/news/rss': { + 'The problem of blocked streets': { + 'title': 'The problem of blocked streets', + 'pubDate': 'Tue, 28 Jun 2022 19:52:00 +0300', + 'description': 'The celebration of the City Day was held at a high level.', + 'link': 'https://vse.sale/news/view/37519' + } + 'Of blocked streets': { + 'title': 'Of blocked streets', + 'pubDate': 'Tue, 28 Jun 2022 19:52:00 +0300', + 'description': 'The celebration of the City Day was held at a high level.', + 'link': 'https://vse.sale/news/view/375171892' + } + } + } +} + + diff --git a/RSS_reader/requirements.txt b/RSS_reader/requirements.txt new file mode 100644 index 00000000..f6fd22a0 --- /dev/null +++ b/RSS_reader/requirements.txt @@ -0,0 +1,11 @@ +For the correct operation of the program, please install all these libraries: +1) argparse +2) requests +3) logging +4) dateutil (pip install python-dateutil) +5) bs4 +6) json +7) fpdf +8) lxml + +Thank you have a nice use. \ No newline at end of file diff --git a/RSS_reader/setup.py b/RSS_reader/setup.py new file mode 100644 index 00000000..8e9f6928 --- /dev/null +++ b/RSS_reader/setup.py @@ -0,0 +1,15 @@ +from setuptools import setup + +setup( + name='rss_reader', + version='1.4', + author="Mikhail Aliakseyenka", + author_email="aliakseyenkamikhail@gmail.com", + packages=['src'], + entry_points={ + 'console_scripts': [ + 'rss_reader = src:main', + ] + }, + include_package_data=True, +) \ No newline at end of file diff --git a/RSS_reader/src/__init__.py b/RSS_reader/src/__init__.py new file mode 100644 index 00000000..7668a494 --- /dev/null +++ b/RSS_reader/src/__init__.py @@ -0,0 +1,9 @@ +from . import rss_reader + + +def main(): + rss_reader.main() + + +if __name__ == "__main__": + main() \ No newline at end of file diff --git a/RSS_reader/src/__pycache__/__init__.cpython-310.pyc b/RSS_reader/src/__pycache__/__init__.cpython-310.pyc new file mode 100644 index 00000000..67b61b6b Binary files /dev/null and b/RSS_reader/src/__pycache__/__init__.cpython-310.pyc differ diff --git a/RSS_reader/src/__pycache__/rss_reader.cpython-310.pyc b/RSS_reader/src/__pycache__/rss_reader.cpython-310.pyc new file mode 100644 index 00000000..22e6cd24 Binary files /dev/null and b/RSS_reader/src/__pycache__/rss_reader.cpython-310.pyc differ diff --git a/RSS_reader/src/__pycache__/work_xml.cpython-310.pyc b/RSS_reader/src/__pycache__/work_xml.cpython-310.pyc new file mode 100644 index 00000000..6f0dbff2 Binary files /dev/null and b/RSS_reader/src/__pycache__/work_xml.cpython-310.pyc differ diff --git a/RSS_reader/src/rss_reader.py b/RSS_reader/src/rss_reader.py new file mode 100644 index 00000000..48e179bb --- /dev/null +++ b/RSS_reader/src/rss_reader.py @@ -0,0 +1,59 @@ +import argparse +import logging +from src import work_xml + + +def main(): + arg_parser = argparse.ArgumentParser(description="Pure Python command-line src reader.") + arg_parser.add_argument("source", nargs='?', default='', type=str, help="src reader URL") + arg_parser.add_argument("--version", action="store_true", help="Print version info") + arg_parser.add_argument("--json", action="store_true", help=" Print result as JSON in stdout") + arg_parser.add_argument("--verbose", action="store_true", help="Outputs verbose status messages") + arg_parser.add_argument("--limit", type=int, help="Limit news topics if this parameter provided") + arg_parser.add_argument("--date", type=int, help="Outputs news from cash by date. Required format: 20220525") + arg_parser.add_argument("--html", type=str, help=" Print result as HTML format in file. Required - path to save.") + arg_parser.add_argument("--pdf", type=str, help=" Print result as PDF format in file. Required - path to save.") + + args = arg_parser.parse_args() + + xml_items = False + + if args.verbose: + logging.basicConfig(level=logging.INFO) + + try: + if args.version: + print(get_version()) + elif args.source == '': + if args.date: + xml_items = work_xml.get_cache_news(args.date) + else: + print("URL is are required") + return False + elif args.date: + xml_items = work_xml.get_cache_news(args.date, args.source) + else: + xml_items = work_xml.take_xml_items(args.source, args.limit) + work_xml.set_cache_news(args.source, xml_items["items"]) + + if xml_items == False: + return False + + if args.json: + work_xml.generate_json(xml_items) + elif args.html: + work_xml.save_html(args.html, work_xml.generate_html(xml_items)) + elif args.pdf: + work_xml.generate_pdf(args.pdf, xml_items) + else: + work_xml.print_to_console(xml_items) + except AttributeError: + print("Error, failed to get an attribute. Check correctness URL") + + +def get_version(): + return "Version 1.4" + + +if __name__ == "__main__": + main() diff --git a/RSS_reader/src/work_xml.py b/RSS_reader/src/work_xml.py new file mode 100644 index 00000000..33530914 --- /dev/null +++ b/RSS_reader/src/work_xml.py @@ -0,0 +1,262 @@ +from dateutil import parser +from fpdf import fpdf +from bs4 import BeautifulSoup, Tag +import requests +import json +import logging +import os + +system_path = os.path.dirname(os.path.abspath(__file__)) +cache_file = os.path.join(system_path, 'cache.json') + + +def take_xml_items(link, limit): # we accept the url and the limit and return the dict + logging.info("Take xml items started") + try: + r = requests.get(link) + soup = BeautifulSoup(r.content, features='xml') + + title = soup.find('channel').findChildren("title", recursive=False) # out name feed + if title: + title = title[0].get_text() + else: + title = '' + + items_temp = soup.findAll("item", limit=limit) + items = dict() + key = 0 + for item in items_temp: + items[key] = {'title': item.title.get_text(), + 'pubDate': item.pubDate.get_text(), + 'description': item.description.get_text() if item.description else 'No description', + 'link': item.link.get_text() if item.link else 'No link', } + key += 1 + logging.info("Take xml items finished successfully") + + return {'title': title, 'items': items} + except Exception as e: + print(f'This extraction job failed. See exceptions: {e}') + logging.info("Take xml items with exception") + return False + + +def print_to_console(xml_items): # output the fields to the console + logging.info("Print to console started") + try: + if 'title' in xml_items: + print(f"Feed: {xml_items['title']}") + + for item in xml_items['items'].values(): + print(f"Title: {item['title']}") + print(f"Date: {item['pubDate']}") + print(f"Link: {item['link']}") + print(f"Description: {item['description']}") + + logging.info("Print to console finished successfully") + except Exception as e: + print(f'This extraction job failed. See exceptions: {e}') + logging.info("Print to console finished with exception") + return False + + +def generate_json(xml_items): + logging.info("Generate json started") + try: + for item in xml_items['items'].values(): + json_item = json.dumps(item, indent=4) + print(json_item) + + logging.info("Generate json finished successfully") + + except Exception as e: + print(f'This extraction job failed. See exceptions: {e}') + logging.info("Generate json finished with exception") + return False + + +def read_cache_file(): + logging.info("Read cache file started") + try: + if os.path.exists(cache_file) and os.path.getsize(cache_file) > 0: + with open(cache_file, 'r') as file: + cache = json.load(file) + else: + logging.info("Read cache file: cache file is not exist") + return dict() + + logging.info("Read cache file finished successfully") + return cache + except Exception as e: + print(f'This extraction job failed. See exceptions: {e}') + logging.info("Read cache file finished with exception") + return False + + +def write_cache_file(cache): + logging.info("Write cache file started") + try: + with open(cache_file, 'w+') as file: + json.dump(cache, file, indent=4) + + logging.info("Write cache file finished successfully") + except Exception as e: + print(f'This extraction job failed. See exceptions: {e}') + logging.info("Write cache file finished with exception") + return False + + +def re_generate_cache(cache_object, source, items): + logging.info("Re-generate cache started") + try: + for item in items.values(): + date = parser.parse(item['pubDate']).strftime('%Y%m%d') + + if date not in cache_object: + cache_object[date] = dict() + + if source not in cache_object[date]: + cache_object[date][source] = dict() + + cache_object[date][source][item['title']] = item + logging.info("Re-generate cache finished successfully") + return cache_object + except Exception as e: + print(f'This extraction job failed. See exceptions: {e}') + logging.info("Re-generate cache finished with exception") + return False + + +def set_cache_news(source, items): + logging.info("Set cache news started") + try: + cache = read_cache_file() + if not cache: + cache = dict() + + cache = re_generate_cache(cache, source, items) + + write_cache_file(cache) + logging.info("Set cache news finished successfully") + return cache + + except Exception as e: + print(f'This extraction job failed. See exceptions: {e}') + logging.info("Set cache news finished with exception") + return False + + +def get_cache_news(date, source=False): + logging.info("Get cache news started") + try: + cache = read_cache_file() + date = str(date) + + if date in cache: + if source != False and source in cache[date]: + items = dict() + key = 0 + for item in cache[date][source].values(): + items[key] = item + key += 1 + logging.info("Get cache news by date and source finished successfully") + return {'items': items} + elif source != False: + print(f'News by date and source not found in cache') + logging.info("News by date and source not found in cache") + return False + else: + items = dict() + key = 0 + for key_source in cache[date]: + for item in cache[date][key_source].values(): + items[key] = item + key += 1 + logging.info("Get cache news by date finished successfully") + return {'items': items} + else: + print(f'News by date not found in cache') + logging.info("News by date not found in cache") + return False + + except Exception as e: + print(f'This extraction job failed. See exceptions: {e}') + logging.info("Get cache news finished with exception") + return False + + +def generate_html(xml_items): + logging.info("Generate html started") + soup = BeautifulSoup() + html = Tag(soup, name="html") + body = Tag(soup, name="body") + soup.append(html) + html.append(body) + + try: + for item in xml_items['items'].values(): + div = Tag(soup, name="div") + + h1 = Tag(soup, name="h1") + h1.string = item['title'] + pPubDate = Tag(soup, name="p") + pPubDate.string = item['pubDate'] + pdescription = Tag(soup, name="p") + pdescription.string = item['description'] + alink = Tag(soup, name="a") + alink.string = item['link'] + + div.append(h1) + div.append(pPubDate) + div.append(pdescription) + div.append(alink) + + body.append(div) + + return soup.prettify() + + except Exception as e: + print(f'This extraction job failed. See exceptions: {e}') + logging.info("Generate html finished with exception") + return False + + +def save_html(path, html_str): + logging.info("Save html started") + try: + with open(os.path.join(path, "HTML_file.html"), 'w', encoding='utf-8') as file: + file.write(html_str) + logging.info("Save html finished successfully") + + except Exception as e: + print(f'This extraction job failed. See exceptions: {e}') + logging.info("Save html finished with exception") + return False + + +def generate_pdf(path, xml_items): + logging.info("Generate pdf started") + + try: + pdf = fpdf.FPDF() + pdf.add_font("Sans", style="", fname="Noto_Sans/NotoSans-Regular.ttf", uni=True) + pdf.set_font("Sans", size=16) + + pdf.add_page() + + feed = xml_items['title'] + pdf.cell(200, 10, feed, ln=1, align='C') + + for item in xml_items['items'].values(): + pdf.cell(200, 10, 'NEWS', ln=1, align='C') + pdf.multi_cell(200, 10, item['title']) + pdf.multi_cell(200, 10, item['pubDate']) + pdf.multi_cell(200, 10, item['description']) + pdf.multi_cell(200, 10, f"Link: {item['link']}") + + pdf.output(os.path.join(path, "PDF_file.pdf")) + logging.info("Generate pdf finished successfully") + + except Exception as e: + print(f'This extraction job failed. See exceptions: {e}') + logging.info("Generate pdf finished with exception") + return False \ No newline at end of file diff --git a/RSS_reader/tests/HTML_file.html b/RSS_reader/tests/HTML_file.html new file mode 100644 index 00000000..0796579d --- /dev/null +++ b/RSS_reader/tests/HTML_file.html @@ -0,0 +1,18 @@ + +
++ Tue, 28 Jun 2022 19:52:00 +0300 +
++ Празднование Дня города прошло на высоком уровне - была составлена насыщенная программа, соблюдались меры безопасности и контроля, оперативно убран мусор. +
+ + https://vse.sale/news/view/37519 + +