Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ celerybeat-schedule
.venv
src/venv/
env/
# venv/
venv/
ENV/
env.bak/
venv.bak/
Expand Down
8 changes: 0 additions & 8 deletions .pydevproject

This file was deleted.

674 changes: 0 additions & 674 deletions LICENSE

This file was deleted.

603 changes: 0 additions & 603 deletions LICENSE.txt

This file was deleted.

11 changes: 0 additions & 11 deletions Pipfile

This file was deleted.

20 changes: 10 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,28 +1,28 @@
# PyHarvest
Extremely naive harvester. The key element is the postgresql database that manages resources in such a way that unique URI are kept in a seperate tables and only a reference (id) refers to it.
Extremely naive harvester. The key element is the PostGresql database that manages resources in such a way that unique URI are kept in a seperate tables and only a reference (id) refers to it.


## Data model

![ER model](store_er.png)

Unique resources are stored in resources table and given a unique id.
Unique resources are stored in the resources table and given a unique id.
Statements which the object are stored in `t_resource` and is therefore just a collections of id (subject,predicate,object,context).
Statements which the object is a literal are kept in `t_literal` and stores different aspect of the literal : it's value, its type and its language.
The literal types are manages in `lit_type`, only the id of that type is stored in `t_literal`
The literal types are managed in `lit_type`, only the id of that type is stored in `t_literal`

Two views ; `resource_triples` and `literal_triples` merge back the resources ids and their values.
Two views: `resource_triples` and `literal_triples` merge back the resources ids and their values.
Those views are also used to insert triples. The views have INSTEAD OF triggers that does the house keeping of create a new resources if it does not already exists in `resources` table. The trigger on `literal_triples`also takes care of this job for literal types - if any. NULL type is just a string.
## Example code
`main.py` shows an simple example of inserting triples into the database

## Example code

`main.py` shows a simple example of inserting triples into the database

To run this example, you must set an environment `GSIP_HARV_CON_STR` variable with the connection string (eg: `host=localhost dbname=gsip user=gsip password=S3kret`)

## Running the app

* activate your virtualenv.
* activate your virtualenv: . venv/bin/activate
* run: pip install -r requirements.txt
* run: python app.py
* navigate to: http://127.0.0.1:5000/
* navigate to: http://127.0.0.1:8000/
Loading