Author Disambiguation

This repository contains the Author Disambiguation Project.

The objective is to be able to recognize all articles written by a specific author who has written a specific title.

The folder R contains the following codes:

Signatures.R: Prepares the dataset by accessing to a local MySQL database.
Model_Disambiguated_CV.R: Generates a Cross validated Ensemble Model utilizing a subset of disambiguated signatures.
Model.R

The folder python contains models built using scikit-learn and word2vec. Name2Vec.bin is a skip-gram model that was trained with following parameters:

min_word_count = 1
vector_size = 10
word_window = 5
worker_threads = 8

The spark folder contains code to build datapipelines that generate author,title pair signatures. All the pre-procesing functions are contained in the PreProcessingUtil.scala and Features.scala contains pipeline logic.

The folder disambiguattion-app contains:

The User guide for the Application developed in Rpres.
The Shiny App code in R.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
Case Study Author2Vec		Case Study Author2Vec
Graph exploration		Graph exploration
R		R
disambiguation-app		disambiguation-app
python		python
spark		spark
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Author Disambiguation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Author Disambiguation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages