Skip to content
Snippets Groups Projects
Forked from Luis Salamanca / democrasci_preprocWP1
105 commits behind the upstream repository.

Democrasci Project WP1 - Preprocessing

In this repository we collect all functions, data, notebooks and tools that comprise the first WP of the Democrasci project. Broadly, we can find the following:

  • Data (data/democrasci/): all the different files splitted in folders, by year. First, the raw data files, that include the pdfs and metadata of each session. Then, different files for each of the preprocessing steps.
  • Functions(src/python): all the code included in the current repo is intended for the preprocessing of the raw data, in order to curate it, annotate it, and extract the meaningful information for later NLP analysis. That is the reason why the project is splitted into two repos, as they can be seen as separate entities.
  • Notebooks and tools (notebooks): in this case, we have several notebooks that facilitate the analyses of all the intermediate steps carried out throughout WP1. This way, we can explore the quality of the results, and consider the implementation of improvements, etc.

Last update 07/01/2019