Skip to content
Snippets Groups Projects
Commit fadb0b13 authored by Luis Salamanca's avatar Luis Salamanca
Browse files

Notebooks with basic for metadat usage

parent 5b70b3ab
No related branches found
No related tags found
No related merge requests found
%% Cell type:code id: tags:
``` python
%load_ext autoreload
%autoreload 2
import os, sys
sys.path.append('../src/python/')
import def_classes as defc
import numpy as np
from pdf2image import convert_from_path, convert_from_bytes
import utils_proc
```
%% Output
The autoreload extension is already loaded. To reload it, use:
%reload_ext autoreload
%% Cell type:code id: tags:
``` python
year = 1978
folder_database = '../data/AB/'
```
%% Cell type:markdown id: tags:
## List the files inside the tar
%% Cell type:code id: tags:
``` python
name_tar = '00_rawpdfs'
list_docs = utils_proc.get_list(year, folder_database, name_tar)
print(len(list_docs[0]))
np.transpose(list_docs)
```
%% Output
1011
array([['./1978/20006356.pdf', '20006356'],
['./1978/20006357.pdf', '20006357'],
['./1978/20006358.pdf', '20006358'],
...,
['./1978/20007364.pdf', '20007364'],
['./1978/20007365.pdf', '20007365'],
['./1978/20007366.pdf', '20007366']], dtype='<U19')
%% Cell type:code id: tags:
``` python
# From the ones above
iddoc = '20026449'
input_file = './' + str(year) +'/' + iddoc +'.pdf'
```
%% Cell type:code id: tags:
``` python
d1 = defc.Document(input_file, folder_database)
```
%% Cell type:code id: tags:
``` python
d1.
```
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment