-
Notifications
You must be signed in to change notification settings - Fork 14
Open
Description
Is this repository an appropriate place to asking questions about this client: https://pypi.org/project/transkribus-client?
How it would be possible to exclude images from download?
I found that there should be an option bNoImage (cf. https://github.com/Transkribus/TranskribusPyClient/blob/master/src/TranskribusPyClient/client.py).
But it is not clear for me where it should be passed in? Here is how the download was implemented by a predecessor of me:
Click to see the class.
import logging
import os
import shutil
from pathlib import Path
from typing import List
from classes.conf import ConverterConfig
from classes.logger import Logger
from tqdm import tqdm
from transkribus import TranskribusAPI
from transkribus.models import Collection, Document
CONF = ConverterConfig()
LOG = Logger().get_logger()
class TranskribusDownloader:
"""
TranskribusDownloader is a wrapper for inofficial transkribus-client
(https://gitlab.com/arkindex/transkribus/-/blob/master/transkribus/api.py)
"""
api: TranskribusAPI
def downloadDocuments(
self,
colId: int,
docIds: List[int],
downloadDir: str,
usePreviousDownload: bool
) -> List[Path]:
"""Downloads documents from Transkribus and returns
a list of Paths of the directories containing the downloaded files.
Attention! removes every file from downloadDir first
Args:
colId (int): id of transkribus collection
docIds (List[int]): ids of transkribus documents
downloadDir(str): parent directory for downloads
Returns:
List of Paths of the directories containing the downloaded files
"""
directories = []
downloadDir = Path(downloadDir)
if not usePreviousDownload:
for subdirectory in [f.path for f in os.scandir(downloadDir) if f.is_dir()]:
shutil.rmtree(subdirectory)
for docId in tqdm(docIds, desc='Downloading documents'):
if not usePreviousDownload:
collection: Collection = Collection(int(colId))
doc: Document = Document(
collection,
int(docId)
)
LOG.info(
f'Downloading document {docId} from collection {colId}')
doc.download(self.api, downloadDir)
directories.append(Path(f'{downloadDir}/{docId}'))
return directories
def __init__(self):
try:
self.api = TranskribusAPI()
self.api.login(CONF.transkribusUser(), CONF.transkribusPassword())
except:
LOG.error('No transkribus credentials provided. Could not log in.') Metadata
Metadata
Assignees
Labels
No labels