Skip to content

RocketRML on GTFSbench CSV 1 #44

Description

@Aklakan

Hi,

For an academic evaluation of RML tools we are trying to reproduce the results for gtfs-bench CSV size=1 from this publication (PDF):

Arenas-Guerrero, Julián, et al. "Knowledge graph construction with R2RML and RML: an ETL system-based overview." CEUR workshop proceedings.. Vol. 2873. CEUR Workshop Proceedings, 2021.

However, running your system either locally or via docker - e.g. using

> docker run -e "NODE_OPTIONS=--max-old-space-size=16000" -v /home/user/datasets/gtfsbench/datasets/csv/1/.:/data t/rmldocker

results in out of memory issues [1] already on the smallest dataset because at the end all data is loaded into an in-memory string rather than streaming it out.

To me it is unclear whether this issue has been communicated by the authors and thus whether
a possible fix only exists on the authors side or whether you already have the fix but it just did not make it back into this repository.

Best regards,
Claus

[1] Stacktrace:

Perform joins..

<--- Last few GCs --->

[7:0x629c7e0]    46790 ms: Scavenge 7864.2 (8026.4) -> 7862.5 (8028.4) MB, 8.4 / 0.0 ms  (average mu = 0.149, current mu = 0.098) allocation failure; 
[7:0x629c7e0]    46805 ms: Scavenge 7869.1 (8031.4) -> 7867.2 (8040.4) MB, 8.9 / 0.0 ms  (average mu = 0.149, current mu = 0.098) allocation failure; 
[7:0x629c7e0]    54084 ms: Mark-Compact 7880.6 (8047.9) -> 7876.6 (8051.1) MB, 6874.2 / 0.0 ms  (average mu = 0.106, current mu = 0.070) allocation failure; GC in old space requested


<--- JS stacktrace --->

FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
 1: 0xbe6ce0 node::Abort() [node]
 2: 0xaf28b4  [node]
 3: 0xdc9ae0 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [node]
 4: 0xdc9e96 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [node]
 5: 0xfc8915  [node]
 6: 0xfdc045 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
 7: 0xfb7daf v8::internal::HeapAllocator::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [node]
 8: 0xfb8df7 v8::internal::HeapAllocator::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [node]
 9: 0xf9855a v8::internal::Factory::NewFillerObject(int, v8::internal::AllocationAlignment, v8::internal::AllocationType, v8::internal::AllocationOrigin) [node]
10: 0x13ac1cd v8::internal::Runtime_AllocateInOldGeneration(int, unsigned long*, v8::internal::Isolate*) [node]
11: 0x18323f9  [node]
Aborted (core dumped)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions