Skip to content
This repository was archived by the owner on Feb 7, 2019. It is now read-only.
This repository was archived by the owner on Feb 7, 2019. It is now read-only.

Missing delimiters and NULL placeholders #3

@wwentland

Description

@wwentland

xml2sql output is faulty for missing columns

If I run xml2sql i get the following output for revision:

$ head -n 6 revision.sql
-- xml2sql - MediaWiki XML to SQL converter
-- Table revision for PostgreSQL

COPY "revision" FROM STDIN;
287059  2   287059  roboti Nyongeza: [[tpi:Akiolosi]]   988 LaaknorBot  2009-08-19T03:20:58Z    1   0
32885   10  32885   +jamii  160 Baba Tabita 2007-02-02T10:14:01Z    1   0

If i try to import this i get the following error:

psql -h localhost -d test2 -U babilen -W < revision.sql
Password for user babilen: 
ERROR:  missing data for column "rev_len"
CONTEXT:  COPY revision, line 1: "287059    2   287059  roboti Nyongeza: [[tpi:Akiolosi]]   988 LaaknorBot  20090819032058  1   0"

The issue can be fixed if the following awk oneliner is applied to the txt files (or lines within revision.sql that hold values)

awk -v e=10 'BEGIN { FS = OFS = "\t"; e++ } NF > e { exit(1) } { t = ""; for (a = 0; a < e - NF; a++) t = t "\t\\N"; printf "%s%s\n", $0, t }'

The difference in output is:
287059 2 287059 roboti Nyongeza: [[tpi:Akiolosi]] 988 LaaknorBot 2009-08-19T03:20:58Z 1 0
287059 2 287059 roboti Nyongeza: [[tpi:Akiolosi]] 988 LaaknorBot 2009-08-19T03:20:58Z 1 0 \N \N

The \N are placeholders for NULL values which is OK since the respective columns (rev_len and rev_parent_id) are defined as nullable:

CREATE TABLE revision (
....
   rev_len         INTEGER          NULL,
   rev_parent_id   INTEGER          NULL
);

thanks

babilen

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions