Feature/ timeline id in clone section for Spilo#760
Feature/ timeline id in clone section for Spilo#760Sudeepta92 wants to merge 5 commits intozalando:masterfrom
Conversation
| parser = argparse.ArgumentParser(description="Script to clone from S3 with support for point-in-time-recovery") | ||
| parser.add_argument('--scope', required=True, help='target cluster name') | ||
| parser.add_argument('--datadir', required=True, help='target cluster postgres data directory') | ||
| parser.add_argument('--scope', required=True, |
There was a problem hiding this comment.
| parser.add_argument('--scope', required=True, | |
| parser.add_argument('--scope', required=True, |
| parser.add_argument('--datadir', required=True, help='target cluster postgres data directory') | ||
| parser.add_argument('--scope', required=True, | ||
| help='target cluster name') | ||
| parser.add_argument('--datadir', required=True, |
There was a problem hiding this comment.
| parser.add_argument('--datadir', required=True, | |
| parser.add_argument('--datadir', required=True, |
| help='the timestamp up to which recovery will proceed (including time zone)', | ||
| dest='recovery_target_time_string') | ||
| parser.add_argument('--dry-run', action='store_true', help='find a matching backup and build the wal-e ' | ||
| parser.add_argument('--dry-run', action='store_true', |
There was a problem hiding this comment.
| parser.add_argument('--dry-run', action='store_true', | |
| parser.add_argument('--dry-run', action='store_true', |
| parser.add_argument('--recovery-target-timeline', | ||
| help='the timeline up to which recovery will proceed. Leave empty for latest.', | ||
| dest='recovery_target_timeline', | ||
| type=lambda timeline_id: int(timeline_id,16)) |
There was a problem hiding this comment.
| type=lambda timeline_id: int(timeline_id,16)) | |
| type=lambda timeline_id: int(timeline_id, 16)) |
| recovery_target_time = None | ||
|
|
||
| return options(args.scope, args.datadir, recovery_target_time, args.dry_run) | ||
| if args.recovery_target_timeline == None: |
There was a problem hiding this comment.
| if args.recovery_target_timeline == None: | |
| if args.recovery_target_timeline is None: |
|
|
||
| return options(args.scope, args.datadir, recovery_target_time, recovery_target_timeline, args.dry_run) | ||
|
|
||
| def get_latest_timeline(): |
There was a problem hiding this comment.
| def get_latest_timeline(): | |
| def get_latest_timeline(): |
| latest_timeline_id = int(backup["name"][5:13], 16) | ||
| return latest_timeline_id | ||
|
|
||
| def build_wale_command(command, datadir=None, backup=None): |
There was a problem hiding this comment.
| def build_wale_command(command, datadir=None, backup=None): | |
| def build_wale_command(command, datadir=None, backup=None): |
| def get_latest_timeline(): | ||
| env = os.environ.copy() | ||
| backup_list = list_backups(env) | ||
| latest_timeline_id = int("00000000",16) |
There was a problem hiding this comment.
| latest_timeline_id = int("00000000",16) | |
| latest_timeline_id = int("00000000", 16) |
|
@Sudeepta92 let's step back and explain what problem are you trying to solve? The thing is that recovery_target_timeline has only very narrow use: https://www.postgresql.org/docs/current/continuous-archiving.html#BACKUP-TIMELINES
This is exactly what postgres-operator avoids: every time you deploy a new cluster it gets a new and unique archive location. That is, we will never have the situation when two recovery attempts are performed with the same archive location, what makes this feature effectively useless... |
@CyberDem0n I'm not sure if I understand you correctly. It's not always true that a unique archive location is created for a database. This can be because of two reasons. First, you can specify that there should be no prefix and suffix within the archive location path (S3), which leads to a location which only holds the cluster name and version and no UID. Second, you can do an "in place restore" (described here) This will lead to a new cluster UID but in the case above where the UID is not represented in the archive location, this will lead to the same problem. On the Operator repository, there is also a open PR to add the needed configuration to the postgresql manifest, they seem to be willing to implement this. (See here) When having UIDs in the backup location path, this makes it nearly impossible to automate a restore process and make backups selectable over some kind of webinterface, cause you probably will not have a history of all UIDs that a database might had in the past, therefore you cannot make the backups selectable anymore using an automation. I think this is the reason why one would disable the prefix / suffix stuff. Just my thoughts. Philip |
Well, if you shoot your own foot, it hurts badly. Just don't do it. Two different clusters must not be writing to the same archive location. |
They are not different clusters, just a restored one which takes place from the original one (inplace restore). And second, I don't understand why you use an offensive term like "shoot in the foot", it's the concept of timelines in Postgres, so not a shady workaround. Philip |
|
If the old cluster is still up and running and the new cluster is forked from the old one (with the new timeline), they are two different clusters.
If you are unlucky enough, both clusters will end up on the same timeline and comparable LSNs, and there will be clashes in WAL file names. Even if the old cluster is down, there will be a mess in the archive after PITR. I don't understand the offensiveness of the "shoot your own foot". It is a well-known idiom and in this context, it means that we shouldn't be creating problems in the first place in order to make a significant effort to solve these problems later. |
To modify Spilo image to process timeline id in the clone section.