Feature/ timeline id in clone section for Spilo by Sudeepta92 · Pull Request #760 · zalando/spilo

Sudeepta92 · 2022-07-19T08:48:43Z

To modify Spilo image to process timeline id in the clone section.

hughcapet · 2022-08-02T15:36:15Z

    parser = argparse.ArgumentParser(description="Script to clone from S3 with support for point-in-time-recovery")
-    parser.add_argument('--scope', required=True, help='target cluster name')
-    parser.add_argument('--datadir', required=True, help='target cluster postgres data directory')
+    parser.add_argument('--scope', required=True, 


Suggested change

parser.add_argument('--scope', required=True,

parser.add_argument('--scope', required=True,

hughcapet · 2022-08-02T15:36:32Z

-    parser.add_argument('--datadir', required=True, help='target cluster postgres data directory')
+    parser.add_argument('--scope', required=True, 
+                        help='target cluster name')
+    parser.add_argument('--datadir', required=True, 


Suggested change

parser.add_argument('--datadir', required=True,

parser.add_argument('--datadir', required=True,

hughcapet · 2022-08-02T15:36:57Z

                        help='the timestamp up to which recovery will proceed (including time zone)',
                        dest='recovery_target_time_string')
-    parser.add_argument('--dry-run', action='store_true', help='find a matching backup and build the wal-e '
+    parser.add_argument('--dry-run', action='store_true', 


Suggested change

parser.add_argument('--dry-run', action='store_true',

parser.add_argument('--dry-run', action='store_true',

hughcapet · 2022-08-02T15:37:22Z

+    parser.add_argument('--recovery-target-timeline',
+                        help='the timeline up to which recovery will proceed. Leave empty for latest.',
+                        dest='recovery_target_timeline',
+                        type=lambda timeline_id: int(timeline_id,16))


Suggested change

type=lambda timeline_id: int(timeline_id,16))

type=lambda timeline_id: int(timeline_id, 16))

hughcapet · 2022-08-02T15:37:56Z

        recovery_target_time = None

-    return options(args.scope, args.datadir, recovery_target_time, args.dry_run)
+    if args.recovery_target_timeline == None:


Suggested change

if args.recovery_target_timeline == None:

if args.recovery_target_timeline is None:

hughcapet · 2022-08-02T15:38:31Z

+
+    return options(args.scope, args.datadir, recovery_target_time, recovery_target_timeline, args.dry_run)

+def get_latest_timeline():


Suggested change

def get_latest_timeline():

def get_latest_timeline():

hughcapet · 2022-08-02T15:38:58Z

+            latest_timeline_id = int(backup["name"][5:13], 16)
+    return latest_timeline_id

 def build_wale_command(command, datadir=None, backup=None):


Suggested change

def build_wale_command(command, datadir=None, backup=None):

def build_wale_command(command, datadir=None, backup=None):

hughcapet · 2022-08-02T15:39:30Z

+def get_latest_timeline():
+    env = os.environ.copy()
+    backup_list = list_backups(env)
+    latest_timeline_id = int("00000000",16)


Suggested change

latest_timeline_id = int("00000000",16)

latest_timeline_id = int("00000000", 16)

CyberDem0n · 2022-08-03T12:49:14Z

@Sudeepta92 let's step back and explain what problem are you trying to solve?

The thing is that recovery_target_timeline has only very narrow use: https://www.postgresql.org/docs/current/continuous-archiving.html#BACKUP-TIMELINES

The default behavior of recovery is to recover to the latest timeline found in the archive. If you wish to recover to the timeline that was current when the base backup was taken or into a specific child timeline (that is, you want to return to some state that was itself generated after a recovery attempt), you need to specify current or the target timeline ID in recovery_target_timeline. You cannot recover into timelines that branched off earlier than the base backup.

This is exactly what postgres-operator avoids: every time you deploy a new cluster it gets a new and unique archive location. That is, we will never have the situation when two recovery attempts are performed with the same archive location, what makes this feature effectively useless...

thedatabaseme · 2022-08-30T10:18:52Z

This is exactly what postgres-operator avoids: every time you deploy a new cluster it gets a new and unique archive location. That is, we will never have the situation when two recovery attempts are performed with the same archive location, what makes this feature effectively useless...

@CyberDem0n I'm not sure if I understand you correctly. It's not always true that a unique archive location is created for a database. This can be because of two reasons. First, you can specify that there should be no prefix and suffix within the archive location path (S3), which leads to a location which only holds the cluster name and version and no UID. Second, you can do an "in place restore" (described here) This will lead to a new cluster UID but in the case above where the UID is not represented in the archive location, this will lead to the same problem.

On the Operator repository, there is also a open PR to add the needed configuration to the postgresql manifest, they seem to be willing to implement this. (See here)

When having UIDs in the backup location path, this makes it nearly impossible to automate a restore process and make backups selectable over some kind of webinterface, cause you probably will not have a history of all UIDs that a database might had in the past, therefore you cannot make the backups selectable anymore using an automation. I think this is the reason why one would disable the prefix / suffix stuff.

Just my thoughts.

Philip

CyberDem0n · 2022-08-30T10:26:28Z

you can specify that there should be no prefix and suffix within the archive location path (S3), which leads to a location which only holds the cluster name and version and no UID.

Well, if you shoot your own foot, it hurts badly. Just don't do it. Two different clusters must not be writing to the same archive location.

thedatabaseme · 2022-08-30T11:24:03Z

you can specify that there should be no prefix and suffix within the archive location path (S3), which leads to a location which only holds the cluster name and version and no UID.

Well, if you shoot your own foot, it hurts badly. Just don't do it. Two different clusters must not be writing to the same archive location.

They are not different clusters, just a restored one which takes place from the original one (inplace restore). And second, I don't understand why you use an offensive term like "shoot in the foot", it's the concept of timelines in Postgres, so not a shady workaround.

Philip

CyberDem0n · 2022-08-30T12:07:47Z

If the old cluster is still up and running and the new cluster is forked from the old one (with the new timeline), they are two different clusters.
They just have two common things:

cluster system identifier
and the ancestor

If you are unlucky enough, both clusters will end up on the same timeline and comparable LSNs, and there will be clashes in WAL file names.

Even if the old cluster is down, there will be a mess in the archive after PITR.

I don't understand the offensiveness of the "shoot your own foot". It is a well-known idiom and in this context, it means that we shouldn't be creating problems in the first place in order to make a significant effort to solve these problems later.

Sudeepta Patra added 4 commits July 19, 2022 08:45

add timeline in clone

da1c1e9

remove default for timeline

c51d485

remove extra space

4959475

add new line eof

95cb05d

hughcapet reviewed Jul 27, 2022

View reviewed changes

Comment thread postgres-appliance/bootstrap/clone_with_wale.py

Sudeepta92 marked this pull request as draft August 2, 2022 08:47

fixing bug to choose backup from right timeline

83170e0

Sudeepta92 marked this pull request as ready for review August 2, 2022 11:36

hughcapet requested changes Aug 3, 2022

View reviewed changes

FxKu mentioned this pull request Jan 5, 2023

Feature/timelineid in clone section zalando/postgres-operator#1961

Open

	parser.add_argument('--scope', required=True,
	parser.add_argument('--scope', required=True,

	parser.add_argument('--datadir', required=True,
	parser.add_argument('--datadir', required=True,

	parser.add_argument('--dry-run', action='store_true',
	parser.add_argument('--dry-run', action='store_true',

	type=lambda timeline_id: int(timeline_id,16))
	type=lambda timeline_id: int(timeline_id, 16))

	if args.recovery_target_timeline == None:
	if args.recovery_target_timeline is None:


		return options(args.scope, args.datadir, recovery_target_time, recovery_target_timeline, args.dry_run)

		def get_latest_timeline():

	def build_wale_command(command, datadir=None, backup=None):

	def build_wale_command(command, datadir=None, backup=None):

	latest_timeline_id = int("00000000",16)
	latest_timeline_id = int("00000000", 16)

Conversation

Sudeepta92 commented Jul 19, 2022

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CyberDem0n commented Aug 3, 2022

Uh oh!

thedatabaseme commented Aug 30, 2022

Uh oh!

CyberDem0n commented Aug 30, 2022

Uh oh!

thedatabaseme commented Aug 30, 2022

Uh oh!

CyberDem0n commented Aug 30, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants