-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathVS.tex
More file actions
1428 lines (1198 loc) · 57.7 KB
/
VS.tex
File metadata and controls
1428 lines (1198 loc) · 57.7 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\documentclass[ngerman,a4paper]{report}
\usepackage[english,ngerman]{babel}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage{geometry}
\usepackage{caption}
\usepackage{hyperref}
\usepackage{MyriadPro}
%\usepackage{MinionPro}
\usepackage{graphicx}
%\geometry{verbose,tmargin=3cm,bmargin=3cm,lmargin=3cm,rmargin=3cm}
\usepackage{listings}
\usepackage{paralist}
\usepackage{stmaryrd}
\usepackage{color}
%\usepackage{floatflt}
\usepackage{amsmath}
%\usepackage{amssymb}
\usepackage{float}
\definecolor{dkgreen}{rgb}{0,0.6,0}
\definecolor{gray}{rgb}{0.5,0.5,0.5}
\definecolor{mauve}{rgb}{0.58,0,0.82}
\lstset{language=C,
numbers=left,
numberstyle=\tiny\color{gray},
stepnumber=1,
numbersep=5pt,
%basicstyle=\tiny,
%frame = single,
tabsize =2,
breaklines = true,
breakatwhitespace = false,
keywordstyle=\color{blue}, % keyword style
commentstyle=\color{dkgreen}, % comment style
stringstyle=\color{mauve}, % string literal style
literate=%
{Ö}{{\"O}}1
{Ä}{{\"A}}1
{Ü}{{\"U}}1
{ß}{{\ss}}2
{ü}{{\"u}}1
{ä}{{\"a}}1
{ö}{{\"o}}1
}
\selectlanguage{english}
\renewcommand{\familydefault}{\sfdefault}
\author{Hinnerk van Bruinehsen\\Tobias Höppner\\Tobias Famulla\\Johannes Dillmann\\Julian Dobmann\\Jens Fischer}
\title{Lecture Notes\\\Huge{Distributed System}}
\date{SoSe 2013}
\begin{document}
\maketitle
\tableofcontents
\chapter{Verteilte Systeme/Distributed Systems}
\section{Orga}
VL Di 10-12 (nicht am 23.04.)\\
Ue Do 10-12\\
\subsection{Elektisches}
\begin{compactitem}
\item (kvv)
\item Website AG
\item Sakai
\end{compactitem}
\subsection{Übungen}
\begin{compactitem}
\item ca. 5 Übungsblätter, 14-tägig
\item Vorträge in Gruppen über \glqq verteilte Systeme\grqq
\end{compactitem}
\subsection{Material/Inhalt}
\begin{compactitem}
\item[1. Hälfte] Distributed Systems (Tanenbaum, van Steen)
\begin{compactitem}
\item Architektur
\item Prozesse
\item Kommunikation
\item Namen
\item Synchronisation
\item Konsistenz
\item Replikation
\item Fehlertoleranz
\end{compactitem}
\item[2. Hälfte] Distributed Algorithms (Nancy Lynch)
\begin{compactitem}
\item synchronous network algorithms
\item network models (leader election, shortest path, distributed consensus, byzantine agreement)
\item asynchronous network algorithms (shared memory, mutual exclusion, resource allocation, consensus)
\item timing
\item network resource allocation
\item failure detectors
\end{compactitem}
\end{compactitem}
\chapter{Distributed Systems}
\textbf{Def:} A distributed System is a collection of independent computers that appears to it's users as a single coherent system.
Characteristics:
\begin{compactitem}
\item autonomous components
\item appears as single system
\item communication is hidden
\item organisation is hidden \\(could be high-performance mainframe or sensor net)
\item heterogenous system offers homogenous look/interface
\end{compactitem}
\ \\
\ \\
There are 4 goals of distributed systems:
\begin{compactenum}
\item Making Resources Accessible
\item Distribution Transparency
\item Openness
\item Scalability
\end{compactenum}
\section{Making Resources Accessible}
\begin{compactitem}
\item provide users (and applications) access to remote resources (printer, storage, computing)
\item share ressources in a controlled efficient way
\end{compactitem}
\section{Distribution Transparency}
Hide the fact that processes and resources are physically distributed. A distributed system that is able to present itself to users and applications as if it were only a single computer system is said to be transparent.
\begin{compactitem}
\item transparancy is desireable, but not always perfectly possible
\item tradeoff between transparancy and complexity, maintainablility and performance
\end{compactitem}
\pagebreak
\textbf{Types of transparancy:}
\begin{compactitem}
\item[\textbf{access}] hide differences in data representation and how a resource is accessed
\item[\textbf{location}] hide where a resource is located
\item[\textbf{migration}] hide that a resource may move to another location
\item[\textbf{relocation}] hide that a resource may be moved to another location while in use
\item[\textbf{replication}] hide that a resource is replicated
\item[\textbf{concurrency}] hide that a resource may be shared by serveral competitive users
\item[\textbf{failure}] hide the failure and recovery of a resource
\end{compactitem}
\section{Openness}
\begin{compactitem}
\item An open distributed system is a system that offers services according to standard rules that describe the syntax and semantics of those services
\item service interfaces (syntax) specified using Interface Definition Language (IDL)
\item service specification (semantics) as text
\end{compactitem}
\section{Scalability}
is an important property, distributed systems should be scalable in \\
\begin{compactitem}
\item[\textbf{size}] number of nodes, users, resources
\item[\textbf{geographic spread}] geographical distribution of users and resources (may lie far apart)
\item[\textbf{administration}] manageability, even if even if it spans many independent administrative organizations
\end{compactitem}
\subsection{Scaling in size}
If more users or resources need to be supported, we are often confronted with the limitations of centralized services, data, and algorithms. Scalability in size is limitated by:
\begin{compactitem}
\item [\textbf{centralized services}] A single server for all users (bottleneck)
\item [\textbf{centralized data}] A single on-line telephone book or non-distributed DNS
\item [\textbf{centralized algorithms}] Doing routing based on \emph{complete} information
\end{compactitem}
\subsection{Geographical Scalability}
\begin{compactitem}
\item existing distributed systems were designed for LANs
\item $\rightarrow$ LAN-based systems often use synchronous communication
\item $\rightarrow$ LAN-based systems provide relatively reliable communication
\item $\rightarrow$ in LAN-based systems broadcasting is possible
\item all of this is not possible in WAN $\Rightarrow$ problems in geographical scalability
\end{compactitem}
\subsection{Scalability in administration}
how to scale a distributed system across multiple, independent administrative domains. Important security questions arise:
\begin{compactitem}
\item [\textbf{selfprotection}] from malicious attacks from the new domain
\item [\textbf{domainprotection}] the domain protects itself from attacks from a new domain
\end{compactitem}
\subsection{Scaling techiques}
\begin{compactitem}
\item [\textbf{hiding communication latencies}] use only asynchronous communication
\item [\textbf{distribution}] split components into smaller parts :)
\item [\textbf{replication}] of components, chaching, enables load balancing
\end{compactitem}
\section{Pitfalls}
False assumptions that everyone makes when developing a distributed application for the first time:
\begin{compactenum}
\item The network is reliable.
\item The network is secure.
\item The network is homogeneous.
\item The topology does not change.
\item Latency is zero.
\item Bandwidth is infinite.
\item Transport cost is zero.
\item There is one administrator.
\end{compactenum}
\section{Types of distributed systems}
\subsection{Distributed Computing Systems}
\subsubsection{Cluster Computing Systems}
\begin{compactitem}
\item the underlying hardware consists of a collection of similar workstations or PCs
\item these closely connected by means of a high speed local-area network
\item each node runs the same operating system
\item used for instance for the building of supercomputers using off-the-shelf technology by hooking up a collection of relatively simple computers in a high-speed network
\end{compactitem}
\begin{figure}[h]
\centering
\includegraphics[width=200px]{gfx/cluster_computing.png}
\caption{cluster computing}
\label{img:cluster_comp}
\end{figure}
\subsubsection{Grid Computing Systems}
\begin{compactitem}
\item grid computing systems have a high degree of heterogeneity: no assumptions are made concerning hardware, operating systems, networks, administrative domains, secu- rity policies, etc.
\item virtual organization: resources from different organizations are brought together to allow the collaboration of a group of people or institutions
\item geographically distributed
\end{compactitem}
\subsection{Distributed Information Systems}
\subsubsection{Transaction Processing Systems}
A distributed system for processing transactions, e.g. a (distributed) database.
Transactions are defined through fulfilling the \textbf{ACID} (atomicity, consistency, isolation, durability) properties:
\begin{compactdesc}
\item[Atomic] To the outside world, the transaction, happens indivisibly
\item[Consistens] The transaction does not violate system invariants
\item[Isolated] Concurrent transactions do not interfere with each other
\item[Duarble] Once a transaction commits, the changes are permanent
\end{compactdesc}
\subsubsection{Enterprise Application Integration}
\begin{compactitem}
\item the more applications became decoupled from the databases they were built upon, the more evident it became that facilities were needed to integrate applications independent from their databases
\item application components should be able to communicate directly with each other and not mere- ly by means of the request/reply behavior that was supported by transaction processing systems
\item main idea was that existing applications could directly exchange information via communication middleware
\item see RPC
\end{compactitem}
\subsubsection{Distributed Pervasive Systems}
\begin{compactitem}
\item The distributed systems we have been discussing so far are largely characterized by their stability: nodes are fixed and have a more or less permanent and high-quality connection to a network
\item in distributed pervasive systems, instability is the default behavior
\item often small, wireless, adhoc, no administration
\item Examples: Home automation, Health systems, Sensor Networks
\end{compactitem}
\chapter{Architectures 1: Architectural styles}
\begin{compactitem}
\item how to split software into components\\
$\Rightarrow$ Software architecture (aka architectural styles)
\item how to build a system out of the components\\
$\Rightarrow$ System architecture
\end{compactitem}
\ \\
Middleware can help to create distribution transparency
\section{Layered architecture}
\begin{compactitem}
\item components are organized in a layered fashion
\item a component at layer $L_n$ is allowed to call (request) components at the underlying layer $L_{n-1}$, but not the other way around
\item control flows from layer to layer
\item request down, reply up
\item widely adopted in network stack
\end{compactitem}
\section{Object-based architectures}
\begin{compactitem}
\item interaction between components
\item components are connected through a (remote) procedure call mechanism
\item associated with client-server system architecture
\end{compactitem}
\begin{figure}[h]
\centering
\includegraphics[width=300px]{gfx/layeres_and_objects.png}
\caption{The (a) layered and (b) object-based architectural style.}
\label{img:layeres_and_objects}
\end{figure}
\section{Data-centered architectures}
\begin{compactitem}
\item processes communicate through a common (passive or active) repository
\item Example: networked applications with shared distributed file system in which all communication takes place through files
\item Example: Web-based distributed system: processes communicate through the use of shared database
\item Example: distributed database
\end{compactitem}
\section{Event-based architecture}
\begin{figure}[h]
\centering
\includegraphics[width=150px]{gfx/pub_sub.png}
\caption{publish subsribe system}
\label{img:publish_subscribe}
\end{figure}
\begin{compactitem}
\item aka publish-subscribe systems
\item processes essentially through the propagation of events
\item publisher announces events at broker
\item only those processes that subscribed will receive the events
\item $\Rightarrow$ loose coupling (publisher and subscriber need not to know each other), decoupled in space\\
\item $\Rightarrow$ scalability better than client-server, parallel processing, caching\\
\end{compactitem}
\section{Shared data spaces}
Event-based and data-based can be combined $\Rightarrow$ shared Data space
\begin{figure}[h]
\centering
\includegraphics[width=150px]{gfx/shared_data_space.png}
\caption{shared data space}
\label{img:shared_data_space}
\end{figure}
\chapter{Architectures 2: System architectures}
\begin{compactitem}
\item \textbf{vertical distribution} (layering, multitiered architectures)
\begin{compactitem}
\item placing logically different components on different machines\\
$\Rightarrow$ centralized architectures / client-server
\end{compactitem}
\item \textbf{horizontal distribution}
\begin{compactitem}
\item replicated client/server operating on different data (different parts of the data)
\item all the machines fulfill in principal the same role\\
$\Rightarrow$ peer-to-peer systems
\end{compactitem}
\end{compactitem}
\section{Centralized architectures}
\subsection{Client - server}
\begin{figure}[h]
\centering
\includegraphics[width=200px]{gfx/cs_simple_wait.png}
\caption{client server simple waiting situation}
\label{img:cs_simple_wait}
\end{figure}
\begin{compactitem}
\item processes in a distributed system are divided into two (possibly overlapping) groups
\item \textbf{server:} a process implementing a specific service, for example, a file system service or a database service
\item \textbf{client:} a process that requests a service from a server by sending it a request and subsequently waiting for the server's
\item problems:
\begin{compactitem}
\item single point of failure
\item performance (server is bottleneck)
\item can request be repeated without harm? only if request is idempotent
\end{compactitem}
\end{compactitem}
\subsubsection{Application layering}
\begin{compactenum}
\item User interface
\begin{compactitem}
\item all that is necessary to directly interface with the user, such as display management
\item clients typically implement the user-interface level
\end{compactitem}
\item processing level
\begin{compactitem}
\item contains the core functionality of an application
\end{compactitem}
\item data level
\begin{compactitem}
\item contains the programs that maintain the actual data on which the applications operate
\end{compactitem}
\end{compactenum}
\subsubsection{Multitiered Architectures}
\begin{compactitem}
\item physically distribute a client-server application across several machines
\item direct consequence of dividing applications into a user-interface, processing components, and a data level
\item different tiers correspond directly with the logical organization of applications
\item three-tiered architecture: split user-interface, application server and database server
\item Example: Website
\begin{compactitem}
\item Web server acts as an entry point to a site
\item requests are passed to an application server where the actual processing takes place
\item the application server interacts with a database server
\end{compactitem}
\end{compactitem}
$\Rightarrow$ a lot of waiting\\
$\Rightarrow$ does not scale\\
\begin{figure}[h]
\centering
\includegraphics[width=\linewidth]{gfx/server_as_client.png}
\caption{Application layering: example of a server acting as client.}
\label{img:cs_app_layer}
\end{figure}
\section{Decentralized architectures}
\begin{compactitem}
\item processes that constitute a peer-to-peer system are (in principal) all equal
\item functions that need to be carried out are represented by every process
\item interaction between processes is symmetric: each process will act as a client and a server at the same time
\item how to organize the processes in an \textbf{overlay network}
\begin{compactitem}
\item a network in which the nodes are formed by the processes and the links represent the possible communication channels
\item hide physical structure by adding logical structure
\end{compactitem}
\end{compactitem}
\subsection{Structured P2P architectures}
\begin{compactitem}
\item the overlay network is constructed using a deterministic procedure
\item most-used procedure is distributed hash table (DHT)
\end{compactitem}
\subsubsection{DHT / Chrod}
\begin{figure}[h]
\centering
\includegraphics[width=280px]{gfx/chord_mapping.png}
\caption{The mapping of data items onto nodes in Chord}
\label{img:chord_mapping}
\end{figure}
\begin{compactitem}
\item randomly 128 bit or 160 bit ke for data and nodes. Two or more duplicate keys are extremely unlikely
\item efficient and deterministic scheme that uniquely maps the key of a data item to the identifier of a node
\item when looking up a data item, the network address of the node responsible for that data item is returned
\item Chord system arranges items in a ring
\item data item $k$ is assigned to node with smallest identifier $id \geq k$
\begin{compactitem}
\item item 1 belongs to node 1 (see \ref{img:chord_mapping}
\item item 2 belongs to node 4 (see \ref{img:chord_mapping}
\end{compactitem}
\item \textbf{succ(k)} for each item $k_i$ $succ(k)=id$ returns the name of the node $k$ is assigned to
\item \textbf{lookaup(k)} to find data item $k$ the function $lookup(k)$ returns the adress of $succ(k)$ in $\mathcal{O}(log(N))$
\item \textbf{membership management}
\begin{compactitem}
\item join
\begin{compactitem}
\item create SHA1 identifier $id$
\item $lookup(id)$ will return $succ(id)$
\item contact $succ(id)$ and $pred(id)$ and insert itself in between
\end{compactitem}
\item leave
\begin{compactitem}
\item node id informs $succ(id)$ and $pred(id)$ and assigns it's data to $succ(id)$
\end{compactitem}
\end{compactitem}
\end{compactitem}
\subsubsection{Content adressable network (CAN)}
\begin{figure}[h]
\centering
\includegraphics[width=350px]{gfx/CAN.png}
\caption{(a) The mapping of data items onto nodes in CAN. (b) Splitting a region when a node joins.}
\label{img:CAN}
\end{figure}
\begin{compactitem}
\item d-dimensional cartesian space
\item every node draws random number
\item space is divided among nodes
\item every data draws identifier (coodinates) which assigns a node
\item join\begin{compactitem}
\item select random point
\item half the square in which id falls
\item assign item to centers
\end{compactitem}
\item leave\begin{compactitem}
\item one node takes the rectangle\\
$\Rightarrow$ reassign rectangles periodically
\end{compactitem}
\end{compactitem}
\subsection{Unstructured P2P Network}
\begin{compactitem}
\item rely on randomized algorithms for constructing an overlay network
\item the goals of many unstructured peer-to-peer systems is to construct an overlay network that resembles a random graph
\item basic model
\begin{compactitem}
\item each node maintains a list of $c$ neighbors
\item each neighbor represents a randomly chosen \emph{live} node from the current set of nodes
\item list of neighbors is also referred to as a \emph{partial view}
\item nodes regularly exchange entries from their partial view
\item an entry identifies another node in the network
\item an has an associated age (indicates how old the reference is)
\end{compactitem}
\end{compactitem}
\subsubsection{Basic algorithm for overlay construction}
\begin{compactdesc}
\item[Active thread] Select peer from partial view
\begin{compactitem}
\item[PUSH] \
\begin{compactitem}
\item select c/2 youngest entries+myself
\item send to peer
\end{compactitem}
\item[PULL] \
\begin{compactitem}
\item receive peer buffer
\item construct new partial view
\item increment age \\
\end{compactitem}
\end{compactitem}
\item[Passive Thread] Receive buffer from peer
\begin{compactitem}
\item[PULL] \
\begin{compactitem}
\item select c/2
\item send to peer
\item construct new partial view
\item increment age
\end{compactitem}
\end{compactitem}
\end{compactdesc}
\input{chapter/peersim.tex}
\input{chapter/proc.tex}
\chapter{Communication}
\begin{compactitem}
\item Communication in distributed systems is always based on low-level message passing as offered by the underlying network
\item message passing is harder than using primitives based on shared memory, as in nondistributed systems
\item low-level communication facilities of computer networks are in many ways not suitable due to their lack of distribution transparency.
\end{compactitem}
\section{RPC - Remote Procedure Call}
%ordinary procedure call, e.g. count = read(fd,buf,nbytes)
\begin{compactitem}
\item allow programs to call procedures located on other machines
\item When a process on machine A calls' a procedure on machine B, the calling process on A is suspended, and execution of the called procedure takes place on B.
\item Remote procedure call uses stubs to pack parameters in message
\item client stub: packs the parameters into a message and requests that message to be sent to the server
\item server stub: transforms requests coming in over the network into local procedure calls
\item No message passing at all is visible to the programmer
\item neither client nor server need to be aware of the intermediate steps or the existence of the network
\end{compactitem}
\subsection*{A remote procedure call occurs in the following steps:}
\begin{compactenum}
\item The client procedure calls the client stub in the normal way.
\item The client stub builds a message and calls the local operating system.
\item The client's operating system sends the message to the remote operating system.
\item The remote operating system gives the message to the server stub.
\item The server stub unpacks the parameters and calls the server.
\item The server does the work and returns the result to the stub.
\item The server stub packs it in a message and calls its local operating system.
\item The server's as sends the message to the client's operating system.
\item The client's operating system gives the message to the client stub.
\item The stub unpacks the result and returns to the client.
\end{compactenum}
\subsection*{Parameter Marshaling}
parameter marshaling: packing parameters into a message is called
\subsubsection*{Passing Value Parameters}
\begin{compactitem}
\item values are packed into messages (client) and unpacked from messages (server)
\item transfered byte-by-byte
\item as long as the client and server machines are identical this model works fine
\item in a large distributed system, it is common that multiple machine types are present
\item $\Rightarrow$ problems because of different character encoding (EBCDIC vs ASCII), represetation of integers (one's complement vs two's complement) or endianness (little endian vs. big endian)
\end{compactitem}
\subsubsection*{Passing Reference Parameters}
\begin{compactitem}
\item extremly difficult
\item pointers are meaningful only within the address space of the process in which it is being used
\item replace with copy/restore: copy the datastructure, send it to the server, work on it, send it back, restore at the client
\end{compactitem}
\section{Asynchronous RPC}
\begin{figure}[h]
\centering
\includegraphics[width=400px]{gfx/rpc.png}
\caption{a: synchronous b: asynchronous RPC}
\label{img:rpc}
\end{figure}
\begin{compactitem}
\item in conventional procedure calls, when a client calls a remote procedure, the client will block until a reply is returned
\item asynchronous RPCs: the server immediately sends a reply back to the client the moment the RPC request is received. Reply acts as an acknowledgment.
\item client will continue without further blocking as soon as it has received the server's acknowledgment
\item Examples: transferring money from one account to another, adding entries into a database, starting remote services, batch processing...
\item Asynchronous RPCs can also be useful when a reply will be returned but the client doesn't need to wait for it and can do nothing in the meantime
\item One-Way RPCs: the client does not wait for an acknowledgment from the server
\item deferred synchronous RPC: organize the communication between the client and server through two asynchronous RPCs
\end{compactitem}
\begin{compactitem}
\item foo
\end{compactitem}
\section{Message oriented communication}
General Idea: avoid synchronous communication which blocks sender (RPC)
\subsection{Message-Oriented Transient Communication}
transient: flüchtig, vorrübergehend
\subsubsection{Berkeley Sockets}
A socket is a communication end point to which an application can write data that are to be sent out over the underlying network, and from which incoming data can be read. A socket forms an abstraction over the actual communication end point that is used by the local operating system for a specific transport protocol.
\begin{figure}[h]
\centering
\includegraphics[width=400px]{gfx/sockets.png}
\caption{Connection-oriented communication pattern using sockets}
\label{img:sockets}
\end{figure}
\begin{compactitem}
\item socket: create a new communication end point
\item bind: attach a local addres to a socket
\item listen: announce willingness to accept connections
\item accept: block caller until a connection request arrives
\item connect: actively attemt to establish a connection
\item send: send some data over the connection
\item receive: receive some data over the connection
\item close: release the connection
\end{compactitem}
\subsubsection{Message-passing-interface (MPI)}
\begin{compactitem}
\item standad for message passing
\item designed for parallel applications
\item communication within groups of processes
\item A $(groupID, processID)$ pair uniquely identifies the source or destination of a message (used instead of a transport-level address)
\end{compactitem}
\subsection{Message-Oriented Persistent Communication}
aka Message-queuing-system, Message-oriented-middleware (MoM) \\
\begin{figure}[h]
\centering
\includegraphics[width=400px]{gfx/mom.png}
\caption{general organization of a message-queuing system with routers}
\label{img:mom}
\end{figure}
\begin{compactitem}
\item asynchronous persistent communication
\item offer intermediate-term storage capacity for messages, without requiring either the sender or receiver to be active during message transmission
\item transfer may take minutes, not milliseconds
\item applications communicate by inserting messages into queues
\item messages are only put into and read from local queues
\item the message-queuing system takes care that messages are transferred from their source to their destination queue
\item message carries destination address
\item queue managers
\begin{compactitem}
\item a queue manager interacts directly with the application that is sending or receiving a message
\item also special queue managers that operate as routers, or relays: they forward incoming messages to other queue managers
\end{compactitem}
\item message brokers transform type A into type B, using a set of rules
\begin{compactitem}
\item application-level gateway in a message-queuing system
\item convert incoming messages so that they can be understood by the destination application
\item transform messages of type A into type B, using a set of rules
\end{compactitem}
\item Examples: Email, workflow, batch processing, queries accross several databases
\end{compactitem}
\section{Stream Oriented Communication}
\begin{figure}[h]
\centering
\includegraphics[width=400px]{gfx/streams.png}
\caption{A general architecture for streaming stored multimedia data over a network}
\label{img:streams}
\end{figure}
\begin{compactitem}
\item form of communication in which timing plays a crucial role
\item in continuous media, the temporal relationships between different data items are fundamental to correctly interpreting what the data actually means
\item multimedia data will need to be compressed substantially in order to reduce the required network capacity
\item simple stream: consists of only a single sequence of data
\item complex stream: consists of several related (often time dependent) simple streams (substreams)
\item QoS
\begin{compactenum}
\item The required bit rate at which data should be transported.
\item The maximum delay until a session has been set up (i.e., when an application can start sending data).
\item The maximum end-to-end delay (i.e., how long it will take until a data unit makes it to a recipient).
\item The maximum delay variance, or jitter.
\item The maximum round-trip delay.
\end{compactenum}
\item synchronisation of streams
\begin{compactitem}
\item Synchronization of streams deals with maintaining temporal relations between streams
\item Synchronization takes place at the level of the data units of which a stream is made up
\item In other words, we can synchronize two streams only between data units
\item Example: Playing a movie in which the video stream needs to be synchronized with the audio
\end{compactitem}
\end{compactitem}
\section{Multicast communication}
\subsection{Application Level Multicasting}
\begin{compactitem}
\item sending data to multiple receivers
\item For many years, this topic has belonged to the domain of network protocol
\item With the advent of peer-to-peer technology various application-level multicasting techniques have been introduced
\item The basic idea in application-level multicasting is that nodes organize into an overlay network, which is then used to disseminate information to its members
\item connections between nodes in the overlay network may cross several physical links, and as such, routing messages within the overlay may not be optimal in comparison to what could have been achieved by network-level routing
\end{compactitem}
\subsubsection*{Approaches to building the overlay}
\begin{compactitem}
\item tree: nodes may organize themselves directly into a tree, meaning that there is a unique (overlay) path between every pair of nodes
\item mesh network: nodes organize into a mesh network in which every node will have multiple neighbors and, in general, there exist multiple paths between every pair of nodes
\item mesh network generally provides higher robustness
\end{compactitem}
\
\subsubsection*{Example: Construct overlay tree for chord}
\begin{compactitem}
\item node that wants to start multicast generates key 128bit/160bit (mid) randomly
\item lookup of succ(mid) finds node responsible for key mid\\
$\Rightarrow$ succ(mid) becomes root of tree
\item join multicast: lookup (mid) creates lookup message with join request routed from P to succ(mid)
\item request is forwarded by Q (first time forward), Q becomes forwarder\\
$\Rightarrow$ P child of Q
\item request is then forwarded by R, R becomes forwarder\\
$\Rightarrow$ Q becomes child of R
\item if Q or R is already forwarder: no forward\\
$\Rightarrow$ Q becomes child of R
\item multicast: lookup(mid) sends message to the root\\
multicast from root
\end{compactitem}
\begin{figure}[h]
\centering
\includegraphics[width=250px]{gfx/overlay_example.png}
\caption{The relation between links in an overlay and actual network-level routes}
\label{img:ovrlnt}
\end{figure}
\subsubsection*{Efficiency}
\begin{compactitem}
\item building a tree is not difficult once we have organized the nodes into an overlay
\item building an efficient tree may be difficult
\item The quality of an application-level multicast tree is generally measured by three different metrics
\begin{compactenum}
\item Link stress: defined per link and counts how often a packet crosses the same link
\item Stretch / Relative Delay Penalty (RDP)\\
ratio in the delay between two nodes in the overlay, and the delay that those two nodes would experience in the underlying network:
$\frac{\text{transmission time in overlay}}{\text{transmission time in delay/network}}$ \\
$\Rightarrow$ minimize aggregated stretch, average RDP over all note pairs
\item tree cost: global metric, generally related to minimizing the aggregated link costs \\
link cost = cost between end points\\
$\Rightarrow$ find minimal spanning tree
\end{compactenum}
\end{compactitem}
\subsection{Gossip-based-communication}
\begin{compactitem}
\item epidemic behaviour: model information dissemination in a (large) distributed system after the spreading of infectious diseases
\item infected node: a node that holds data that it is willing to spread
\item susceptible: a node does not yet have the new data
\item removed: an updated node that is not willing or able to spread its data
\item we assume we can distinguish old from new data, for example, because it has been timestamped or versioned
\end{compactitem}
\subsubsection*{Anti-entropy-model}
A node P picks another node Q at random, and subsequently exchanges updates with Q.
There are three approaches to exchanging updates:
\begin{compactenum}
\item P only pushes its own updates to Q
\begin{compactitem}
\item updates can be propagated only by infected nodes
\item if many nodes are infected, the probability of each one selecting a susceptible node is relatively small
\item $\Rightarrow$ chances are that a particular node remains susceptible for a long period simply because it is not selected by an infected node
\end{compactitem}
\item P only pulls in new updates from Q
\begin{compactitem}
\item spreading updates is essentially triggered by susceptible nodes
\item high probability to to contact an infected node and pull in the updates
\item $\Rightarrow$ pull-based approach works much better when many nodes are infected
\end{compactitem}
\item P and Q send updates to each other (i.e., a push-pull approach)
\begin{compactitem}
\item if only one node is infected push/pull is best
\end{compactitem}
\end{compactenum}
\
\begin{compactitem}
\item Round is period in which each node at least once selects a neighbor
\item number of rounds needed to spread $\approx \mathcal{O}(\log(N)), N$ is number of nodes
\end{compactitem}
\subsubsection*{Rumor spreading, gossiping}
\begin{compactitem}
\item if node P has just been updated for data item x, it contacts an arbitrary other node Q and tries to push the update to Q
\item if Q was already updated by another node, P may lose (with some probability) interest in spreading the update any further
\item P then becomes removed\item
\item Fraction of nodes that never obtain data: $s=e^{-(k+1)(1-s)}$\\
e.g. $k=4, ln(s) = 4,97$\\
$\Rightarrow s = 0,007$\\
less than $0,7\%$ remain without data\\
\end{compactitem}
\subsubsection*{Removing Data}
\begin{compactitem}
\item problem: deletion of a data item destroys all information on that item\\
$\Rightarrow$ when a data item is simply removed from a node, that node will eventually receive old copies of the data item and interpret those as updates on something it did not have before
\item $\Rightarrow$ record the deletion of a data item as just another update, and keep a record of that deletion
\end{compactitem}
\chapter{CHORD: Distributed Hash Tables}
Chapter in the Book:
\begin{compactitem}
\item Naming
\begin{compactitem}
\item Flat naming
\begin{compactitem}
\item Distributed Hash Tables
\end{compactitem}
\end{compactitem}
\end{compactitem}
\ \\
%krasse büld von de hipster tobi
\begin{compactitem}
\item Chord uses an m-bit identifier space (128 or 160 Bit) to assign randomly-chosen identifiers to nodes as well as keys to specific entities
\item An entity with key k falls under the jurisdiction of the node with the smallest identifier id $\geq$ k (referred to as succ(k)
\end{compactitem}
\section{Joining}
\begin{compactitem}
\item node $p$ wants to join
\item $p$ requests lookup for $succ(p)$
\begin{compactitem}
\item actually, $p$ creates SHA1 identifier $id$, e.g. from its IP, which is then looked up
\end{compactitem}
\item contact $succ(id)$ and $pred(id)$ to join ring
\item create finger table for $p$ either by calculating all the entries or by copying the finger table of $succ(p)$ an let the stabilization protocol do the rest
\end{compactitem}
\section{Leaving}
\begin{compactitem}
\item node $p$ informs $succ(p)$ and $pred(p$)
\item $p's$ data is assigned to $succ(p)$
\end{compactitem}
\section{Routing}
\begin{compactitem}
\item searching, lookung
\item Resolve key $k$ to address of $succ(k)$
\end{compactitem}
\subsection{Option 1: Naive Approach (Linear search)}
\begin{compactitem}
\item each node p keeps succ(p+1) and pred(p)
\item each node forwards request for key k to a neighbor
\item if pred(p) < k $\leq$ p, return(p)\\
$\Rightarrow$ not scalable
\end{compactitem}
\subsection{Option 2: Finger-table based lookup}
\begin{compactitem}
\item better solution: each Chord node maintains \underline{finger table} of lenght m\\
$ \forall \ 1 \leq i \leq m : FT[i] = succ(p+2^{i-1}) \mod 2^m $ \\
\item the $i$-th entry points to the first node succeeding $p$ by at least $2^{i-1}$
\item Example: Node 1: FT[$1$]=succ($p+2^{i-1}$) = succ($1+2^{1-1}$) = succ($1+1$) = succ($2$) \\ (smallest id, sucht that id $\geq$ 2)
\item to lookup key $k$, node $p$ forwards request to node $q$ with index $j$ in $p's$ finger table:\\
$q = FT_p[j] \leq k \leq FT_p[j+1]$
\item lookup generally requires $O(log(N))$ steps, N nodes in system
\end{compactitem}
\section{Stabilization}
\begin{compactitem}
\item goals:
\begin{compactitem}
\item keep the finger tables up to date
\item make new nodes known to the rest of the network
\item preserve the structure of the ring
\end{compactitem}
\item two tasks, periodically run at each node
\begin{compactenum}
\item \lstinline$fix_fingers$
\begin{compactitem}
\item check (li.e. lookup) one, some or all entries in the the finger table
\item update the finger table accordingly
\end{compactitem}
\item \lstinline$stabilize$
\begin{compactitem}
\item lookup $pred(succ(p))$ (should be $p$)
\item otherwise: update successor and predecessor information in $p$ and the neighbors
\end{compactitem}
\end{compactenum}
\end{compactitem}
\subsubsection{Example 1: Routing}
The example refers to Figure 5.4, p. 190, Tanenbaum.\\
Resolve k = 26 from node 1 \\
$k=26 > FT_1[5] \Rightarrow$ forward request to node\\
$18 = FT_1[5]$
\begin{compactitem}
\item node $18$ selects node $20 FT_{18}[2] \leq k < FT_{18}[3]$
\item node $20$ selects node $21 \Rightarrow 28 $ which is responsible for key $26$
\end{compactitem}
\ \\
\subsubsection{Example 2: Superpeers}
Assume a system for which k bits of an m-bit identifier space have been reserved for assigning to superpeers. If identifiers are randomly assigned, how many superpeers can one expect to have in an N-node system? \\
Superpeers: $2^k$, therefore $2^{m-k}$ normal nodes.\\
Probability that a node is a supernode: $n\cdot \frac{2^k}{2^m} = 2^{k-m} \cdot n$, with $m$ number of bits, $k$ is number of bits that mark superpeers, $n$ is number of nodes (not maximum possible but the actual number).\\
Number of superpeers to be expected: $E(x)= n\cdot p$\\
%\item example2:\\
%\begin{figure}[h]
% \centering
% \includegraphics[width=250px]{gfx/Chord_Fingertable.png}
% \caption{Ring and Fingertable for Node 30}
% \label{img:ring}
%\end{figure}
%\begin{tabular}{|l|l|}
%\hline
%30 & \\
%\hline
%1 & 37\\
%\hline
%2 & 37\\
%\hline
%3 & 37\\
%\hline
%4 & 38\\
%\hline
%5 & 45\\
%\hline
%6 & 0\\
%\hline
%\end{tabular}