-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathindex.xml
More file actions
1359 lines (1185 loc) · 110 KB
/
index.xml
File metadata and controls
1359 lines (1185 loc) · 110 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title>DIPr Lab at PSU</title>
<link>https://diprlab.github.io/</link>
<atom:link href="https://diprlab.github.io/index.xml" rel="self" type="application/rss+xml" />
<description>DIPr Lab at PSU</description>
<generator>Hugo Blox Builder (https://hugoblox.com)</generator><language>en-us</language><lastBuildDate>Wed, 24 Jan 2024 00:00:00 +0000</lastBuildDate>
<image>
<url>https://diprlab.github.io/media/logo_hu_f67add51057eb433.png</url>
<title>DIPr Lab at PSU</title>
<link>https://diprlab.github.io/</link>
</image>
<item>
<title>Joining the lab</title>
<link>https://diprlab.github.io/wiki/join/</link>
<pubDate>Fri, 01 Nov 2024 00:00:00 +0000</pubDate>
<guid>https://diprlab.github.io/wiki/join/</guid>
<description><p>
<figure >
<div class="d-flex justify-content-center">
<div class="w-100" ><img alt="Alt text for my gif" srcset="
/media/bird-landing_hu_f2c23b2db82270ad.webp 400w,
/media/bird-landing_hu_11d89950f9609ab3.webp 760w,
/media/bird-landing_hu_e596a193b06bc514.webp 1200w"
src="https://diprlab.github.io/media/bird-landing_hu_f2c23b2db82270ad.webp"
width="760"
height="505"
loading="lazy" data-zoomable /></div>
</div></figure>
</p>
<h2 id="graduate-students">Graduate Students</h2>
<p>If you are interested in applying to the PSU Computer Science graduate program, please check the information on our <a href="https://www.pdx.edu/academics/programs/graduate/computer-science" target="_blank" rel="noopener">Graduate Program</a>.
In your application, mention Dr. Primal Pappachan as a potential advisor and your application will be routed to me for consideration.
Graduate admissions are done by Grad admissions committee that processes all applications and decides on admissions for the entire department. Individual faculty members may not accept students on their own but you can reach out to Dr. Primal by email if you are interested in applying and potentially being part of the DIPr lab. Please see details below on what to include in this email.</p>
<p><strong>Funding</strong>: PhD students typically obtain funding. All of your tuition and fees will be paid and you will be paid a monthly stipend.</p>
<h3 id="masters-students">Masters students</h3>
<p>Master’s students who are interested in conducting research in the lab are welcome to apply, provided they are in their second quarter of study. Applicants should be able to commit to dedicating a minimum of 10 hours per week to doing research. This opportunity is ideal for students looking to gain hands-on research experience and contribute to ongoing projects.</p>
<p><strong>Funding</strong>: M.S. students are not typically funded. In some rare instances, it may be possible to pay the MS student an hourly salary. These opportunities are extremely rare and are reserved to support students who have been in the lab for 1 or more quarters.</p>
<h2 id="undergraduate-students">Undergraduate Students</h2>
<p>Students majoring in Computer Science are encouraged to apply. Applicants should be able to commit to dedicating a minimum of 10 hours per week to doing research. This opportunity is ideal for students looking to gain hands-on research experience and contribute to ongoing projects.</p>
<ul>
<li>Previous research experience is not required.</li>
<li>Knowledge of programming languages (e.g., Java, Python), Web development (e.g., HTML, Javascript, React), and Databases (e.g., MySQL, PostgreSQL) are a plus. Demonstrated coding skills is a plus.</li>
<li>Strong communication skills (written and oral) are a plus.</li>
</ul>
<p><strong>Funding</strong>: Similar to M.S. students, undergraduate students generally do not receive funding. Limited opportunities may be available, typically reserved for those who have been in the lab for more than one quarter.</p>
<p>You may also apply through the <a href="https://www.pdx.edu/engineering/urmp" target="_blank" rel="noopener">Maseeh College Undergraduate Research &amp; Mentoring Program (URMP)</a> listing Dr. Primal Pappachan as a faculty mentor. This 10-week program includes a stipend.</p>
<h2 id="highschool-students">Highschool students</h2>
<p>For high school students in the Portland Metropolitan area, please apply through programs such as <a href="https://computinginresearch.org/" target="_blank" rel="noopener">Institute for Computing in Research</a> or <a href="https://www.saturdayacademy.org/" target="_blank" rel="noopener">Saturday Academy</a> and mention your interest in working in Dr. Primal Pappachan and DIPr lab.</p>
<h2 id="how-do-i-apply">How do I apply?</h2>
<!-- Note: This only applies to currently enrolled undergrad and M.S students at Portland State University. -->
<!-- The DIPr lab is focused on designing better data protection mechanisms for databases. Undergraduate researchers will start by contributing to ongoing projects listed in the [home page](/). The activities may involve developing software prototypes, participating in research meetings, reading research papers, presenting results etc. -->
<p><strong>Ph.D. applicants</strong>: When reaching out to Dr. Primal after completing the graduate application, please include your: (1) CV (including a link to your GitHub profile and website), (2) a description of your previous research experience and interests, and (3) specific information about which research projects in the lab interests you and why. Emails without these information may be ignored.</p>
<p><strong>M.S. applicants</strong>: This only applies to currently enrolled M.S. students. After going through projects in the lab, if you are interested in applying to be part of the lab, send an email including: (1) your resume (PDF), (2) unofficial copy of your first year transcript (PDF), and (3) few paragraphs explaining why you’d like to work in our lab. To write this well, I suggest you look at some of our <a href="https://diprlab.github.io/publication/">previous publications</a> to orient yourself to our current projects. Make the subject of your email &ldquo;Masters Application&rdquo; and send this to Dr. Primal. You are welcome to join the weekly group meetings (see <a href="https://diprlab.github.io/expectations">Expectations</a>) to learn more about the ongoing projects and connect with lab members.</p>
<p><strong>Undergrad applicants</strong>: This only applies to currently enrolled undergrad students. After going through projects in the lab, if you are interested in applying to be part of the lab, send an email including: (1) your resume (PDF), (2) unofficial copy of your transcript (PDF); freshman can send high school transcript (PDF), and (3) few paragraphs explaining why you’d like to work in our lab. To write this well, I suggest you look at some of our <a href="https://diprlab.github.io/publication/">previous publications</a> to orient yourself to our current projects. Make the subject of your email “Undergrad Application” and send this to Dr. Primal. You are welcome to join the weekly group meetings (see <a href="https://diprlab.github.io/expectations">Expectations</a>) to learn more about the ongoing projects and connect with lab members.</p>
</description>
</item>
<item>
<title>Getting started</title>
<link>https://diprlab.github.io/wiki/onboarding/</link>
<pubDate>Fri, 01 Nov 2024 00:00:00 +0000</pubDate>
<guid>https://diprlab.github.io/wiki/onboarding/</guid>
<description><p>
<figure >
<div class="d-flex justify-content-center">
<div class="w-100" ><img alt="Alt text for my gif" srcset="
/media/bird-taking-off_hu_e63a0ba3ab8812ca.webp 400w,
/media/bird-taking-off_hu_705576c7b11f7c09.webp 760w,
/media/bird-taking-off_hu_2f42ad519b2a2763.webp 1200w"
src="https://diprlab.github.io/media/bird-taking-off_hu_e63a0ba3ab8812ca.webp"
width="728"
height="471"
loading="lazy" data-zoomable /></div>
</div></figure>
</p>
<p>Welcome to DIPr Lab! We are excited that you have decided to join our team! We hope that these onboarding resources, guidelines, and tips will make the first few steps easier.</p>
<h2 id="set-up-the-first-meeting">Set up the first meeting</h2>
<p>Check the Primal&rsquo;s Google Calendar (<a href="https://support.google.com/calendar/answer/6294878?hl=en&amp;co=GENIE.Platform%3DDesktop" target="_blank" rel="noopener">See someone&rsquo;s calendar availability</a>) and propose a time to meet that works for both of you. Primal&rsquo;s office is room FAB 115-08.</p>
<h2 id="join-the-github-organization">Join the GitHub organization</h2>
<p>Our group has <a href="https://github.com/DIPrLab" target="_blank" rel="noopener">a GitHub organization account</a> to host public and private repos for the software we create for each of the research projects. You can learn more about how to use <a href="https://product.hubspot.com/blog/git-and-github-tutorial-for-beginners" target="_blank" rel="noopener">GitHub</a> in this tutorial. This is also the location for our group website, which is hosted through GitHub pages.</p>
<h2 id="add-yourself-to-the-groups-website">Add yourself to the group&rsquo;s website</h2>
<p>After you have joined the GitHub organization, you can add yourself as a member to <a href="https://github.com/DIPrLab/website" target="_blank" rel="noopener">the lab website</a> by following the instructions on the README of the website repo.</p>
<h2 id="join-our-zulip-instance">Join our Zulip instance</h2>
<p>We use Zulip for communication around research projects, general updates, and sharing random news of interest. You can join our Zulip instance by signing up <a href="https://dipr.zulip.cs.pdx.edu/join/zmjww3y3maoyyndcop746ppm/" target="_blank" rel="noopener">here</a>.</p>
<h2 id="request-to-be-added-to-the-shared-drive">Request to be added to the shared drive</h2>
<p>We use a shared drive on Google drive to manage common resources that are relevant to all members. You can join this drive by sending an email about the same to <a href="mailto:primal@pdx.edu">Primal</a>.</p>
<h2 id="mailing-list">Mailing list</h2>
<p>Send an invite to the <a href="https://groups.google.com/a/pdx.edu/g/dipr-group" target="_blank" rel="noopener">DIPr lab Google Groups</a> (only visible if you are logged into your PSU email) which is used for broadcasting information that is relevant to all group members.</p>
<h2 id="office-space">Office Space</h2>
<p>If you prefer to work in the department, please request for a desk in the first meeting with Primal. Lab members sit either in the DIPr lab space (FAB 135-04), or in the shared graduate student cubicle space. Graduate students are expected to make use of their allocated desk in the DIPr lab space. Group meetings are held in one of the conference rooms.</p>
<h2 id="individual-developmentmentoring-plans">Individual Development/Mentoring plans</h2>
<p>Within the first few weeks of joining the DIPr Lab, you should work with Primal to develop a plan outlining your short, medium, and long term goals. More on individual developing plans can be found in <a href="https://diprlab.github.io/expectations">Expectations</a>.</p>
<h2 id="key-access">Key access</h2>
<p>If you require regular use of the lab space, you should discuss your need with Primal before applying for key access. Once you&rsquo;ve received permission to receive a lab key, you should fill out a <a href="https://www.pdx.edu/facilities/sites/facilities.web.wdt.pdx.edu/files/2022-09/Key%20Request%20Form%20-%20August%202022_0.pdf" target="_blank" rel="noopener">Key Authorization &amp; Request form</a> and email your request to <a href="mailto:keys@pdx.edu">keys@pdx.edu</a>. Key requests require an active <a href="https://www.pdx.edu/technology/odin-account" target="_blank" rel="noopener">ODIN Account</a>.</p>
<!--
- Printer
- OIT machine access -->
</description>
</item>
<item>
<title>Expectations</title>
<link>https://diprlab.github.io/wiki/expectations/</link>
<pubDate>Fri, 01 Nov 2024 00:00:00 +0000</pubDate>
<guid>https://diprlab.github.io/wiki/expectations/</guid>
<description><p>
<figure >
<div class="d-flex justify-content-center">
<div class="w-100" ><img alt="Alt text for my gif" srcset="
/media/bird-mid-flight_hu_63522e7262220333.webp 400w,
/media/bird-mid-flight_hu_64961156bad421cd.webp 760w,
/media/bird-mid-flight_hu_8a5ea93f76e8e991.webp 1200w"
src="https://diprlab.github.io/media/bird-mid-flight_hu_63522e7262220333.webp"
width="760"
height="507"
loading="lazy" data-zoomable /></div>
</div></figure>
</p>
<h3 id="working-hours">Working hours</h3>
<p>You should create a working schedule that is a right fit for you with the understanding that your ideal schedule may evolve over time. Depending on the nature of your appointment, there may be specific minimum number of hours that you are expected to work and you should check with about these in the first meeting. Graduate students are generally expected to work an average of at least 40 hours per week. For M.S., undergrad, and high school students this will vary depending on the project.
You are not expected to work on weekends and holidays. Consult with Primal and notify fellow lab members in advance of any planned absences during the week. As a student, you should feel flexible to create a work schedule that work for you while meeting the expectations of your role.</p>
<p>If you prefer to work remotely, this should be discussed and approved by Primal and arranged in accordance with <a href="https://www.pdx.edu/human-resources/remote-work-guidelines" target="_blank" rel="noopener">Portland State University Polices on Remote Work</a>. All lab members are expected to attend certain in-person events.</p>
<p>Note: All female researchers are entitled to two days of leave when they are on their periods.</p>
<h3 id="individual-meetings">Individual Meetings</h3>
<p>Following the first meeting, we will set up a time for regular individual meetings. Together, we&rsquo;ll decide upon the timing and frequency of these meetings that will involve you, Primal, and any others working on the research project (e.g., PhD students, collaborators etc.). To make these meetings productive, please prepare an agenda in the form of document or slides.</p>
<ul>
<li>Past: Key points from our previous meeting.</li>
<li>Present: Updates on what you have worked on since our last discussion.</li>
<li>Future: Your plan for upcoming tasks or areas where you need feedback.</li>
</ul>
<p>Primal will help in problem solving, providing constructive feedback and general support during these meetings.</p>
<p>After the meeting, please share meeting notes with Primal summarizing the main discussion points within a day of the meeting. This helps keep discussions fresh in everyone’s mind. Use <a href="https://docs.google.com/document/d/1cUKDqf5pdPJwzbc-Jb3oND1weKP1vjue1tYmVzZRLbk/edit?usp=sharing" target="_blank" rel="noopener">this provided template</a> to organize the meeting notes. Primal will review these meetings and may reach out with clarifying questions or additional guidance on the tasks.</p>
<p>Do not cancel meetings with Primal if you feel that you have not made adequate progress on your research; these might be the most critical times to meet with a mentor.</p>
<h3 id="group-meetings">Group Meetings</h3>
<p>The schedule and venue for meetings will be determined at the beginning of the quarter and announced in the mailing list. In the beginning of the meeting, Primal will make group announcements followed by a presentation from the students. All the DIPr lab students are expected to present at least once during the quarter. This presentation can be one of the following:</p>
<ul>
<li>Research presentation: A detailed talk about your research project covering aspects such as motivation, approach, and evaluation.</li>
<li>Project workshopping: A brief presentation of the problem that you are working on followed by collaborative brainstorming for the remainder of the meeting.</li>
<li>Tutorial: A hands-on presentation on a tool or a new approach that could benefit the entire group.</li>
</ul>
<p>If you are the presenter, send a detailed draft of the presentation to Primal at least 3 days prior to the meeting. Primal will review the presentation and will give you feedback on the presentation outline and indidivual slides.</p>
<p>We will use a shared Google Sheet to organize the presentation. Active participation is encouraged, so please ask questions and engage in discussions. Non-members are welcome to attend to learn about our work and connect with the lab members.</p>
<h4 id="celebrations-">Celebrations 🎉</h4>
<p>Once every month (typically the first or the last meeting of the month) we will celebrate with sweet treats (donuts, cupcakes, etc) to recognize any achievments or life milestones of the lab members. This includes and is not limited to paper acceptance, submissions, rejections (yes, we celebrate those too!), birthdays, and more.</p>
<h3 id="database-reading-group-dbrg-meetings">Database Reading Group (DBRG) meetings</h3>
<p>The Database Reading Group meets weekly to discuss papers related (broadly speaking) to database technology. Meeting times are decided at the beginning of the quarter. The regular meeting place is room 130 in PSU&rsquo;s Fourth Avenue Building. The group welcomes anyone, inside or outside of PSU, with an interest in the subject matter. Designated group members will lead the discussion each week. More on database reading group can be found in <a href="https://diprlab.github.io/dbrg">DBRG</a>.</p>
<h3 id="attendance">Attendance</h3>
<p>In-person attendance is expected for the following.</p>
<ul>
<li>Weekly group meetings</li>
<li>individual meetings with Primal to discuss your research</li>
<li><a href="https://diprlab.github.io/dbrg">Database Reading Group meetings</a></li>
<li><a href="https://pdxscholar.library.pdx.edu/studentsymposium/" target="_blank" rel="noopener">Student Research Symposiums</a></li>
<li>Outreach events (such as <a href="https://www.pdx.edu/center-for-internship-mentoring-and-research/summer-research-academy" target="_blank" rel="noopener">Summer Research Academy</a>, <a href="https://www.pdx.edu/engineering/cyberpdx-cybersecurity-camp-native-indigenous-high-school-students" target="_blank" rel="noopener">CyberPDX</a>)</li>
</ul>
<h3 id="communication">Communication</h3>
<p>Email and Zulip will be the primary communication mediums. This should be primarily viewed as a medium for asynchronous communication. If you receive a message, you are neither obligated to read nor to respond immediately (and you shouldn’t expect this when you’re sending, too). The expectation is that you will respond to the email within 24-48 hours.</p>
<h3 id="authorship">Authorship</h3>
<p>Authorship is earned by someone who significantly contributes to the project (e.g., conceives of the project, designs solution, performs simulations or experiments and analyzes results, writes the paper). All authors must read, proofread, and sign off on the final version of the manuscript before submission.
Barring unusual circumstances, the lab policy is that students are first-author on all work for which they are leading.</p>
<h3 id="guidelines-for-phd-and-ms-defense">Guidelines for PhD and MS Defense</h3>
<p>The university has guidelines for remote participation. Please note:</p>
<ul>
<li>All committee members must agree to remote in advance</li>
<li>Remote connections for committee members are expected to be both audio and video</li>
<li>Visual aids must be distributed in advance</li>
<li>All committee members must participate in the entire meeting</li>
<li>A draft of the thesis must be delivered to the committee at least two weeks in advance of defense</li>
</ul>
<p>Student must send their abstract and other required information at least two weeks in advance of the defense to the CS Graduate Advisor (<a href="mailto:gccs@pdx.edu">gccs@pdx.edu</a>).
Students should not book their room for the defense. The Graduate Advisor will do so once she has received the student&rsquo;s abstract.</p>
<p>The expectations for student milestones are outlined in greater detail <a href="https://docs.google.com/document/d/1-u5RtUlbHZvduDRpGGFtgLQlZ1Fy_GlsAqV9ssipXrk/" target="_blank" rel="noopener">in this document</a>.</p>
<h3 id="code-management">Code Management</h3>
<p>Code and data are important and it is your responsability to make sure that nothing is lost. Have a plan to make sure that your code and data are safe and accessible to other members of the group.
We all work on related topics in the group so, we all benefit from utilizing each other code/data (with appropriate acknowledgement and after requesting access to it).</p>
<h4 id="github">GitHub</h4>
<p>The DIPr lab has a dedicated <a href="https://github.com/DIPrLab" target="_blank" rel="noopener">GitHub account</a> for hosting public repositories, including the code we create and our group website (hosted on GitHub Pages). Unless specified otherwise, all files required to reproduce research results will be made publicly available on the DIPr lab’s GitHub repository. For double-blind review cases, public access may occur post-acceptance, but every published paper will link to a public GitHub repository with the corresponding code and data.</p>
<h4 id="backups">Backups</h4>
<p>Backing up data is crucial; nothing is more disheartening than losing months of work, simulations, or code updates. Each lab member is responsible for regularly backing up their own work, and Google Drive is our recommended cloud storage solution.</p>
<h4 id="ethical-management-of-data">Ethical management of data</h4>
<p>If your research involves data related to individuals, it is essential to follow ethical guidelines. Always consult with Primal if you are uncertain about the responsible handling of such data. Ethical standards are particularly important in these situations, so reach out if you need clarification.</p>
<h3 id="quarterly-evaluation-form">Quarterly evaluation form</h3>
<p>At the end of each quarter, you will be receiving <a href="https://docs.google.com/document/d/1lGzuArWoY6KrTF1ploBmYVpMKWPPwqMJIHcMmrLhJtg/edit?usp=sharing" target="_blank" rel="noopener">the following evaluation form</a> asking you to reflect you on your progress and your experience as a mentee. We will sit aside time during an individual meeting at the beginning of following quarter to review your answers together. This is an opportunity for you to share any concerns you may have about your experience as a graduate student, whether they involve other students, faculty, or staff.</p>
<p>This discussion is also a chance to address any concerns you may have about my role as your advisor. If you need more guidance, would like more independence, or would prefer more frequent meetings, please let me know. Likewise, I’ll provide feedback on your progress, noting any areas where improvement is needed so we can address them proactively. This session is our time to address any concerns early, ensuring that we’re aligned on your goals and progress.</p>
<h3 id="individual-development-plans">Individual Development Plans</h3>
<p>Primal will work with each of you to develop your individual mentoring plan that serves to ensure your time in the lab progresses your short, medium, and long-term goals. This is a useful planning document that assists in aligning expectations. Graduate students will revisit the mentoring plan during an individual meeting with Primal during multiple times in the year. Other lab members will revisit these at appropriate time scales (e.g. every 6 months). For this purpose, we will either modify a <a href="https://docs.google.com/document/d/1DdQA4rMAdrPlMW7LMsE5pfBtDGZgKRBl/edit" target="_blank" rel="noopener">template</a> (for all students) or use an online <a href="https://myidp.sciencecareers.org/" target="_blank" rel="noopener">tool</a> (geared towards PhD students and PostDocs).</p>
<h2 id="take-care-of-yourself">Take care of yourself!</h2>
<p>As a student, you may experience a wide variety of challenges to your physical and mental health, that can interfere with learning or doing research. Help is available on campus and an important aspect of taking care of yourself is learning how to ask for help.
Talk to Primal or any of the lab members, if you are struggling. Ask for help early. We cannot change the past, but can influence the future. Confidential <a href="https://www.pdx.edu/health-counseling/counseling" target="_blank" rel="noopener">counseling services</a> are available at PSU. Please refer to the <a href="https://www.pdx.edu/health-counseling/sites/studenthealthcounseling.web.wdt.pdx.edu/files/2023-03/Crisis%20Cards%20-%20Counseling.pdf" target="_blank" rel="noopener">Student Crisis Resource Card</a> for a list of phone numbers, contacts and support resources.</p>
</description>
</item>
<item>
<title>Lab Policies</title>
<link>https://diprlab.github.io/wiki/policies/</link>
<pubDate>Fri, 01 Nov 2024 00:00:00 +0000</pubDate>
<guid>https://diprlab.github.io/wiki/policies/</guid>
<description><p>
<figure >
<div class="d-flex justify-content-center">
<div class="w-100" ><img alt="Alt text for my gif" srcset="
/media/bird-flock_hu_75fe9f0e1b6545.webp 400w,
/media/bird-flock_hu_bc33fe29ac6fb790.webp 760w,
/media/bird-flock_hu_6a86b88820ddaf52.webp 1200w"
src="https://diprlab.github.io/media/bird-flock_hu_75fe9f0e1b6545.webp"
width="760"
height="506"
loading="lazy" data-zoomable /></div>
</div></figure>
</p>
<h3 id="diversity-and-inclusivity-policy">Diversity and Inclusivity Policy</h3>
<p>At DIPr lab, we also aim to build and sustain a community in which everyone feels welcomed, respected, and intellectually stimulated. It is my intent to ensure that members from diverse backgrounds, including but not limited to race, color, national origin, language, sex, disability, age, sexual orientation, gender identity, and religion, feel welcome and included in this group. If you notice that any of the interactions in this group are not respectful of this diversity, please bring it to my attention. Any suggestions on how to improve the inclusivity of the lab policies are also much appreciated. If you have experienced or observed any discrimination, please report it and/or reach out to support groups listed on PSU&rsquo;s <a href="https://www.pdx.edu/diversity/equity-compliance" target="_blank" rel="noopener">Equity and Compliance</a> website.</p>
<h3 id="reporting">Reporting</h3>
<p>DIPr lab desires to create a safe space for everyone. If you or someone you know has been harassed by a lab member or if you have concerns, please contact Primal. If you do not wish to contact Primal, please contact the Department Chair - <a href="mailto:wuchi@cs.pdx.edu">Dr. Wu-chi Feng</a> or the Dean of College - <a href="mailto:joseph.bull@pdx.edu">Dr. Joseph Bull</a>.</p>
<p>Please remember that by way of his position at the university, Primal is a mandated reporter under Title IX. This means that he is not allowed to keep matters falling under Title IX confidential, and is required to disclose these incidents to the administration. You are welcome to discuss matters with Primal, but please keep this in mind when doing so. Primal will do his best to remind you of his responsibilities at the start of conversations anticipated to relate to these topics.
If you would rather share information about these matters with a PSU staff member who does not have these reporting responsibilities and can keep the information confidential, please use these campus resources:</p>
<ul>
<li>Confidential Advocates: 503-894-7982 or schedule online (for matters regarding sexual harassment and sexual and relationship violence)</li>
<li>Center for Student Health and Counseling: 1880 SW 6th Avenue #200; 503-725-2800</li>
</ul>
<p>You can also find additional resources on <a href="https://www.pdx.edu/sexual-assault/" target="_blank" rel="noopener">PSU’s Sexual Misconduct Response website</a>.</p>
<h4 id="discrimination-and-bias-incidents">Discrimination and Bias Incidents</h4>
<p>The Office of Equity and Compliance (OEC) addresses complaints of discrimination, discriminatory Harassment, and sexual harassment against employees (faculty and staff). If you or someone you know believes they have been discriminated against, you may file a complaint. Someone from the OEC will contact you to discuss how to best address your complaint.</p>
<p>The Bias Review Team (BRT) gathers information on bias incidents that happen on and around campus, and gives resources and support to individuals who experience them. You can report a bias incident you experienced or learned about. A member of the BRT will contact you if you indicate you would like to be contacted.</p>
<h2 id="confidentiality-policy">Confidentiality Policy</h2>
<p>All communications within DIPr lab, including emails, discussions, and meetings involving research data or methodologies, should be treated with care and kept confidential. Information should only be shared using secure methods, and with third parties only after obtaining permission from the principal investigator or project leader. Public sharing of research findings should be coordinated with the prinicipal investigator. This policy is intended to ensure the integrity of our work and applies during and after involvement with the research group.</p>
<h2 id="privacy-policy">Privacy Policy</h2>
<p>The privacy of all members and collaborators of DIPr lab is respected and protected. Personal information, such as contact details and other identifying data, will be used solely for professional and administrative purposes and will not be shared with third parties without consent. Data collected as part of research will be handled in compliance with ethical standards and legal regulations, ensuring that participant identities are safeguarded, and that personal information remains confidential. Members of the research group are expected to handle any personal data they encounter in a manner that respects the privacy of all individuals involved in the research.</p>
</description>
</item>
<item>
<title>Offboarding</title>
<link>https://diprlab.github.io/wiki/offboarding/</link>
<pubDate>Fri, 01 Nov 2024 00:00:00 +0000</pubDate>
<guid>https://diprlab.github.io/wiki/offboarding/</guid>
<description><p>
<figure >
<div class="d-flex justify-content-center">
<div class="w-100" ><img alt="Alt text for my gif" srcset="
/media/bird-flying-away_hu_fa9bbd1321f6649.webp 400w,
/media/bird-flying-away_hu_d8a70d0dc9b9146e.webp 760w,
/media/bird-flying-away_hu_30a7be38a8600969.webp 1200w"
src="https://diprlab.github.io/media/bird-flying-away_hu_fa9bbd1321f6649.webp"
width="760"
height="507"
loading="lazy" data-zoomable /></div>
</div></figure>
</p>
<p>Everyone will eventually move on from the lab—whether it’s to complete a degree, start a job, or pursue new opportunities, which is an exciting time! A clear offboarding process ensures that your work can seamlessly continue, that future collaborators have what they need, and that any remaining steps (e.g., publications, future projects) are clearly outlined.</p>
<p>(Credits to <a href="https://thefaylab.github.io/lab-manual/06-offboarding.html" target="_blank" rel="noopener">Fay lab</a>)</p>
<h2 id="exit-interview">Exit Interview</h2>
<p>Set up a dedicated time to meet with Primal to talk about your time in the lab, and to go through the below checklist to make sure these have been done. Besides the checklist, things to talk about include the best part of being in our team, whether you got the support you needed and what could we improve for mentoring and training someone in your role in the future.</p>
<h2 id="project-documentation">Project Documentation</h2>
<p>Project work should be hosted in a repository under the organizational GitHub account.</p>
<p>Each project should have an easily found README text file that provides information for others so they can navigate and use your work, and give contact information for authors (and any data creators/use restrictions if propietary data). Ideally, the README should also include links to publications and presentations from the work.</p>
<h2 id="publications">Publications</h2>
<p>Science is not finished until it has been communicated. Ideally, you’ll have the chance to publish your results in a conference or journal. In your exit interview, coordinate with Primal on any remaining publications and set a submission timeline. Ensure that all publications and presentations from your projects are archived in the appropriate folder on the lab’s Google Drive and are listed on the lab website where appropriate.</p>
<h2 id="equipment">Equipment</h2>
<p>Ensure any lab equipment (e.g. computer and peripherals) you have been using has been returned to the lab, office furniture is present. Make sure any problems with equipment are documented and that Primal and relevant department staff so that they can be addressed.</p>
</description>
</item>
<item>
<title>References</title>
<link>https://diprlab.github.io/wiki/references/</link>
<pubDate>Fri, 01 Nov 2024 00:00:00 +0000</pubDate>
<guid>https://diprlab.github.io/wiki/references/</guid>
<description><p>Some of the content in this wiki is inspired by similar wikis from labs.</p>
<ol>
<li><a href="https://atomslab.github.io/" target="_blank" rel="noopener">ATOMS lab</a></li>
<li><a href="https://damslabumbc.github.io/" target="_blank" rel="noopener">DAMS lab</a></li>
<li><a href="https://poldracklab.org/" target="_blank" rel="noopener">Poldrack lab</a></li>
<li><a href="https://thefaylab.github.io/" target="_blank" rel="noopener">Faylab</a></li>
</ol>
</description>
</item>
<item>
<title>Winter 2026 Week 9</title>
<link>https://diprlab.github.io/dbrg/events/2026/winter/09/</link>
<pubDate>Fri, 06 Mar 2026 00:00:00 +0000</pubDate>
<guid>https://diprlab.github.io/dbrg/events/2026/winter/09/</guid>
<description><table>
<tr>
<td>Title</td>
<td>
BridgeScope: A Universal Toolkit for Bridging Large Language Models and Databases
</td>
</tr>
<tr>
<td>Authors</td>
<td>
Lianggui Weng, Dandan Liu, Rong Zhu, Bolin Ding, Jingren Zhou
</td>
</tr>
<tr>
<td>Abstract</td>
<td>
As large language models (LLMs) demonstrate increasingly powerful reasoning and orchestration capabilities, LLM-based agents are rapidly adopted for complex data-related tasks. Despite this progress, the current design of how LLMs interact with databases exhibits critical limitations in usability, security, privilege management, and data transmission efficiency. To address these challenges, we introduce BridgeScope, a universal toolkit that bridges LLMs and databases through three key innovations. First, it modularizes SQL operations into fine-grained tools for context retrieval, CRUD execution, and ACID-compliant transaction management. This design enables more precise, LLM-friendly controls over database functionality. Second, it aligns tool implementations with database privileges and user-defined security policies to steer LLMs away from unsafe or unauthorized operations, which not only safeguards database security but also enhances task execution efficiency by enabling early identification and termination of infeasible tasks. Third, it introduces a proxy mechanism that supports seamless data transfer between tools, thereby bypassing the transmission bottlenecks via LLMs. All of these designs are database-agnostic and can be transparently integrated with existing agent architectures. We also release an open-source implementation of BridgeScope for PostgreSQL. Evaluations on two novel benchmarks demonstrate that BridgeScope enables LLM agents to interact with databases more effectively. It reduces token usage by up to 80% through improved security awareness and uniquely supports data-intensive workflows beyond existing toolkits. These results establish BridgeScope as a robust foundation for next-generation intelligent data automation.
</td>
</tr>
</table>
</description>
</item>
<item>
<title>BL(u)E CRAB: Bluetooth Low Energy Connection Risk Assessment Benchmarking</title>
<link>https://diprlab.github.io/publication/blue_crab_thesis/</link>
<pubDate>Fri, 27 Feb 2026 00:00:00 +0000</pubDate>
<guid>https://diprlab.github.io/publication/blue_crab_thesis/</guid>
<description></description>
</item>
<item>
<title>Winter 2026 Week 8</title>
<link>https://diprlab.github.io/dbrg/events/2026/winter/08/</link>
<pubDate>Fri, 27 Feb 2026 00:00:00 +0000</pubDate>
<guid>https://diprlab.github.io/dbrg/events/2026/winter/08/</guid>
<description><table>
<tr>
<td>Title</td>
<td>
Algorithmic Data Minimization for Machine Learning over Internet-of-Things Data Streams
</td>
</tr>
<tr>
<td>Authors</td>
<td>
Ted Shaowang, Shinan Liu, Jonatas Marques, Nick Feamster, Sanjay Krishnan
</td>
</tr>
<tr>
<td>Abstract</td>
<td>
Machine learning can analyze vast amounts of data generated by IoT devices to identify patterns, make predictions, and enable real-time decision-making. This raises significant privacy concerns, necessitating the application of data minimization – a foundational principle in emerging data regulations, which mandates that service providers only collect data that is directly relevant and necessary for a specified purpose. Despite its importance, data minimization lacks a precise technical definition in the context of sensor data, where collections of weak signals make it challenging to apply a binary “relevant and necessary” rule. This paper provides a technical interpretation of data minimization in the context of sensor streams, explores practical methods for implementation, and addresses the challenges involved. Through our approach, we demonstrate that our framework can reduce user identifiability by up to 16.7% while maintaining accuracy loss below 1%, offering a viable path toward privacy-preserving IoT data processing.
</td>
</tr>
</table>
</description>
</item>
<item>
<title>UR2PhD program acceptance</title>
<link>https://diprlab.github.io/post/2026/ur2phd/</link>
<pubDate>Mon, 16 Feb 2026 00:00:00 +0000</pubDate>
<guid>https://diprlab.github.io/post/2026/ur2phd/</guid>
<description><p>About UR2PHD:</p>
<blockquote>
<p><a href="https://cra.org/ur2phd/" target="_blank" rel="noopener">UR2PhD (Undergraduate Research to PhD)</a> is a three-month, national virtual program run by the Computing Research Association that pairs undergraduate researchers with graduate student mentors in computing. The undergraduate research training course is a virtual, synchronous opportunity where undergraduate students receive support and training during their research with a faculty and graduate student mentor. It is designed for first-time researchers who want tostrengthen their technical and communication skills in the context of research.</p></blockquote>
<p><a href="https://diprlab.github.io/author/ambika-vyas/">Ambika Vyas</a> and <a href="https://diprlab.github.io/author/nico-wood/">Nico Wood</a> got accepted into the Undergraduate research training course for Spring of 2026.</p>
<p>&ldquo;As an aspiring researcher, I&rsquo;m grateful for this opportunity and the mentorship I will receive. I have been with the DIPr lab for a few months and have enjoyed participating in the research environment and working with <a href="https://diprlab.github.io/author/primal-pappachan/">Primal Pappachan</a> &amp; <a href="https://diprlab.github.io/author/anadi-shakya/">Anadi Shakya</a>. Being able to continue this with additional support from computer science professionals at different institutions is something I&rsquo;m looking forward to.&rdquo; – <a href="https://diprlab.github.io/author/ambika-vyas/">Ambika Vyas</a></p>
<p>“As an undergraduate with an interest in research, I’m thankful for the opportunity provided by UR2PhD for a formal environment to build these skills and make connections with other researchers. Being new to the DIPr lab, I’m also very excited for the chance to work under the care of Anadi and Primal while Ambika and I progress through this course. I look forward to growing alongside each other and continuing to build community together going forward.” – <a href="https://diprlab.github.io/author/nico-wood/">Nico Wood</a></p>
</description>
</item>
<item>
<title>DIPr Lab at NorthWest Database Society (2026) Meeting</title>
<link>https://diprlab.github.io/post/2026/nwds/</link>
<pubDate>Fri, 13 Feb 2026 00:00:00 +0000</pubDate>
<guid>https://diprlab.github.io/post/2026/nwds/</guid>
<description><p><a href="https://db.cs.washington.edu/events/database_day/2026/database_day_2026.html" target="_blank" rel="noopener">The Northwest Database Society</a> Annual Meeting brings together researchers and practitioners from the greater Pacific Northwest for a day of technical talks and networking on the broad topic of data management systems.</p>
<p><a href="https://diprlab.github.io/author/orobosa-ekhator/">Orobosa Ekhator</a> attended the full day event at University of Washington, Seattle. She presented a poster about the Clustering-Based Local Outlier Factor (CBLOF) classifier she developed for <a href="https://diprlab.github.io/project/bluecrab">BL(u)E CRAB</a>.</p>
<p>
<figure >
<div class="d-flex justify-content-center">
<div class="w-100" ><img src="./poster.pdf" alt="CBLOF Poster" loading="lazy" data-zoomable /></div>
</div></figure>
</p>
</description>
</item>
<item>
<title>Winter 2026 Week 5</title>
<link>https://diprlab.github.io/dbrg/events/2026/winter/05/</link>
<pubDate>Fri, 06 Feb 2026 00:00:00 +0000</pubDate>
<guid>https://diprlab.github.io/dbrg/events/2026/winter/05/</guid>
<description><table>
<tr>
<td>Title</td>
<td>
I Can’t Believe It’s Not Yannakakis: Pragmatic Bitmap Filters in Microsoft SQL Server
</td>
</tr>
<tr>
<td>Authors</td>
<td>
Hangdong Zhao et al.
</td>
</tr>
<tr>
<td>Abstract</td>
<td>
The quest for optimal join processing has reignited interest in the Yannakakis algorithm, as researchers seek to realize its theoretical ideal in practice via bitmap filters instead of expensive semijoins. While this academic pursuit may seem distant from industrial practice, our investigation into production databases led to a startling discovery: over the last decade, Microsoft SQL Server has built an infrastructure for bitmap pre-filtering that subsumes the very spirit of Yannakakis! This is not a story of academia leading industry; but rather of industry practice, guided by pragmatic optimization, outpacing academic endeavors. This paper dissects this discovery. As a crucial contribution, we prove how SQL Server’s bitmap filters, pull-based execution, and Cascades optimizer conspire to not only consider, but often generate, instance-optimal plans, when it truly minimizes the estimated cost! Moreover, its rich plan search space reveals novel, largely overlooked pre-filtering opportunities on intermediate results, which approach strong semi-robust runtime for arbitrary join graphs. Instead of a verdict, this paper is an invitation: by exposing a system design that is long-hidden, we point our community towards a challenging yet promising research terrain.
</td>
</tr>
</table>
</description>
</item>
<item>
<title>Winter 2026 Week 4</title>
<link>https://diprlab.github.io/dbrg/events/2026/winter/04/</link>
<pubDate>Fri, 30 Jan 2026 00:00:00 +0000</pubDate>
<guid>https://diprlab.github.io/dbrg/events/2026/winter/04/</guid>
<description><table>
<tr>
<td>Title</td>
<td>
LOCATER: Cleaning WiFi Connectivity Datasets for Semantic Localization
</td>
</tr>
<tr>
<td>Authors</td>
<td>
Yiming Lin, Daokun Jiang, Roberto Yus, Georgios Bouloukakis, Andrew Chio, Sharad Mehrotra, Nalini Venkatasubramanian
</td>
</tr>
<tr>
<td>Abstract</td>
<td>
This paper explores the data cleaning challenges that arise in using WiFi connectivity data to locate users to semantic indoor locations such as buildings, regions, rooms. WiFi connectivity data consists of sporadic connections between devices and nearby WiFi access points (APs), each of which may cover a relatively large area within a building. Our system, entitled semantic LOCATion cleanER (LOCATER), postulates semantic localization as a series of data cleaning tasks - first, it treats the problem of determining the AP to which a device is connected between any two of its connection events as a missing value detection and repair problem. It then associates the device with the semantic subregion (e.g., a conference room in the region) by postulating it as a location disambiguation problem. LOCATER uses a bootstrapping semi-supervised learning method for coarse localization and a probabilistic method to achieve finer localization. The paper shows that LOCATER can achieve significantly high accuracy at both the coarse and fine levels.
</td>
</tr>
</table>
</description>
</item>
<item>
<title>Data Privacy Day 2026</title>
<link>https://diprlab.github.io/post/2026/data_privacy_day/</link>
<pubDate>Wed, 28 Jan 2026 00:00:00 +0000</pubDate>
<guid>https://diprlab.github.io/post/2026/data_privacy_day/</guid>
<description><div style="position: relative; width: 100%; height: 0; padding-top: 100.0000%;
padding-bottom: 0; box-shadow: 0 2px 8px 0 rgba(63,69,81,0.16); margin-top: 1.6em; margin-bottom: 0.9em; overflow: hidden;
border-radius: 8px; will-change: transform;">
<iframe loading="lazy"
style="position: absolute; width: 100%; height: 100%; top: 0; left: 0; border: none; padding: 0;margin: 0;"
src="https://www.canva.com/design/DAG_pGGI1_0/39aqM5FzT5fmZZ1JujzPqA/view?embed" allowfullscreen="allowfullscreen"
allow="fullscreen">
</iframe>
</div>
</description>
</item>
<item>
<title>Winter 2026 Week 2</title>
<link>https://diprlab.github.io/dbrg/events/2026/winter/02/</link>
<pubDate>Fri, 16 Jan 2026 00:00:00 +0000</pubDate>
<guid>https://diprlab.github.io/dbrg/events/2026/winter/02/</guid>
<description><table>
<tr>
<td>Title</td>
<td>
LLM-Driven Auto Configuration for Transient IoT Device Collaboration
</td>
</tr>
<tr>
<td>Authors</td>
<td>
Hetvi Shastri, Walid A. Hanafy, Li Wu, David Irwin, Mani Srivastava, Prashant Shenoy
</td>
</tr>
<tr>
<td>Abstract</td>
<td>
Today's Internet of Things (IoT) has evolved from simple sensing and actuation devices to those with embedded processing and intelligent services, enabling rich collaborations between users and their devices. However, enabling such collaboration becomes challenging when transient devices need to interact with host devices in temporarily visited environments. In such cases, fine-grained access control policies are necessary to ensure secure interactions; however, manually implementing them is often impractical for non-expert users. Moreover, at run-time, the system must automatically configure the devices and enforce such fine-grained access control rules. Additionally, the system must address the heterogeneity of devices.<br /><br />
In this paper, we present CollabIoT, a system that enables secure and seamless device collaboration in transient IoT environments. CollabIoT employs a Large language Model (LLM)-driven approach to convert users' high-level intents to fine-grained access control policies. To support secure and seamless device collaboration, CollabIoT adopts capability-based access control for authorization and uses lightweight proxies for policy enforcement, providing hardware-independent abstractions.<br /><br />
We implement a prototype of CollabIoT's policy generation and auto configuration pipelines and evaluate its efficacy on an IoT testbed and in large-scale emulated environments. We show that our LLM-based policy generation pipeline is able to generate functional and correct policies with 100% accuracy. At runtime, our evaluation shows that our system configures new devices in ~150 ms, and our proxy-based data plane incurs network overheads of up to 2 ms and access control overheads up to 0.3 ms.
</td>
</tr>
</table>
</description>
</item>
<item>
<title>Thesis Defense - Dylan Conklin</title>
<link>https://diprlab.github.io/post/2025/blue_crab_defense/</link>
<pubDate>Tue, 09 Dec 2025 00:00:00 +0000</pubDate>
<guid>https://diprlab.github.io/post/2025/blue_crab_defense/</guid>
<description><p><a href="https://diprlab.github.io/author/dylan-conklin/">Dylan Conklin</a> successfully defended is M.S. thesis on <a href="https://diprlab.github.io/project/bluecrab/">BL(u)E CRAB</a>.</p>
<p><strong>Committee</strong>: <a href="https://diprlab.github.io/author/primal-pappachan/">Primal Pappachan</a>, Roberto Yus, Bart Massey, Nirupama Bulusu, Wu-Chang Feng</p>
<p><strong>Abstract</strong>: The usage of Bluetooth Low Energy (BLE)-based tracker devices for stalking has become a salient privacy concern. Detecting unwanted or suspicious trackers is challenging due to their cross-platform compatibility issues, inconsistent detection methods, and lack of an industry-wide standard for detecting malicious devices. BL(u)E CRAB, Bluetooth Low Energy Connection Risk Assessment Benchmarking, scans and collects risk factors about nearby devices to classify them as suspicious or not. These risk factors include the number of encounters the user had with a device, the duration of time a device has been near the user, the distance a device has travelled with the user, the number of areas each device appeared in, the device&rsquo;s proximity to the user, and the stability of the device&rsquo;s signal strength. After collecting this information, BL(u)E CRAB uses one of several classifiers adapted to these risk metrics to determine whether a device is suspicious or not. We have integrated a multitude of new device classifier methods, including single- and multi-dimensional clustering methods. We evaluated these classifiers against existing methods using a diverse dataset of BLE tracker data in various real-world scenarios. The benchmark results show the efficacy of different classifiers in identifying suspicious BLE trackers. We also developed a full working prototype of BL(u)E CRAB that is an end-to-end solution that is easy to use, customizable, and can easily integrate other classifiers.</p>
<p>The thesis paper can be read <a href="https://diprlab.github.io/publication/blue_crab_thesis">here</a>.</p>
</description>
</item>
<item>
<title>Fall 2025 Week 9</title>
<link>https://diprlab.github.io/dbrg/events/2025/fall/09/</link>
<pubDate>Wed, 26 Nov 2025 00:00:00 +0000</pubDate>
<guid>https://diprlab.github.io/dbrg/events/2025/fall/09/</guid>
<description><table>
<tr>
<td>
Title
</td>
<td>
SIEVE: Effective Filtered Vector Search with Collection of Indexes
</td>
</tr>
<tr>
<td>
Authors
</td>
<td>
Zhaoheng Li, et al.
</td>
</tr>
<tr>
<td>
Abstract
</td>
<td>
Real-world tasks such as recommending videos tagged kids can be reduced to finding similar vectors associated with hard predicates. This task, filtered vector search, is challenging as prior state-of-the-art graph-based (unfiltered) similarity search techniques degenerate when hard constraints are considered: effective graph-based filtered similarity search relies on sufficient connectivity for reaching similar items within a few hops. To consider predicates, recent works propose modifying graph traversal to visit only items that satisfy predicates. However, they fail to offer the just-a-few-hops property for a wide range of predicates: they must restrict predicates significantly or lose efficiency if only few items satisfy predicates. <br /> <br />
We propose an opposite approach: instead of constraining traversal, we build many indexes each serving different predicate forms. For effective construction, we devise a three-dimensional analytical model capturing relationships among index size, search time, and recall, with which we follow a workload-aware approach to pack as many useful indexes as possible into a collection. At query time, the analytical model is employed yet again to discern the one that offers the fastest search at a given recall. We show superior performance and support on datasets with varying selectivities and forms: our approach achieves up to 8.06 x speedup while having as low as 1% build time versus other indexes, with less than 2.15 x memory of a standard HNSW graph and modest knowledge of past workloads.
</td>
</tr>
</table>
</description>
</item>
<item>
<title>PaPrica-PS: Fine-Grained, Dynamic Access Control Policy Enforcement for Pub/Sub Systems</title>
<link>https://diprlab.github.io/project/pubsubcontrol/</link>
<pubDate>Wed, 26 Nov 2025 00:00:00 +0000</pubDate>
<guid>https://diprlab.github.io/project/pubsubcontrol/</guid>
<description><p>High-volume publish/subscribe (pub/sub) systems include collections
of hardware and software components such as IoT sensors and the protocols
that connect them. Many of these have heretofore lacked robust security
and privacy controls by default despite there being significant security,
safety, and privacy implications driving the need to control access to
the data they generate and manage.</p>
<p>Examples of such pub/sub-based systems are those which power critical systems
from smart buildings
and factories to full city-wide device networks.
In this project, we are developing a
fine-grained access control model and enforcement mechanism to
address this gap. Our proposed FGAC model builds upon
Attribute-Based Access Control (ABAC) defining access rules based
on the MQTT protocol message &ldquo;topics&rdquo;, attributes of the subscribers
and publishers to those topics, as well as
ephemeral and per-message context information.</p>
<p>Our framework is platform-agnostic and we implement the prototype for our
experiments based on an off-the-shelf open source MQTT pub/sub
system without altering the base code of that server itself.</p>
</description>
</item>
<item>
<title>Fall 2025 Week 8</title>
<link>https://diprlab.github.io/dbrg/events/2025/fall/08/</link>
<pubDate>Wed, 19 Nov 2025 00:00:00 +0000</pubDate>
<guid>https://diprlab.github.io/dbrg/events/2025/fall/08/</guid>
<description><table>
<tr>
<td>
Title
</td>
<td>
Adaptive Differentially Private Structural Entropy Minimization for Unsupervised Social Event Detection
</td>
</tr>
<tr>
<td>
Authors
</td>
<td>
Zhiwei Yang, et al.
</td>
</tr>
<tr>
<td>
Abstract
</td>
<td>
Social event detection refers to extracting relevant message clusters from social media data streams to represent specific events in the real world. Social event detection is important in numerous areas, such as opinion analysis, social safety, and decision-making. Most current methods are supervised and require access to large amounts of data. These methods need prior knowledge of the events and carry a high risk of leaking sensitive information in the messages, making them less applicable in open-world settings. Therefore, conducting unsupervised detection while fully utilizing the rich information in the messages and protecting data privacy remains a significant challenge. To this end, we propose a novel social event detection framework, ADP-SEMEvent, an unsupervised social event detection method that prioritizes privacy. Specifically, ADP-SEMEvent is divided into two stages, i.e., the construction stage of the private message graph and the clustering stage of the private message graph. In the first stage, an adaptive differential privacy approach is used to construct a private message graph. In this process, our method can adaptively apply differential privacy based on the events occurring each day in an open environment to maximize the use of the privacy budget. In the second stage, to address the reduction in data utility caused by noise, a novel 2-dimensional structural entropy minimization algorithm based on optimal subgraphs is used to detect events in the message graph. The highlight of this process is unsupervised and does not compromise differential privacy. Extensive experiments on two public datasets demonstrate that ADP-SEMEvent can achieve detection performance comparable to state-of-the-art methods while maintaining reasonable privacy budget parameters.
</td>
</tr>
</table>
</description>
</item>
<item>
<title>Fall 2025 Week 7</title>
<link>https://diprlab.github.io/dbrg/events/2025/fall/07/</link>
<pubDate>Wed, 12 Nov 2025 00:00:00 +0000</pubDate>
<guid>https://diprlab.github.io/dbrg/events/2025/fall/07/</guid>
<description><table>
<tr>
<td>
Title
</td>
<td>
Scribe: How Meta transports terabytes per second in real time
</td>
</tr>
<tr>
<td>
Authors
</td>
<td>
Manos Karpathiotakis, et al.
</td>
</tr>
<tr>
<td>
Abstract
</td>
<td>
Millions of web servers and a multitude of applications are producing ever-increasing amounts of data in real time at Meta. Regardless of how data is generated and how it is processed, there is a need for infrastructure that can accommodate the transport of arbitrarily large data streams from their generation location to their processing location with low latency. <br /> <br />
This paper presents Scribe, a multi-tenant message queue service that natively supports the requirements of Meta’s data-intensive applications, ingesting > 15 TB/s and serving > 110 TB/s to its consumers. Scribe relies on a multi-hop write path and opportunistic data placement to maximise write availability, whereas its read path adapts replica placement and representation based on the incoming workload as a means to minimise resource consumption for both Scribe and its downstreams. The wide range of Scribe use cases can pick from a range of offered guarantees, based on the trade-offs favourable for each one.
</td>
</tr>
</table>
</description>
</item>
<item>
<title>Fall 2025 Week 6</title>
<link>https://diprlab.github.io/dbrg/events/2025/fall/06/</link>
<pubDate>Wed, 05 Nov 2025 00:00:00 +0000</pubDate>
<guid>https://diprlab.github.io/dbrg/events/2025/fall/06/</guid>
<description><table>
<tr>
<td>
Title
</td>
<td>
Delta Sharing: An Open Protocol for Cross-Platform Data Sharing
</td>
</tr>
<tr>
<td>
Authors
</td>
<td>
Krishna Puttaswamy, et al.
</td>
</tr>
<tr>
<td>
Abstract
</td>
<td>
Organizations across industries increasingly rely on sharing data to drive collaboration, innovation, and business performance. However, securely and efficiently sharing live data across diverse platforms and adhering to varying governance requirements remains a significant challenge. Traditional approaches, such as FTP and proprietary in-data-warehouse solutions, often fail to meet the demands of interoperability, cost, scalability, and low overhead. This paper introduces Delta Sharing, an open protocol we developed in collaboration with industry partners, to overcome these limitations. Delta Sharing leverages open formats like Delta Lake and Apache Parquet alongside simple HTTP APIs to enable seamless, secure, and live data sharing across heterogeneous systems. Since its launch in 2021, Delta Sharing has been adopted by over 4000 enterprises and supported by hundreds of major software and data vendors. We discuss the key challenges in developing Delta Sharing and how our design addresses them. We also present, to our knowledge, the first large-scale study of production data sharing workloads offering insights into this emerging data platform capability.
</td>
</tr>
</table>
</description>
</item>
<item>
<title>DIPr Lab at URMP Poster Competition</title>
<link>https://diprlab.github.io/post/2025/spencer_urmp/</link>
<pubDate>Thu, 25 Sep 2025 00:00:00 +0000</pubDate>
<guid>https://diprlab.github.io/post/2025/spencer_urmp/</guid>
<description><p>Portland State University&rsquo;s Undergraduate Research &amp; Mentor Program <a href="https://www.pdx.edu/engineering/urmp" target="_blank" rel="noopener">(URMP)</a> hosted a poster event that brings together student researchers and allows them to demonstrate their work. <a href="https://diprlab.github.io/author/spencer-henwood/">Spencer Henwood</a> presented a poster about securing Publish-Subscribe (pub/sub) systems with fine-grained access control (FGAC).</p>
<p>
<figure >
<div class="d-flex justify-content-center">
<div class="w-100" ><img src="./poster.pdf" alt="URMP Poster" loading="lazy" data-zoomable /></div>
</div></figure>
</p>
</description>
</item>
<item>
<title>UR2PhD program acceptance</title>
<link>https://diprlab.github.io/post/2025/ur2phd/</link>
<pubDate>Thu, 04 Sep 2025 00:00:00 +0000</pubDate>
<guid>https://diprlab.github.io/post/2025/ur2phd/</guid>
<description><p>About UR2PHD:</p>
<blockquote>
<p><a href="https://cra.org/ur2phd/" target="_blank" rel="noopener">UR2PhD (Undergraduate Research to PhD)</a> is a three-month, national virtual program run by the Computing Research Association that pairs undergraduate researchers with graduate student mentors in computing. As part of this, there is a structured mentorship course for graduate students. The course helps graduate mentors learn how to support undergraduate researchers effectively, with a focus on inclusive, research-based mentoring practices. Topics include how people learn in research settings, aligning expectations, articulating a mentoring philosophy, giving effective feedback, and helping mentees feel that they belong in the computing research community.</p></blockquote>
<p><a href="https://diprlab.github.io/author/orobosa-ekhator/">Orobosa Ekhator</a> and <a href="https://diprlab.github.io/author/anadi-shakya/">Anadi Shakya</a> got accepted into the Graduate Student Mentor Training Course for Fall 2025. The goal of the course is to provide mentors of undergraduate researchers with the essential skills necessary to build robust research settings.</p>
<p>&ldquo;As a PhD student, this is helping me build skills I’ll need to eventually lead my own research projects and create a lab culture where undergraduates feel welcomed, supported, and able to see themselves as future researchers.&rdquo; – Anadi</p></description>
</item>
<item>
<title>Summer 2025 Week 4</title>
<link>https://diprlab.github.io/dbrg/events/2025/summer/04/</link>
<pubDate>Wed, 20 Aug 2025 00:00:00 +0000</pubDate>
<guid>https://diprlab.github.io/dbrg/events/2025/summer/04/</guid>
<description><table>
<tr>
<td>
Title
</td>
<td>
TSB-UAD: An End-to-End Benchmark Suite for Univariate Time-Series Anomaly Detection
</td>
</tr>
<tr>
<td>
Authors
</td>
<td>
John Paparrizos ,Yuhao Kang , Paul Boniol , Ruey S. Tsay ,Themis Palpanas , Michael J. Franklin
</td>
</tr>
<tr>
<td>
Abstract
</td>
<td>
The detection of anomalies in time series has gained ample academic and industrial attention. However, no comprehensive benchmark exists to evaluate time-series anomaly detection methods. It is common to use (i) proprietary or synthetic data, often biased to support particular claims; or (ii) a limited collection of publicly available datasets. Consequently, we often observe methods performing exceptionally well in one dataset but surprisingly poorly in another, creating an illusion of progress. To address the issues above, we thoroughly studied over one hundred papers to identify, collect, process, and systematically format datasets proposed in the past decades. We summarize our effort in TSB-UAD, a new benchmark to ease the evaluation of univariate time-series anomaly detection methods. Overall, TSB-UAD contains 13766 time series with labeled anomalies spanning different domains with high variability of anomaly types, ratios, and sizes. TSB-UAD includes 18 previously proposed datasets containing 1980 time series and we contribute two collections of datasets. Specifically, we generate 958 time series using a principled methodology for transforming 126 time-series classification datasets into time series with labeled anomalies. In addition, we present data transformations with which we introduce new anomalies, resulting in 10828 time series with varying complexity for anomaly detection. Finally, we evaluate 12 representative methods demonstrating that TSB-UAD is a robust resource for assessing anomaly detection methods. We make our data and code available at www.timeseries.org/TSB-UAD. TSB-UAD provides a valuable, reproducible, and frequently updated resource to establish a leaderboard of univariate time-series anomaly detection methods.
</td>
</tr>
</table>
</description>
</item>
<item>
<title>Summer 2025 Week 3</title>
<link>https://diprlab.github.io/dbrg/events/2025/summer/03/</link>
<pubDate>Wed, 06 Aug 2025 00:00:00 +0000</pubDate>
<guid>https://diprlab.github.io/dbrg/events/2025/summer/03/</guid>
<description><table>
<tr>
<td>
Title
</td>
<td>
HoneyBee: Efficient Role-based Access Control for Vector Databases via Dynamic Partitioning
</td>
</tr>
<tr>
<td>
Authors
</td>
<td>
Hongbin Zhong, Matthew Lentz, Nina Narodytska, Adriana Szekeres, Kexin Rong
</td>
</tr>
<tr>
<td>
Abstract
</td>
<td>
As vector databases gain traction in enterprise applications, robust access control has become critical to safeguard sensitive data. Access control in these systems is often implemented through hybrid vector queries, which combine nearest neighbor search on vector data with relational predicates based on user permissions. However, existing approaches face significant trade-offs: creating dedicated indexes for each user minimizes query latency but introduces excessive storage redundancy, while building a single index and applying access control after vector search reduces storage overhead but suffers from poor recall and increased query latency. This paper introduces HoneyBee, a dynamic partitioning framework that bridges the gap between these approaches by leveraging the structure of Role-Based Access Control (RBAC) policies. RBAC, widely adopted in enterprise settings, groups users into roles and assigns permissions to those roles, creating a natural "thin waist" in the permission structure that is ideal for partitioning decisions. Specifically, HoneyBee produces overlapping partitions where vectors can be strategically replicated across different partitions to reduce query latency while controlling storage overhead. By introducing analytical models for the performance and recall of the vector search, HoneyBee formulates the partitioning strategy as a constrained optimization problem to dynamically balance storage, query efficiency, and recall. Evaluations on RBAC workloads demonstrate that HoneyBee reduces storage redundancy compared to role partitioning and achieves up to 6x faster query speeds than row-level security (RLS) with only 1.4x storage increase, offering a practical middle ground for secure and efficient vector search.
</td>
</tr>
</table>
</description>
</item>
<item>
<title>Summer 2025 Week 2</title>
<link>https://diprlab.github.io/dbrg/events/2025/summer/02/</link>
<pubDate>Wed, 23 Jul 2025 00:00:00 +0000</pubDate>
<guid>https://diprlab.github.io/dbrg/events/2025/summer/02/</guid>
<description><table>
<tr>
<td>
Title
</td>
<td>
An Elephant Under the Microscope: Analyzing the Interaction of Optimizer Components in PostgreSQL
</td>
</tr>
<tr>
<td>
Authors
</td>
<td>
Rico Bergmann, Claudio Hartmann, Dirk Habich, Wolfgang Lehner
</td>
</tr>
<tr>
<td>
Abstract
</td>
<td>
Despite an ever-growing corpus of novel query optimization strategies, the interaction of the core components of query optimizers is still not well understood. This situation can be problematic for two main reasons: On the one hand, this may cause surprising results when two components influence each other in an unexpected way. On the other hand, this can lead to wasted effort in regard to both engineering and research, e.g., when an improvement for one component is dwarfed or entirely canceled out by problems of another component. Therefore, we argue that making improvements to a single optimization component requires a thorough understanding of how these changes might affect the other components. To achieve this understanding, we present results of a comprehensive experimental analysis of the interplay in the traditional optimizer architecture using the widely-used PostgreSQL system as prime representative. Our evaluation and analysis revisit the core building blocks of such an optimizer, i.e. per-column statistics, cardinality estimation, cost model, and plan generation. In particular, we analyze how these building blocks influence each other and how they react when faced with faulty input, such as imprecise cardinality estimates. Based on our results, we draw novel conclusions and make recommendations on how these should be taken into account.
</td>
</tr>
</table>
</description>
</item>
<item>
<title>Summer 2025 Week 1</title>
<link>https://diprlab.github.io/dbrg/events/2025/summer/01/</link>
<pubDate>Wed, 09 Jul 2025 00:00:00 +0000</pubDate>
<guid>https://diprlab.github.io/dbrg/events/2025/summer/01/</guid>
<description><table>
<tr>
<td>
Title
</td>
<td>
Streaming Democratized: Ease Across the Latency Spectrum with Delayed View Semantics and Snowflake Dynamic Tables
</td>
</tr>
<tr>
<td>
Authors
</td>
<td>
Daniel Sotolongo, Daniel Mills, Tyler Akidau, Anirudh Santhiar, Attila-Péter Tóth, Botong Huang, Boyuan Zhang, Igor Belianski, Ling Geng, Matt Uhlar, Nikhil Shah, Olivia Zhou, Saras Nowak, Sasha Lionheart, Vlad Lifliand, Wendy Grus, Yiwen Zhu, Ankur Sharma, Dzmitry Pauliukevich, Enrico Sartorello, Ilaria Battiston, Ivan Kalev, Lawrence Benson, Leon Papke, Niklas Semmler, Till Merker, Yi Huang
</td>
</tr>
<tr>
<td>
Abstract
</td>
<td>
Streaming data pipelines remain challenging and expensive to build and maintain, despite significant advancements in stronger consistency, event time semantics, and SQL support over the last decade. Persistent obstacles continue to hinder usability, such as the need for manual incrementalization, semantic discrepancies across SQL implementations, and the lack of enterprise-grade operational features (e.g. granular access control, disaster recovery). While the rise of incremental view maintenance (IVM) as a way to integrate streaming with databases has been a huge step forward, transaction isolation in the presence of IVM remains underspecified, which leaves the maintenance of application-level invariants as a painful exercise for the user. Meanwhile, most streaming systems optimize for latencies of 100 milliseconds to 3 seconds, whereas many practical use cases are well-served by latencies ranging from seconds to tens of minutes.
<p>In this paper, we present delayed view semantics (DVS), a conceptual foundation that bridges the semantic gap between streaming and databases, and introduce Dynamic Tables, Snowflake&rsquo;s declarative streaming transformation primitive designed to democratize analytical stream processing. DVS formalizes the intuition that stream processing is primarily a technique to eagerly compute derived results asynchronously, while also addressing the need to reason about the resulting system end to end. Dynamic Tables then offer two key advantages: ease of use through DVS, enterprise-grade features, and simplicity; as well as scalable cost efficiency via IVM with an architecture designed for diverse latency requirements. We first develop extensions to transaction isolation that permit the preservation of invariants in streaming applications. We then detail the implementation challenges of Dynamic Tables and our experience operating it at scale. Finally, we share insights into user adoption and discuss our vision for the future of stream processing.</p>
</td>
</tr>
</table>
</description>
</item>
<item>
<title>Spring 2025 Week 9</title>
<link>https://diprlab.github.io/dbrg/events/2025/spring/09/</link>
<pubDate>Fri, 30 May 2025 00:00:00 +0000</pubDate>
<guid>https://diprlab.github.io/dbrg/events/2025/spring/09/</guid>
<description><table>
<tr>
<td>
Title
</td>
<td>
In-Database Time Series Clustering
</td>
</tr>
<tr>
<td>
Authors
</td>
<td>
Yunxiang Su, Kenny Ye Liang, Shaoxu Song
</td>
</tr>
<tr>
<td>
Abstract
</td>
<td>
Time series data are often clustered repeatedly across various time ranges to mine frequent subsequence patterns from different periods, which could further support downstream applications. Existing state-of-the-art (SOTA) time series clustering method, such as K-Shape, can proficiently cluster time series data referring to their shapes. However, in-database time series clustering problem has been neglected, especially in IoT scenarios with large-volume data and high efficiency demands. Most time series databases employ LSM-Tree based storage to support intensive writings, yet causing underlying data points out-of-order in timestamps. Therefore, to apply existing out-of-database methods, all data points must be fully loaded into memory and chronologically sorted. Additionally, out-of-database methods must cluster from scratch each time, making them inefficient when handling queries across different time ranges. In this work, we propose an in-database adaptation of SOTA time series clustering method K-Shape. Moreover, to solve the problem that K-Shape cannot efficiently handle long time series, we propose Medoid-Shape, as well as its in-database adaptation for further acceleration. Extensive experiments are conducted to demonstrate the higher efficiency of our proposals, with comparable effectiveness. Remarkably, all proposals have already been implemented in an open-source commodity time series database, Apache IoTDB.
</td>
</tr>
</table>
</description>
</item>
<item>
<title>Spring 2025 Week 8</title>
<link>https://diprlab.github.io/dbrg/events/2025/spring/08/</link>
<pubDate>Fri, 23 May 2025 00:00:00 +0000</pubDate>
<guid>https://diprlab.github.io/dbrg/events/2025/spring/08/</guid>
<description><table>
<tr>
<td>
Title
</td>
<td>
Highly Efficient and Scalable Access Control Mechanism for IoT Devices in Pervasive Environments
</td>
</tr>
<tr>
<td>
Authors
</td>
<td>
Alian Yu, Jian Kang, Wei Jiang and Dan Lin
</td>
</tr>
<tr>
<td>
Abstract
</td>
<td>
With the continuous advancement of sensing, networking, controlling, and computing technologies, there is a growing number of IoT (Internet of Things) devices emerging that are expected to integrate into public infrastructure in the near future. However, the deployment of these smart devices in public venues presents new challenges for existing access control mechanisms, particularly in terms of efficiency. To address these challenges, we have developed a highly efficient and scalable access control mechanism that enables automatic and fine-grained access control management while incurring low overhead in large-scale settings. Our mechanism includes a dual-hierarchy access control structure and associated information retrieval algorithms, which we have used to develop a large-scale IoT device access control system called FACT+. FACT+ overcomes the efficiency issues of granting and inquiring access control status over millions of devices in pervasive environments. Additionally, our system offers a pay-and-consume scheme and plug-and-play device management for convenient adoption by service providers. We have conducted extensive experiments to demonstrate the practicality, effectiveness, and efficiency of our access control mechanism.
</td>
</tr>
</table>
</description>
</item>
<item>
<title>Spring 2025 Week 6</title>
<link>https://diprlab.github.io/dbrg/events/2025/spring/06/</link>
<pubDate>Fri, 09 May 2025 00:00:00 +0000</pubDate>
<guid>https://diprlab.github.io/dbrg/events/2025/spring/06/</guid>
<description><table>
<tr>
<td>