Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
148 commits
Select commit Hold shift + click to select a range
75a2843
Initial setup for Kafka consumer groups.
Tiihott Dec 5, 2023
692061e
Implementing idempotent consumer pattern to the Kafka consumer groups…
Tiihott Dec 8, 2023
888d157
Implementing idempotent consumer pattern to the Kafka consumer groups…
Tiihott Dec 11, 2023
2ab05a4
Implementing Kafka consumed record processing using AVRO serializatio…
Tiihott Dec 12, 2023
9a8065c
Implementing AVRO serialization to Kafka record processing.
Tiihott Dec 13, 2023
5da4247
Implementing AVRO serialization to Kafka record processing, WIP2.
Tiihott Dec 14, 2023
200d958
Implementing HDFSWriter to Kafka consumer record processing.
Tiihott Dec 15, 2023
281d86c
Implementing HDFSWriter to Kafka consumer record processing (2).
Tiihott Dec 18, 2023
6f4b9ef
Implementing HDFSWriter to Kafka consumer record processing (3).
Tiihott Dec 19, 2023
00ce004
Testing configurations, setting up testcases, debugging dependency is…
Tiihott Dec 20, 2023
4e23a06
Testing and debugging kafka consumer processing, debugging dependency…
Tiihott Dec 22, 2023
818005e
Fixed issue in Kafka mock consumer processing, continued testing Avro…
Tiihott Dec 29, 2023
0bfe935
Added Kerberos authentication to HDFS access.
Tiihott Jan 3, 2024
de78d63
Added configurations for Kerberos authentication to HDFS access.
Tiihott Jan 4, 2024
0b15600
Added support to define mock data amount. Debugging AVRO-serializatio…
Tiihott Jan 5, 2024
df06744
Debugging issue on how AVRO-file size is tracked.
Tiihott Jan 8, 2024
8e58000
Changed AVRO-serialization to use approximated file size values inste…
Tiihott Jan 9, 2024
00d790e
Still having issues in the filesize of AVRO-serialization.
Tiihott Jan 9, 2024
f7e3169
Rolled back to using definitive file size values on avro-serializatio…
Tiihott Jan 10, 2024
7fdc758
Setting up HDFS access tests.
Tiihott Jan 10, 2024
cf9ff73
Setting up HDFS access tests (2).
Tiihott Jan 12, 2024
f995126
Added readme and issue template.
Tiihott Jan 15, 2024
c1db309
Fixed faulty origin and source parameter generation in DatabaseOutput…
Tiihott Jan 15, 2024
b6928e9
Replacing mxj_01 based performance metrics with dropwizard metrics im…
Tiihott Jan 16, 2024
eebc240
Implemented Dropwizard metrics to monitor Kafka consumer processing p…
Tiihott Jan 17, 2024
13d5489
Debugging faulty kafka topic partition ordering. (2)
Tiihott Jan 17, 2024
77faf23
Changed logic on how topic partitions are handled, each partition now…
Tiihott Jan 18, 2024
faefd9b
Altered Kafka MockConsumer to use subscription instead of assign method.
Tiihott Jan 22, 2024
484b6aa
Implementing fix to consumer group creation for partitions.
Tiihott Jan 23, 2024
0904280
Added config parameter for defining the number of consumers in a cons…
Tiihott Jan 24, 2024
a08b610
Fixed consumer group generation and its tests. Added example configur…
Tiihott Jan 25, 2024
82bd6f5
Cleaned up and improved tests.
Tiihott Jan 25, 2024
a4f6ed4
Improved commenting. Cleaned up code. Cleaned up class names.
Tiihott Jan 29, 2024
8848500
Added licenses to files. Removed obsolete TopicStatistics.java.
Tiihott Jan 30, 2024
bb66945
Replaced rlo_09 dependency with Kafka-clients.
Tiihott Jan 30, 2024
1713376
Setting up tests for HDFS write.
Tiihott Feb 1, 2024
6198f30
Setting up MiniCluster for HDFS testing. Debugging issues in HDFS tests.
Tiihott Feb 5, 2024
6c9061f
Fixed HDFS tests.
Tiihott Feb 6, 2024
a74131e
Created integration test for Kafka consumer groups, record processing…
Tiihott Feb 6, 2024
29f82e9
Cleaning up comments. Adding missing loggers to files.
Tiihott Feb 7, 2024
8ca381f
Cleaning up loggers. Added missing exception handling to Kerberized H…
Tiihott Feb 8, 2024
e836a1f
Testing FileSystem limitations on how pruning of the HDFS database ca…
Tiihott Feb 8, 2024
4c274c5
Testing FileSystem limitations on how pruning of the HDFS database ca…
Tiihott Feb 9, 2024
045ff95
Setting up pruning using MapReduce.
Tiihott Feb 12, 2024
d88b36e
Setting up pruning using avro-mapred.
Tiihott Feb 13, 2024
b0199a3
Setting up pruning using avro-mapred. (2)
Tiihott Feb 15, 2024
c9f330b
Setting up pruning using avro-mapred. (3)
Tiihott Feb 15, 2024
832ae6a
Attempt at optimizing the output of MapReduce.
Tiihott Feb 16, 2024
6af8eb4
Reverting back to using FileSystem modification timestamps for prunin…
Tiihott Feb 19, 2024
795c488
Parametrized the pruning cutoff epoch offset. Removed obsolete parame…
Tiihott Feb 20, 2024
0426a78
Added the missing kerberized access to HDFSPrune.java.
Tiihott Feb 21, 2024
989b9e1
Implementing fixes/improvements from code review. Fixes for comments …
Tiihott Feb 21, 2024
fb67b46
Implemented null object pattern for RecordOffsetObject (improvements …
Tiihott Feb 26, 2024
17dc709
Reverted back to using original modification timestamp functionality …
Tiihott Mar 13, 2024
ffe1942
Changed Offset class namings. Changed file system asset naming.
Tiihott Apr 4, 2024
1b845bc
Added maven compiler plugin version tag, simplified versioning for ha…
Tiihott May 14, 2024
3cb7adc
Renamed HDFSWriter.java to HDFSWrite.java.
Tiihott May 14, 2024
094c21c
Added HDFSRead.java for fetching offsets of the latest kafka records …
Tiihott May 14, 2024
2a39476
Implemented idempotent kafka consumer functionality using kafka consu…
Tiihott May 14, 2024
ac1e4e1
Fixed bug in topic partition offset map creation in HDFSRead.java. Ad…
Tiihott May 15, 2024
a1845f4
Changed logging to follow the Java logging standard (issue #12).
Tiihott May 16, 2024
0aa3119
Improved boolean value naming in DatabaseOutput.java.
Tiihott May 16, 2024
1d0ae48
Removed distribution management from pom.xml.
Tiihott May 27, 2024
2c50a71
Fixed typo.
Tiihott May 27, 2024
84b3234
Fixed boolean name.
Tiihott May 27, 2024
e2153fa
Uncommented out code in pom.xml.
Tiihott May 27, 2024
ea10fa9
Changed naming of message to payload in avro schema. Made changes to …
Tiihott May 28, 2024
96da584
Removed unused test file and related dependency.
Tiihott May 28, 2024
022832c
Cleaned up commenting. Removed duplicate code in HDFSWrite.java.
Tiihott May 28, 2024
a1fda67
Removed ANSI coloring from logging and fixed wrong log level usage.
Tiihott May 29, 2024
8721d82
Fixed missing and wrong licenses.
Tiihott May 29, 2024
45a69d9
Added spotless, enforcer and jacoco plugins and their requirements.
Tiihott May 29, 2024
aa39668
Spotless
Tiihott May 29, 2024
b12bb14
Implemented logger call guards.
Tiihott May 29, 2024
57c6805
Renamed MockKafkaConsumerFactoryTemp to MockKafkaConsumerFactory
Tiihott May 29, 2024
7536224
Added @VisibleForTesting to MockKafkaConsumerFactory
Tiihott May 30, 2024
0d536be
Added missing FIXME tag
Tiihott May 30, 2024
75cb618
Changed to using UncheckedIOException if necessary.
Tiihott May 30, 2024
e201470
Renamed KafkaController.java to HdfsDataIngestion.java
Tiihott May 31, 2024
70c6f26
Refactored Offset abstract class into an interface from abstract clas…
Tiihott May 31, 2024
2f64e8b
Moved DurationStatistics usage from class field to a local variable.
Tiihott Jun 3, 2024
d7cab0c
Removed redundant public modifiers.
Tiihott Jun 3, 2024
c1c455c
Refactored Offset, RecordOffset and NullOffset further by implementin…
Tiihott Jun 3, 2024
8ce5ca4
Deleted QueueUtilities.java and moved the contained methods to Writab…
Tiihott Jun 4, 2024
65342c2
Refactoring tests according to the code review. WIP
Tiihott Jun 4, 2024
18860a4
Added PruningTest.java (WIP). Minor tweak to HDFSPrune.prune() to ret…
Tiihott Jun 7, 2024
b015ca7
Spotless
Tiihott Jun 7, 2024
61c54a1
Expanded and improved tests in PruningTest.java
Tiihott Jun 7, 2024
9c3b842
Added ConfigTest.java.
Tiihott Jun 11, 2024
6dd01c4
Refactored KafkaConsumerTest.java. Added condition to ReadCoordinator…
Tiihott Jun 11, 2024
f67e592
Refactored CombinedFullTest.java. Spotless.
Tiihott Jun 12, 2024
7d1ad2f
Removed unused input arguments and parameters from HDFSWrite.java and…
Tiihott Jun 13, 2024
068419a
Removed assertion helper functions from CombinedFullTest.java.
Tiihott Jun 13, 2024
6235423
Refactoring HdfsTest.java and improved HDFSWrite.java exception handl…
Tiihott Jun 13, 2024
728a6de
Improved test assertion setups.
Tiihott Jun 17, 2024
dee838a
Separated tests in CombinedFullTest.java to Ingestion0FilesTest.java,…
Tiihott Jun 18, 2024
95d01d5
Separated tests in PruningTest.java to PruningNoFilesTest.java, Pruni…
Tiihott Jun 18, 2024
58b53d8
Removed helper functions and if-statement usage from Pruning tests.
Tiihott Jun 18, 2024
5c40bb9
Removed helper functions and if-statement usage from ingestion tests.
Tiihott Jun 18, 2024
8b9eda3
Added codeblock
Tiihott Jun 20, 2024
f57c4d7
Added example config.jaas, application.properties and log4j2.properti…
Tiihott Jun 20, 2024
54cc6b9
Added public visibility to Config() in Config.java.
Tiihott Jun 20, 2024
230960e
Fixed typo in configPath parameter initialization.
Tiihott Jun 20, 2024
7e981f1
Removed queueNamePrefix from config parameters, instead sourcing the …
Tiihott Jun 20, 2024
b0cc685
Removed duplicate example configuration files from test section.
Tiihott Jun 20, 2024
fdc919f
Set visibility to private on committedToHdfs() in DatabaseOutput.java.
Tiihott Jun 20, 2024
03e4c20
Renamed committedToHdfs to writeToHdfs. Added missing second & to a c…
Tiihott Jun 24, 2024
3385811
Added additional ingestion test for low maximum file size.
Tiihott Jun 24, 2024
678c2ae
Refactored HDFSPrune.java to use dependency injection for FileSystem …
Tiihott Jun 25, 2024
5b9025d
Refactored HDFSRead.java. Added dependency injection for FileSystem, …
Tiihott Jun 25, 2024
38de4b0
Renamed thread id parameter from testi to threadId.
Tiihott Jun 25, 2024
d6b7e7a
Removed unused stop() method.
Tiihott Jun 25, 2024
ae2b625
Changed RecordOffset json printer to use numeric values for partition…
Tiihott Jun 25, 2024
3458ee2
Added missing public visibilities.
Tiihott Jun 26, 2024
1ac75a3
Removed unneeded concatenation. Renamed single letter variable.
Tiihott Jun 26, 2024
75ff318
Fixed logging.
Tiihott Jun 26, 2024
f272ed3
Condensed assertions using nested loops.
Tiihott Jun 26, 2024
e57fc1f
Improved commenting on tests.
Tiihott Jun 26, 2024
053476e
Removed now unneeded additional FileSystem initializations from inges…
Tiihott Jun 27, 2024
90c0278
Implemented TestMiniClusterFactory.java and TestFileSystemFactory.jav…
Tiihott Jun 27, 2024
78d809e
Fixed logging.
Tiihott Jun 27, 2024
5df2c15
Implemented Main.java, rpm packaging and GitHub workflows.
Tiihott Jul 1, 2024
825ba1e
Fixed wrong json format usage in HdfsTest.java.
Tiihott Jul 1, 2024
9127524
Fixed logging.
Tiihott Jul 1, 2024
51c7f4c
Removed unneeded System.setProperty.
Tiihott Jul 1, 2024
0649753
Improved logging format.
Tiihott Jul 1, 2024
f3601d3
Fixed error in json printer format.
Tiihott Jul 1, 2024
f1d6344
Moved the pre-generated avro/hdfs files to test resources.
Tiihott Jul 2, 2024
3183e97
Implemented configuration tests. Implemented configuration checks. Ad…
Tiihott Jul 2, 2024
42337f2
Added more configuration checks. Changed default directory of applica…
Tiihott Jul 3, 2024
5a989b0
Updated README.adoc and example config.
Tiihott Jul 4, 2024
598cafa
Added rat plugin. Fixed workflow runs-on parameter. Added disable con…
Tiihott Jul 4, 2024
017932e
Merge pull request #11 from Tiihott/devbranch_idempotent
StrongestNumber9 Jul 4, 2024
882f233
Downgraded spotless to 2.30.0
Tiihott Jul 4, 2024
38a0e38
Downgraded spotless to 2.30.0
Tiihott Jul 4, 2024
dfdd034
Added exclusion for AVRO-generated file.
Tiihott Jul 4, 2024
7411708
Added exclusion to .gitignore
Tiihott Jul 4, 2024
1307439
Added exclusion for AVRO-generated file.
Tiihott Jul 4, 2024
578ddb0
Added exclusion to .gitignore
Tiihott Jul 4, 2024
c4e122f
Fix to rpm packaging configuration
Tiihott Jul 4, 2024
f0def37
Merge pull request #27 from Tiihott/devbranch_idempotent
StrongestNumber9 Jul 4, 2024
221bfde
Added configuration files to rpm packaging.
Tiihott Jul 4, 2024
0f50aa7
Adds systemd service file, new service user to rpm
StrongestNumber9 Jul 4, 2024
a44b70e
Fix for issue #32 by enabling automatic TGT renewal.
Tiihott Aug 6, 2024
2742be9
Replaced hadoop Configuration class usage with the child class HdfsCo…
Tiihott Aug 7, 2024
60cc06c
Added handling of NULL records and un-parseable records. (#35)
Tiihott Aug 20, 2024
18de5a1
apply spotless (#38)
kortemik Aug 20, 2024
220d0d5
Added encryption flags for configuration (#40)
Tiihott Aug 20, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 10 additions & 10 deletions .github/ISSUE_TEMPLATE/bug_report.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,23 +9,23 @@ assignees: ''

**Describe the bug**
<!-- Please explain what happened and provide the context in which the bug occurred. -->

**Expected behavior**
<!-- Please tell us why you think the behavior is unexpected. If you can, please copy-paste logs or error messages you got while facing the bug. -->

**How to reproduce**
<!-- Please provide us reproducible step-by-step guide on how to reproduce the bug. Please keep in mind that non-reproducible issues will be closed. -->

**Screenshots**
<!-- If applicable, add screenshots to help explain your problem. -->

**Software version**
<!-- e.g. 3.1.4 -->

**Desktop (please complete the following information if relevant):**
- OS: <!-- [e.g. iOS] -->
- Browser: <!-- [e.g. chrome, safari] -->
- Version: <!-- [e.g. 22] -->
- OS: <!-- [e.g. iOS] -->
- Browser: <!-- [e.g. chrome, safari] -->
- Version: <!-- [e.g. 22] -->

**Additional context**
<!-- Add any other context about the problem here. -->
<!-- Add any other context about the problem here. -->
4 changes: 2 additions & 2 deletions .github/ISSUE_TEMPLATE/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ contact_links:
about: Problems with Teragrep documentation
- name: Ask a question or get support
url: https://github.com/teragrep/cfe_39/discussions
about: Ask a question or request support
about: Ask a question or request support
- name: Report vulnerability
url: https://github.com/teragrep/teragrep/security/advisories/new
about: Privately report a security vulnerability
about: Privately report a security vulnerability
20 changes: 20 additions & 0 deletions .github/ISSUE_TEMPLATE/feature_requests.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
---
name: Feature request
about: Suggest an idea for this project
title: ''
labels: enhancement
assignees: ''

---

**Description**
<!-- Please describe shortly your feature idea. -->

**Use case or motivation behind the feature request**
<!-- Please tell us what you would like to happen. Rather than explaining the implementation process, we would appreciate to hear what you are trying to achieve with your feature. -->

**Related issues**
<!-- If there are any, please list issues that are associated with your feature request. -->

**Additional context**
<!-- Add any other context or screenshots about the feature request here. -->
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/tasks-and-meta.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,4 @@ assignees: ''
---

**Description**
<!-- Add a short description and screen shots if needed. -->
<!-- Add a short description and screen shots if needed. -->
43 changes: 43 additions & 0 deletions .github/workflows/upload_release.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
name: Upload Release

on:
release:
types: [published]

jobs:
upload:
name: Upload
runs-on: ubuntu-latest
permissions:
contents: write

steps:
- uses: actions/checkout@v3
with:
fetch-depth: 0

- name: Set up JDK 8
uses: actions/setup-java@v3
with:
java-version: '8'
distribution: 'temurin'
server-id: github
settings-path: ${{ github.workspace }}


- name: Package jar
run: mvn --batch-mode -Drevision=${{ github.event.release.tag_name }} -Dsha1= -Dchangelist= clean package
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

- name: Package rpm
run: cd rpm/ && mvn --batch-mode -Drevision=${{ github.event.release.tag_name }} -Dsha1= -Dchangelist= -f rpm.pom.xml package
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

- name: Attach rpm to release
uses: softprops/action-gh-release@v1
with:
files: |
rpm/target/rpm/com.teragrep-cfe_39/RPMS/noarch/com.teragrep-cfe_39-*.noarch.rpm
target/cfe_39-*-jar-with-dependencies.jar
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,5 @@ buildNumber.properties
.project
# JDT-specific (Eclipse Java Development Tools)
.classpath

src/main/java/com/teragrep/cfe_39/avro/SyslogRecord.java
41 changes: 41 additions & 0 deletions README.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@

# CFE_39

This is a HDFS Data Ingestion module for PTH_06 use.

## Features

Implements almost real-time datasource that allows reading latest data from Kafka (last few records), semi-latest data from HDFS (Last Few Days) and old data from S3 Archive.

## Documentation

See the official documentation on https://docs.teragrep.com[docs.teragrep.com].

## How to [compile/use/implement]

`mvn clean package`

application.properties, config.jaas and log4j2.properties files have to be created to use this module.
By default, application.properties file must be placed in /opt/teragrep/cfe_39/etc/ directory.
The application.properties is used to define the directory where the other files must be placed.

Example configuration files available in cfe_39/rpm/resources/ directory.

## Contributing

You can involve yourself with our project by https://github.com/teragrep/cfe_39/issues/new/choose[opening an issue] or submitting a pull request.

Contribution requirements:

. *All changes must be accompanied by a new or changed test.* If you think testing is not required in your pull request, include a sufficient explanation as why you think so.
. Security checks must pass
. Pull requests must align with the principles and http://www.extremeprogramming.org/values.html[values] of extreme programming.
. Pull requests must follow the principles of Object Thinking and Elegant Objects (EO).

Read more in our https://github.com/teragrep/teragrep/blob/main/contributing.adoc[Contributing Guideline].

### Contributor License Agreement

Contributors must sign https://github.com/teragrep/teragrep/blob/main/cla.adoc[Teragrep Contributor License Agreement] before a pull request is accepted to organization's repositories.

You need to submit the CLA only once. After submitting the CLA you can contribute to all Teragrep's repositories.
Loading