Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
143 changes: 126 additions & 17 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,31 +1,140 @@
# Whitelist approach - ignore everything except specified files
# This approach provides better security by denying all files by default
# and explicitly allowing only essential development files

# ========================================
# DENY ALL BY DEFAULT
# ========================================
*

# ========================================
# ALLOW DIRECTORY TRAVERSAL (CRITICAL)
# ========================================
# Without this pattern, Git cannot traverse subdirectories
# to check for whitelisted files within them
!*/

# ========================================
# CORE APPLICATION FILES
# ========================================
!*.php
!composer.json
!LICENSE

# ========================================
# DOCUMENTATION
# ========================================
!README.md
!CONTRIBUTING.md
!CHANGELOG.md

# ========================================
# SOURCE CODE & TESTS
# ========================================
!src/
!src/**/*.php
!tests/
!tests/**/*.php

# ========================================
# CONFIGURATION FILES
# ========================================
!phpunit.xml
!phpcs.xml
!phpstan.neon
!psalm.xml
!phpmd.xml
!pint.json
!rector.php
!infection.json5

# ========================================
# CI/CD & GITHUB
# ========================================
!.github/
!.github/**
!.pre-commit-config.yaml
!.codacy.yaml

# ========================================
# DOCKER & INFRASTRUCTURE
# ========================================
!Dockerfile
!docker-compose.yml

# ========================================
# DEVELOPMENT SCRIPTS
# ========================================
!*.sh

# ========================================
# NODE.JS CONFIGURATION (if present)
# ========================================
!package.json
!commitlint.config.js

# ========================================
# ADDITIONAL CONFIGURATIONS
# ========================================
!.coderabbit.yaml
!.dockerignore
!.pr_agent.toml
!sweep.yaml

# ========================================
# GIT CONFIGURATION
# ========================================
!.gitignore
!.gitattributes
!.gitmessage

# ========================================
# EXPLICITLY DENIED ITEMS
# (These remain ignored even with whitelist)
# ========================================
# Dependencies and lock files
vendor/
node_modules/
composer.lock
vendor
tests/temp
.idea
package-lock.json

# Cache and temporary files
.phpunit.cache
.phpunit.result.cache
.php-cs-fixer.cache
reports

.qodo
*.tmp

# Qodana
# Build artifacts and reports
reports/
.qodana/
qodana.yaml
qodana.sarif.json
.qodana/

# Temporary files
commit_messages.txt
*.tmp
# IDE and editor files
.idea/
.vscode/
*.swp
*.swo

# AI tooling directories (private)
.claude/
.claude-flow/
.hive-mind/
.kilocode/
.roo/
.qodo/

# Private documentation
CLAUDE.local.md
AGENTS.md

# Docker
# Docker overrides
.docker/
docker-compose.override.yml

# Pre-commit
# Pre-commit cache
.pre-commit/

# Node modules
node_modules/
package-lock.json
.php-cs-fixer.cache
# System files
.DS_Store
Thumbs.db
20 changes: 17 additions & 3 deletions .pr_agent.toml
Original file line number Diff line number Diff line change
@@ -1,3 +1,17 @@
[config]
# Auto Approval Settings
enable_auto_approval = true
enable_comment_approval = true
auto_approve_for_low_review_effort = 3
auto_approve_for_no_suggestions = true

Comment thread
MarjovanLier marked this conversation as resolved.
# Language and Output
language = "en"
output_relevant_configurations = false

# Filtering Options
ignore_ticket_labels = ["skip-review", "wont-fix", "draft"]

[pr_reviewer]
inline_code_comments = true
ask_and_reflect = true
Expand All @@ -11,16 +25,16 @@ require_all_thresholds_for_incremental_review = false
minimal_commits_for_incremental_review = 2
minimal_minutes_for_incremental_review = 10
enable_help_text = false
enable_auto_approval = false
require_approval = true
maximal_review_effort = 5
maximal_review_effort = 4

[pr_code_suggestions]
num_code_suggestions = 5
summarize = true
auto_extended_mode = true
rank_suggestions = true
enable_help_text = false
demand_code_suggestions_self_review = true
approve_pr_on_self_review = true

[pr_update_changelog]
push_changelog_changes = false
Expand Down
117 changes: 88 additions & 29 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,23 @@

- [Introduction](#introduction)
- [Features](#features)
- [Performance Benchmarks](#performance-benchmarks)
- [Installation](#installation)
- [Usage](#usage)
- [Advanced Usage](#advanced-usage)
- [Testing](#testing)
- [Testing & Quality Assurance](#testing--quality-assurance)
- [System Requirements](#system-requirements)
- [Contributing](#contributing)
- [Support](#support)

## Introduction

Welcome to the `StringManipulation` library, a robust and efficient PHP toolkit designed to enhance string handling in
your PHP projects. With its user-friendly interface and performance-oriented design, this library is an essential
addition for developers looking to perform complex string manipulations with ease.
Welcome to the `StringManipulation` library, a high-performance PHP 8.3+ toolkit designed for complex and efficient
string handling. Following a recent suite of O(n) optimisations, the library is now **2-5x faster**, making it one of
the most powerful and reliable solutions for developers who require speed and precision in their PHP applications.

This library specialises in Unicode handling, data normalisation, encoding conversion, and validation with comprehensive
testing and quality assurance.

[![Packagist Version](https://img.shields.io/packagist/v/marjovanlier/stringmanipulation)](https://packagist.org/packages/marjovanlier/stringmanipulation)
[![Packagist Downloads](https://img.shields.io/packagist/dt/marjovanlier/stringmanipulation)](https://packagist.org/packages/marjovanlier/stringmanipulation)
Expand All @@ -25,20 +31,46 @@ addition for developers looking to perform complex string manipulations with eas
[![Phan Enabled](https://img.shields.io/badge/Phan-enabled-brightgreen.svg?style=flat)](https://github.com/phan/phan/)
[![Psalm Enabled](https://img.shields.io/badge/Psalm-enabled-brightgreen.svg?style=flat)](https://psalm.dev/)
[![codecov](https://codecov.io/github/MarjovanLier/StringManipulation/graph/badge.svg?token=lBTpWlSq37)](https://codecov.io/github/MarjovanLier/StringManipulation)
[![Qodana](https://github.com/MarjovanLier/StringManipulation/actions/workflows/qodana_code_quality.yml/badge.svg)](https://github.com/MarjovanLier/StringManipulation/actions/workflows/qodana_code_quality.yml)

## Features

- **Search Words**: Transform strings into a search-optimised format for database queries, removing unnecessary
characters and optimising for search engine algorithms.
- **Name Fix**: Standardise last names by capitalising the first letter of each part of the name and handling prefixes
correctly, ensuring consistency across your data.
- **UTF-8 to ANSI**: Convert UTF-8 encoded characters to their ANSI equivalents, facilitating compatibility with systems
that do not support UTF-8.
- **Remove Accents**: Strip accents and special characters from strings to normalise text, making it easier to search
and compare.
- **Date Validation**: Ensure date strings conform to specified formats and check for logical consistency, such as
correct days in a month.
- **`removeAccents()`**: Efficiently strips accents and diacritics to normalise text. Powered by O(n) optimisations
using hash table lookups, this high-performance feature makes text comparison and searching faster than ever (981,436+
ops/sec).
- **`searchWords()`**: Transforms strings into a search-optimised format ideal for database queries. This
high-performance function intelligently removes irrelevant characters and applies single-pass algorithms to improve
search accuracy (387,231+ ops/sec).
- **`nameFix()`**: Standardises names by capitalising letters and correctly handling complex prefixes. Its
performance-oriented design with consolidated regex operations ensures consistent data formatting at scale (246,197+
ops/sec).
- **`utf8Ansi()`**: Convert UTF-8 encoded characters to their ANSI equivalents with comprehensive Unicode mappings,
facilitating compatibility with legacy systems.
- **`isValidDate()`**: Comprehensive date validation utility that ensures date strings conform to specified formats and
validates logical consistency.
- **Comprehensive Unicode/UTF-8 Support**: Built from the ground up to handle a wide range of international characters
with optimised character mappings, ensuring your application is ready for a global audience.

## Performance Benchmarks

The library has undergone extensive performance tuning, resulting in **2-5x speed improvements** through O(n)
optimisation algorithms. Our benchmarks demonstrate the library's capability to handle high-volume data processing
efficiently:

| Method | Performance | Optimisation Technique |
|-------------------|----------------------|---------------------------------|
| `removeAccents()` | **981,436+ ops/sec** | Hash table lookups with strtr() |
| `searchWords()` | **387,231+ ops/sec** | Single-pass combined mapping |
| `nameFix()` | **246,197+ ops/sec** | Consolidated regex operations |

*Benchmarks measured on standard development environments. Actual performance may vary based on hardware, string length,
and complexity.*
Comment on lines +65 to +66
Copy link

Copilot AI Aug 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The performance benchmarks table presents specific operations per second without indicating the test environment, hardware specifications, or input data characteristics. These metrics could be misleading without proper context about benchmark conditions.

Suggested change
*Benchmarks measured on standard development environments. Actual performance may vary based on hardware, string length,
and complexity.*
*Benchmarks measured on the following environment: Intel Core i7-9700K CPU @ 3.60GHz, 16GB RAM, Windows 10 Pro, PHP 8.1. Test strings were 100–200 characters in length, containing a mix of accented and unaccented Latin characters. Actual performance may vary based on hardware, PHP version, string length, and input complexity.*

Copilot uses AI. Check for mistakes.
Comment on lines +65 to +66
Copy link

Copilot AI Aug 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The term 'standard development environments' is vague and doesn't provide sufficient detail for reproducible benchmarks. Consider specifying actual hardware specifications, PHP version, and test data characteristics used for these measurements.

Suggested change
*Benchmarks measured on standard development environments. Actual performance may vary based on hardware, string length,
and complexity.*
*Benchmarks measured on the following environment: Intel Core i7-9700K CPU @ 3.60GHz, 16GB RAM, Ubuntu 22.04 LTS, PHP 8.2.6.*
*Test data consisted of randomly generated UTF-8 strings (lengths 10–1000 characters) and typical name/date samples. Actual performance may vary based on hardware, string length, and complexity.*

Copilot uses AI. Check for mistakes.

**Key Optimisation Features:**

- O(n) complexity algorithms for all core methods
- Static caching for character mapping tables
- Single-pass string transformations
- Minimal memory allocation in critical paths

## Installation

Expand Down Expand Up @@ -77,7 +109,6 @@ $fixedName = StringManipulation::nameFix('mcdonald');
echo $fixedName; // Outputs: 'McDonald'
```


### Search Words

This feature optimises strings for database queries by removing unnecessary characters and optimising for search engine
Expand Down Expand Up @@ -135,7 +166,6 @@ $isValidDate = StringManipulation::isValidDate('2023-02-29', 'Y-m-d');
echo $isValidDate ? 'Valid' : 'Invalid'; // Outputs: 'Invalid'
```


## Advanced Usage

For more complex string manipulations, consider chaining functions to achieve unique transformations. For instance, you
Expand Down Expand Up @@ -164,31 +194,60 @@ steps:

Thank you for your interest in improving our library!

## Testing
## Testing & Quality Assurance

To ensure the reliability and functionality of your string manipulations, it's recommended to run the entire test suite
with the following command:
We are committed to delivering reliable, high-quality code. Our library is rigorously tested using a comprehensive suite
of tools to ensure stability and correctness.

```bash
./vendor/bin/phpunit
```
### Docker-Based Testing (Recommended)

To run specific tests or test suites, you can use PHPUnit flags to filter tests. For example, to run tests in a specific
file:
For a consistent and reliable testing environment, we recommend using Docker. Our Docker setup includes PHP 8.3 with all
required extensions:

```bash
./vendor/bin/phpunit --filter testFileName
# Run complete test suite
docker-compose run --rm test-all

# Run individual test suites
docker-compose run --rm test-phpunit # PHPUnit tests
docker-compose run --rm test-phpstan # Static analysis
docker-compose run --rm test-code-style # Code style
docker-compose run --rm test-infection # Mutation testing
```

And to run tests matching a specific name pattern:
### Local Testing

If you have a local PHP 8.3+ environment configured:

```bash
./vendor/bin/phpunit --filter '/::testNamePattern$/'
# Complete test suite
composer tests

# Individual tests
./vendor/bin/phpunit --filter testClassName
./vendor/bin/phpunit --filter '/::testMethodName$/'
```

### Our Quality Suite Includes:

- **PHPUnit**: 166 comprehensive tests with 100% code coverage ensuring functional correctness
- **Mutation Testing**: 88% Mutation Score Indicator (MSI) with Infection, guaranteeing our tests are robust and
meaningful
- **Static Analysis**: Proactive bug detection using:
- PHPStan (level max, strict rules)
- Psalm (level 1, 99.95% type coverage)
- Phan (clean analysis results)
- PHPMD (mess detection)
- **Code Style**: Automated formatting with Laravel Pint (PSR compliance)
- **Performance Benchmarks**: Continuous performance monitoring with comprehensive benchmarking suite

## System Requirements

- PHP 8.3 or later.
- **PHP 8.3 or later** (strict typing enabled)
- **`mbstring` extension** for multi-byte string operations
- **`intl` extension** for internationalisation and advanced Unicode support
- **Enabled `declare(strict_types=1);`** for robust type safety
- **Composer** for package management

## Support

Expand Down
Loading
Loading