Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
135 changes: 135 additions & 0 deletions LOCKUP_QUICK_REFERENCE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
# Service Lockup - Quick Reference Card

## 🚨 When Services Lock Up

### Step 1: Capture Diagnostics (DO THIS FIRST!)
```bash
./scripts/debug-lockup.sh
```
**DO NOT RESTART SERVICES UNTIL AFTER RUNNING THIS!**

### Step 2: Review the Report
```bash
# Find the latest report
ls -lt debug-logs/ | head -5

# View it
cat debug-logs/lockup-YYYYMMDD-HHMMSS.log
```

### Step 3: Restart Services
```bash
# Development
docker-compose restart

# Production
docker-compose -f docker-compose.prod.yml restart
```

---

## 📊 Continuous Monitoring

### Start Health Monitor
```bash
# Run in background (checks every 60 seconds)
./scripts/monitor-health.sh &

# Or with custom interval
./scripts/monitor-health.sh 30 &
```

### View Monitor Logs
```bash
tail -f logs/health-monitor.log
```

### Stop Monitor
```bash
pkill -f monitor-health.sh
```

---

## 🔍 What to Look For in Debug Reports

- **OOM Events**: Out of memory kills
- **High CPU/Memory**: Near 100% usage
- **DB Connections**: Count near max_connections
- **Long Queries**: Queries running for minutes
- **DB Locks**: Ungranted locks blocking operations
- **MQTT Issues**: Port not listening or process dead
- **Network Failures**: Services can't reach each other
- **Error Patterns**: Repeated errors in logs

---

## 🛠️ Quick Fixes

### Memory Issues
```bash
# Check memory usage
docker stats --no-stream

# Clean up Docker
docker system prune -a
```

### Disk Space Issues
```bash
# Check disk space
df -h

# Clean up old logs
find ./logs -name "*.log" -mtime +7 -delete
find ./debug-logs -name "*.log" -mtime +7 -delete
```

### Database Issues
```bash
# Check connections
docker-compose exec -T postgres psql -U meshtastic -d meshtastic_mapper -c "SELECT count(*) FROM pg_stat_activity;"

# Check long queries
docker-compose exec -T postgres psql -U meshtastic -d meshtastic_mapper -c "SELECT pid, now() - query_start as duration, query FROM pg_stat_activity WHERE state = 'active' ORDER BY duration DESC LIMIT 5;"
```

### MQTT Issues
```bash
# Check MQTT status
docker-compose exec mosquitto sh -c "ps aux | grep mosquitto"

# Check MQTT port
docker-compose exec mosquitto sh -c "netstat -tlnp | grep 1883"
```

---

## 📚 Full Documentation

- **Complete Guide**: `docs/DEBUGGING_SERVICE_LOCKUPS.md`
- **Implementation Summary**: `docs/fixes/SERVICE_LOCKUP_DEBUGGING.md`
- **Debug Script**: `scripts/debug-lockup.sh`
- **Monitor Script**: `scripts/monitor-health.sh`

---

## 💡 Prevention Tips

1. ✅ Resource limits now configured in docker-compose.yml
2. ✅ MQTT connection limits configured (max: 1000)
3. ✅ Health monitoring scripts available
4. 🔄 Run health monitor continuously in production
5. 🔄 Set up log rotation
6. 🔄 Review debug reports after each lockup

---

## 📞 Reporting Issues

When reporting lockup issues, include:
1. Debug report from `debug-logs/`
2. Health monitor logs (if running)
3. What was happening when lockup occurred
4. Frequency of lockups
5. Any recent changes
13 changes: 11 additions & 2 deletions TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,23 +5,32 @@
| Priority | Description | Status |
|----------|-------------|--------|
| High | Decryption and Protobuf decoding are not working properly | ✅ Complete - Fixed encryption algorithm, nonce handling, and key management |
| High | Services locking up and requiring restart | 🔧 In Progress - Debug tools created, resource limits added |
| Medium | Network topology graph link is not working, it takes the user to the map | Complete |
| Medium | Map center on user is not working | Non-Issue |
| Medium | Map center on user is not working | ✅ Complete - Fixed MUI Tooltip warning |
| Medium | Map startup on user location not working | Non-Issue |
| Low | Hardware types are not complete and may even be wrong | Complete |
| Medium | Device Telemetry do not appear to be saving | 🔧 Fixed - Added enhanced logging, needs testing |
| Medium | Device Neighbors not being recorded | 🔧 Fixed - Added NeighborInfo parsing and storage, needs testing |
| Low | Hardware names are not being proeprly shown on small node details window | Fixed |
| Low | Hardware names are not being properly shown on large node details window in overview and details tabs | Fixed |
| Low | In Node detail window on the Lora Config tab, There is a blue box at the bottom that is off the window | Not Started |
| Medium | Cluster count icons are not working correctly when you click on them or zoom in on the map | Not Started |

## Incomplete Features

| Priority | Description | Status |
|----------|-------------|--------|
| High | Docker deployment not complete | Not Started |
| Medium | MQTT monitor statistics has issues like rounding, no numbers in messages by type and top nodes are showing the decimal value | Complete |
| Medium | MQTT monitor statistics has issues like rounding, no numbers in messages by type and top nodes are showing the decimal value | Complete - Fixed decryption failures count and messages per minute calculation |

## Changes

| Priority | Description | Status |
|----------|-------------|--------|
| Low | About window needs restructured to represent the application and move about meshtastic down below the about for the application, it should link to the github repo for the application. The system information at the bottom is not properly represented. | ✅ Complete |
| Medium | Make node icons and cluster icons larger | Not Started |
| Low | Add option to map options to show node name on map | Not Started |

## New Features

Expand Down
2 changes: 1 addition & 1 deletion backend/package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "meshtastic-node-mapper-backend",
"version": "1.0.2",
"version": "1.0.3",
"description": "Backend API for Meshtastic Node Mapper",
"main": "dist/index.js",
"scripts": {
Expand Down
File renamed without changes.
6 changes: 3 additions & 3 deletions backend/src/middleware/rateLimiting.ts
Original file line number Diff line number Diff line change
Expand Up @@ -122,21 +122,21 @@ export const rateLimiters = {
// Read operations - more lenient
read: createApiKeyAwareRateLimiter({
windowMs: 60 * 60 * 1000, // 1 hour
max: process.env.NODE_ENV === 'development' ? 10000 : 5000, // Higher limit in dev
max: process.env.NODE_ENV === 'development' ? 50000 : 5000, // Much higher limit in dev
message: 'Too many read requests. Please try again later.'
}),

// Write operations - more restrictive
write: createApiKeyAwareRateLimiter({
windowMs: 60 * 60 * 1000, // 1 hour
max: 500,
max: process.env.NODE_ENV === 'development' ? 5000 : 500, // Higher limit in dev
message: 'Too many write requests. Please try again later.'
}),

// Real-time data endpoints - very lenient for legitimate use
realtime: createApiKeyAwareRateLimiter({
windowMs: 60 * 1000, // 1 minute
max: process.env.NODE_ENV === 'development' ? 500 : 200, // Higher limit in dev
max: process.env.NODE_ENV === 'development' ? 5000 : 200, // Much higher limit in dev
message: 'Too many real-time requests. Please slow down.'
}),

Expand Down
114 changes: 107 additions & 7 deletions backend/src/services/mqtt-manager.service.ts
Original file line number Diff line number Diff line change
Expand Up @@ -254,14 +254,45 @@ export class MQTTManagerService extends EventEmitter {

// Store telemetry data
if (data.telemetry) {
await tx.telemetryReading.create({
data: {
...data.telemetry,
nodeId: node.id,
data: data.telemetry.data as any // Cast to satisfy Prisma JSON type
try {
await tx.telemetryReading.create({
data: {
...data.telemetry,
nodeId: node.id,
data: data.telemetry.data as any // Cast to satisfy Prisma JSON type
}
});
logger.info(`Stored ${data.telemetry.type} telemetry for node: ${data.nodeId}`);

// Also update the node's telemetry fields for quick access
if (data.telemetry.type === 'DEVICE_METRICS' && data.telemetry.data) {
const metrics = data.telemetry.data as any;
const updateData: any = {};

if (metrics.batteryLevel !== undefined) {
updateData.batteryLevel = metrics.batteryLevel;
}
if (metrics.voltage !== undefined) {
updateData.voltage = metrics.voltage;
}
if (metrics.channelUtilization !== undefined) {
updateData.channelUtilization = metrics.channelUtilization;
}
if (metrics.airUtilTx !== undefined) {
updateData.airUtilTx = metrics.airUtilTx;
}

if (Object.keys(updateData).length > 0) {
await tx.node.update({
where: { id: node.id },
data: updateData
});
logger.debug(`Updated node ${data.nodeId} with latest device metrics`);
}
}
});
logger.debug(`Stored telemetry for node: ${data.nodeId}`);
} catch (error) {
logger.error(`Failed to store telemetry for node ${data.nodeId}:`, error);
}
}

// Store message data
Expand Down Expand Up @@ -301,6 +332,75 @@ export class MQTTManagerService extends EventEmitter {
});
logger.debug(`Stored message from node: ${data.nodeId}`);
}

// Store neighbor data
if (data.neighbors && data.neighbors.length > 0) {
logger.debug(`Processing ${data.neighbors.length} neighbors for node: ${data.nodeId}`);

for (const neighborData of data.neighbors) {
// Find or create the neighbor node
let neighborNode = await tx.node.findUnique({
where: { nodeId: neighborData.neighborId }
});

// If neighbor node doesn't exist, create a minimal entry
if (!neighborNode) {
try {
neighborNode = await tx.node.create({
data: {
nodeId: neighborData.neighborId,
hexId: neighborData.neighborId.replace('!', ''),
networkId,
isOnline: true,
mqttConnected: false
}
});
logger.debug(`Created neighbor node: ${neighborData.neighborId}`);
} catch (error: any) {
// Handle race condition
if (error.code === 'P2002') {
neighborNode = await tx.node.findUnique({
where: { nodeId: neighborData.neighborId }
});
} else {
logger.error(`Failed to create neighbor node ${neighborData.neighborId}:`, error);
continue;
}
}
}

if (!neighborNode) {
logger.warn(`Could not create or find neighbor node: ${neighborData.neighborId}`);
continue;
}

// Upsert the neighbor relationship
try {
await tx.nodeNeighbor.upsert({
where: {
nodeId_neighborId: {
nodeId: node.id,
neighborId: neighborNode.id
}
},
update: {
snr: neighborData.snr,
lastHeard: neighborData.lastHeard,
updatedAt: new Date()
},
create: {
nodeId: node.id,
neighborId: neighborNode.id,
snr: neighborData.snr,
lastHeard: neighborData.lastHeard
}
});
logger.debug(`Stored neighbor relationship: ${data.nodeId} -> ${neighborData.neighborId}`);
} catch (error) {
logger.error(`Failed to store neighbor relationship for ${data.nodeId} -> ${neighborData.neighborId}:`, error);
}
}
}
}, {
maxWait: 5000, // Maximum time to wait for a transaction slot
timeout: 30000, // Maximum time for the transaction to complete
Expand Down
Loading
Loading