Skip to content

OjasSinh/predict-data-types

 
 

Repository files navigation

Predict Data Types

npm version License: MIT

The Problem

When users upload CSV or JSON files, everything arrives as strings.

TypeScript and JavaScript can't help you here:

// ❌ TypeScript only knows static types
const userInput = "test@example.com"; // TypeScript thinks: string
const csvValue = "2024-01-01"; // TypeScript thinks: string
const formData = "42"; // TypeScript thinks: string

// TypeScript CANNOT detect these are email, date, and number at runtime

This library solves that problem with runtime type detection:

const { infer } = require("predict-data-types");

infer("test@example.com"); // → 'email' ✅
infer("2024-01-01"); // → 'date' ✅
infer("42"); // → 'number' ✅

infer(["true", "false", "true"]);
// → 'boolean' ✅

infer({ name: "Alice", age: "25", email: "alice@example.com" });
// → { name: 'string', age: 'number', email: 'email' } ✅

infer([
  { name: "Alice", age: "25" },
  { name: "Bob", age: "30" },
]);
// → { name: 'string', age: 'number' } ✅

One smart function. Any input type.


Zero-dependency package for automatic data type detection from strings, arrays, and JSON objects. Detects 18+ data types including primitives, emails, URLs, UUIDs, dates, IPs, colors, percentages, hashtags, mentions, and currency.

💡 Important: This library performs runtime type detection on string values, not static type checking. TypeScript is a compile-time type system for your code structure - this library analyzes actual data content at runtime. They solve completely different problems!

Features

  • Smart Type Inference: One infer() function handles strings, arrays, objects, and arrays of objects
  • 18 Data Types: Primitives plus emails, URLs, UUIDs, dates, IPs, colors, percentages, currency, hashtags, MAC addresses, mentions, CRON, and hashes
  • JSON Schema Generation: Automatically generate JSON Schema from objects (compatible with Ajv, etc.)
  • Type Constants: Use DataTypes for type-safe comparisons instead of string literals
  • CSV Support: Parse comma-separated values with optional headers
  • Zero Dependencies: Completely standalone, no external packages
  • TypeScript Support: Full type definitions included
  • 45+ Date Formats: Comprehensive date parsing including month names and timezones
  • Battle-Tested: 75+ comprehensive test cases

Installation

npm install predict-data-types

Quick Examples

Real-world use cases showing what you can build:

📊 CSV Import Tool

// Auto-detect column types and transform data
const employees = parseCSV(file); // All values are strings
const schema = infer(employees);
// → { name: 'string', email: 'email', salary: 'currency', hire_date: 'date' }

🎨 Form Builder

// Auto-generate form fields with correct input types
const userData = { email: 'alice@example.com', age: '25', website: 'https://alice.dev' };
const types = infer(userData);
// → { email: 'email', age: 'number', website: 'url' }
// Generate: <input type="email">, <input type="number">, <input type="url">

🌐 API Analyzer

// Generate JSON Schema and TypeScript interfaces from API responses
const response = await fetch('/api/users').then(r => r.json());
const jsonSchema = infer(response, Formats.JSONSCHEMA);
// Use with Ajv, joi, or generate TypeScript types

✅ Data Validator

// Validate imported data quality
const expected = { email: DataTypes.EMAIL, age: DataTypes.NUMBER };
const actual = infer(importedData);
// Detect mismatches, missing fields, wrong types

👉 See full runnable examples in examples/ directory

Supported Data Types

Type Examples
string 'John', 'Hello World'
number 42, 3.14, -17, 1e10
boolean true, false, yes, no
email user@example.com
phone 555-555-5555, (555) 555-5555
url https://example.com
uuid 550e8400-e29b-41d4-a716-446655440000
date 2023-12-31, 31/12/2023
ip 192.168.1.1, 2001:0db8::1
macaddress 00:1B:63:84:45:E6, 00-1B-63-84-45-E6
color #FF0000, #fff
percentage 50%, -25%
currency $100, €50.99
hashtag #hello, #OpenSource, #dev_community
mention @username, @user_name123, @john-doe
cron 0 0 * * *, */5 * * * *, 0 9-17 * * 1-5
array [1, 2, 3]
object {"name": "John"}
mime:image image/png, image/jpeg, image/gif, image/svg+xml
mime:application application/json, application/pdf, application/zip
mime:text text/html, text/plain, text/css, text/javascript
mime:media video/mp4, audio/mpeg

Usage

Type Constants (Recommended)

Use DataTypes constants instead of string literals for type-safe comparisons:

const { infer, DataTypes } = require("predict-data-types");

const type = infer("test@example.com");

// ✅ Type-safe with constants
if (type === DataTypes.EMAIL) {
  console.log("Valid email!");
}

// ❌ Avoid string literals (error-prone)
if (type === 'email') { ... }

// All available constants:
DataTypes.STRING      // 'string'
DataTypes.NUMBER      // 'number'
DataTypes.BOOLEAN     // 'boolean'
DataTypes.EMAIL       // 'email'
DataTypes.PHONE       // 'phone'
DataTypes.URL         // 'url'
DataTypes.UUID        // 'uuid'
DataTypes.DATE        // 'date'
DataTypes.ARRAY       // 'array'
DataTypes.OBJECT      // 'object'
DataTypes.IP          // 'ip'
DataTypes.MACADDRESS  // 'macaddress'
DataTypes.COLOR       // 'color'
DataTypes.PERCENTAGE  // 'percentage'
DataTypes.CURRENCY    // 'currency'
DataTypes.MENTION     // 'mention'
DataTypes.CRON        // 'cron'
DataTypes.HASHTAG     // 'hashtag'
// MIME types
DataTypes.MIME_IMAGE        // 'mime:image'
DataTypes.MIME_APPLICATION  // 'mime:application'
DataTypes.MIME_TEXT         // 'mime:text'
DataTypes.MIME_MEDIA        // 'mime:media'

Basic Example

const predictDataTypes = require("predict-data-types");

const text = "John, 30, true, john@example.com, 2023-01-01";
const types = predictDataTypes(text);

console.log(types);
// {
//   'John': 'string',
//   '30': 'number',
//   'true': 'boolean',
//   'john@example.com': 'email',
//   '2023-01-01': 'date'
// 'image/png': 'mime:image',
// 'application/json': 'mime:application',
// 'text/html': 'mime:text',
// }

Smart infer() Function

The infer() function automatically adapts to any input type:

const { infer, DataTypes } = require("predict-data-types");

// Single value → DataType
infer("2024-01-01"); // → 'date'
infer("test@example.com"); // → 'email'
infer("@username"); // → 'mention'
infer("42"); // → 'number'
infer("#OpenSource"); // → 'hashtag'
infer(["#dev", "#opensource", "#community"]); // → 'hashtag'
// MIME types
infer('image/png');          // 'mime:image'
infer('image/jpeg');         // 'mime:image'
infer('application/json');   // 'mime:application'
infer('application/pdf');    // 'mime:application'
infer('text/html');          // 'mime:text'
infer('text/css');           // 'mime:text'
infer('video/mp4');          // 'mime:media'
infer('audio/mpeg');         // 'mime:media'

// Ambiguous 3-char values (can be hex color or hashtag)
infer("#bad"); // → 'color' (default: hex takes priority)
infer("#bad", "none", { preferHashtagOver3CharHex: true }); // → 'hashtag'

// Array of values → Common DataType
infer(["1", "2", "3"]); // → 'number'
infer(["true", "false", "yes"]); // → 'boolean'

// Object → Schema
infer({
  name: "Alice",
  age: "25",
  active: "true",
});
// → { name: 'string', age: 'number', active: 'boolean' }

// Array of objects → Schema
infer([
  { name: "Alice", age: "25", email: "alice@example.com" },
  { name: "Bob", age: "30", email: "bob@example.com" },
]);

// → { name: 'string', age: 'number', email: 'email' }

JSON Schema Format

Generate standard JSON Schema for validation libraries (Ajv, etc.):

const { infer, Formats } = require("predict-data-types");

const data = {
  name: "Alice",
  age: "25",
  email: "alice@example.com",
  website: "https://example.com",
};

// Simple format (default)
infer(data);
// → { name: 'string', age: 'number', email: 'email', website: 'url' }

// JSON Schema format
infer(data, Formats.JSONSCHEMA);
// → {
//     type: 'object',
//     properties: {
//       name: { type: 'string' },
//       age: { type: 'number' },
//       email: { type: 'string', format: 'email' },
//       website: { type: 'string', format: 'uri' }
//     },
//     required: ['name', 'age', 'email', 'website']
//   }

// Use with validation libraries
const Ajv = require("ajv");
const ajv = new Ajv();

const schema = infer(data, Formats.JSONSCHEMA);
const validate = ajv.compile(schema);
const valid = validate({
  name: "Bob",
  age: 30,
  email: "bob@example.com",
  website: "https://bob.dev",
});

CSV with Headers

const csvData = `name,age,active,email
John,30,true,john@example.com`;

const types = predictDataTypes(csvData, true);
// {
//   'name': 'string',
//   'age': 'number',
//   'active': 'boolean',
//   'email': 'email'
// }

📚 Complete Examples

The examples/ directory contains full, runnable code for real-world scenarios:

  • CSV Import - Parse CSV files, auto-detect types, transform data to proper JavaScript types
  • Form Builder - Dynamically generate HTML forms with correct input types and validation
  • API Analyzer - Generate JSON Schemas, TypeScript interfaces, and API documentation
  • Data Validation - Validate imported data quality and detect type mismatches

Each example includes:

  • ✅ Complete runnable code with detailed comments
  • ✅ Real-world use cases and scenarios
  • ✅ Sample data files where applicable

Run any example:

cd examples/csv-import
node example.js
  • ✅ Sample data files

Complex Data

const { infer } = require('predict-data-types');

const complexString = "192.168.1.1, #FF0000, 50%, $100, 2023-12-31";
const types = infer(complexString.split(', ').map(v => ({ value: v })));
// { value: 'ip' } // Takes the most specific type found

// Or analyze each value separately:
const values = "192.168.1.1, #FF0000, 50%, $100, 2023-12-31".split(', ');
values.forEach(val => {
  console.log(`${val}: ${infer(val)}`);
});
// 192.168.1.1: ip
// #FF0000: color
// 50%: percentage
// $100: currency
// 2023-12-31: date

API

infer(input, format?, options?)

The main function - handles any input type:

Parameters:

  • input (string | string[] | Object | Object[]): Value(s) to analyze
  • format (optional): Output format - Formats.NONE (default) or Formats.JSONSCHEMA
  • options (optional): Configuration options
    • preferHashtagOver3CharHex (boolean, default: false): When true, treats ambiguous 3-character values like #bad, #ace as hashtags instead of hex colors

Returns:

  • DataType (string) - for single values and arrays of values
  • Schema (Object) - for objects and arrays of objects
  • JSONSchema (Object) - when format is Formats.JSONSCHEMA

Examples:

const { infer, Formats, DataTypes } = require('predict-data-types');

// Single values
infer("42"); // → 'number'
infer("test@example.com"); // → 'email'

// Arrays
infer(["1", "2", "3"]); // → 'number'

// Objects
infer({ age: "25", email: "test@example.com" });
// → { age: 'number', email: 'email' }

// Arrays of objects
infer([{ age: "25" }, { age: "30" }]);
// → { age: 'number' }

// JSON Schema format
infer({ name: "Alice", age: "25" }, Formats.JSONSCHEMA);
// → { type: 'object', properties: {...}, required: [...] }

// Hashtag field example
infer({ tag: "#OpenSource" }, Formats.JSONSCHEMA);
// {
//   tag: { type: 'string', pattern: '^#[A-Za-z0-9_]+$' }
// }

Constants

DataTypes - Type-safe constants for comparisons:

DataTypes.STRING, DataTypes.NUMBER, DataTypes.BOOLEAN, DataTypes.EMAIL,
DataTypes.PHONE, DataTypes.URL, DataTypes.UUID, DataTypes.DATE,
DataTypes.IP, DataTypes.COLOR, DataTypes.PERCENTAGE, DataTypes.CURRENCY, DataTypes.HASHTAG,
DataTypes.ARRAY, DataTypes.OBJECT

Formats - Output format constants:

Formats.NONE        // Default simple schema
Formats.JSONSCHEMA  // JSON Schema format

Legacy API

predictDataTypes(input, firstRowIsHeader) - For CSV strings only (use infer() instead)

Show legacy API details

Parameters:

  • input (string): Comma-separated string to analyze
  • firstRowIsHeader (boolean): Treat first row as headers (default: false)

Returns: Object mapping field names/values to their data types

Example:

const types = predictDataTypes('name,age\nAlice,25', true);
// { name: 'string', age: 'number' }

Note: This function is maintained for backwards compatibility. New code should use infer().

TypeScript vs. This Library

Common Misconception: "Doesn't TypeScript already do this?"

No! TypeScript and this library serve completely different purposes:

Feature TypeScript This Library
When it works Compile-time Runtime
What it checks Your code structure Actual data content
Scope Static type annotations Dynamic string analysis
Use case Prevent coding errors Analyze user-provided data

Example:

// TypeScript
const value: string = "test@example.com";
// TypeScript knows: "value is a string"
// TypeScript DOESN'T know: "value contains an email address"

// This Library
const type = infer("test@example.com");
// Returns: 'email' ✅
// Detects the ACTUAL CONTENT at runtime

When to use this library:

  • 📊 Users upload CSV/Excel files
  • 🌐 API responses with unknown structure
  • 📝 Form data that needs validation
  • 🔄 ETL pipelines processing raw data
  • 🎨 Dynamic form/UI generation

TypeScript can't help with any of these - you need runtime type detection!

Development

npm test              # Run tests
npm run test:coverage # Run tests with coverage
npm run lint          # Check code quality
npm run lint:fix      # Fix lint issues

License

MIT License - see LICENSE file for details.

Contributing

See CONTRIBUTING.md for contribution guidelines.


Author: Melih Birim

About

A simple npm package that predicts data types for comma-separated values, including JSON objects, and validates URLs, phone numbers, email addresses, and geolocation data within string values.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • JavaScript 98.6%
  • TypeScript 1.4%