Skip to content
Change the repository type filter

All

    Repositories list

    • mpire

      Public
      A Python package for easy multiprocessing, but faster than multiprocessing
      Python
      MIT License
      40000Updated Sep 10, 2021Sep 10, 2021
    • hydra

      Public
      Hydra is a framework for elegantly configuring complex applications
      Python
      MIT License
      812000Updated Mar 25, 2021Mar 25, 2021
    • klio

      Public
      Smarter data pipelines for audio.
      Python
      Apache License 2.0
      54000Updated Oct 16, 2020Oct 16, 2020
    • Neuraxle

      Public
      Build neat pipelines with the right abstractions to do AutoML. Let your pipeline steps have hyperparameter spaces. Enable checkpoints to cut duplicate calculati…
      Python
      Apache License 2.0
      63000Updated Dec 19, 2019Dec 19, 2019
    • faust

      Public
      Python Stream Processing
      Python
      Other
      536000Updated Aug 4, 2018Aug 4, 2018
    • thredo

      Public
      Python
      MIT License
      17000Updated Aug 1, 2018Aug 1, 2018
    • bloop

      Public
      A hot bloop for your productivity
      Scala
      Apache License 2.0
      213000Updated Jun 5, 2018Jun 5, 2018
    • Stream Framework is a Python library, which allows you to build news feed, activity streams and notification systems using Cassandra and/or Redis. The authors o…
      Python
      Other
      533000Updated Jan 21, 2018Jan 21, 2018
    • fireant

      Public
      Data analysis and reporting tool for quick access to custom charts and tables in Jupyter Notebooks and in the shell.
      Python
      Apache License 2.0
      20000Updated Nov 30, 2017Nov 30, 2017
    • gain

      Public
      Web crawling framework based on asyncio for everyone.
      Python
      GNU General Public License v3.0
      205000Updated Jun 19, 2017Jun 19, 2017
    • Persimmon

      Public
      A visual dataflow programming language for sklearn
      Python
      MIT License
      41000Updated May 18, 2017May 18, 2017
    • ufora

      Public
      Compiled, automatically parallel Python for data science
      Python
      Apache License 2.0
      28000Updated May 27, 2016May 27, 2016
    • dask

      Public
      Task scheduling and blocked algorithms for parallel processing
      Python
      BSD 3-Clause "New" or "Revised" License
      1.8k000Updated Jul 6, 2015Jul 6, 2015
    • disque

      Public
      Disque is a distributed message broker
      C
      BSD 3-Clause "New" or "Revised" License
      536000Updated Apr 30, 2015Apr 30, 2015
    • HiBench

      Public
      HiBench is a Hadoop benchmark suite.
      Java
      Other
      769000Updated Apr 17, 2015Apr 17, 2015
    • This repository hold the Amazon Elastic MapReduce sample bootstrap actions
      Python
      Other
      303000Updated Apr 14, 2015Apr 14, 2015
    • rabit

      Public
      Reliable Allreduce and Broadcast Interface for distributed machine learning
      C++
      BSD 3-Clause "New" or "Revised" License
      181000Updated Mar 27, 2015Mar 27, 2015
    • crawler4j

      Public
      Open Source Web Crawler for Java
      Java
      1.9k000Updated Mar 4, 2015Mar 4, 2015
    • grpc-java

      Public
      The Java gRPC implementation. HTTP/2 based RPC
      Java
      BSD 3-Clause "New" or "Revised" License
      4k000Updated Feb 26, 2015Feb 26, 2015
    • Java
      Other
      124000Updated Feb 25, 2015Feb 25, 2015
    • Spark and Redshift integration
      Scala
      Apache License 2.0
      346000Updated Feb 6, 2015Feb 6, 2015
    • Scala
      72000Updated Jan 31, 2015Jan 31, 2015
    • A tool for managing Apache Kafka.
      Scala
      Apache License 2.0
      2.5k000Updated Jan 29, 2015Jan 29, 2015
    • flink

      Public
      Mirror of Apache Flink
      Java
      Apache License 2.0
      14k000Updated Jan 16, 2015Jan 16, 2015
    • Runs embedded, in-memory Apache Kafka instances. Helpful for integration testing.
      Scala
      Other
      13000Updated Dec 3, 2014Dec 3, 2014
    • dataduct

      Public
      DataPipeline for humans.
      Python
      Other
      82000Updated Nov 28, 2014Nov 28, 2014
    • A Python module to scrape several search engines (like Google, Yandex, Bing, Duckduckgo, Baidu and others) by using proxies (socks4/5, http proxy) and with many…
      Python
      755000Updated Nov 20, 2014Nov 20, 2014
    • pyspider

      Public
      A Powerful Spider System with Web UI
      Python
      Apache License 2.0
      3.7k000Updated Nov 18, 2014Nov 18, 2014
    • samoa

      Public
      SAMOA (Scalable Advanced Massive Online Analysis) is a distributed streaming machine learning (ML) framework that contains a programing abstraction for distribu…
      Java
      Apache License 2.0
      77000Updated Nov 8, 2014Nov 8, 2014
    • kangaroo

      Public
      Hadoop utilities for Kafka
      Java
      Apache License 2.0
      35000Updated Oct 23, 2014Oct 23, 2014