Xandra BI Toolkit powered by ML released to Open Source

We are happy to announce that will be partially releasing our Python Business Intelligence Toolkit powered by machine learning algorithms to open-source.

Idea

The idea behind the toolkit is to provide an easy way for companies to arrange, process, visualise business data. Due to machine learning algorithms applied, users will be able so solve prediction, classification and clustering problems.

The visual part will also be a priority for us so the users are capable of conducting quick review.

Development

The development is done in Python using pandas, seaborn and, of course sk-learn libraries.  Since the product will bear a graceful name, we will be putting our best effort create modular architecture, lightweight code-style and test coverage.

Fine-tuning parameters will also be made easily using settings file.

{
"dataset_path" : "trained_all.csv",
"dataset_separator" : ";",
"columns_to_remove": ["Unnamed: 0", "Autoclass", "Color 1", "Color 2", "Image", "Images", "Description", "Overview" ],
"columns_to_encode":["Category"],
"columns_to_do_tfidf":["Product name"],
"should_purify" : true,
"problem" : "clustering",
"clustering_settings": {
  "algorithm" : "kmeans",
  "number_of_cluster" : 30,
  "target_column" : "Cluster"

},

"rows_to_debug": 5
}

The following design patterns will be used:

  • Pipeline / Chain of responsibility – in order to build pipeline of execution.
  • Abstract factory – to dynamically generate objects responsible for the picked algorithms
  • Decorator – to provide additional functionality to existing classes
  • MVC – to serve as architectural pattern for web applications later on

Roadmap

At this point data preprocessing is implemented: label encoding, tf-idf textual fields transformations, excessive columns removal.

The steps to follow are:

  • To implement clustering algorithms
  • To implement classification algorithms
  • To implement regression algorithms
  • To add visualization
  • To add support of different datasources (.txt, SQL etc)
  • To wrap inside web application

Please follow out Github repo or contact us at [email protected]