React / React Native developer

Cyber Whale¬†ūüźč¬†is looking for:¬†

For a range of amazing projects in a context of vertical and horizontal expansion, Cyber Whale  Рan international company with R&D unit in Moldova is looking for

React / React Native developer

Requirements:

  • Strong JS experience. ¬†
  • Strong knowledge of React js.¬†
  • Node JS server side scripting and deployment¬†
  • CSS + HTML , clean and aligned markup creation skill is a must.¬†
  • Knowledge of React Native would be an advantage.¬†
  • Experience building and working with CI/CD

 

What you’ll gett

  • Pleasant atmosphere for personal and professional growth
  • Nice salary
  • Nice management
  • Flexible hours
  • Interesting assignments and proper advice

 

Send your CV to¬†[email protected], [email protected]

 

 

PickOnePic

Privacy Policy

built the PickOnePic app as a Free app. This SERVICE is provided by at no cost and is intended for use as is.

This page is used to inform visitors regarding my policies with the collection, use, and disclosure of Personal Information if anyone decided to use my Service.

If you choose to use my Service, then you agree to the collection and use of information in relation to this policy. The Personal Information that I collect is used for providing and improving the Service. I will not use or share your information with anyone except as described in this Privacy Policy.

The terms used in this Privacy Policy have the same meanings as in our Terms and Conditions, which is accessible at PickOnePic unless otherwise defined in this Privacy Policy.

Information Collection and Use

For a better experience, while using our Service, I may require you to provide us with certain personally identifiable information. The information that I request will be retained on your device and is not collected by me in any way.

The app does use third party services that may collect information used to identify you.

Link to privacy policy of third party service providers used by the app

Log Data

I want to inform you that whenever you use my Service, in a case of an error in the app I collect data and information (through third party products) on your phone called Log Data. This Log Data may include information such as your device Internet Protocol (‚ÄúIP‚ÄĚ) address, device name, operating system version, the configuration of the app when utilizing my Service, the time and date of your use of the Service, and other statistics.

Cookies

Cookies are files with a small amount of data that are commonly used as anonymous unique identifiers. These are sent to your browser from the websites that you visit and are stored on your device’s internal memory.

This Service does not use these ‚Äúcookies‚ÄĚ explicitly. However, the app may use third party code and libraries that use ‚Äúcookies‚ÄĚ to collect information and improve their services. You have the option to either accept or refuse these cookies and know when a cookie is being sent to your device. If you choose to refuse our cookies, you may not be able to use some portions of this Service.

Service Providers

I may employ third-party companies and individuals due to the following reasons:

  • To facilitate our Service;
  • To provide the Service on our behalf;
  • To perform Service-related services; or
  • To assist us in analyzing how our Service is used.

I want to inform users of this Service that these third parties have access to your Personal Information. The reason is to perform the tasks assigned to them on our behalf. However, they are obligated not to disclose or use the information for any other purpose.

Security

I value your trust in providing us your Personal Information, thus we are striving to use commercially acceptable means of protecting it. But remember that no method of transmission over the internet, or method of electronic storage is 100% secure and reliable, and I cannot guarantee its absolute security.

Links to Other Sites

This Service may contain links to other sites. If you click on a third-party link, you will be directed to that site. Note that these external sites are not operated by me. Therefore, I strongly advise you to review the Privacy Policy of these websites. I have no control over and assume no responsibility for the content, privacy policies, or practices of any third-party sites or services.

Children’s Privacy

These Services do not address anyone under the age of 13. I do not knowingly collect personally identifiable information from children under 13. In the case I discover that a child under 13 has provided me with personal information, I immediately delete this from our servers. If you are a parent or guardian and you are aware that your child has provided us with personal information, please contact me so that I will be able to do necessary actions.

Changes to This Privacy Policy

I may update our Privacy Policy from time to time. Thus, you are advised to review this page periodically for any changes. I will notify you of any changes by posting the new Privacy Policy on this page. These changes are effective immediately after they are posted on this page.

Contact Us

If you have any questions or suggestions about my Privacy Policy, do not hesitate to contact me at [email protected]

Cyber Whale hits IT Park in Republic of Moldova

Proud to announce that Cyber Whale LLC has been incorporated in Republic of Moldova and entered the IT Park.

The benefits of being present in the IT Park are the following:

  • The¬†unified tax rate which is just 7% of the sales (VAT not included).
  • A straightforward procedure to become a member of the park.
  • Easier reporting (just 1 monthly tax report , instead of 4 reports).
  • A great opportunity for investors.
  • 0% salary tax for employees, 0% medical tax, 0% social tax – all included in 7% tax rate.

Cyber Whale is a digital agency rendering Digital and Creative services as well as Machine Learning and Business Intelligence services, operating worldwide from Republic of Moldova.

How to write trained Word2Vec model to CSV with DeepLearning4j

I used DeepLearning4j to train word2vec model. Then I had to save the dictionary to CSV so I can run some clustering algorithms on it.

Sounded like a simple task, but it took a while, and here is the code to do this:

 

   private void writeIndexToCsv(String csvFileName, Word2Vec model) {

        CSVWriter writer = null;
        try {
            writer = new CSVWriter(new FileWriter(csvFileName));
        } catch (IOException e) {
            e.printStackTrace();
        }

        VocabCache<VocabWord> vocCache =  model.vocab();
        Collection<VocabWord> wrds = vocCache.vocabWords();

        for(VocabWord w : wrds) {
            String s = w.getWord();
            System.out.println("Looking into the word:");
            System.out.println(s);
            StringBuilder sb = new StringBuilder();
            sb.append(s).append(",");
            double[] wordVector = model.getWordVector(s);
            for(int i = 0; i < wordVector.length; i++) {
                sb.append(wordVector[i]).append(",");
            }

            writer.writeNext(sb.toString().split(","), false);
        }

        try {
            writer.close();
        } catch (IOException e) {
            e.printStackTrace();
        }

    }

Xanda BI Toolkit: clustering

In the previous post we introduced the toolkit release to open source and the general idea behind the project, now I would like to share clustering implementation.

At this point we implemented 3 clustering algorithms:

  • K-means
  • DBSCAN
  • Hierarchical clustering

K-means

Very straight-forward algorithm

#clustering algorithms
class KMeansAlgorithm(Step):
    def __init__(self):
        self.params = settings["clustering_settings"]["kmeans_params"]
        self.newColumn = settings["clustering_settings"]["target_column"]

    def execute(self, df):
        pprint(self.__class__.__name__)
        pprint(inspect.stack()[0][3])

        km = KMeans(**self.params)
        km.fit(df)
        clusters = km.labels_.tolist()
        df[self.newColumn] = clusters
        pprint(df.head(settings["rows_to_debug"]))
        return df

K-means is memory-friendly and provides good output resulrs.

DBSCAN

Although DBSCAN is noise reduction based algorithm it is capable to self-organise clusters.

class DBScanAlgorithm(Step):
    def __init__(self):
        self.params = settings["clustering_settings"]["dbscan_params"]
        self.newColumn = settings["clustering_settings"]["target_column"]

    def execute(self, df):
        pprint(self.__class__.__name__)
        pprint(inspect.stack()[0][3])


        loc_df = StandardScaler().fit_transform(df)
        db = DBSCAN(**self.params).fit(loc_df)
        core_samples_mask = np.zeros_like(db.labels_, dtype=bool)
        core_samples_mask[db.core_sample_indices_] = True
        clusters = db.labels_.tolist()
        print(clusters)


        loc_df[self.newColumn] = clusters
        pprint(df.head(settings["rows_to_debug"]))

        return loc_df

Xandra BI Toolkit powered by ML released to Open Source

We are happy to announce that will be partially releasing our Python Business Intelligence Toolkit powered by machine learning algorithms to open-source.

Idea

The idea behind the toolkit is to provide an easy way for companies to arrange, process, visualise business data. Due to machine learning algorithms applied, users will be able so solve prediction, classification and clustering problems.

The visual part will also be a priority for us so the users are capable of conducting quick review.

Development

The development is done in Python using pandas, seaborn and, of course sk-learn libraries.  Since the product will bear a graceful name, we will be putting our best effort create modular architecture, lightweight code-style and test coverage.

Fine-tuning parameters will also be made easily using settings file.

{
"dataset_path" : "trained_all.csv",
"dataset_separator" : ";",
"columns_to_remove": ["Unnamed: 0", "Autoclass", "Color 1", "Color 2", "Image", "Images", "Description", "Overview" ],
"columns_to_encode":["Category"],
"columns_to_do_tfidf":["Product name"],
"should_purify" : true,
"problem" : "clustering",
"clustering_settings": {
  "algorithm" : "kmeans",
  "number_of_cluster" : 30,
  "target_column" : "Cluster"

},

"rows_to_debug": 5
}

The following design patterns will be used:

  • Pipeline / Chain of responsibility – in order to build pipeline of execution.
  • Abstract factory – to dynamically generate objects responsible for the picked algorithms
  • Decorator – to provide additional functionality to existing classes
  • MVC – to serve as architectural pattern for web applications later on

Roadmap

At this point data preprocessing is implemented: label encoding, tf-idf textual fields transformations, excessive columns removal.

The steps to follow are:

  • To¬†implement clustering algorithms
  • To implement classification algorithms
  • To implement regression algorithms
  • To add visualization
  • To add support of different datasources (.txt, SQL etc)
  • To wrap inside web application

Please follow¬†out Github repo or contact us at [email protected]

 

 

 

5 programming languages to fall in love with on St. Valentine’s Day.

Saint Valentine’s Day is a holiday of love not only toward your beloved one or family, but also to things like… programming languages. We would like to outline 5 programming languages to fall in love with on St. ¬†Valentine’s Day.

Python

The list of reasons to love Python is infinite:

  • Prevents you from writing Spaghetti code by not compiling without proper¬†indents.
  • Very easy to get started.
  • Multiple tutorials and mobile apps to learn Python on the¬†run.
  • Great web frameworks like Django.
  • List of powerful packages. Just anything from csv to machine learning packages.
  • Easy to install, don’t need IDE.

Scala

Scala is not new and is growing and deemed as a future replacement to Java

  • Unlike Java has a lightweight syntax
  • Is 100% JVM compatible, so you can reuse existing modules.
  • Has great web framework called Play.
  • Implements functional programming paradigm.
  • Syntaxis sugar.

Angular 2

  • Best JS framework, great support, huge community
  • A lot of technologies relying on it. (i.e. Ionic 2).
  • Great data binding.
  • Improved version of Angular 1, with a better approach (not backward compatible).

C#

Old but good language that still dominates the charts.

  • Extremely popular with tons of examples and huge community.
  • Soon to be 100% cross-platform via .NET Core.
  • Excellent business-oriented web framework ASP.NET.
  • Great ORM frameworks, test frameworks.
  • Quite backward compatible, you will not drown with legacy code.

Kotlin

  • Very fresh and lightweight.
  • 100% JVM compatible
  • Out of box in IntelliJ IDEA because…
  • Kotlin created by developers at JetBrains and that these folks know to how to master¬†a language. Just imagine, for so many years they studied thoroughly languages like Java, Groovy, Scala, and they surely have tons of “inspiration” to come up with a good programming language.

Let us program for you in any of this language, let us know at [email protected]

Happy St. Valentine’s Day!

 

How to parse dynamic HTML content using Python

In the previous tutorial we learning how to parse HTML in Python. In the Python tutorial we are going to learn to to parse dynamic HTML content generated by JavaScript, jQuery, Ajax, Angular or other dynamic pages technology.

What’s the problem with parsing dynamic HTML content in Python and in general?

The problem is that when you request contents of a HTML page, you are presented HTML, CSS and scripts returned from the server. If the page is dynamic, what you get is only a couple of scripts that are meant to be interpreted by your browser that, in its turn, will eventually display HTML content for a user.

That leads us to the idea that we should first render the page and then grab its HTML. Also it should take some time to render the page since sometimes the content is quite “heavy” and it takes some time to load it.

So, along with pure Python we should use some kind of UI component and in particular a Web View or some kind of Web frame.

One of the options is to use Qt for Python and to handle page rendering events and another one (which I honestly prefer more) is to use selenium for python.

So, let’s get down to writing some code but before that let’s elaborate and approach.

  1. Open web view with URL.
  2. Wait untill the page is loaded. Often the criteria here is a loaded div of some class.
  3. Grab the rendered HTML.
  4. Process it further using beautiful soup

You will need Chrome Web Driver to run the web view.

Also you will have to install selenium as well as libs from previous tutorial:

pip install selenium

So here is the Python code to parse dynamic content:

#import selenium compnents, urllib, beautiful soup
from bs4 import BeautifulSoup
from selenium import webdriver
from urllib import urlopen
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.by import By


#url - the url to fetch dynamic content from.
#delay - second for web view to wait
#block_name - id of the tag to be loaded as criteria for page loaded state.
def fetchHtmlForThePage(url, delay, block_name):
	#supply the local path of web driver.
	#in this example we use chrome driver
	browser = webdriver.Chrome('/Applications/chromedriver')
	#open the browser with the URL
	#a browser windows will appear for a little while
	browser.get(url)
	try:
	#check for presence of the element you're looking for
		element_present = EC.presence_of_element_located((By.ID, block_name))
		WebDriverWait(browser, delay).until(element_present)

	#unless found, catch the exception
	except TimeoutException:
		print "Loading took too much time!"	

	#grab the rendered HTML
	html = browser.page_source
	#close the browser
	browser.quit()
	#return html
	return html


#call the fetching function we created
html = fetchHtmlForThePage(url, 5, 're-Searchresult')
#grab HTML document
soup = BeautifulSoup(html)
#process it further as you wish.....
#.....
processFetchedUrls(soup, path)
	

So here how to parse dynamic HTML content generated with JavaScript with the of Python.

Visit us to get help with your Python challenge of let us know if can help you with your digital needs.

How to parse emails from HTML in Python

In this tutorial we are going to get an idea of how to parse emails from HTML using Python.

Python is a scripting language easy to get started and is perfect for tasks like parsing emails.

So let’s elaborate an approach of how parsing works:

  1. Initialize a queue of URLs. The first item will be the initial URL.
  2. Initialize a set of already visited URL to avoid repetitions.
  3. Start parsing the current URL from the queue.
  4. Add the URL to processed URLs.
  5. Extract the whole HTML, search for an email pattern using a regex.
  6. If one or multiple emails were found, write to CSV.
  7. Loop through <a> tags found.
  8. Check if URL is relative or absolute.
  9. Check if URL is already in the processed URLs set. If not, add to the processing queue
  10. Repeat from step 3.

Before launching the script don’t forget to install proper libraries.

Using command line do:

pip install requests
pip install urlparse
pip install csv
pip install beautifulsoup4

Once you have the libraries installed, you’ll be able to check the script.

from bs4 import BeautifulSoup
import requests
import requests.exceptions
from urlparse import urlparse
from urlparse import urlsplit
from collections import deque
import re
import csv

#initialize CSV writer and filename
cw = csv.writer(open("Singa.csv",'a'), delimiter=',')
# a queue of urls, start
new_urls = deque(['https://foundersgrid.com/50-singapore-startups/'])

# a set of urls that we have already crawled
processed_urls = set()

# a set of crawled emails
emails = set()

# process urls one by one until we exhaust the queue
while len(new_urls):

    #extract the last one from queue
	url = new_urls.popleft()
	#mark as visited by adding to proccessed URLs
	processed_urls.add(url)

    # break down the extract the base url to resolve relative links
	parts = urlsplit(url)
	base_url = "{0.scheme}://{0.netloc}".format(parts)
	path = url[:url.rfind('/')+1] if '/' in parts.path else url

    # get url's content
	#handle exception if any
	try:
		response = requests.get(url)
	except (requests.exceptions.MissingSchema, requests.exceptions.ConnectionError):
        # skip pages with errors
		continue

    # extract all email addresses and add them into the resulting set
	new_emails = set(re.findall(r"[a-z0-9\.\-+_][email protected][a-z0-9\.\-+_]+\.[a-z]+", response.text, re.I))
	emails.update(new_emails)
	print new_emails
	#write to CSV the new mails.
	#alternatively you can write the emails set to CSV after parsing
	for em in new_emails:
		cw.writerow([em,])

    # create a beutiful soup object as representation of the html page
	soup = BeautifulSoup(response.text)

    # walk through a anchords
	for anchor in soup.find_all("a"):
        # extract link url from the anchor
		link = anchor.attrs["href"] if "href" in anchor.attrs else ''
        # resolve relative links
		if link.startswith('/'):
			link = base_url + link
		elif not link.startswith('http'):
			link = path + link
        # add the new url to the queue if it was not enqueued nor processed yet
		if not link in new_urls and not link in processed_urls:
			new_urls.append(link)

As you can see, parsing emails in Python is rather a simple task.

If you have any questions on this tutorial, you can contact us [email protected]

Also, if you need assistance with data collection or any other digital service, please let us know.

Don’t forget to share the tutorial and visit us at¬†https://cyberwhale.tech

PS. In the next tutorial we will discuss how to parse dynamic HTML content using Python.

Update XML node in Python

I like python because it’s minimalistic and elegant.
Let’s see how to update an XML node using ElementTree.

We use CD catalog in XML as a datasource.

<?xml version="1.0" encoding="iso-8859-1" ?>
<?xml-stylesheet type="text/xsl" href="cdcatalog.xsl"?>
 <catalog>
<cd>
  <title>empire burlesque</title> 
  <artist>bob dylan</artist> 
  <country>usa</country> 
  <company>columbia</company> 
  <price>10.90</price> 
  <year>1985</year> 
  </cd>
 <cd>
  <title>hide your heart</title> 
  <artist>bonnie tyler</artist> 
  <country>uk</country> 
  <company>cbs records</company> 
  <price>9.90</price> 
  <year>1988</year> 
  </cd>
 <cd>
  <title>greatest hits</title> 
  <artist>dolly parton</artist> 
  <country>usa</country> 
  <company>rca</company> 
  <price>9.90</price> 
  <year>1982</year> 
  </cd>
</catalog>

Here is the python script itself.

import xml.etree.ElementTree as ET	

#parse XML file
tree = ET.parse('catalog_.xml')

#get root
root = tree.getroot()
#iterate over each price node (which is subchild of cd node)
for price in root.iter('price'):
	#get the price of CD, multiply 10
	new_price = float(price.text) * 10
	#update the text (value) of the node
	price.text = str(new_price)
	#add 'updated' attribute to mark node updated=yes
	price.set('updated', 'yes')

#can also use the same file if you want to directly update file.
tree.write('catalog_new.xml')

And the output is the following:

<catalog>
<cd>
  <title>empire burlesque</title> 
  <artist>bob dylan</artist> 
  <country>usa</country> 
  <company>columbia</company> 
  <price updated="yes">109.0</price> 
  <year>1985</year> 
  </cd>
 <cd>
  <title>hide your heart</title> 
  <artist>bonnie tyler</artist> 
  <country>uk</country> 
  <company>cbs records</company> 
  <price updated="yes">99.0</price> 
  <year>1988</year> 
  </cd>
 <cd>
  <title>greatest hits</title> 
  <artist>dolly parton</artist> 
  <country>usa</country> 
  <company>rca</company> 
  <price updated="yes">99.0</price> 
  <year>1982</year> 
  </cd>
</catalog>