How to write trained Word2Vec model to CSV with DeepLearning4j

I used DeepLearning4j to train word2vec model. Then I had to save the dictionary to CSV so I can run some clustering algorithms on it.

Sounded like a simple task, but it took a while, and here is the code to do this:

 

   private void writeIndexToCsv(String csvFileName, Word2Vec model) {

        CSVWriter writer = null;
        try {
            writer = new CSVWriter(new FileWriter(csvFileName));
        } catch (IOException e) {
            e.printStackTrace();
        }

        VocabCache<VocabWord> vocCache =  model.vocab();
        Collection<VocabWord> wrds = vocCache.vocabWords();

        for(VocabWord w : wrds) {
            String s = w.getWord();
            System.out.println("Looking into the word:");
            System.out.println(s);
            StringBuilder sb = new StringBuilder();
            sb.append(s).append(",");
            double[] wordVector = model.getWordVector(s);
            for(int i = 0; i < wordVector.length; i++) {
                sb.append(wordVector[i]).append(",");
            }

            writer.writeNext(sb.toString().split(","), false);
        }

        try {
            writer.close();
        } catch (IOException e) {
            e.printStackTrace();
        }

    }

5 programming languages to fall in love with on St. Valentine’s Day.

Saint Valentine’s Day is a holiday of love not only toward your beloved one or family, but also to things like… programming languages. We would like to outline 5 programming languages to fall in love with on St.  Valentine’s Day.

Python

The list of reasons to love Python is infinite:

  • Prevents you from writing Spaghetti code by not compiling without proper indents.
  • Very easy to get started.
  • Multiple tutorials and mobile apps to learn Python on the run.
  • Great web frameworks like Django.
  • List of powerful packages. Just anything from csv to machine learning packages.
  • Easy to install, don’t need IDE.

Scala

Scala is not new and is growing and deemed as a future replacement to Java

  • Unlike Java has a lightweight syntax
  • Is 100% JVM compatible, so you can reuse existing modules.
  • Has great web framework called Play.
  • Implements functional programming paradigm.
  • Syntaxis sugar.

Angular 2

  • Best JS framework, great support, huge community
  • A lot of technologies relying on it. (i.e. Ionic 2).
  • Great data binding.
  • Improved version of Angular 1, with a better approach (not backward compatible).

C#

Old but good language that still dominates the charts.

  • Extremely popular with tons of examples and huge community.
  • Soon to be 100% cross-platform via .NET Core.
  • Excellent business-oriented web framework ASP.NET.
  • Great ORM frameworks, test frameworks.
  • Quite backward compatible, you will not drown with legacy code.

Kotlin

  • Very fresh and lightweight.
  • 100% JVM compatible
  • Out of box in IntelliJ IDEA because…
  • Kotlin created by developers at JetBrains and that these folks know to how to master a language. Just imagine, for so many years they studied thoroughly languages like Java, Groovy, Scala, and they surely have tons of “inspiration” to come up with a good programming language.

Let us program for you in any of this language, let us know at [email protected]

Happy St. Valentine’s Day!

 

Update XML node in Python

I like python because it’s minimalistic and elegant.
Let’s see how to update an XML node using ElementTree.

We use CD catalog in XML as a datasource.

<?xml version="1.0" encoding="iso-8859-1" ?>
<?xml-stylesheet type="text/xsl" href="cdcatalog.xsl"?>
 <catalog>
<cd>
  <title>empire burlesque</title> 
  <artist>bob dylan</artist> 
  <country>usa</country> 
  <company>columbia</company> 
  <price>10.90</price> 
  <year>1985</year> 
  </cd>
 <cd>
  <title>hide your heart</title> 
  <artist>bonnie tyler</artist> 
  <country>uk</country> 
  <company>cbs records</company> 
  <price>9.90</price> 
  <year>1988</year> 
  </cd>
 <cd>
  <title>greatest hits</title> 
  <artist>dolly parton</artist> 
  <country>usa</country> 
  <company>rca</company> 
  <price>9.90</price> 
  <year>1982</year> 
  </cd>
</catalog>

Here is the python script itself.

import xml.etree.ElementTree as ET	

#parse XML file
tree = ET.parse('catalog_.xml')

#get root
root = tree.getroot()
#iterate over each price node (which is subchild of cd node)
for price in root.iter('price'):
	#get the price of CD, multiply 10
	new_price = float(price.text) * 10
	#update the text (value) of the node
	price.text = str(new_price)
	#add 'updated' attribute to mark node updated=yes
	price.set('updated', 'yes')

#can also use the same file if you want to directly update file.
tree.write('catalog_new.xml')

And the output is the following:

<catalog>
<cd>
  <title>empire burlesque</title> 
  <artist>bob dylan</artist> 
  <country>usa</country> 
  <company>columbia</company> 
  <price updated="yes">109.0</price> 
  <year>1985</year> 
  </cd>
 <cd>
  <title>hide your heart</title> 
  <artist>bonnie tyler</artist> 
  <country>uk</country> 
  <company>cbs records</company> 
  <price updated="yes">99.0</price> 
  <year>1988</year> 
  </cd>
 <cd>
  <title>greatest hits</title> 
  <artist>dolly parton</artist> 
  <country>usa</country> 
  <company>rca</company> 
  <price updated="yes">99.0</price> 
  <year>1982</year> 
  </cd>
</catalog>

Remove duplicate lines from a file in Scala

How to remove duplicate lines from csv or txt file?

The answer is quite straightforward: you basically need BufferedReader and BufferedWriter, and this also works for large files quite well.

 

 def removeDuplicatesFromFile(fileName : String) {

    val reader = new BufferedReader(new FileReader(fileName))
    val lines = new mutable.HashSet[String]()
    var line: String = null
    while ({line = reader.readLine; line != null}) {
      lines.add(line)
    }
    reader.close

    val writer = new BufferedWriter(new FileWriter(fileName))
    for (unique <- lines) {
      writer.write(unique)
      writer.newLine()
    }
    writer.close

  }

Top 5 useful Java Libs

Java is an advanced language, but nonetheless there are libs to make life even more easier. We would like to share 5 useful libs to help you with projects of different kind.

FileUtils – Apache Commons

Small but a very useful lib to help you deal with files. Simplifies working with files in a great way, making you productive and avoiding boilerplate code.

FileUtils.readLines(new File("myfile.txt"));

String Utils – Apache Commons

Also small but powerful library. Has all string methods you always lack.

String title = StringUtils.substringBetween(someText, "The", "end");

Jsoup Library

This is the best Java library for parsing HTML and XML, or other markup in general.

Document doc = Jsoup.connect("http://example.com/").get();
String title = doc.title();

OpenCSV

Parsing CSV is a trivial task, but sometimes still cause trouble. OpenCSV is a minimalistic library to help you with this.

CSVReader reader = new CSVReader(new FileReader("yourfile.csv"));
     String [] nextLine;
     while ((nextLine = reader.readNext()) != null) {
        // nextLine[] is an array of values from the line
        System.out.println(nextLine[0] + nextLine[1] + "etc...");
     }

org.json

You usually do a lot of networking in Java, but what you really need is a good JSON parser/manager. Org.json is a popular and minimalistic Java library for operating with JSON data.

String str = "{ \"firstName\": \"Vladimir\", \"age\": 30 }";
JSONObject obj = new JSONObject(str);
String n = obj.getString("firstName");
int a = obj.getInt("age");
System.out.println(n + " " + a);  // prints "Vladimir 30"

We would also point out other libs like fasterxml, FileNameUtils and Unirest.

Hope you’ll find these minimalistic java libs helpful and powerful.

Anyway you can check with us to see if we can help you develop your java application.