How to write trained Word2Vec model to CSV with DeepLearning4j

I used DeepLearning4j to train word2vec model. Then I had to save the dictionary to CSV so I can run some clustering algorithms on it.

Sounded like a simple task, but it took a while, and here is the code to do this:

 

   private void writeIndexToCsv(String csvFileName, Word2Vec model) {

        CSVWriter writer = null;
        try {
            writer = new CSVWriter(new FileWriter(csvFileName));
        } catch (IOException e) {
            e.printStackTrace();
        }

        VocabCache<VocabWord> vocCache =  model.vocab();
        Collection<VocabWord> wrds = vocCache.vocabWords();

        for(VocabWord w : wrds) {
            String s = w.getWord();
            System.out.println("Looking into the word:");
            System.out.println(s);
            StringBuilder sb = new StringBuilder();
            sb.append(s).append(",");
            double[] wordVector = model.getWordVector(s);
            for(int i = 0; i < wordVector.length; i++) {
                sb.append(wordVector[i]).append(",");
            }

            writer.writeNext(sb.toString().split(","), false);
        }

        try {
            writer.close();
        } catch (IOException e) {
            e.printStackTrace();
        }

    }

Remove duplicate lines from a file in Scala

How to remove duplicate lines from csv or txt file?

The answer is quite straightforward: you basically need BufferedReader and BufferedWriter, and this also works for large files quite well.

 

 def removeDuplicatesFromFile(fileName : String) {

    val reader = new BufferedReader(new FileReader(fileName))
    val lines = new mutable.HashSet[String]()
    var line: String = null
    while ({line = reader.readLine; line != null}) {
      lines.add(line)
    }
    reader.close

    val writer = new BufferedWriter(new FileWriter(fileName))
    for (unique <- lines) {
      writer.write(unique)
      writer.newLine()
    }
    writer.close

  }

Top 5 useful Java Libs

Java is an advanced language, but nonetheless there are libs to make life even more easier. We would like to share 5 useful libs to help you with projects of different kind.

FileUtils – Apache Commons

Small but a very useful lib to help you deal with files. Simplifies working with files in a great way, making you productive and avoiding boilerplate code.

FileUtils.readLines(new File("myfile.txt"));

String Utils – Apache Commons

Also small but powerful library. Has all string methods you always lack.

String title = StringUtils.substringBetween(someText, "The", "end");

Jsoup Library

This is the best Java library for parsing HTML and XML, or other markup in general.

Document doc = Jsoup.connect("http://example.com/").get();
String title = doc.title();

OpenCSV

Parsing CSV is a trivial task, but sometimes still cause trouble. OpenCSV is a minimalistic library to help you with this.

CSVReader reader = new CSVReader(new FileReader("yourfile.csv"));
     String [] nextLine;
     while ((nextLine = reader.readNext()) != null) {
        // nextLine[] is an array of values from the line
        System.out.println(nextLine[0] + nextLine[1] + "etc...");
     }

org.json

You usually do a lot of networking in Java, but what you really need is a good JSON parser/manager. Org.json is a popular and minimalistic Java library for operating with JSON data.

String str = "{ \"firstName\": \"Vladimir\", \"age\": 30 }";
JSONObject obj = new JSONObject(str);
String n = obj.getString("firstName");
int a = obj.getInt("age");
System.out.println(n + " " + a);  // prints "Vladimir 30"

We would also point out other libs like fasterxml, FileNameUtils and Unirest.

Hope you’ll find these minimalistic java libs helpful and powerful.

Anyway you can check with us to see if we can help you develop your java application.