Remove duplicate lines from a file in Scala

How to remove duplicate lines from csv or txt file?

The answer is quite straightforward: you basically need BufferedReader and BufferedWriter, and this also works for large files quite well.

 

 def removeDuplicatesFromFile(fileName : String) {

    val reader = new BufferedReader(new FileReader(fileName))
    val lines = new mutable.HashSet[String]()
    var line: String = null
    while ({line = reader.readLine; line != null}) {
      lines.add(line)
    }
    reader.close

    val writer = new BufferedWriter(new FileWriter(fileName))
    for (unique <- lines) {
      writer.write(unique)
      writer.newLine()
    }
    writer.close

  }

Top 5 useful Java Libs

Java is an advanced language, but nonetheless there are libs to make life even more easier. We would like to share 5 useful libs to help you with projects of different kind.

FileUtils – Apache Commons

Small but a very useful lib to help you deal with files. Simplifies working with files in a great way, making you productive and avoiding boilerplate code.

FileUtils.readLines(new File("myfile.txt"));

String Utils – Apache Commons

Also small but powerful library. Has all string methods you always lack.

String title = StringUtils.substringBetween(someText, "The", "end");

Jsoup Library

This is the best Java library for parsing HTML and XML, or other markup in general.

Document doc = Jsoup.connect("http://example.com/").get();
String title = doc.title();

OpenCSV

Parsing CSV is a trivial task, but sometimes still cause trouble. OpenCSV is a minimalistic library to help you with this.

CSVReader reader = new CSVReader(new FileReader("yourfile.csv"));
     String [] nextLine;
     while ((nextLine = reader.readNext()) != null) {
        // nextLine[] is an array of values from the line
        System.out.println(nextLine[0] + nextLine[1] + "etc...");
     }

org.json

You usually do a lot of networking in Java, but what you really need is a good JSON parser/manager. Org.json is a popular and minimalistic Java library for operating with JSON data.

String str = "{ \"firstName\": \"Vladimir\", \"age\": 30 }";
JSONObject obj = new JSONObject(str);
String n = obj.getString("firstName");
int a = obj.getInt("age");
System.out.println(n + " " + a);  // prints "Vladimir 30"

We would also point out other libs like fasterxml, FileNameUtils and Unirest.

Hope you’ll find these minimalistic java libs helpful and powerful.

Anyway you can check with us to see if we can help you develop your java application.

Python networking example

Here is a small example demonstrating get requests in Python.

pip install requests

And the code itself


import library
import requests

#prepare paramteters
parameters = {'date:':'2000:2010', 'format':'xml'}

#prepare URL
url = 'http://api.worldbank.org/countries/br/indicators/SP.POP.TOTL'

#call get method and save data into the response
r = requests.get(url, params=parameters)

#print the url considering the params
print r.url

#check for status code
statusCode = int(r.status_code)

#if failed request print the mesage, else print response headers and text
if statusCode != 200:
print 'Request Failed'
else:
print r.headers['content-type']
#print r.json() for json requests
print r.text

Who we are

We are Cyber Whale, we are here in this world to deliver you hi-fi digital services at affordable prices.

We operate worldwide making our customers happy.

Some of our services are:

  • Mobile applications developments
  • Web apps, rich content web apps
  • Cloud deployment
  • Datamining, business intelligence services
  • Quality Assurance and help desk

Visit https://cyberwhale.tech for more info.