Remove duplicate lines from a file in Scala

How to remove duplicate lines from csv or txt file?

The answer is quite straightforward: you basically need BufferedReader and BufferedWriter, and this also works for large files quite well.

 

 def removeDuplicatesFromFile(fileName : String) {

    val reader = new BufferedReader(new FileReader(fileName))
    val lines = new mutable.HashSet[String]()
    var line: String = null
    while ({line = reader.readLine; line != null}) {
      lines.add(line)
    }
    reader.close

    val writer = new BufferedWriter(new FileWriter(fileName))
    for (unique <- lines) {
      writer.write(unique)
      writer.newLine()
    }
    writer.close

  }

Leave a Reply

Your email address will not be published. Required fields are marked *