Splitting a csv file using bash in linux

This is a typo update on a post i’ve seen on this blog post http://www.geekology.co.za/blog/2009/02/bash-script-to-split-single-csv-file-into-multiple-files-with-headers/

What this will do is split a csv file into chucks of 300 lines but also prepend the original header from the csv file to the chunks that have been cut.

#!/bin/bash

# check if an input filename was passed as a command line argument:

if [ ! $# == 1 ]; then  echo "Please specify the name of a file to split!"  exit

fi

# create a directory to store the output:
mkdir output

# create a temporary file containing the header without
# the content:
head -n 1 $1 > header.csv

# create a temporary file containing the content without

# the header:
tail -n +2 $1 > content.csv

# split the content file into multiple files of 5 lines each:
split -l 300 content.csv output/data_

# loop through the new split files, adding the header# and a '.csv' extension:
for f in output/*; do cat header.csv $f > $f.csv; rm $f; done;

# remove the temporary files:
rm header.csv
rm content.csv

You can change it to any number you want, but this is the ideal number of records for importing into a sugarcrm setupon a fasthosts site. If you use any other hosting the memory on sugarcrm may max out at a different level,so just tweak that number a little.

for a non-sugarcrm solution, for example loading into excel you could have a value of 50,000 to beat the 64,000~ barrier.

by the way anyone who want to analyse a 64k line excel file and make business critical decisions based on that is an idiot.

85 thoughts on “Splitting a csv file using bash in linux

  1. I don’t order anything from anyone unless they deliver to my work address. Most will do it now… if you have an understanding work!
    Citilink depot in Preston really sucks. If you phone Play.com, you may be able to get them to contact the citilink depot for you and arrange delivery somewhere else…. if they can be bothered!

    • I was told I had to contact City-Link, yet on City-Links FAQ you have to contact the sender to do this.

      It’s just returning to sender, and I’m getting my moneyback, they should have used parcelforce.

  2. Unfortunately, they will only delivery to my billing address. For “security” purposes.

    I won’t be travelling to Preston for security purposes also, so it’s going back to sender.

  3. Yes. It would be cool if they put a message on their blog to let us know what is going on and when they expect it to be fixed.

    Instead, I’ll just have to sit here at work refreshing the page every 10 minutes or so…

  4. its workin on TOR on and off, which is fukin wierd, thats y i asked if any other auzzies are havin problems, maybe ts a web filter thing… maybe not but im gonna get to the bottom of this!!!

  5. A used or refurbished MacMini is available for around 300-400 bugs, if this is too much, I thinks you are not serious about developing iPhone/iPod/iPad software.

    • I am thinking of that, the issue is the legallity of developing on pc, but a mac mini is on the cards for myself.

      The problem is if my friend X want to play with my code to learn does he have to buy a mac especially for that reason?

      And would Apple benefit from x10 more coders?

  6. I’ve just discovered that play.com is now using citylink for their deliveries and this is my last purchase from them, as i live in edinburgh and the nearest depot (which is named Edinburgh Depot) is located in livingstone, somewhere15miles away from edinburgh with no public transport going there, so bay bay play, and f*** shity-link

    • I have been thinking of setting up a courier shop and depot database to help people make their choice of online store which will facilitate delivery.

  7. I used DoubleCommand to remap that wierd button which looks like the tab symbol, but pointing upwards, to be the much-needed ‘del’ key which all PCs rightly have.
    What is that vertical tab supposed to do anyway? I noticed it’s missing from the last macbook pro I saw.

  8. Pingback: i.nt.ro » Blog Archive » A useful app for unix/mac converts

  9. Pingback: Ganalot! » Blog Archive » 25 years of UNIX history

  10. Pingback: Ganalot! » Blog Archive » Unix Shell Programming-I

  11. I’ve used every major holiday search engine over the last few days, and they all have major flaws. This one contains most of them. Here’s some features on my wish list:
    1) Departure Airport: Either allow multiple selections, or have a ‘All North West Airports’ etc, option.
    2) Destination. Wierd as it sounds, not everyone has a specific destination in mind. Other criteria may be more important such as *rating, TripAdvisor average rating, specific facilities offered. As most sites you’ll be searching will likely require a destination, the only solution I can think of is caching, but the bandwidth you’d need to poll all combinations would likely ban you from using their services, or at least incur a hefty bill from your server provider/ISP
    3) Destination: should only be populated once the departure point is known. There is no point searching for example Manchester to Chang Mai, only to be told ‘No results’. If there is no direct way to fly there, remove Chang Mai from the list of destinations once Manchester is chosen. If indirect flight options are searched for, this will get very complex to implement I’m sure.
    4) Star Rating should always be minimum star rating. No rating is ever too high, it is only the price that can be too high.
    5) Minimum price per person seems redundant, as star rating and destination should determine that. Maximum price is much much more important. Some sites have a range, which is useful
    6) Activities available. Pretty much impossible to implement without cooperation of all holiday companies. Need to be able to search for, and display info on activities available, and whether on-site, off-site, included in holiday or extra.

    My current holiday search would be:

    Any North West Airport
    to
    Any Indian Ocean or Far East resort
    4* minimum
    All Inclusive
    Dingy sailing available (preferably onsite and included in cost)
    Spa facilities available onsite (preferably included)

    I believe this is the type of search which is sadly lacking at the moment. Unfortunately many of these features are impossible with the data available by holiday companies.

    Now, if anyone knows where I should go in Sept/Oct, feel free to tweet me! @GregWoods

    • 1) Departure Airport: Either allow multiple selections, or have a ‘All North West Airports’ etc, option.
      Done – I took this out when on my dev host as it was being a bit cumbersome, but now we have a nice production server its gone straight back in – Good call.
      2 & 6) Probably out of question, unfortunately.
      3) It does do this… it updates to valid routes.. just some only have holidays on certain days, henc you always going to get ‘no results’ from time to time.
      4) Done – Good call.
      5) This basically is pagination and could go top right on the results…. I dunno I like it where it is. Implementing range is on the to do list!

      Thanks so much for your feedback!

  12. I’d like to point out that this site claims to be fast, yet contains tonnes of bloat.

    The use of 10 javascript libraries (jQuery, Cufon and their related libraries), each of which took 1-2 seconds to load on from your “beefy VPS”, is quite unnecessary. Most of these libraries can be loaded from a CDN, which could reduce load times through caching and localised server farms.
    The fact that you use a jQuery for auto movement functionality on the search box is pointless. CSS2 positioning could be used to achieve the same effect that is functional in all but IE6, and which would mean a lot less browser strain.

    The use of a pointless flash image slideshow, that re-loads the 5 images every cycle.

    As well as all the flaws pointed out by Greg Woods, I would also like to point out that the search results are sorted by price, and that the user is given no option for sorting preference. It would be nice to sort by Star rating, type of board, etc.

    All in all, the web design is poor, web development is sloppy and bloated, and the search has large flaws.

    • I’ve taken your advice on reducing header requests. I’ve read this a lot places but always just thought it’d only make a difference to massive sites reducing their bandwidth and would never make a noticeable difference to clients’ load times. It’s like putting all your images in one image and loading divs with altered background positions, right? Like Google does. None the less… a good suggestion is a good suggestion so I’ve used a CDN and put everything in one js file on my side.
      Not too worried about browser strain. The scrollfollow library is easy to use and customizable and is popular in the community. Slate it if you will, but I think it’s cute.
      I’ll pass on your comments on design to the designer, although I’m not sure how helpful he’ll find your general distaste.
      The search retains the last search options. If you want to filter by start rating, change the star rating and press search again.

      My comments on speed was in relation to the search. The data center suppling the feed is amazingly fast and our VPS is going the job parsing and presenting it. This site does out perform the large comparison sites in this respect when I’ve tested it. I can’t believe that any amount of bloat can’t be made negligible with hardware. It’s just a web page and the net is getting pretty fast these days. Although, the thought of having my code described as bloated is a bit shameful… :S

  13. Jason – next time I’m hiring I’ll give you a call. Anyone else found a cheaper UAT method ? Costs me thousand$.

  14. I’ll admit that this IS an exciting alternative to the OLPC, but my concern is with the broader tenet: that technology plays a critical role in educating the third world masses.

    Undoubtedly, children worldwide have a right to and need for a basic education. But we don’t seem to get past the idealized notion that computers & Internet are magic bullets for that education long enough to notice that technology has NOT bolstered educational efforts here in the U.S.

    I’ve seen little to quantify that exposure to technology has has a truly positive effect on American children – in fact, it could be argued with some success that it’s actually dumbing them down.

    Perhaps we’d be better off focusing on more basic educational needs like teachers, books, and a safe place to study for developing countries. Catch that and the techie stuff will take care of itself.

  15. Pingback: rodrigoeiras.eti.br

  16. Aaayyyeeaayyeewaayyeee. That stuff is by far the hottest sauce I’ve ever had! Makes Tabasco taste like mayo. However, my views may be influenced by the fact I once bought it, mistaking it for sweet chilli sauce, and poured half a jar into the wok with my prawn stirfry. I nearly died. Not one to waste good king prawns, I rinsed the whole meal through a seive, added more soy sauce, and ate. It was still so hot my eyes bled! Let this be a lesson to you all.

  17. You mean like the dialog in KDE4? There is a input field where you just put the name of the application and it will open the file with this application.

    If you don’t know the name, there is a list of all installed applications, grouped by category like Adminitration, Development, Games, Internet, Multimedia, etc.

    Then if you like, you can set it as the default application for this file type.

    Here is an example
    http://www.flickr.com/photos/53303115@N02/4992814625/

  18. Pingback: i.nt.ro » Blog Archive » Bresenham's algorythm in javascript – javascript - dowiedz się więcej!

  19. Wouldn’t work. The physical storage capacity of the sorting office is 3-5 days. They’d run out of room.

  20. Pingback: Here’s a must do for whenever you come across a Jukebox!

  21. Pingback: Rails 3 Devise and mongo db

  22. 0.o:~$ apt-cache show highlight

    Package: highlight

    Description: Universal source code to formatted text converter

    A utility that converts sourcecode to HTML, XHTML, RTF, LaTeX, TeX, SVG, XML or terminal escape sequences with syntax highlighting. It supports several programming and markup languages. Language descriptions are configurable and support regular expressions. The utility offers indentation and reformatting capabilities. It is easily possible to create new language definitions and colour themes.

    Homepage: http://www.andre-simon.de

  23. Pingback: Tweets that mention i.nt.ro » Blog Archive » A command line syntax highlighter written in Ruby -- Topsy.com

  24. R3ally c00l stuff.
    It saved my 1 hour of coding time.

    On form_dropdown()
    modified $status_options to $fruit_options

    Thank U very much!!

    • That’s what the second example is doing. the first example was the wrong way to do it. and the second is showing how to put the query result into the array and using the dropdown helper.

  25. If your site gets any serious traffic at all do not do this. Your disk i/o will shoot through the roof. Make an array in a file and load that file to populate your arrays.

    • I don’t see what makes you think that. the disk io is only being hit by the controller. All the loop is doing is reordering the array into a format that can be read by the dropdown helper.

    • I heard it was quite a bit slower to use pygments over coderay on railscasts.org

      But as i’m planning to create a state based code editor so that code wouldn’t be running at the time so i’m sure it would be better on the command line then it can be on the web.

      I’ll be sure to have a look at it.

  26. Dude, first, u can append any gemset name to ur ruby version on the fly – so it’s just one line:

    rvm use ruby-1.8.7-p249@rails3

    secondly u can create a .rvmrc file in each project folder containing this command (change gemset name of course) which will automatically run when switching to this folder.

    http://rvm.beginrescueend.com/workflow/rvmrc/

    cheers!

  27. How about just type this this?

    rvm use 1.8.7@rails3 –create

    That will switch to Ruby 1.8.7 (latest patch) in the Rails 3 gemset and create the gemset if needed. Toss that in a “.rvmrc” file to have it run automatically when you enter that project.

  28. RVM will do all the gemset switching for you every time you cd into a project if you create an .rvmrc in your project directory.

    rvm –rvmrc –create RUBY@GEMSET

    You can also switch setups with the shorthand

    rvm RUBY@GEMSET

  29. I use .rvmrc files for automatic switching. And I use bundler with –path to vendor all of the project’s gems, –binstubs for keeping executables nearby, and appropriate $PATH settings. My rails3 gemset is there only for setting up new rails3 apps.

  30. don’t take this the wrong way but it would have been nice if you had a TL;DR version at the top so i could have skipped your train of thought.

  31. Pingback: Threaded Comment Class in PHP