Merging and sorting files on Linux

Merging and sorting files on Linux

There are quite a lot of methods to merge and kind textual content recordsdata on Linux, however learn how to go about it depends upon what you are attempting to perform – whether or not you merely need to put the content material of a number of recordsdata into one huge file, or manage it not directly that makes it simpler to make use of. In this publish, we’ll have a look at some instructions for sorting and merging file contents and give attention to how the outcomes differ.

Using cat

If all you need to do is pull a bunch of recordsdata collectively right into a single file, the cat command is a straightforward alternative. All you need to do is sort “cat” after which listing the recordsdata on the command line within the order wherein you need them included within the merged file. Redirect the output of the command to the file you need to create. If a file with the required identify already exists, it is going to be overwritten by the one you might be creating. For instance:

$ cat firstfile secondfile thirdfile > newfile

If you need to add the content material of a sequence of recordsdata to an current file slightly than overwrite it, simply change the > to >>.

$ cat firstfile secondfile thirdfile >> updated_file

If the recordsdata you might be merging observe some handy naming conference, the duty might be even less complicated. You will not have to incorporate the entire file names for those who can specify them utilizing an everyday expression. For instance, if the recordsdata all finish with the phrase “file” as within the instance above, you could possibly do one thing like this:

$ cat *file > allfiles

Note that the command proven above will add file contents in alphanumeric order. On Linux, a file named “filea” could be added earlier than one named “fileA”, however after one named “file7”. After all, we do not simply must suppose “ABCDE” once we’re coping with an alphanumeric sequence; we have now to suppose “0123456789aAbBcCdDeE”. You can all the time use a command like “ls *file” to view the order wherein the recordsdata can be added earlier than merging the recordsdata.

NOTE: It’s a good suggestion to first make it possible for your command contains the entire recordsdata that you really want within the merged file and no others – particularly while you’re utilizing a wild card like “*”. And remember that the merged recordsdata will nonetheless exist as separate recordsdata, which you may need to delete as soon as the merge has been verified.

Merging recordsdata by age

If you need to merge your recordsdata primarily based on the age of every file slightly than by file names, use a command like this one:

$ for file in `ls -tr myfile.*`; do  cat $file >> BigFile.$$; carried out

Using the -tr choices (t=time, r=reverse) will lead to an inventory of recordsdata in oldest-first age order. This might be helpful, for instance, for those who’re preserving a log of sure actions and wish the content material added within the order wherein the actions have been carried out.

The $$ within the command above represents the method ID for the command while you run it. It’s fully pointless to make use of this, however it makes it almost unimaginable that you’ll inadvertently add onto the tip of an current file as an alternative of making a brand new one. If you utilize $$, the resultant file may seem like this:

$ ls -l BigFile.*
-rw-rw-r-- 1 justme justme   931725 Aug  6 12:36 BigFile.582914

Merging and sorting recordsdata

Linux supplies some fascinating methods to kind file content material earlier than or after the merge.

Sorting content material alphabetically

If you need the merged file content material to be sorted, you possibly can kind the general content material with a command like this:

$ cat myfile.1 myfile.2 myfile.3 | kind > newfile

If you need to maintain the content material grouped by file, kind every file earlier than including it to the brand new file with a command like this:

$ for file in `ls myfile.?`; do kind $file >> newfile; carried out

Sorting recordsdata numerically

To kind file contents numerically, use the -n possibility with kind. This possibility is beneficial provided that the traces in your recordsdata begin with numbers. Keep in thoughts that, within the default order, “02” could be thought-about smaller than “1”. Use the -n possibility while you need to be sure that traces are sorted in numeric order.

$ cat myfile.1 myfile.2 myfile.3 | kind -n > xyz

The -n possibility additionally permits you to kind file contents by date if the traces within the recordsdata begin with dates in a format like “2020-11-03” or “2020/11/03” (yr, month, day format). Sorting by dates in different codecs can be difficult and would require way more complicated instructions.

Using paste

The paste command permits you to be a part of the contents of recordsdata on a line-by-line foundation. When you utilize this command, the primary line of the merged file will include the primary line of every of the recordsdata being merged. Here’s an instance wherein I’ve used capital letters to make it straightforward to see the place the traces got here from:

$ cat file.a
A one
A two
A 3

$ paste file.a file.b file.c
A one   B one   C one
A two   B two   C two
A 3 B three C thee
        B 4  C 4
                C 5

Redirect the output to a different file to  reserve it:

$ paste file.a file.b file.c > merged_content

Alternately, you possibly can paste recordsdata collectively such that the content material of every file is joined in a single line. This requires use of the -s (sequential) possibility. Notice how the output this time reveals every file’s content material:

$ paste -s file.a file.b file.c
A one   A two   A 3
B one   B two   B three B 4
C one   C two   C thee  C 4  C 5

Using be a part of

Another command for merging recordsdata is be a part of. The be a part of command permits you to merge the content material of a number of recordsdata primarily based on a typical area. For instance, you may need one file that comprises cellphone numbers for a bunch of coworkers and one other that comprises their private e mail addresses and so they’re each listed by the people’ names. You can use be a part of to create a file with each cellphone numbers and e mail addresses.

One essential restriction is that the recordsdata should have their traces listed in the identical order and embody the be a part of area in every file.

Here’s an instance command:

$ be a part of phone_numbers email_addresses
Sandra 555-456-1234 [email protected]
Pedro 555-540-5405
John 555-333-1234 [email protected]
Nemo 555-123-4567 [email protected]

In this instance, the primary area (first names) should exist in every file even when the extra data is lacking or the command will fail with an error. Sorting the contents is useful and possibly quite a bit simpler to handle, however is just not required so long as the order is constant.

Wrap-Up

You have numerous choices on Linux for merging and sorting knowledge saved in separate recordsdata. The decisions could make some in any other case tedious duties surprisingly straightforward.

Copyright © 2020 , Inc.

Spread the love