Most clusters run on a Linux Operating System (OS). Other examples of operating systems are Windows or Mac OSx. Linux operating systems are also known as “Distributions” because they basically are a collection of software packages, which are “distributed” to the users. Popular Linux distributions are: Ubuntu, Fedora, RedHat, CentOS and openSuse, but there are many more (see for example distrowatch for an overview of all distributions).
The (mis)use of the name¶
Within this software collection, almost always a package called Linux is found, which is the “kernel”. This particular piece of software is very important, because it handles the connection between the hardware and all the other pieces of software. However, when people speak about Linux, they typically mean a distribution containing Linux. It is like calling an airplane by the name of its engine, which is a bit awkward, but this is just how it is.
Most users never interact with the “kernel”, they experience the pieces of software that provide the user interface (UI). UIs come in two flavours, the Graphical User Interface (GUI) and the Command Line Interface (CLI). When you install a Linux Distribution on your own computer, it typically comes with a GUI or desktop environment, e.g. Gnome and KDE. Typically clusters only offer a CLI, which is basically a terminal (“window”) which presents you a prompt, where you can type a command.
Linux file structure¶
- Directories (folders) are delimited with a
/(instead of a
- The top most directory (or root) is called
/. Hard-drives, other media, or even remote file systems can be mounted anywhere. For example a USB drive is commonly mounted at
/media/mystick. In contrast to Windows where each drive has a different name in the file tree (e.g.
- All characters can be used in directory and file names, but it is best not to use exotic characters (e.g.
- File (and directory) names starting with a
.are hidden files, and are not visible by default.
- Files (and directories) have owners and permissions, preventing misuse or accidental removal.
- Each user has his or her own home-folder which is typically located at
To interact with the Linux operating system a Shell is used. In this command line environment, commands given by the user are interpreted by the Shell. Several Shells exist, each with its own syntax and built-in commands. One of the most popular is the Bash-shell.
Before introducing several features of the Bash-shell it is useful to discuss the basic controls. In principle the only form of control is through the keyboard. The exception is copying and pasting (parts) of commands, which is exclusively done using the mouse.
Crtl+v, etc. have a different meaning (see below). Specifically, highlighted text is automatically copied to the clipboard. It is pasted using the middle mouse button. Alternatively, the
paste command can be reached through the right mouse button. Several other basic controls are listed below.
Each command follows a prompt that is displayed by the terminal. For example:
A command is followed by
returnto execute the command.
TAB, Bash will try to auto-complete your typed command, pressing
TABtwice will print auto-complete suggestions.
To stop a command
Ctrl+cis used. It is advised not to use
Ctrl+swhich, respectively, sleeps or freezes a command.
To move the location of the cursor (by definition only possible inside the current command) the keys , ,
Endcan be used.
To move through the history the keys and are used. Alternatively, the history can be searched using
Crtl+rfollowed by keywords. To progress through the selection, use
Crtl+r. It is noticed that when pressing
returnthe selected command is directly executed. Use the to edit the selected command in stead.
Crtl+dis used, this is equivalent of typing
In general a command consists of three parts: the command, options, and input arguments. Without going into detail, we consider an example. The command
[[email protected] ~]$ tar -czp -f outputname.tar.gz foldername
creates a compressed archive. This command can be divided as follows
prompt $ command <options> arguments # prompt: [[email protected] ~]$ # command: tar # options: -czp -f outputname.tar.gz # argument: foldername
From this, we observe that different parts of the command are separated by spaces. Also, we observe that options begin with a “
-“. Furthermore some options require an argument. As is observed for the
-f option, the argument directly follows the option. Finally, it is remarked that options are commonly combined. In the command above the options
-p are grouped to
Most commands have a manual page. This page is found using
[[email protected] ~]$ man commandname
This opens a simple text-viewer. Using the / ,
PageDown, and the scroll wheel on the mouse one can scroll through the manual page. To search the manual use
/ followed by your query, and
n to progress through the search results. To close the editor type
man command prompts accept the same commands as the
Alternatively (or sometimes exclusively), a (short) manual page can often be printed to the screen. This is provided by the command itself, i.e.
Several useful commands are listed, the most important ones are elaborated in the following sections.
|pwd||print the current working directory|
|ls||list directory contents|
|du||report disk usage of files|
|find||search and find files|
|mkdir||make a directory|
|cp||copy files (and directories with the
|mv||move (rename) files and directories|
|rm||remove files (and directories with the
|cat||concatenate files and print on the standard output|
|head||print the first few lines of a file|
|tail||print the last few lines of a file|
|grep||Globally search a Regular Expression and Print, use this for simple output filtering|
|less||a text-file viewer|
|vi||a text-file editor|
|top||display Linux tasks|
|ps||report a process status list|
|which||shows the full path of (shell) commands|
|chmod||change file’s permissions|
The change directory (cd) command can be used to navigate through the file tree by changing the current directory. Let us use an example of a file tree such as displayed above. Typically the terminal will start in the user’s home folder:
where the current directory is indicated between brackets:
[ ... ]. Notice that
[ ~ ] is the abbreviation of
[ /home/username ]. We can now change directory by typing
where the change of directory is specified in absolute sense. Alternatively, we can use a relative file path to do the same. In a relative file path definition use
./to denote the current directory
../to denote the one directory up
../../to denote the two directories up
The previous command could therefore also be specified as follows
./ is not strictly necessary, i.e.
is equivalent. If we would now like to change the directory to
~/sim/sub2 we could use a relative path definition:
Notice that it is convenient to use relative file definitions inside code, as they are not dependent on the file structure. For example if
../sub2/ would have been included in a code, the code is not influenced by changing
test. In contrast, if we had used an absolute path, the code would fail. This is particularly important when running the same code or script on different machines (running on different platforms), such as in the case of a desktop computer and a cluster.
The contents (files and directories) of the current directory are listed in “matrix” format using
[[email protected] ~]$ ls
Depending on the shell and the terminal that are used, executable files, files, and folders are highlighted differently. By specifying (optional) input arguments, the contents of directories other than the current directory are listed. For the example above
[[email protected] ~]$ ls ~/sim/sub1
would list one file,
More detailed file information can be obtained using the
-lh option. For example
[[email protected] ~]$ ls -lh ~/sim/sub1
would output for example
-rw-rw-r-- 1 exuser exgroup 26K Sep 18 11:57 output.log
whereby the columns indicate:
- time/data modified
Or more specifically
In Linux each file/directory/link has permissions. In the output of ls -l these permissions break down as follows:
a. - -/d/l b. rw- user c. rw- group d. r-- other
Herein, the first item specifies if the item is a file (
-), a directory (
d), or link (
l). The next three group specify the permissions of the file’s owner, its group (both specified in 3.), and other users. Herein
rcorresponds to read permission,
wto write permission, and
xto execute permission. In this case the user
exuseris allowed to read and write the file. The same permission resides with users in the group
exgroup, while other users may only read the file.
From this it follows that an executable in Linux is nothing more than a file (e.g. plain text) with the right permissions. The
extensionis in principle meaningless. The file can be made executable using the command chmod, e.g.
[[email protected] ~] $ chmod u + x output . log
More information is found online.
The permissions can be directly specified (instead of added or removed) using a numerical notation:
- 4 = r (read)
- 2 = w (write)
- 1 = x (execute)
The desired permissions are set by adding the numerical value of those permissions you would like to allow. For example:
The number of directories and links inside the item. For a file the counter is always equal to one.
The user and group name to which the file belongs.
The size of the file. Because we have used the
-hoption, this is in human readable format (i.e. kilo-, mega-, giga-, or terabytes).
The time and date of the last modification to the file.
The file name
The copy (cp), remove (rm), and move (mv) commands are used to do file operations, directories are created using mkdir.
To copy a file:
[[email protected] ~] $ cp source destination
For example to make a backup of the
output.log file, used as an example in the previous section, in the same folder:
[[email protected] ~] $ cp ~/ sim / sub1 / output . log ~/ sim / sub1 / output . bak
If this command is issued from the
~/sim/sub1 directory, the relative command
[[email protected] sub1] $cp output . log output . bak
If a directory is copied, the
-r (recursive) options should be specified to also copy all the content of the directory. For example:
[[email protected] ~] $ cp -r ~/ sim / sub2 ~/ sim / sub3
Analogous to the copy command, a file is removed using
[[email protected] ~] $ rm filename
To remove a directory use
[[email protected] ~] $ rm -r directoryname
Notice that, in principle, removed files cannot be recovered, i.e. there is no such thing as a recycle bin when removing files from the command line. For convenience, wild cards can be used. One example of a wild card is
*. Simply said, the
* replaces zero or more characters. For example to remove all
.log files in the
[[email protected] sub1] $ rm *. log
which in this case would remove only
output.log. In contrast, the command
[[email protected] ~] $ rm -r ~/ sim / sub *
would remove all the directories beginning with
sub, which, in this case would be both the directories
sub2 including all their content.
Never use the command
[[email protected] ~] $ rm -r *.*
since it removes all files and directories up and down the file tree (including those that are hidden) to which the user has permissions. Thus, all your files on the computer are permanently lost. The
.* in the wild card string also matches
.. which causes the remove command to also remove higher directories. This mistake is typically made by DOS users, where it has a different meaning. In a Linux environment, rm -r * is usually the intended command, i.e. empty the current directory.
To move a file to a different location (or to rename a file) the following command is used (for files and directories)
[[email protected] ~] $ mv source destination
For example to rename the
[[email protected] sub1] $ mv output . log output . txt
To move this file to the
[[email protected] sub1] $ mv output . log ../ sub2 / output . txt
Redirecting output is a powerful capability of (among others) Bash. This way the output that is printed to standard Input/Output (i.e. the screen) can be intercepted and used differently. The output can be transferred to another command using
|, or it can be stored to a file using
> or appended to a file using
For example to find the lines in which error messages are included in the file
output.log, we could use:
[[email protected] sub1]$ cat output.log | grep -n "error"
The cat command outputs the contents of the
output.log file. The
| intercepts this output and forwards it to the The grep command, which and prints the lines matching the pattern
error (including the line numbers, because of the
These lines can be stored to a file
[[email protected] sub1]$ cat output.log | grep -n "error" > error.log
To get the current directory as the top line of the file, we do
[[email protected] sub1]$ pwd > error.log
which empties or creates the file
error.log and prints the current working directory. The file is now appended with the error lines by
[[email protected] sub1]$ cat output.log | grep -n "error" >> error.log
As a final note, the Bash shell considers two outputs, the
stdout and the
stderr. Any program can write to these outputs, and typically both are shown in the terminal window. It is possible to redirect each output differently, but this is considered outside the scope of this document.
Bash commands, some of which are introduced above, can be combined in a script. Such a script is an executable plain-text file. Below, we consider a very simple script myscript. We first make the file and give the user executable permissions, e.g. by
We then edit the file’s contents to
#!/bin/bash # # This is a very simple script varname="Hello world" echo $varname
In this script, the first line selects the environment in which the script is programmed, in this case the bash environment. Except for the shell-definition on the first line, any statement that follows a
# is a comment and is not evaluated. The last two lines are the only lines of code, in which the string
"Hello world" is assigned to the variable
varname. In the second line, the echo command prints the variable
varname, and thus
"Hello world", to the screen. the variable name is preceded by a
$, to get the value of a variable.
If a script is often used, it can be useful to make it a “global” script, such that it can be used in the same way as for example cd. To this end, it is common to create a directory
bin in the home folder:
[[email protected] ~]$ mkdir ~/bin
Next, Bash has to look for executable files in this directory. To this end, we add the new directory to the
[[email protected] ~]$ export PATH=$HOME/bin:$PATH
$HOME is equivalent to
Beware that copy/pasting code from this page may not transfer correctly.
To avoid having to specify this after every new login, this (and other commands) can be added to the file
~/.bashrc. This file is evaluated at the beginning of each login. This file is commonly of the following format:
# .bashrc # Source global definitions if [ -f /etc/bashrc ]; then . /etc/bashrc fi export PATH=$HOME/bin:$PATH