Frequently asked questions
How can I get time on Melbourne Bioinformatics (formerly VLSCI) clusters?
Please use our application form.
How do I see how much disk space I have used, or have left?
Use the command
mydisk to see how much is used and left in your home directory. If you are in several projects, all are listed.
I cannot find the application or binary I need, but I believe it is installed.
Melbourne Bioinformatics uses a tool called
modules to set up your environment for any particular application. While we arrange for some to be loaded by default, you probably need to choose to load the ones you want to use. You can see a full list of applications supported by modules by typing:
You can then choose to load or unload the default version of one of those modules. For example, to load gcc, use:
module load gcc
You can pick a specific version of the software by specifying the full name. For example:
module load gcc/4.4.3
The software I need to use has some license restrictions and I find I cannot use it.
Some applications installed on the Melbourne Bioinformatics systems are subject to license restrictions. These restrictions may limit who can use them (eg. academic use only) or require you to cite the application when publishing work that involved use of the application. When you sign your personal account application form, you agree to comply with any such restrictions. Melbourne Bioinformatics keeps a register of specific applications and who has formally agreed to the terms that relate to that application. Please send an email to the help desk to be added to that register. You need to state clearly that you have read the license restrictions and agree to comply with them.
Unfortunately, the restrictions associated with some applications prevent some people from being allowed to use them at all.
I am in two (or more) different projects. How do I handle that?
Melbourne Bioinformatics resources are made available to projects rather than to individuals. Normally, your personal home directory sits within the project disk space. However, if you are associated with more than one project, that home directory can clearly only exist within one of the project spaces. This means that before you login you need to let the system know which project you want to use. All usage and permissions will be set-up based on which project the system thinks you are logged in under.
To change your default project, login to the Melbourne Bioinformatics project management site select
My Projects and click
Make default on the project which you want to login under. You need to do this the first time you login to a new project so a new home directory will be set-up for you.
For example, assume that you are in two projects, VR5200 and VR9999 and your user name is jsmith. Your home directory is probably
/vlsci/VR5200 but you are now working on VR9999. When you logon, you will find (using the pwd command) that you are in
/vlsci/VR5200/jsmith. If you save work from VR9999 in this location, your disk usage will be taken from VR5200 and any queued jobs will also be charged to your default project.
To avoid allocating disk and CPU usage to the wrong account, you can use the online tool to change your default project before logging in. It is possible to work with multiple projects without changing your default project, but you will need to be careful with group permissions and keeping track of which project you are charging jobs to. The safest way to launch jobs is to explicitly specify which project to charge. For instance, when you launch jobs on the cluster, add the command
#SBATCH -A VR5201 so the CPU usage is assigned to the correct project.
Why do I not have a home directory?!
There are two situations where you might be asking this question.
You're a new user.
You've just created a Melbourne Bioinformatics account. You want to upload some files (eg. using WinSCP or another file transfer application) that you'll subsequently use in your compute jobs. But when you attempt to upload them you can't find a home directory for your account and thus there is nowhere to upload your files.
When Melbourne Bioinformaitcs accounts are created, a corresponding home directory is not created immediately. The home directory is created upon your first login to a Melbourne Bioinformatics system using ssh. On Mac or Linux you can connect by typing the following into any terminal application:
On Windows, you can use
Putty to connect via ssh.
During your first login via ssh, you will see the following message:
Could not chdir to home directory /vlsci/<yourproject>/<yourusername>: No such file or directory Home directory created
The first line shows that your shell initialisation scripts couldn't find your home directory; and the second shows that it was subsequently created. Once this is done, you may exit your ssh session (type
exit) and then resume uploading files.
The second possible scenario:
You've joined an additional project.
You've already been a member of one project (say,
PROJ1) and have a home directory at
/vlsci/PROJ1/<yourusername>. You've now joined a new project (
PROJ2), but you don't find a home directory for yourself at
/vlsci/PROJ2/<yourusername>, even after logging in via ssh. Where is your home directory for
On our systems, your home directory is considered to be inside your default project. When you are in only one project, that project must be the default. Once you join a second (or subsequent) project, you must then decide which you wish to be your default.
To check or adjust your default project: Go to https://my.vlsci.org.au/karaage/profile/accounts/ Click your username in the "Account" column * At the bottom of the page your projects are listed. Click the "Make Default" on the row of the project which is to be your default.
Once you set a project to be your default, at your next login, your home directory will be considered to be inside that project. For example if you set your default project to be
PROJ2, and there is no home directory at
/vlsci/PROJ2/<yourusername>, then at your next login this directory will be created.
Note that if you change your default project and thus your home directory location, you won't see any settings you have stored in your previous home directory, for example your personal
.ssh/config or your
.vimrc. You may want to copy whichever of these files is important to you from your previous home directory to your new one so you can keep the settings you've configured.
See also changing your default project.
Warning: in some scenarios you may have queued jobs which reference paths inside your home directory. If you change your default project and thus your home directory before these jobs begin to run, you may see unexpected results.
Running jobs under the scheduler
My job is taking longer than I expected. It might run out of walltime.
Jobs that run out of walltime will be automatically killed by the scheduler. As soon as you think this might happen, send a message to firstname.lastname@example.org telling us the job number and how much extra time it might need. We can sometimes extend the walltime and will do so if we can.
My job won't run, what is the problem?
There are a number of potential problems.
A common problem is to have scheduled a job on a machine for which your project does not have permission to run jobs.
I know I have permission for the machine but my job is still deferred: Melbourne Bioinformatics put a limit on how many jobs and CPUs you use simultaneously. Check the
mylimitscommand to see what those limits are.
I don't have many jobs running and am hardly using any CPUs but my job is still deferred: If you have requested special hardware needs (ie) memory or cpus on the same node, you can often wait a bit longer until those resources are available. Occasionally, people ask for resources that will never become available, such as too many cores on the one node or more RAM than is available. The scheduler, ever patient, will keep trying to run your jobs for you in the vain hope that we add some more hardware. Not a good idea! Please
scancelthe job and reconfigure it.
When will my job start. If you have jobs that are not starting, run
squeue -u username --start(replace
usernamewith your username). This will give a list of all your pending jobs with the scheduled start time. Only the first five non-blocked jobs per user are considered for scheduling, so you will only see (at most) five of your jobs with a scheduled start time. If none of your pending jobs have a start time, please investigate the other issues mentions here.
My job is stuck in the Idle queue and there are plenty of CPUs unused: Sadly, sometimes having a suitable number of CPUs available is not enough. The scheduler manages a number of resources as well as CPU, most importantly memory. If a running job is using a lot of the memory on a node, it can leave none for the other CPUs. This means that those other CPUs cannot be used productively. Smart users tune their memory requirements so they can sneak in and use these idle cpus but if your job really needs more memory than available there is not much that can be done about it.
Please don't hesitate to ask the help desk what is going on. It is quite possible that there is a problem that’s easily fixed if you bring it to our attention or we can suggest an alternative, and possibly more productive way to run your jobs.
Data and file space
Is my data backed up?
Data stored on our systems is backed up. However we strongly recommend that you keep your own backups of important data. The storage system uses RAID and has redundant disks but that may not be enough if we are unlucky. However, Melbourne Bioinformatics may be able to recover files you have accidentally deleted recently. Please contact the help desk as soon as you realise you need to recover a deleted file. Note that this is not the same as a backup protecting against systems failure. If Melbourne Bioinformatics has an unlikely combination of disk failures, your files could be lost if you don't have your own backup system in place.
I need a large data set for my research. How might that work?
Melbourne Bioinformatics may be able to download and maintain the data set for you. In many cases, several users may need the same data and it is clearly a good idea to have just one copy to conserve disk space and network bandwidth. The assumption here is that the data concerned is publicly available and you need only read access. Please contact the help desk.
My project has a data set that all project members need to use. Where should I put it?
Just above your home directory, you will see the home directories of other members of your project. You will also see a directory called
shared at that level. All your project members will be able to write into the
shared directory and files written there will, by default, be set so that other project members can use them. Space taken up by files in this shared directory contributes to your project's total allowed disk space usage.
Note that files you create under the
shared directory will be readable by other members of your group but not, by default, writable. You can change their mode with the
chmod command. For example, to ensure other group members can write to a file called
myfile, you would use
chmod g+w myfile. You can also alter the default mode that newly created files get using
umask. Generally, these commands need be treated somewhat carefully!
How do I see how much disk space I have available?
At Melbourne Bioinformatics, disk space is granted to projects so you share the disk space with other members of your project. Each project has a disk volume and you can find the status of that volume with the
What happens if I exceed my disk quota?
If the disk volume for your project is full, then neither you nor applications running under your name will be able to write to the disk. Other members of your project will also not be able to write.
Why do I need to tell the system how much memory I need?
Memory is a limited resource. If we tell the scheduler how much each job needs, it can fit jobs into slots on the system most efficiently. If you don't define your memory needs, the scheduler assumes you need the default of 2GB. That’s not a lot and many applications need more. If you actually try and use more than the scheduler thinks you should, it will kill your job. On the other hand, if you ask for a lot more than you need, you make it harder for the scheduler to squeeze you in and you waste resources. So it's important to get it right.
Remember, if you ask for more memory per core than that core's fair share of what’s available (typically 4GB, but depending on the system, as high as 16GB per core), then you probably make some other core on that node unavailable. Under those conditions, we need to charge your quota for those under-utilized cores as well as the ones you are actually using. Sad but true. Conversely, if you use less than typical memory per core, you often find that the scheduler can squeeze your job into one of those under-utilized cores and it gets to run straight away.
How do I tell the system how much memory I need?
Please see the section on managing memory.