This is the current SDSC Storage Resource Broker (SRB) Frequently Asked Questions (FAQ) page. The intent is to provide a good introduction to the SRB as well as more extensive and complete information and links for more experienced evaluators, administrators, and users. This had been seriously out of date (circa April 1998) until mid-2003 but is now being actively maintained.
The SDSC Storage Resource Broker (SRB) provides the abstraction mechanisms needed to implement data grids, digital libraries, and persistent archives for data sharing, data publication, and data preservation.
Many people, using only a subset of the features, find that using the SRB as global file system is its most compelling function. Users of multiple distributed computing systems find it to be an essential tool to easily and quickly access files from various locations. With the SRB's parallel I/O capabilities, the SRB will transfer files at least as quickly as any other mechanism, and usually faster.
Generally, the SRB is as secure as the DBMS used to store the MCAT and the physical resources used to store the data. User identity is as secure as the client host system.
Since the SRB server runs as a non-root user, it does not present a vulnerability to the OS if compromised. This is a big advantage over software systems that need to be run as root.
Starting in April 2004, SRB releases include a paper describing how to run a secure SRB system: readme.dir/srb-security.html.
Starting with SRB 2.1 (late May, 2003), we provide a mechanism by which SRB data files can be encrypted for both network transmission and storage. This system provides security against network eavesdropping for the data objects exchanged via the SRB and also improves the security of the data objects as they reside on any of the various physical resources. This system was implemented to be efficient, although the encryption and decryption is always a compute-intensive operation and there will be some unavoidable performance penalty. See http://www.npaci.edu/dice/srb/SecureAndOrCompressedData.html for more information.
For small files, transfers can be a little slower due to the additional interaction with the MCAT (especially to a remote MCAT), but you can use Containers and/or Sbload (Bulk load) and Sbunload (Bulk unload) to greatly speed these. For more information see, the man pages and Container questions in this FAQ. We are working toward enhancing Sget and Sput to do bulk operations to non-container files to speed them up too.
Matrix API can be used to define multiple SRB commands (and non-SRB grid services) as a single dataflow process and execute it on multiple servers. Matrix is available as a (SOAP/WSDL) web service. Matrix client programming for SRB is made very simple using a developer friendly Java API (less learning curve).
Hence, each client provides a unique way of handling metadata and their management. One of our goals is to provide uniform functionality across all client interfaces. But this requires a huge amount of programming which we are unable to dedicate at this time.
For cutting and pasting, there are utilities in both MySRB and in Scommands for copying metadata from one SRBobject to another, from one SRB collection to another and from one SRB collection to an SRB object. This is different from cutting and pasting as it is done internal to the SRB and not at the user-GUI.
Also another unique way of associating metadata for SRB objects is to do automatic extraction INSIDE SRB and storing them in the MCAT. This is done by writing simple templates (basically rules) that allows one to identify the metadata values in the SRB object and then extracting them and storing them as attribute-value pairs in the MCAT. We have done this type of templates for multiple file formats including DICOM, FITS, email, NSFAwardAbstracts and HTML files. This can be launched through the MySRB or through the Scommands.
The logical name space is the set of names of collections (directories) and data objects (files) maintained by the SRB. Users see and interact with the logical name space, and the physical location is handled by the SRB system and administrators. The SRB system adds this logical name space on top of the physcial name space, and derives much of its power and functionality from that.
Also see "Replicated Data Management user SRB" (GGF-4, February, 2002) at http://www.npaci.edu/dice/Pubs/SRBReplication.ppt.
The SRB keeps all the information about how the container is laid out in its Metadata Catalog (MCAT) and uses it when retrieving individual files. One can also modify and delete files in a container as though they are doing these operations on a normal file and the SRB takes care of the operation.
To answer a related question, the container is not "made" on the desktop and then loaded into the SRB. Instead it is constructed in situ on the resource. But what happens is that containers are normally assigned a logical resource which has two physical components: an archive resource such as the HPSS or roadnet-sam, and a cache resource such as a unix file system (eg. roadnet-unix). All the construction, file access and modifications are done on the cache resource and the storage of a full container or a non-needed container is done on the archive resource.
Hence, the archive sees a single file and the construction is done before getting into the archive on the cache resource (not on the users desktop) which is also a resource controlled by the SRB.
Containers grow in size and are pinched off into physical pieces by the SRB so that a container might look really long, but are actually multiple files of smaller sizes. Normally we recommend these pinching off to be around 100 MBytes or 200 MBytes but then can be in the GB range also. This is akin to blocks in a tape system.
What this means is that the user sees one container where they "put" in their data, but like a goods-train, the container is physically divided. Obviously individual files are much smaller than the container size. To give an example, in one of our collections, we have containers of size around 50 MBytes, storing files of sizes 2 MBytes each. Each container stores about 25 files in its physical blocks.
For 3.0 (September, 2003), we plan to release a Federated MCAT capability, where complete MCAT-enabled SRB systems can be integrated with other SRB federations. Each MCAT member of such a federation is called an SRB Zone.
In the simplest configuration, one can use the SRB client components (client utilities, GUI applications, and libraries) at a site and use SRB servers running at remote sites or hosts. A SRB client can connect to a specific (possibly remote) SRB server and access data objects that are under the control of that server and/or other servers in the federation. With the client-only setup one cannot access any data object at the local site through SRB. In the second setup, a site can have a SRB server running locally but without any MCAT service. In this setup, the local SRB server can provide access to local resources and contacts another SRB server that has MCAT service for retrieving the meta data about data objects. In the third configuration, one can have a SRB server and a MCAT database running locally. Any client can talk to any SRB server and need not necessarily talk to a local or 'nearest' server.
As for the files stored under the SRB, one can do it in multiple ways: The first and easist is to backup the storage resource directory (for example, the SRBVault directory), as an incremental backup. Depending upon your system, you can do it on the fly or during PMs. Weekly PMs will be helpful.
A second startegy is to make sure that there are replicated copies of the file in two distributed storage systems which hopefully don't share any hardware and are geographically separated. This can be done either under user-control (replicate only those that are needed) or under srbAdmin control (possible with 3.0.2 release soon) which will replicate all files that are modified to a particular backup resource.
A third strategy, is to use the zoneSRB is to run a backupZone at a remote site and back up to this zone from your zone. We are testing and finalizing some protocols for doing this.
On the SRB server server-side, you must open the port that the srbMaster is listening on plus at least 100 configurable ports.
The srbMaster listens on the port defined in srb.h or specified in srbPort environment variable (often set in the runsrb script). By default, srb.h has DefaultPort "5544" but this can be changed via the configure --enable-srbport=value option. Regardless of the DefaultPort value, the srbMaster will listen on the port specified in the srbPort environment variable value if it is defined. You can edit the runsrb script to change this.
(The clients also need to know the port number to connect to. For the Scommands they will default to the value in srb.h or will use the number specified in the srbPort line in each user's ~/.MdasEnv file.)
By default, the configurable ports are 20000 to 20199 (see mk/mk.config COMM_PORT_NUM_START and COMM_PORT_NUM_COUNT). You can change them via configure, for example: ./configure --enable-commstart=21000 --enable-commnum=200
On SRB client side, if you use the Sput or Sget -m option (server initiated connection for parallel I/O), the client's firewall needs to open at least 16 configurable ports. These are the same ports as the server uses, i.e. 20000 to 20199 by default and can be changed via configure, for example ./configure --enable-commstart=21000 --enable-commnum=16
But starting with 3.1.1, users can now use the -M option (client initiated connection for parallel I/O) which does not require the opening of ports on the client side.
For suggestions, comments, or questions concerning this FAQ, email Wayne Schroeder at schroede@sdsc.edu or the srb group at srb@sdsc.edu.
Last Modified on June 3, 2004