1 Deploying the Virtual Analysis Facility
2 =======================================
7 Thanks to CernVM and PROOF on Demand, it is possible to deploy
a ready
8 to use Virtual Analysis Facility on your cloud (either
public,
private
9 or even your desktop computer).
11 On the server side,
"configuring" the Virtual Analysis Facility is
12 simply
a matter of starting
a certain number of CernVM
virtual machines
13 that will become part of your PROOF cluster. CernVM uses
14 contextualization to specialize each
virtual machine to be either
a head
15 node or
a worker node.
17 The Virtual Analysis Facility comes with many preconfigured things:
19 -
a HTCondor cluster capable of running PROOF on Demand
21 - certificate authentication
23 - your experiment
's software (if available on CernVM-FS)
25 Obtain the CernVM image and contextualization
26 ---------------------------------------------
28 ### Download the CernVM bare image
30 The Virtual Analysis Facility currently works with *CernVM Batch 2.7.1
31 64-bit*. This means that you need to have this CernVM image available
32 either on your local hard disk (in case of a desktop deployment) or in
33 your cloud's image repository.
35 > For convenience we provide the direct link
for the working versions:
37 > - [CernVM 2.7.1 batch 64-bit
for
40 > - [CernVM 2.7.1 batch 64-bit
for
43 > Images are gzipped. In most cases you
'll need to gunzip them before
44 > registering to your image repository.
46 ### Create VM configuration profiles
48 CernVM images are base images supporting boot-time customization via
49 configuration profiles called "contexts". Context creation can be
50 performed through the [CernVM Online](https://cernvm-online.cern.ch/)
51 website. The site is immediately accessible if you have a CERN account.
53 Go to your [CernVM Online
54 Dashboard](https://cernvm-online.cern.ch/dashboard), click on the
55 **Create new context...** dropdown and select **Virtual Analysis Facility
58 There's only
a few parameters to configure.
61 :
A name for your context (such
as *VAF Master
for ATLAS*). Any
name
65 : Use
this to configure either
a *master* or
a *slave*.
67 VAF master (only available when configuring
a slave)
68 : IP address or FQDN of the Virtual Analysis Facility master.
71 : Choose between *ALICE LDAP* (useful only
for ALICE users) or *Pool
72 accounts* (good
for authenticating all the other Grid users).
74 Num. pool accounts (only available when using pool accounts auth)
75 : Number of pool accounts to create.
78 : An URL specifying the proxy server
for CernVM-FS, such
as
80 be automatically discovered.
82 HTCondor shared secret
83 : VMs part of the same cluster should have the same value of this
84 field. It is used to mutually authenticate VMs and it is used like
a
88 : Current profile will be saved on the [CernVM Online
90 information there to be publicly available to other users,
type in
91 a value
for protecting the context with an encryption password.
93 You will have to create
a profile
for the **master** and the **slave**. Since
94 most of the configuration
variables are the same (like the *HTCondor
95 shared secret*) you can create one, clone it and change only what
's
98 Deploy it on the cloud
99 ----------------------
101 Provided you have access to a certain cloud API, you'll need to
102 instantiate
a certain number of CernVM batch images with proper
103 contextualization: one
for the master,
as many
as you want
as slaves.
105 CernVM supports contextualization through the
"user data" field
106 supported by all cloud infrastructures.
108 Each cloud infrastructure has
a different method of setting the
"user
109 data". The following description will focus on:
113 - OpenStack (such
as the [CERN Agile
114 infrastructure](https:
117 the
open [Eucalyptus](http:
119 clouds support such interface and tools
121 ### Download the CernVM Online contextualizations
123 Go to the CernVM Online Dashboard page where you have previously
124 customized the contexts
for your master and your slaves.
126 Click on the rightmost button on the
line of the desired context and
127 select **Get rendered context** from the dropdown: save the
output to
a
128 text file (such
as `my_vaf_context.txt`, the
name we will use in the
129 examples that follow). This file will be subsequently passed
as the so
130 called
"user-data" file to the cloud API.
132 > Repeat the operation
for both the master context and the slave
135 ### OpenStack API: nova
137 Example of
a CernVM instantiation using `nova`:
142 --image cernvm-batch-node-2.6.0-4-1-x86_64 \
143 --key-
name my_default_keyparir \
144 --user-
data my_vaf_context.txt \
Name-Of-My-New-VM
147 The `--user-
data` option requires the context file we
've just
150 ### EC2 API: euca-tools
152 Example of a CernVM instantiation using `euca-tools`:
156 --instance-type m1.xlarge \
157 --key my_default_keyparir \
158 --user-data-file my_vaf_context.txt \
159 cernvm-batch-node-2.6.0-4-1-x86_64
162 The `--user-data-file` option is the context file we've just downloaded.
166 An
example VM definition follows:
170 EC2_USER_DATA=
"<base64_encoded_string>",
175 IMAGE=
"cernvm-batch-node-2.6.0-4-1-x86_64",
178 NAME=
"CernVM-VAF-Node"
180 NETWORK=
"My-OpenNebula-VNet" ]
185 The `<base64_encoded_string>` requires the base64 version of the whole
186 downloaded context definition. You can obtain it by running:
188 cat my_vaf_context.txt | base64 | tr -d
'\n'
190 Network security groups
191 -----------------------
193 In order to
make the Virtual Analysis Facility work properly, the
194 firewall of your infrastructure must be configured to allow some
197 Some ports need to allow
"external" connections
while other ports might
198 be safely opened to allow only connections from other nodes of the
199 Virtual Analysis Facility.
201 ### Ports to
open on all nodes
204 : Allow **TCP and UDP range 9600-9700** only between nodes of the Virtual
207 Only HTCondor and PoD communication is needed between the nodes. No HTCondor
208 ports need to be opened to the world.
210 ### Additional ports to
open on the front end node
213 : Allow **TCP 443** from all
216 : Allow **TCP 22** from all
218 No other ports need to be opened from the outside. Your definition of
219 *allow from all* might vary.
220
Small helper to keep current directory context.
constexpr std::array< decltype(std::declval< F >)(std::declval< int >))), N > make(F f)
void variables(TString dataset, TString fin="TMVA.root", TString dirName="InputVariables_Id", TString title="TMVA Input Variables", Bool_t isRegression=kFALSE, Bool_t useTMVAStyle=kTRUE)