Sunday, 28 June 2015

AWS - Liferay Cluster configuration ( auto scaling capable )


Default Liferay clusters run using multicast communications that enable dynamic member discovery using EhCache peerDiscovery over multicastGroups .  In fact, if you are running several Liferay clusters in a same network you should make sure they aren't sharing multicast groups ( configured in portal.properties under  multicast.group.address and multicast.group.port  keys ).

By contrast, cloud platforms don't allow multicast traffic between server instances. As a consequence, Liferay's default clusterlink setup doesn't work ( well, instances will work but cache and index changes will not be replicated across nodes ). There are no differences in that aspect between AWS, Azure, Google, etc.

Lots of articles in the Internet explain how to configure a Liferay cluster and probably one of the  most detailed has become part of Liferay documentation. There are even articles explaining how to configure a Liferay cluster using unicast traffic .  But I haven't found any article explaining how to configure a real autoscaling group with an unique cluster configuration among all nodes so new instances can be cloned and started automatically in any order.

As explained in those articles, Liferay needs several of their modules to be specifically configured to work in a clustered environment:

  • Database: there are no special needs regarding database setup to enable a Liferay cluster apart of using the same database instance across all Liferay nodes. But if you are looking for an unbreakable Liferay you can explore database cluster options. There are also horizontal scaling options supported by Liferay such as Database Sharding. For our purpose, we have setup an Amazon RDS server with a Multi AZ deployment to reduce maintenance downtime. 
  • Document Library: must be shared across cluster instances. In a typically hosted cluster an NFS volume would be used, but Liferay provides the perfect fit for our Amazon deployment: it supports Amazon S3 buckets. Even if you already have a running Liferay instance and want to migrate data to S3 this article explains the process.
  • Quartz: since version 6.1 of Liferay there is no need of an explicit cluster configuration. When Liferay detects that clusterlink is enabled it creates all database tables automatically
  • Index: Lucene indexes must be updated across the cluster, usually enabling  lucene.replicate.write=true when Cluster Link is enabled. A more refined and scalable option is using Solr as an external indexing server.
  • Cluster Link:  is Liferay's  cluster communication mechanism, it's enabled when cluster.link.enabled=true and it shares EhCaches´s peer discovery mechanism based on JGroups to discover cluster members. 
  • EhCache: also uses JGroups to discover peers, which needs to be configured to work with unicast only channels.

All this topics have already been covered in linked articles, but there is a flaw in all of them:  JGroups configuration.  The configuration needs in TCPPING section a static list of initial members of the cluster. But JGroups supports some other implementations of the PING operation that doesn't need to now about this list of members. Two of them fit perfectly in our case:
Both implementations use a shared resource (an S3 bucket or a database table ) to allow cluster members to register and make them aware of the other members. As the database is the only non-clustered system in our platform and we want to save as much cpu time as possible we've chosen to use S3PING in a S3 bucket created ad hoc.

Having that in mind, by adding following lines (replacing the actual db server values and path to Liferay installation with environment specific values )  in every node portal-ext.properties ...

cluster.link.enabled=true
cluster.link.autodetect.address=YOUR_DB_HOST:YOUR_DB_PORT
cluster.link.channel.properties.control=PATH_TO_LIFERAY_WEBAPP/WEB-INF/classes/jgroups/tcp.xml
cluster.link.channel.properties.transport.0=PATH_TO_LIFERAY_WEBAPP/WEB-INF/classes/jgroups/tcp.xml

ehcache.bootstrap.cache.loader.factory=com.liferay.portal.cache.ehcache.JGroupsBootstrapCacheLoaderFactory
ehcache.cache.event.listener.factory=net.sf.ehcache.distribution.jgroups.JGroupsCacheReplicatorFactory
ehcache.cache.manager.peer.provider.factory=net.sf.ehcache.distribution.jgroups.JGroupsCacheManagerPeerProviderFactory

ehcache.multi.vm.config.location.peerProviderProperties=/jgroups/tcp.xml
ehcache.multi.vm.config.location=/ehcache/liferay-multi-vm-clustered.xml

net.sf.ehcache.configurationResourceName.peerProviderProperties=/jgroups/tcp.xml
net.sf.ehcache.configurationResourceName=/ehcache/hibernate-clustered.xml

lucene.replicate.write=true
index.search.writer.max.queue.size=9999999

dl.store.impl=com.liferay.portlet.documentlibrary.store.S3Store
dl.store.s3.access.key=S3_ACCESS_KEY
dl.store.s3.secret.key=S3_SECRET_KEY
dl.store.s3.bucket.name=S3_DL_BUCKET_NAME


... and creating a file  tcp.xml  under   PATH_TO_LIFERAY_WEBAPP/WEB-INF/classes/jgroups/ with this content ...

<config xmlns="urn:org:jgroups"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/JGroups-3.1.xsd">

    <TCP singleton_name="liferay"
         bind_port="7800"
         loopback="false"
         recv_buf_size="${tcp.recv_buf_size:5M}"
         send_buf_size="${tcp.send_buf_size:640K}"
         max_bundle_size="64K"
         max_bundle_timeout="30"
         enable_bundling="true"
         use_send_queues="true"
         sock_conn_timeout="300"

         timer_type="old"
         timer.min_threads="4"
         timer.max_threads="10"
         timer.keep_alive_time="3000"
         timer.queue_max_size="500"

         thread_pool.enabled="true"
         thread_pool.min_threads="1"
         thread_pool.max_threads="10"
         thread_pool.keep_alive_time="5000"
         thread_pool.queue_enabled="false"
         thread_pool.queue_max_size="100"
         thread_pool.rejection_policy="discard"

         oob_thread_pool.enabled="true"
         oob_thread_pool.min_threads="1"
         oob_thread_pool.max_threads="8"
         oob_thread_pool.keep_alive_time="5000"
         oob_thread_pool.queue_enabled="false"
         oob_thread_pool.queue_max_size="100"
         oob_thread_pool.rejection_policy="discard"/>

    <S3_PING location="S3_JGROUPS_BUCKET_NAME" access_key="S3_ACCESS_KEY"
             secret_access_key="S3_SECRET_KEY" timeout="2000"
	     num_initial_members="2"/>


    <MERGE2  min_interval="10000"
             max_interval="30000"/>
    <FD_SOCK/>
    <FD timeout="3000" max_tries="3" />
    <VERIFY_SUSPECT timeout="1500"  />
    <BARRIER />
    <pbcast.NAKACK2 use_mcast_xmit="false"
                   discard_delivered_msgs="true"/>
    <UNICAST />
    <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
                   max_bytes="4M"/>
    <pbcast.GMS print_local_addr="true" join_timeout="3000"
                view_bundling="true"/>
    <UFC max_credits="2M"
         min_threshold="0.4"/>
    <MFC max_credits="2M"
         min_threshold="0.4"/>
    <FRAG2 frag_size="60K"  />
    <pbcast.STATE_TRANSFER/>

</config>


... Liferay should be able to run in an AWS EC2 cluster without any instance-specific configuration, making these instances  autoscaling capable.

The main drawback of this solution is that S3 traffic will cost around 4$ a month per cluster node (it will be reading the S3 every 3 seconds, which generates around 900K requests a month to the S3 ).

Luckily there is a third option:  if the use of a Solr cluster based on SolrCloud is under consideration, we have developed an implementation of the JGroups PING operation over ZooKeeper, faster and more reliable than any other implementation, available in our github and ready to be downloaded from Maven Central Repository. Our next article will dig deeper into how it works.

Finally, Liferay must be marked as <distributable/> in web.xml,  application server can be configured to balance HttpSessions between nodes ( it's not necessary, actually ) and the Elastic Load Balancer should be configured with session affinity based on JSESSIONID cookie. By the way, this configuration should run in both Liferay 6.1 and 6.2 CE and EE.


Saturday, 25 April 2015

AWS - Setting up a VPC with public and private subnets (III)


This is a network diagram of our Liferay cluster.  One ELB, two web servers configured as reverse proxies balancing traffic to the application servers, and an Amazon RDS instance as database choose. Liferay document library will be stored in an S3 volume.


There are other factors to be considered in this network deployment.  Availability zones and their impact in our portal will be one of them. If you are looking for help on setting up a Liferay cluster with higher availability requirements, we will be happy to help ( for a fistful of dollars, of course ) .

Wednesday, 15 April 2015

AWS - Setting up a VPC with public and private subnets (II)


We have made some more changes in our installation since our previous article.

First, best option to avoid paying Elastic IPs fees when the instances are stopped ( and they will stay stopped a lot of hours, only a total of 750h a month of computing time shared among all instances )  and assigning a DNS name to the machine is creating a DDNS account in noip.com and setting up a client. AWS provides this howto ( with a couple of missing points )

# Install noip client
[ec2-user@ip-10-0-0-130 ~]$ sudo yum install epel-release
[ec2-user@ip-10-0-0-130 ~]$ sudo yum-config-manager --enable epel
[ec2-user@ip-10-0-0-130 ~]$ sudo yum install -y noip

# Configure it
[ec2-user@ip-10-0-0-130 ~]$ sudo noip2 -C

# Setup noip as an startup service
[ec2-user@ip-10-0-0-130 ~]$ sudo chkconfig noip on
[ec2-user@ip-10-0-0-130 ~]$ sudo service noip start


As we are going to use an Elastic Load Balancer to balance traffic among our webservers we have configured an additional Security Group, and slightly modified existing configuration (all internal trafic allowed for now):

NAT Instance  -  INBOUND:   ALLOW SSH  (22) TRAFFIC FROM  0.0.0.0/0
                 INBOUND:   ALLOW ANY       TRAFFIC FROM  10.0.0.0/16  ( our VPC ) 
                 OUTBOUND:  ALLOW ANY       TRAFFIC TO    0.0.0.0/0  

Load Balancer -  INBOUND:   ALLOW HTTP (80) TRAFFIC FROM  0.0.0.0/0
                 OUTBOUND:  ALLOW ANY       TRAFFIC TO    0.0.0.0/0  

Default SG    -  INBOUND:   ALLOW ANY       TRAFFIC FROM  10.0.0.0/16
                 OUTBOUND:  ALLOW ANY       TRAFFIC TO    10.0.0.0/16  


Next article will delay a while until we sort out with Amazon Customer Services the absurd quota of only 2 simultaneous instances they have applied to our account.

Monday, 13 April 2015

AWS - Setting up a VPC with public and private subnets ( in AWS Free Tier )


One of our clients asked us how easy would be building an auto scaling Liferay cluster in AWS. The answer to the question can seem simple, but is far from it.

AWS doesn't allow multicast traffic between EC2 instances (each one of their virtual servers running in EC2) even when that instances belong to the same subnet of a VPC. So Liferay's Clusterlink host autodiscovery won't work. And the only straightforward alternative is configuring an unicast TCP dedicated communication between nodes for ClusterLink, which requires all node IPs to be explicitly configured. And that makes auto scaling difficult.

After googling a while we found that there are some options in newer versions of JGroups which could allow a simpler configuration for new cluster nodes.

As we haven't found any reference about this kind of configuration for Liferay... What better way of spending some spare time than a proof of concept?

First step is setting up the test scenario:
  • configure a VPC in AWS with public and private networks
  • two load-balanced web servers in the public network
  • two Liferay nodes in the private network
with only one restriction: spend no money ( thanks to Amazon Free Tier ).

In this article we will focus on the first step: setting-up the network.

Our test scenario is perfectly described in Amazon help . Obviously they are not giving a lot of details on how to keep us in the free tier.  In fact Amazon offers us a VPC setup wizard but the  NAT instance they create is not a free one.

Main problem for us is space. We'll need at least 5 servers ( 2 web servers, 2 app servers and a NAT instance ) . But all images (AMIs) directly provided by Amazon are 8 GiB size. So we will exceed the 30 GiB limit for ESB volumes.

Luckily we found an old and unique image (ami-6f3b465f) of a minimal Amazon Linux which is only 2 GiB, runs on HVM and the root volume is an ESB GP2. So it fits perfectly into a free t2.micro instance.  I haven't done any check on the AMI so please just have in mind that before using it if you are worried about the security on your servers.

We have initially created five instances:
  • 2 x  t2.micro instances with 3 GiB space in (10.0.1.0/24) for the web servers.
  • 2 x  t2.micro instances with 6 GiB space in (10.0.1.0/24) for the app servers.
  • 1 x  t2.micro instance with  3 GiB space in  (10.0.0.0/24) for the NAT server.

Only 21 GiB reserved for now. We even have some space left for creating a third instance of an app server. No DBMS instance reserved as it can be created as an Amazon RDS with an additional 20 GiB space quota.

It is important to create the NAT instance with an assigned public IP to avoid non-used Elastic IP costs when the machine is not running. After having granted us access through the security group configuration we only need to enable IP forwarding on the server.

# Access the EC2 NAT instance with  via its public IP using .pem key-pair (example for an OSX machine)
macstar:~ trenddevs$ ssh -A -i key-file.pem ec2-user@publicip

# Instant enable of IPv4 forwarding, make it permanent and apply changes
[ec2-user@ip-10-0-0-130 ~]$ sudo sysctl -w net.ipv4.ip_forward=1
[ec2-user@ip-10-0-0-130 ~]$ sudo vi /etc/sysctl.cnf    #(update net.ipv4.ip_forward=1 in the file)
[ec2-user@ip-10-0-0-130 ~]$ service network restart

# Enable IP masquerading in iptables and make the rule persistent 
[ec2-user@ip-10-0-0-130 ~]$ sudo iptables -t nat -A POSTROUTING -o eth0-j MASQUERADE
[ec2-user@ip-10-0-0-130 ~]$ sudo service iptables save

# NOTE: DISABLE 
[ec2-user@ip-10-0-0-130 ~]$ sudo chkconfig noip on
[ec2-user@ip-10-0-0-130 ~]$ sudo service noip start

Last step is disabling the Source/Destination Check in AWS console which avoids traffic originated in any other  different IP  leave the instance.

After all this changes, a new ssh jump to one of the internal network servers should allow us to check that they can    ping www.google.com .

[ec2-user@ip-10-0-0-130 ~]$ ssh 10.0.1.159
Last login: Wed Apr 15 20:20:09 2015 from ip-10-0-0-130.us-west-2.compute.internal

       __|  __|_  )
       _|  (     /   Amazon Linux AMI
      ___|\___|___|

https://aws.amazon.com/amazon-linux-ami/2015.03-release-notes/
[ec2-user@ip-10-0-1-159 ~]$ ping www.google.com
PING www.google.com (173.194.33.147) 56(84) bytes of data.
64 bytes from sea09s17-in-f19.1e100.net (173.194.33.147): icmp_seq=1 ttl=52 time=7.66 ms
64 bytes from sea09s17-in-f19.1e100.net (173.194.33.147): icmp_seq=2 ttl=52 time=7.15 ms

Easy stuff for now. Next step, configuring the webservers.


Thursday, 9 April 2015

Externally hosted @font-face problems in Firefox


A quick one today.

While setting up our website some time ago we realised that one of our custom @font-face wasn't working at all in Firefox but it was working perfectly for Safari and Chrome. The only difference with other fonts used in the page was that it was loaded from an external domain. And in fact that was the reason!

The W3C CSS3 specification defines in section 4.9 font fetching requirements and its implications. They warn that fonts will typically not be loaded cross-origin unless authors specifically takes steps to permit cross-origin loads. Sites can explicitly allow cross-site loading of font data using the Access-Control-Allow-Origin HTTP header.

As Firefox follows CSS3 specification to the letter (not like Chrome or Safari, which didn't have the problem) we had to change this default behavior.

For our Apache HTTP Server the solution was as easy as follow the specification and define the header in our VirtualHost entry.

<VirtualHost *:80>
  ServerName www.trenddevs.co.uk

  # other config...

  Header set Access-Control-Allow-Origin "*"

</VirtualHost>

Tuesday, 31 March 2015

Integrating our Blogger-powered blog!!


This blog is powered by Google's Blogger. It's quite obvious if you pay attention to our favicon (yes, we need a proper favicon) or if you are familiar with other Blogger's blogs structure. And we don't want to hide hit. It's a good tool. No need to care about hosting or backup issues. Plenty of space to store images. And best of all, it's free.

AND we wanted to make our blog part of our website, under a single domain, making a seamless integration between them.

A little tweaking over one of Blogger's default templates solved the design problem (apart of a font problem in Firefox that we will explain in other post).

We REALLY wanted to serve the blog under our website domain and not a subdomain. Our website is not that big so having richer content would improve our SEO/SEP. But Blogger doesn't offer that kind of personalization. Only defining a custom domain.  So we had to do some hacking in our web server.

So our OBJECTIVE was to serve our blog under http://www.trenddevs.co.uk/blog/ instead of under http://trenddevs.blogger.co.uk/  - which is the default blogger domain -  or    under http://blog.trenddevs.co.uk/  - only customization option offered by Blogger.

We achieved that configuring mod_proxy_http and mod_proxy_html in our Apache Httpd server 2.4 for Linux
  • mod_proxy_http is part of the default Apache installation in CentOS, but 
  • mod_proxy_html needs to install an additional package via $sudo yum install mod_proxy_html
Once both modules were installed and configured in Apache, we added some lines to our VirtuaHost

<VirtualHost *:80>
  ServerName www.trenddevs.co.uk
    
  # other config...

  RewriteEngine on
  RewriteRule ^/blog$ http://www.trenddevs.co.uk/blog/ [R=302]

  ProxyPass /blog/ http://trenddevs.blogspot.co.uk/
  ProxyPassReverse /blog/ http://trenddevs.blogspot.co.uk/

  <Location /blog>
     ProxyHTMLURLMap http://trenddevs.blogspot.co.uk/ /blog/
     ProxyHTMLURLMap http://trenddevs.blogspot.com/ /blog/
     SetOutputFilter  proxy-html
     RequestHeader    unset  Accept-Encoding
  </Location>

</VirtualHost>
Lines 9 and 10 are telling mod_proxy_http to proxy all requests to /blog/ and redirect them to the real Blogger domain.  And lines 12 to 17 configure mod_proxy_html to rewrite all responses to /blog/ requests (which were re-requested to Blogger) and do the magic so all links in our HTML point back to our domain. That creates the effect of our blog being hosted by ourselves.

As a bonus, lines 6 and 7 enable mod_rewrite and make that requests to http://www.trenddevs.co.uk/blog are redirected to http://www.trenddevs.co.uk/blog/ .

Not a big deal, but it works!

Saturday, 28 March 2015

Welcome !!

This blog was created with the main motivation of sharing some of our daily work experience with the community and let you know a little more about our small company.

We started our solo adventure almost one year ago, after more than ten years of professional career in IT. We grabbed the bull by its horns and decided to fly away! And after one year of hard work it's time to dedicate a small part of our efforts to make our company bigger and start our own projects! And this blog is one of those projects.

Did you know that all our website images have been taken by ourselves in The National Museum of Computing , one of the hidden wonders of Bletchley Park, home of the Codebreakers?

Can you see us at the other side of the screen?

We really hope to see you soon again!