Quandis now hosts significant infrastructure in the Amazon Web Services cloud. Key reasons include:
Amazon has data centers in different regions, or parts of the world. Quandis currently has infrastructure in us-east (Virginia), and us-west-1 (California). Each region includes data centers split into availability zones (which should be thought of as different buildings). Best practice in creating load balanced environments includes load balancing instances in different availability zones. Quandis currently leverages:
Networking includes the creation of a virtual private cloud (VPC), which is a block of private IP addresses. Quandis uses 10.0.* (masked as 10.0.0.0/16). Think of a VPC as a firewall. Within a VPC, Quandis has several subnets, and can use security groups and routing tables to control what traffic the VPC "firewall" will allow between subnets and to or from the public internet. For example:
Each virtual machine in AWS is called an instance, and normally is assigned only private IP addresses.
Security Groups are rules that dictate firewall rules for subnets, instances or network interfaces. They control inbound and outbound traffic, ports and such by source and destination. When configuring new subnets, the following security groups are relevant:
Amazon Web Services limit the number of dedicated public IP addresses (elastic IPs) to five per organization. One can get dynamically assigned public IP addresses, but there is no guarantee that an instance will retain the same dynamically assigned public IP over time. Thus, one cannot reliably use dynamically allocated public IPs for DNS entries.
Amazon provides a solution for this problem, as long as Amazon is hosting our domain. As of this writing, quandis.net is hosted by the Amazon Route 53 domain name service. Route 53 allows us to map a third-level domain to a load balancer, without assigning a public IP address. It's just an 'alias', pointing to a load balancer's name. If the load balancer is assigned new IP addresses (which can happen as new instances are added to the load balancer), no DNS modifications need to be made.
Thus, production instances hosted in AWS will leverage this feature, meaning they will need to end with 'quandis.net'.
Unfortunately, one cannot assign a third-level domain directly to an instance; only to a load balancer. Thus, to do a permanently reliable DNS entry for a UAT site, we either need to:
Private instances need 2 NICs
Instances that have only a private IP address cannot initiate outbound internet traffic unless they are on a subnet that routes through a NAT. We initially configured instances on the 10.0.0.* subnet, only to find that geocoding (which hits Google) failed. This led us to create instances on the 10.0.2.* subnet (so they could geocode). Ultimately, production web servers should be configured with two NICs: one to a subnet routing to a NAT (for outbound traffic), and one to a subnet routing to an internet gateway (to handle inbound traffic from load balancers).
For our code that initiates outbound web traffic (via HttpWebRequest, such as geocoding), we need to tell Windows to route such traffic through the NAT NIC. (Nic nat patty whack?) This is accomplished by adding permanent routes from the command prompt:
Load balancers must use subnets that route to an internet gateway
Load balancers must communicate with instances on subnets that route through an internet gateway (10.0.3.* or 10.0.5.*). When we configured the load balancer to use the 10.0.2.* subnet (which routes through a NAT), no sites responded. This led us to create the 10.0.3.* subnet, and add a NIC to the production instances. With two NICs, they can both respond to inbound load balancer traffic, and route outbound requests to Google for geocoding.
Even after changing the subnets the load balancer was using, we found the site would sporadically be offline. This appears to have been a caching issue: