4 min read
My experience going through the full lifecycle of a massive platform
I have been working for a public traded company since 2001. I was able to move up in my career doing many interesting things but nothing like what I was doing until a couple of days ago. A couple of years ago my ex boss (previous department) was able to cherry pick people from different teams around the company to start a ambitious IoT project. Luckily (?) I was one of them.
Obviously I cannot talk about the project specifics because is DARPA like classified but I can talk about what I learned walking thru the IoT path inside a big corporation.
I believe my team and I screwed up so many times that we learned one thing or two. In the first post I will talk about what we learned when setting up our team. After that I’ll talk about choosing the best tool for the job and in the last part I will talk about software maintenance and support.
You might be super awesome on what you have been doing until today or you think yourself the Papa Smurf of the code but let me tell you something, you will face things that you never faced before and the most dangerous thing is that you have no idea about them. Because IoT is relatively new, experts are being built by the industry and right now they are heads down working on a super cool startup or companies like Google or Amazon and quite probably that is not your company. Everybody else will need to become an expert .. or go back to do webapps in Ruby on Rails.
You will face right away with High Availability, Load Balancing, Big Data Processing (Streaming, Batch Layer, Speed Layer) , Business Analytics, Predictive Analysis, Workflows, Machine Learning and Legacy Systems. So do like we did, buy some books, get some training and google a lot.
IoT platforms available today are quite different than the ones available four years ago. Things are way easier, cheaper, faster. Besides these good things you will have to have applications, processes and schedulers, and those applications will have to support 1000+ events per second.
To achieve this, you will need to assembly your team with people that is responsible, has sense of ownership and above all things, can learn. There is no class on high availability that will help you. There is not a online webinar that will show you how to scale to 1000 servers but there are hundred of hours of blog reading, breaking things, testing things and getting it wrong.
Your team is everything, the people you have is more important than the product itself. If you don’t have the right people you just have a poor quality shiny product with no future.
If I would start all over again, my team would have these roles.
These professionals will set up your networks, your security and they will do it in a way that if anything gets broken or lost, it can be regenerated since they actually code the infrastructre. Having the servers and services coded allows to have a predictable, high quality, testable, annotable environment. Can you imagine provisioning 100 servers manually when your load increase because you have a load spike ? With a script it will take minutes and it also could be triggered by a traffic monitoring event.
You and I can play devops, but you want someone with strong sysadmin background if you want to do something serious and don’t show up in the news because someone hacked you.
My organization has UX and Middleware developers. I was one of the middleware boys but I acually don’t agree with the fact of having many different disciplines of developers. I believe that every developer should be able to do from the front end to the back end but not every developer should be able to design a distributed transactional system or an UX flow. You will have specialized guys for that, lets say your more senior people in your team. But you get the idea, everybody should be able to touch any part of the code.
You should not have them.
Your testers should be your own developers and here is where a lot of people will disagree with it. A developer’s job is to deliver a working piece of software that has customer value. Traditional quality assurance people (manual testers) will make developers lazy since they will toss software over the wall for testers to test without fully verifying their functionality.
If you make the developer responsible for its own code, your quality will skyrocket. Nobody likes getting the call in the middle of the night or being posted in the “wall of shame”.
The developer should automate 100% of his use cases (including the exception cases) and also provide Unit test to ensure he is not breaking parts of the code with changes here and there. I always say the same, if I have one dollar to spend in testing, I will do it with end to end (or integration) test and not unit test. Why ? Because an end to end test will tell you if your customer can subscribe to your product, can actually pay for it or can create that report that is key for his business.
Next post: Part 2, The right tool for the job
Rodrigo Asensio is Manager of Solution Architecture at Amazon Web Services. He has more than 20 years of experience designing and operating distributed solutions. He is currently responsible for a team in the Enterprise segment helping large clients accelerate their adoption of the cloud and optimize the utilization of their resources.