Big data and analytics are taking centre stage in many smart city projects all over the world. A smart city, after all, is about using information and communication technology to better manage a city’s assets and infrastructure. It is also about leveraging technology to improve the quality of life for citizens, such as improving individual mobility, enhancing connectivity or improving the environment.
However, many government or public sector organisations that are building the smart cities of tomorrow are still not able to fully take advantage of their data today. This can be due to a number of reasons, such as the limitations of their existing IT infrastructure or constraints from working within a traditionally-designed government setup where information is stored in silos.
Public sector organisations like these need an enterprise data hub (EDH) solution that can help them manage and operationalise big data across departments and within agencies, supporting both current and future needs. An improved data management architecture, like an EDH built with and powered by Apache Hadoop, will be able to help lead governments in a direction where big data can be securely shared, processed, and analysed.
When implementing an EDH, public sector organisations have a couple of key concerns. First, they need to comply with stringent regulatory mandates, and second, they need to ensure data security.
The open source Hadoop platform is flexible, scalable and secure. Here are the five most important questions that public sector organisations need to ask before implementing an EDH with Hadoop, aimed at helping agencies prepare and build out enterprise data solutions.
How do I prepare a secure foundation for Hadoop?
Government agencies — including the defence, financial and healthcare sectors — are seeing a critical advantage in analysing and using data, especially when it comes to accessing large, historical data sets.
Public sector IT leaders may have concerns about Hadoop because it is an open source software solution. That is, Hadoop is freely available and developers are free to study, change and distribute the software framework. The spirit behind open source means that Hadoop is developed in a collaborative, public manner. This also means that Hadoop embraces innovation, which can sometime translate into updated and new components that are released every month, if not every day.
Nevertheless, while the technology may evolve and change quickly, many of the tried and true security rules and best practices for hardening the system foundations can control these changes to maintain effective security and governance. Even more important is that this can be done without hampering innovation.
To ensure that data security is observed, public sector organisations that adopt Hadoop need to apply well known security measures to the underlying infrastructure and systems. For example:
- Turning off services that are not required,
- restricting access to users,
- limiting super-user permissions,
- locking down network ports and protocols,
- enabling audit, logging and monitoring,
- applying all the latest OS security patches,
- using centralised corporate authentication, and
- enabling encryption in-transit.
What type of perimeter security is in place?
With a Hadoop cluster installed on a secure platform, the next question to address revolves around the perimeter security: who can access the Hadoop cluster, from where, and how are users authenticated?
Perimeter security restricts users by requiring entry through a secure gateway over secured networks and with approved credentials. Just as agencies need multiple data sources and multiple frameworks to truly instil a data-driven workflow within their organisations, government leaders also need a secure enough network system that is agile and can handle a variety of workforce needs.
What security regulations must I meet?
There are two kinds of interests when it comes to compliance: those that have to be compliant and those that want to follow compliance guidelines.
For those that must be compliant, they are usually operating under a mandate such as FISMA (Federal Information Security Management Act), which establishes the compliance and regulations required, including data encryption. Data encryption is the safety lock to the most sensitive data an organisation has access to.
As for those that are following compliance guidelines, they typically do so to establish differentiation, mitigate risks and promote a culture and mindset of security.
However, public sector organisations need to keep in mind that compliance is not just about the technology, it is also about the people and processes. Organisations first need have a security culture in place. For instance, users need to consistently adhere to simple security guidelines like encrypting sensitive data and locking devices with secure passwords.
Who are the ‘need to know’ users on the Hadoop platform?
It is important for a public sector organisation to only share data on a need-to-know basis internally. This is, however, where many public sector agencies struggle the most. There are sub-groups and divisions built into larger agencies, and with increased organisational complexity comes increased difficulty in monitoring and accessing data.
The power to bring data together, like that of a Hadoop-powered EDH, also comes with a challenge: who are the ‘need-to-know’ users within a large organisation that require access to critical data?
Solutions like Apache Sentry that enable role-based access controls to fine grain data sets may be useful here. Users are defined by ‘need-to-know’ roles rather than organisational structures. Essentially, Sentry is the critical, central authorisation framework that gives the Hadoop platform the ability to store sensitive data, while providing secure access to the agency’s ‘need-to-knows.’
How do I monitor and audit Hadoop security after it goes-live?
Auditing the Hadoop platform is the final key piece of an effective and secure data practice. Auditing allows planners to know how users have been using the platform and address any anomalies that may be suspicious. Tools that inspect the Hadoop logs and enable predictive tracking can help them detect bad behaviours and target them before they become a bigger threat.
Simply running a data audit and indexing data can help identify new data and security permissions and policies that need to be applied. It is about gaining that visibility into data usage and the routes taken. Public sector teams need this visibility to know where the data is coming from, and how it is being used all the time without exceptions.
Safeguarding data in a smart city
The current data explosion will continue and public sector organisations will have to manage more and more data. This is especially so in a smart city environment where citizens and the government are more connected than ever before, and where data powers everything from the public transport system to the water and waste management system.
A Hadoop based EDH, offering flexibility, scalability and security, allows public sector organisations to be future ready today. IT leaders in the public sector need to know that it is possible to ensure compliance and security in Hadoop. Moreover, the continued innovation in the platform will also allow public sector organisations to strengthen that security over time.
Daniel Ng, Senior Director, APAC, Cloudera is an end-in-mind strategist, championing business and technology values for customers, from SMBs to Enterprises.
Lead image: dearza / 123RF Stock Photo
Image in-text: adiruch / 123RF Stock Photo