In this post, Marcos Tapia – a second-year engineering student at the University of Cambridge –...
Despite the whole company needing to work remotely due to coronavirus, the Techspert internship still went ahead. We weren’t going to let a pandemic stop us from nurturing the future generation of STEM pioneers!
In this post, Joshua Snyder, Matthew Macfarlane and Shreshth Malik share their Techspert remote internship experience. They decided to spend their summer break interning with us before embarking on their different ventures.
An intro to our trio
Hi, I’m Matthew! After completing my MSc in Artificial Intelligence at the University of St Andrews, Scotland, I wanted to get some practical experience to bolster my theoretical knowledge. I enjoy having the freedom to explore different solutions to solve a problem, which the Techspert internship offered. After the internship, I’m pursuing a Ph.D. in Deep Reinforcement Learning at the University of Amsterdam.
Hello, I’m Josh! I was drawn to the Techspert internship because I wanted to get some experience with Amazon Web Services (AWS), programming with senior developers, and implementing a project for a business to help achieve its goals – all things I was able to do during my time here. Since completing my Masters in Maths at the University of Cambridge and the internship, I’ve started my own company by creating TimeNavi, an app which helps people manage their time more effectively.
Hi, I’m Shreshth! After completing my Integrated Master’s in Physics at the University of Cambridge, I wanted to experience using machine learning in business and bridge the gap between tech in academia and tech in industry. The Techspert internship was the perfect opportunity to do this. I was also attracted to the fact that the project I worked on would be put into production. After the internship, I’m doing a Master’s in Machine Learning at University College London.
Deep diving with the Search Team
Since we’ve given a brief intro on who we are, let’s dive straight into our internships! We’re members of Search, the team that develops the AI technology that finds world-leading experts. Unlike Ashwath and Toby, the tech interns on the Connect team who worked on the same projects together, we were given individual tasks to work on.
Matthew is going to kick us off!
Making sense of unstructured data
Techspert has developed artificial intelligence (AI) search technology that trawls the web, analyzing and comprehending content on web pages to extract useful expert profiles from them. To widen the volume of content the AI can process, my project was to develop a method to extract expert profiles from unstructured sources, such as news articles. While easy to comprehend for humans, texts are difficult to process by a machine because it doesn’t have a predefined data model or structure that a computer can rely on to make sense of it.
So, I explored and employed a range of natural language processing models that leverage a huge amount of data to train machines to understand human language without having been explicitly programmed to do so. Those models, that I had never used before, included question answering, reading comprehension (see the screenshot below for an example), and coreference resolution, a task that allows a machine to find all expressions that refer to the same entity in a text.
An example of reading comprehension from AllenNLP.
I also got to work with cloud services and implement my models in the cloud for the first time which was a great learning experience.
“Getting to learn and use Amazon Web Services throughout the project was a great opportunity that made the internship particularly valuable to me. It was really exciting to get stuff working in the cloud!”
Matthew Macfarlane, Techspert intern, 2020
After lots of testing, analyzing performance, and checking the accuracy of different models to see which performed best, I developed a model that extracts information from unstructured text to create comprehensive, accurate expert profiles.
As someone who enjoys problem-solving and investigating different solutions, it was encouraging having the freedom to explore and develop something that’s now been implemented into the company’s tech.
Shreshth, over to you!
Mission multilabel classification
My project was to simplify the process of classifying expertise under specific categories to make expert profiles easily and accurately searchable from a defined set of keywords.
This sounds simple but there are sometimes 100,000s of possible categories. For instance, when searching for subject matter experts under the tag “coronavirus”, we also want results for experts tagged under COVID, COVID-19, respiratory disease, pulmonology, pulmonologist, virology, virologist, infectious disease, immunologist, immunology… You get the picture.
To achieve the above, I developed a machine learning model for multilabel classification. It understands what’s specific to a given category and assigns a set of skills to all the labels (or categories) that are relevant. This model is useful when there are two or more classes and the data we want to classify may belong to one or more of the classes at the same time.
A simple illustration of multilabel classification via the University of California
To build the classifier and complete this project, I used various techniques and technologies for the first time in the fields of machine learning, software engineering, and cloud computing.
By the end of the internship, I got the multilabel classification to work. It’s motivating that as an intern, I developed something to make Techspert’s tech even better.
Josh, it’s your go!
Crawling using reinforcement learning
If the title of my section has left you scratching your head, by the end of this it should make sense. techspert.io has developed a computer program called a Crawler that scans the web, reading and indexing everything it finds to produce high-quality, data-backed expert profiles. My project was to train the Crawler to find the right pages faster and to ignore the pages that won’t help it produce profiles.
For example, many people know to click on a website's “Meet the team” page to see people’s profiles, the Crawler doesn’t. Instead, it sifts through all the pages on a website until it gets to the “Meet the team” page, which gives it the profile information it needs, and it continues scanning additional pages after that. The Crawler spending time analyzing content on irrelevant pages is time-consuming and therefore costly.
I used reinforcement learning to get the Crawler to only analyze content that’s going to help it generate profiles. This is a subfield of machine learning where computers learn which actions they should take within an environment to maximize their reward, which is given gradually after each action, or after a certain goal has been reached. For my problem, the reward is information to create expert profiles. Now, when the Crawler is hunting for expert profiles it knows which paths to take to do just that.
This image via the Society of Robotics and Automation sums up reinforcement learning nicely.
Along with using cloud computing and language processing algorithms, I also got to experience how machine learning workflows are developed professionally to deliver predictions that are reproducible and can thus be trusted.
Although we worked on our projects independently, we were never isolated or alone and that’s thanks to a range of initiatives set up at Techspert to encourage collaboration and inclusion.
“The internship taught me a lot about how I want to run my company one day and the importance of a positive company culture. Techspertians are engaged, committed, and excited about what they do. This energy made the experience that much more enjoyable!”
Joshua Snyder, Techspert intern, 2020
Our internship highlights
We’ve spent a lot of time talking about our projects, let’s end off by covering other elements we enjoyed about our internships.
- Learning to program in a team and write good quality code which isn’t something you learn when working on small projects by yourself.
- The supportive learning environment. We feel comfortable going to someone for support if we’re struggling with something.
- The monthly company-wide all-hands meetings. These are a great way to see how everyone plays a crucial role in the company and to see the value of transparency in action.
- The high level of trust given to interns. We’re treated as equals and aren’t tasked with menial activities.
- The team is smart, helpful, friendly, and welcoming. This made all the difference.
“If you have passion for technology, this is the perfect place to explore it independently and also have the backing of knowledgeable, experienced people.”
Shreshth Malik, Techspert intern, 2020