Aman's Blog - Entrepreneurship and Technology: July 2019

Sunday 28 July 2019

How much time will it take to create Quora?

If you really want to build a product that scales to 100 million users, it will be a real big gun and I am sure you will face many challenges both from a technical and non-technical point of view.

Technical challenges:

Advanced machine learning algorithms
Scaling

Non-technical challenges:

Acquiring customers
Keeping the product in line with what customers need

These challenges may sound trivial, but as a tech guy, I can say with confidence that scaling to 100 million people is definitely not a piece of cake. When you have a hundred thousand users, you are all good. When a hundred million users are using your app, things start breaking and you really need some groundbreaking technology to handle this volume of traffic.

However, if you are looking for something simple - a Quora like question and answer website, then that’s really simple and I have done it (with the help of a couple of friends). With some effort, anyone can do it.

Here are the screenshots:

The technology behind it:

Django (Python) based backend
Frontend in Bootstrap
MySQL as database
Hosted on Heroku (using my free tier of Heroku)
Used Google OAuth for authentication
Tinymce for forms and formatting
Rake for basic NLP/keyword extraction

The database schema is described below:

UserProfile table:


user = models.OneToOneField(User, primary_key=True)
avatar = models.ImageField(null=True, upload_to=generate_filename, default="../default_avatar.png")
bio = models.CharField(max_length=50, null=True)
followers = models.ManyToManyField(User, related_name='following')
following = models.ManyToManyField(User, related_name='followers')

Topic table:


name = models.CharField(max_length=50)
url = models.CharField(max_length=100, primary_key=True)
followers = models.ManyToManyField(User, related_name='topic_followers')

Questions table:


text = models.CharField(max_length=100)
time = models.DateTimeField(default=timezone.now)
asked_by = models.ForeignKey(UserProfile, on_delete=models.SET_NULL,
                             null=True, db_index=True)
url = models.CharField(max_length=100, primary_key=True)
details = models.CharField(max_length=200)
topics = models.ManyToManyField(Topic, related_name='topic_questions')
followers = models.ManyToManyField(User, related_name='question_followers')

Answers table (yo is the term I used for upvote :P):


question_url = models.ForeignKey(Question, on_delete=models.CASCADE)
answered_by = models.ForeignKey(
    UserProfile, db_index=True, on_delete=models.SET_NULL, null=True)
question_text = models.CharField(max_length=100)
text = models.TextField()
time = models.DateTimeField(default=timezone.now)
yoers = models.ManyToManyField(User, related_name='yoers')

Comments table (answers have comments):


text = models.CharField(max_length=200)
time = models.DateTimeField(default=timezone.now)
commented_by = models.ForeignKey(
    User, on_delete=models.SET_NULL, null=True)
answer = models.ForeignKey(Answer, on_delete=models.CASCADE, db_index=True, related_name='comments')

An interesting piece remains - newsfeed algorithm:

As can be seen, the newsfeed has 3 parts:

Latest Questions/Answers:


user = request.user
latest_questions = Question.objects.all().order_by('-time')[:FEED_COUNT]
latest_answers = Answer.objects.all().order_by(
    '-time')[:FEED_COUNT].prefetch_related('question_url')
latest_qa = list(latest_questions) + list(latest_answers)
latest_qa.sort(key=lambda x: x.time, reverse=True)
 
yo_list, yo_count_list = utils_get_yo_info(latest_qa, user)
latest_qa_with_yos = zip(latest_qa, yo_list, yo_count_list)
 
return render(request, 'home/latestqa.html', {
    'latest_qa_with_yos': latest_qa_with_yos,
    'user': user,
    'domain': settings.DOMAIN_NAME})

Here is a brief explanation of the code above:

Generate a list of latest 20 or so questions
Generate a list of latest 20 or so answers
Sort them in the reverse order of time after combining the 2 lists
Show them as the latest Q/A newsfeed

Topics you like:


topics = request.user.topic_followers.all()
topic_questions = list(set(Question.objects.filter(
    topics__in=topics).order_by('-time')[:FEED_COUNT]))
topic_questions.sort(key=lambda x: x.time, reverse=True)
 
yo_list, yo_count_list = utils_get_yo_info(topic_questions, request.user)
topic_questions_with_yos = zip(topic_questions, yo_list, yo_count_list)
 
return render(request, 'home/topicsyoulike.html',
              {'topic_questions_with_yos': topic_questions_with_yos})

This is quite straightforward - a simple database query + some manipulations

People you follow:


following = request.user.following.all()
answers = list(set(Answer.objects.filter(
    answered_by__in=following).order_by('-time')[:FEED_COUNT]))
answers.sort(key=lambda x: x.time, reverse=True)
 
yo_list, yo_count_list = utils_get_yo_info(answers, request.user)
answers_with_yos = zip(answers, yo_list, yo_count_list)
 
return render(request, 'home/peopleyoufollow.html',
              {'answers_with_yos': answers_with_yos})

Again, this is also quite straightforward to understand.

All of these newsfeed ‘algorithms’ were written by me and are in a very raw form. None of them is really ‘intelligent’ and I am sure many of the database queries can be optimized significantly. But again, this wasn’t supposed to be Quora, it was rather supposed to be Quora-like.

Note that the site has been abandoned. It was just an academic project (no commercialization) and the questions and answers you see in the screenshots are because it was tested by my friends a couple of days back. It took a few weeks to complete the web-app.

The name SoclWebApp was randomly chosen (Socl == (Soc)ialize + (L)earn). The site isn’t indexed by search engines and you won’t be able to find it on Google/elsewhere. However, I would be more than happy to answer any questions :)

Saturday 27 July 2019

Aman Goel recommends these Machine Learning Resources

Machine Learning as a domain is currently the hottest domain in the market. Big tech companies like Google, Facebook, Microsoft, Amazon, and Apple are investing heavily in it. The market is huge and the salaries are much higher than those who are in traditional Web Development or App Development.

The reason for the higher salaries of Machine Learning Engineers is simply due to the fact that there is a huge shortage of outstanding Machine Learning Engineers. While there are thousands of online resources that claim to help you become a Data Scientist, most of them are focused on the very basics. Today, in order to differentiate yourself, you not only have to learn the basics but also develop a deep specialization in the domain.

I am enlisting here some of the best resources for you to get started with Machine Learning and build a successful career in it. Let's begin?

Prerequisites: Machine Learning requires not only basic programming knowledge, but also you should be comfortable with the concepts taught in a typical Linear Algebra and a Calculus course. Why? Because in Machine Learning, the data is represented in the form of matrices. And therefore, you should be comfortable with the most common matrix operations like addition, subtraction, multiplication, etc. Also, some algorithms require knowledge of Eigenvalues and Eigenvectors - yeah, the scariest part! Talking about Calculus, you'd be doing some fancy differentiation operations on matrices and so, you should have a solid understanding of core Calculus fundamentals. So, if you have not taken a course on Linear Algebra and Calculus, better do it before you start with ML or else, the best you'd learn is from sklearn.linear_model import linearregression.
- Mathematics for Machine Learning: Linear Algebra
- Mathematics for Machine Learning: Multivariate Calculus
Phase 1: as a beginner in Machine Learning, you should focus on building a solid foundation in the most basic Machine Learning Algorithms. The most important algorithms include Linear Regression, Logistic Regression, Support Vector Machines, and Neural Networks. All of these algorithms are covered in the excellent course by Andrew Ng on Coursera. The upside of the course is that it teaches you the fundamentals in a highly intuitive way. The downside is that it uses Octave as a programming language. Octave isn't a Machine Learning industry standard. Python is.
Phase 2: once you are through with the basics of Machine Learning algorithms, your focus should be on the implementation. Remember, knowing the theory is good. Knowing the theory and being able to implement it is better. Udacity has a great course on Machine Learning. The course focuses on the implementation of various Machine Learning algorithms in Python.
Phase 3: now that you are comfortable with the basic algorithms as well as their implementation, won't it be cool to implement some projects which you can showcase on your resume? Eduonix has a fantastic course on Learn Machine Learning by Building Projects. The course focuses on writing actual code and building some great projects which you can put as a part of your resume. You can talk about these courses in your job interviews. The impact of talking about a project would be far more than the impact you'd create by talking about just the vanilla courses. Moreover, you can showcase these projects on your personal web-page as well.

To conclude, follow a step-by-step methodology to learn the domain of Machine Learning. Do not skip the steps. Follow the sequence in order to get the best output and you surely would succeed.

Here are some great additional resources:

All the best!

Thursday 18 July 2019

Advice for those students who learned nothing in their 4 years of college, but now wish to work at Google/Facebook

The 4 years of college are the best time to explore and learn new things, the reason being that you do not have any responsibilities and so, you can put your heart and soul into learning. However, given that you have already wasted the college years, you should effectively “re-live” them.

A typical template for a Computer Science student looks like the following:

Year 1: Basics of Computer Science and Programming. The year typically starts with a couple of introductory programming courses where the key objective is to learn the fundamentals of Computer Programming in at least 1 well-known programming language. Typically, most colleges teach C/C++, Java, Python, etc.
Year 2: The focus during the 2nd year is in developing ‘problem-solving’ thinking. Key courses include Data Structures and Algorithms, Discrete Mathematics and possibly a course on Probability and Statistics. Most good colleges also have a course on Software Systems Lab which is a hands-on course where you get to explore your ‘hacker’ mind and try out dozens of new tools. The key objective of the 2nd year is to develop a problem-solving mindset and learn effective Google Search.
Year 3: The focus during the 3rd year is to learn the basics of Operating Systems, Databases and Computer Networks and at the same time, get a flavor of basic Software Engineering - Web Development, Mobile App Development, etc. The aim is to complete at least 4 good projects, 2 each in Web and Mobile Development. The 3rd year often ends in a summer internship where you can apply what you have learned so far in a real-world scenario.
Year 4: The focus during the 4th year is to learn advanced skills and technologies, particularly those around Machine Learning, Artificial Intelligence, or any other technology that is new in the market.

Keeping the above template in mind, if you feel that you have wasted complete 4 years of college, you can follow the following steps to come back:

Budget 12 complete months of your life in learning and self-improvement. During these 12 months, you would be implementing a “shortened” version of the above 4-year long template.
Since you are squeezing 4-years of learning into 12 months, focus solely on executing the template. Do not take up a job in parallel, unless you see a future in the job. Do not prepare for a competitive exam in parallel unless you see a future in it. Spend 12 continuous and dedicated months.

Now, let’s see how these 12 months would look like:

Month 1 and 2: spend time learning the basics of Python Programming. Python is simple and easy to learn. Here are some suggested courses:

Month 3 and 4: spend time in learning Data Structures and Algorithms. A good starting point would be Algorithms Part 1. The course may be a bit slow, but you can always fast-forward those parts which you are able to understand quickly. Spend time reading the book provided along with the course. Through this course, not only would you learn about Data Structures and Algorithms, but also you will learn the basics of Java. Parallelly, start solving problems on SPOJ. Solve at least the first 50 problems. Aim for 100.
Month 5, 6 and 7: once you are through with the basics of Programming and Data Structures and Algorithms, move to Web and Android Development. Here are the resources I’d personally recommend:

Month 8, 9 and 10: now that you know the basics of programming, problem-solving and development, you are in the right position to move to advanced technologies. Here are the recommended resources:

Month 11 and 12: by this time, you’d feel a lot more confident about your skills and you will also have great projects to showcase on your resume, thanks to the courses above. You’d be in a position to apply to various companies. During these final 2 months, you should spend a lot of time in preparing for interviews and applying to companies. Here are some great resources for interview preparation:

Cracking the Coding Interview
InterviewBit
LinkedIn: a great platform to build your network
AngelList: an excellent platform to apply for jobs at startups

If you wholeheartedly follow the above 12-month-long plan, you surely can develop the right skills in Computer Science and Programming and start with a tech job. You may not necessarily land at your dream company, but you can get a great start. Later, as you learn and progress and build your network, you can consider moving to larger tech companies - Google, Facebook, Microsoft, Amazon, Apple, etc.

To summarize, I’d say that rather than focusing on Google, Facebook or any other tech company, focus on building the right skills. Focus on improving your problem-solving skills and your fundamentals in Computer Science. Work on projects that you can talk about during the interviews. Work hard and you will succeed.

A quote I love

Hard work beats talent when talent doesn’t work hard.